# Basic Python for Linguists in 15 Minutes

[Jackson Lee](https://jacksonllee.com/)

April 2021

Source of this Jupyter notebook: https://github.com/jacksonllee/pycantonese/blob/main/docs/tutorials/lee-python-2021-april.ipynb

The easiest way to immediately play with this notebook is to log on to your Google account (Gmail, etc.) and [open this notebook in Google Colab](https://colab.research.google.com/github/jacksonllee/pycantonese/blob/main/docs/tutorials/lee-python-2021-april.ipynb). You'll have your own copy to run cells from, make changes, and save in your Google Drive.

## Introduction

Perhaps you're attending a class or tutorial that says "prior experience in Python programming is helpful but not required" or similar. You may not have a background in computer programming and/or the Python language, but would like to attend and get something out of it nonetheless. Before attending, and if you have 15 minutes to spare, the following topics are more or less the bare minimum that you'd want to understand -- at least by just reading through the code snippets in this Jupyter notebook (this is the name of this thing you're reading right now - Jupyter is pronounced like the planet in our solar system):

* *Strings*, probably the most important type of Python objects you need to know when handling language data
* *Lists*, to handle a bunch of things
* *Loops*, to do something repeatedly
* *Dicts*, to map something to something else
* *Conditionals*, to do X instead of Y by some condition
* *Outputting A File*, to keep results in a text file (plain text, CSV, etc.)

If you're a linguist, picking up a language should be a piece of cake. 🙂

## Strings

To work with language data in Python, understand how **strings** work.

In [1]:
x = "this is a string"

What's just happened? We've assigned the string `"this is a string"` to the variable `x`.

In [2]:
type(x)

str

How many characters are there in the string? The **`len()`** (for "length") function is defined for strings (and many other data types for which the idea of "number of things in it" makes sense).

In [3]:
len(x)

16

Access the first character. Python counts from zero, like many programming languages.

In [4]:
x[0]

't'

You can **slice** a string. This example with `[5:10]` means accessing the characters from index 5 to (but excluding) index 10.
You get back a string whose length is 5, for indices {5, 6, 7, 8, 9}.
(Yes, the space is a character and has a length of one.)

In [5]:
x[5:10]

'is a '

## Lists

We can handle a single string like `x` above. What about multiple strings? Sure, we can define more variables:

In [6]:
y = "this is another string"
z = "this is yet another string"

Okay, you see where this is going. What if we want to handle many more, or an unspecified number of them? Defining variables one by one isn't sustainable. We need a way to deal with _a bunch of things_. That's where containers in Python come in. We're going to see how **lists** in Python work below. (There're many other options -- hey, you've got only 15 minutes!)

Note how square brackets `[ ]` are used to define a list.

In [7]:
words = ["cats", "dogs"]

In [8]:
type(words)

list

Just like strings, you can ask for the length of a list.

In [9]:
len(words)

2

Remember the string `x` from above? We can break it up by the spaces to create a list of words.

In [10]:
x

'this is a string'

`.split()` applies to a string.

In [11]:
result = x.split()

In [12]:
result

['this', 'is', 'a', 'string']

In [13]:
type(result)

list

In [14]:
len(result)

4

Things are getting more interesting now. We've just begun combing knowledge of strings and lists!

A list behaves just like a string does for the slicing syntax with `[0]`, `[5:10]`, etc that we've seen above for strings.

In [15]:
result[0]

'this'

In [16]:
result[-1]  # [-1] gives the final element.

'string'

Slicing a list gives you a list, just as slicing a string gives you a string.

In [17]:
result[:2]  # If the starting index isn't specified, it's 0, i.e., from the beginning.

['this', 'is']

## `for` loops

You've probably heard that computers are very good at repeating a task over and over. Let's see how Python **loops** work. We'll have time for **`for`** loops only.

In [18]:
result

['this', 'is', 'a', 'string']

In [19]:
for word in result:
    print(word)
    print(word[0])

this
t
is
i
a
a
string
s


Okay, let's unpack what you've just seen.

* `result` is a list of four strings.
* We iterate over each of them and give it a variable name `word` -- that's the `for word in result:` part.
* With every `word`, we do the following: just print it, and then just print the first character of it.

This is a toy example, but the power comes in when you have a lot of elements to iterate over, and when you can control what the computer should do in specific iterations.

## Dicts

A dict (for "dictionary") is a map from something to something else.

You can map a string to another string:

In [20]:
# What does each animal say?
animals = {
    "dog": "woof",
    "cat": "meow",
    "pig": "oink",
}

In [21]:
type(animals)

dict

In [22]:
animals["dog"]

'woof'

In [23]:
animals["pig"]

'oink'

You can also map a string to a number. Whenever you're counting things (e.g., word frequencies), it's very natural to model a counter as a dict from the things you care about to their counts. In fact, counting is such a common use case that Python has a built-in `Counter`.

Let's use the string variables `x`, `y`, and `z` from above and tally up the words.

In [24]:
x

'this is a string'

In [25]:
y

'this is another string'

In [26]:
z

'this is yet another string'

In [27]:
from collections import Counter

word_counter = Counter()  # Initialize an empty counter

# `[x, y, z]` is a list (note the square brackets) of the three strings.
# `s` represents each of these strings in the `for` loop.
for s in [x, y, z]:

    words = s.split()

    # For each word, add it to the counter.
    # If the word doesn't already exist in the counter, its count is now 1.
    # If the word already exists in the counter, its count increments by 1.
    for word in words:
        word_counter[word] += 1

In [28]:
word_counter

Counter({'this': 3, 'is': 3, 'a': 1, 'string': 3, 'another': 2, 'yet': 1})

We can see that "this" appears three times, "another" two times, etc.

A dict has the `items` method for iterating through the key-value pairs (i.e., the pairs of things in a dict).

In [29]:
for word, count in word_counter.items():
    print(word, count)

this 3
is 3
a 1
string 3
another 2
yet 1


Another advantage of using a `Counter` object is that it has a couple pretty nifty methods, e.g., `most_common`.

In [30]:
word_counter.most_common(4)  # top 4 most common items and their counts

[('this', 3), ('is', 3), ('string', 3), ('another', 2)]

## Conditionals

No, we aren't talking about counterfactuals, etc. Conditionals in programming languages are just about the plain, indicative kind.

In [31]:
if 3 > 2:
    print("hi")

hi


In [32]:
if 3 < 2:
    print("hi")

If condition C is true, do X. If condition C is not true, don't do X. Got it? (For the semanticists / pragmaticists / logicians out there, I know what you have in mind...)

In [33]:
if 3 < 2:
    print("hi")
else:
    print("bye")

bye


This is an if-else code block. If some condition is true, do X, or else do Y instead.

This is the point where in a language class you'd be tossed with an example that combines some of what you've been introduced explicitly, possibly with new stuff as well, and you'd be left squinting and trying to make sense of it. Behold!

In [34]:
text = "Among the languages that are spoken today, only few are even tolerably well known to science. Of many we have inadequate information, of others none at all."
# The very first sentence of the chapter "The Languages of the World" in Bloomfield's (1933) _Language_

start_with_a = []
not_start_with_a = []

for word in text.strip().lower().split():
    if word.startswith("a"):
        start_with_a.append(word)
    else:
        not_start_with_a.append(word)

In [35]:
start_with_a

['among', 'are', 'are', 'at', 'all.']

In [36]:
not_start_with_a

['the',
 'languages',
 'that',
 'spoken',
 'today,',
 'only',
 'few',
 'even',
 'tolerably',
 'well',
 'known',
 'to',
 'science.',
 'of',
 'many',
 'we',
 'have',
 'inadequate',
 'information,',
 'of',
 'others',
 'none']

## Outputting A File

We're going to wrap up this short tutorial by briefly getting into how to write a text file to the local disk. I'm re-using the word counter from earlier.

In [37]:
word_counter

Counter({'this': 3, 'is': 3, 'a': 1, 'string': 3, 'another': 2, 'yet': 1})

The `with open("results.csv", "w") as f:` part below takes care of creating a file for the specified file path and representing the file by the variable `f` so that we can specify when and what to write to the file and how.

In [38]:
with open("results.csv", "w") as f:

    for word, count in word_counter.items():
        f.write(f"{word},{count}\n")

#### If you are running this notebook locally on your computer...

You should have created a text file called `results.csv` at the same directory of this notebook. The text file looks like this:

```
this,3
is,3
a,1
string,3
another,2
yet,1
```

You may open this CSV file with a spreadsheet program.

#### If you are running this notebook on Google Colab...

You can run the following cell to retrieve `results.csv` to your computer.

In [None]:
from google.colab import files

files.download("results.csv")

I hope you've found this tutorial helpful!