# Python recap (Units 100-110)

Python is a very high-level programming language, which is why some call it a *scripting language* in contrast to "proper", heavy-weight programming languages like Java or C.
Python is not as fast as these languages, but it is just as powerful while making it much easier to write code in.
When **fast coding** is more important than **fast code**, Python is a good choice.
That's generally the case for data analysis, and increasingly so in computational linuistics, too.
This makes Python the ideal language for this course.

The rest of this notebook gives a quick summary of some Python basics (corresponding to units 100-105 of LIN 120).
Whereas the LIN 120 notebooks have detailed explanations in plain English, this summary is much more concise.
It is assumed that you have enough of a programming background to quickly pick up the core ideas from a few code snippets.

## `print`, `input`, and variables

In [None]:
print("Use print to show messages to the user")

In [None]:
print('Strings can also occur between single quotes')

In [None]:
print('But that\'s not a good idea for English because apostrophes have to be escaped')

In [None]:
print("That's much better!")
print("We'll always use double quotes for strings in this course.")

In [None]:
print("Let's get some input from the user")
# ask user for input and store it in variable user_input
# (note how comments start with #)
user_input = input()

print("Here's what you said:", user_input)

In [None]:
print("Enter a number!")
n = input()
print("I believe", n, "is the number you entered.")

Notice how Python automatically inserts a space between the arguments of `print`.

The `print` command above is a little clunky.
We can do better with **f-strings** (which is short for **format-strings**).
This is also known as string interpolation in other programming languages.

In [None]:
print("Enter a number!")
n = input()
print(f"I believe {n} is the number you entered.")
print(f"Without curly braces we only get n, not {n}.")

While f-strings are powerful, `print` with multiple arguments still has its uses.

In [None]:
# a creative way of printing banana;
# `sep` is inserted between all arguments of print
print("ba", "a", "a", sep="n")

In [None]:
# an even more creative way of printing banana;
# `end` is appended to the last argument
print("b", "n", "n", sep="a", end="a")

In [None]:
# print all arguments with the empty string as the separator
print("d", "o", "w", "n", sep="")

In [None]:
# print all arguments with a newline character as the separator
print("d", "o", "w", "n", sep="\n")

### Summary

```python
print(arg_1, arg_2, ..., arg_n, sep, end)
print(f"some string containing some_variable")
input()
```

### Common mistakes

- Don't confuse `print` (showing a message on the screen) and `input` (getting user input).
- Don't use `==` when defining variables.
  Only a single `=` is used for defining variables.
- Don't forget to add the prefix `f` when using variables inside a string.

## `if`, `else`, and `elif`

The `if`-`else` construct in Python behaves just like in pretty much every other language.

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
    
if n >= 5:
    print("This will be printed because n is greater than or equal to 5.")

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
print("This will be printed because it is not indented.")
print("Whitespace indicates scope, so indentation matters a lot in Python!")

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
else:
    print("This **will** be printed because n is not strictly greater than 5.")

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
else:
    if n < 5:
        print("This won't be printed because n is not stricly less than 5.")
    else:
        print("This **will** be printed because n fails both conditions.")

Nested conditions are hard to read, in particular because of Python's mandatory indenting.
For complex conditions, use `elif` (short for *else if*) to keep hierarchies flat.

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
elif n < 5:
    print("This won't be printed because n is not stricly less than 5.")
else:
    print("This **will** be printed because n fails both conditions.")

Conditions are evaluated from top to bottom, so if a higher one subsumes a lower one, the lower one will never be checked.

In [None]:
n = 5

if n <= 5:
    print("This message will be printed.")
elif n == 5:
    print("Nothing in this block will ever be executed.")
    print("That's because whenever n == 5 holds, the higher-ranked n <= 5 holds, too.")

Particularly simple uses of `if` can be put on a single line.
This can make code more elegant, but if you're not perfectly sure how to use ternary `if` it's better to sacrifice some elegance for code that definitely works as expected.

In [None]:
n = 5

print("This message will be printed.") if n == 5 else print("We always need an else part for single-line if.")

print("This message won't be printed.") if n < 5 else print("But this message will be.")

# an even shorter version with print
print("n is 5." if n == 5 else "n is not 5.")

# ternary if for defining a variable
b = 9 if n > 5 else 3
print(b)

# the minimally different line below would not work
# b = 9 if n > 5 else b = 3

### Summary

```python
if condition_1:
    # any code you want, but properly indented
elif condition_2:
    # some other code
elif condition 3:
    # some other code
else:
    # what to do in the elsewhere case
```

```python
code_if_true if some_condition else code_if_false
```

### Common mistakes

- Don't forget the colon `:` after the condition.
- Never forget about proper indentation.
- The order of conditions matters.
  More specific conditions should be tested before more general ones.
- Equality is tested with `==` (two equal signs), not `=` (one equal sign).
  The latter is only for defining variables.

## Conditions

Anything can be used as a condition as long as it evaluates to `True` or `False`, which are called **Booleans**.
Conditions often involve one of the following operators:

- `==` (equals),
- `!=` (does not equal),
- `<` (strictly less than),
- `>` (strictly greater than),
- `<=` (less than or equals),
- `>=` (greater than or equals).

In [None]:
if False:
    print("This message is never printed because False is never true.")
elif True:
    print("This message is always printed because True can never be false.")
else:
    print("This message is never printed because `elif True` preempts it.")

In [None]:
n = 5

if n == 5:
    print("Yes, n equals 5.")
if n != 5:
    print("This doesn't get printed; n != 5 is false.")
if n < 5:
    print("This doesn't get printed; n < 5 is false.")
if n > 5:
    print("This doesn't get printed; n > 5 is false.")
if n <= 5:
    print("Yes, n is less than or equal to 5.")
if n >= 5:
    print("Yes, n is greater than or equal to 5.")

Conditions can be negated with `not`, and they can be combined with `and` and `or`.

In [None]:
n = 5

if n > 5 or (n < 10 and not n > 7):
    print("This condition is satisfied.")
    
if n < 10 or n > 1:
    print("An or-condition holds even if both requirements are met.")
    
if n > 10 and n < 10:
    print("This can never be satisfied. It's equivalent to `if False`.")

### Common mistakes

- Only conditions can be modified by `and`, `or`, and `not`.
  Something like `if n == 3 or 5` does not work (the code will run, but it won't do what you want).
- Don't use `not ==`. Use `!=` instead.

## `while`-loops

As in other programming languages, `while`-loops are like an `if` that keeps repeating until the condition is no longer met.

In [None]:
n = 0

while n < 5:
    print(f"n is currently {n}")
    n = n + 1
    
print(f"n has reached value {n}. We have left the loop.")

We can use `break` to force Python to leave the loop right away.
This sometimes allows for more elegant code.
Compare the two below.

In [None]:
reply = ""   # hmm, why is this needed??? can't tell at this point

while reply != "No":
    print("What a nice loop.")
    print("Continue looping?")
    reply = input()  # oh, so now we finally know what reply is used for
    
print("We've left the loop!")

In [None]:
while True:
    # this would loop forever because `True` can never be false
    print("What a nice loop.")
    print("Continue looping?")
    reply = input()
    if reply == "No":
        break  # we're exiting the while loop
        
print("We've left the loop!")

### Summary

```python
while some_condition:
    some_code  # can contain one or more breaks to exit the loop
```

### Common mistakes

- Just as with `if`, don't forget the colon `:` at the end.
- Keep in mind that `if` and `while` serve different purposes.
  Use `if` for code that should be run once when a condition is satisfied.
  Use `while` for code that should be run over and over again until a condition is no longer met.

## Lists

Lists are one of the simplest **data structures** in Python.

In [None]:
number_list = [0, 1, 2, 3, 4]

Lists can contain even very complex objects, such as long strings, variables or other lists.

In [None]:
number_list = [0, 1, 2, 3, 4]
n = 5
another_list = [0, "some string", n, "another string", number_list, [0, [5, 10]], "the last item"]

Items can be added to lists with the `list.append` function.
Whether an item is in a list can be tested with the `in` and `not in` operators:

In [None]:
# we start an empty list
memory = []

while "Star Trek" not in memory:
    print("What's the best Sci-Fi franchise?")
    list.append(memory, input())

The `list.append` function is a specific instance of a range of functions that follow the template `type_of_object.do_something(object, arg_1, ..., arg_n)`.
For all these functions, one can instead use the shorthand `object.do_something(arg_1, ..., arg_n)`.
This is known as a **method** (see the section on functions for details).

In [None]:
# we start an empty list
memory = []

while "Star Trek" not in memory:
    print("What's the best Sci-Fi franchise?")
    memory.append(input())

Two lists can be concatenated into one with the `+` operator.
The order of arguments matters.

In [None]:
list1 = [0, 1, 2, 3, 4]
list2 = ["a", "b", "c"]
print(list1 + list2)
print(list2 + list1)
print(list1 + list1 + list2 + list1)

Each item in a list can be referenced by its **index**.

In [None]:
list2 = ["a", "b", "c"]
print(list2[0])
print(list2[1])
print(list2[2])

Note that indexation starts at 0, not 1.
Intuitively, each element of a list occurs between two numbers, and we use the one to the left to show us the item.

```python
0 a 1 b 2 c 3
```

If you accidentally use an index that's larger than the one before the last item, you'll get an error message.

In [None]:
print(["a", "b", "c"][2])  # 2 is the index of the last item
print(["a", "b", "c"][3])  # 3 is greater than 2; this means trouble

We can also use indices to pick out parts of a list with **slices**.

In [None]:
list2 = ["a", "b", "c"]
print(list2[1:3])

The index before/after the colon `:` can be omitted.
In that case Python uses the first/last index of the list.

In [None]:
list2 = ["a", "b", "c"]
print(list2[:2])  # from start to 2
print(list2[1:])  # from 1 to end
print(list2[:])   # from start to end

With slices, there is no problem when the second index is too large for the list.

In [None]:
print(["a", "b", "c"][:999])  # a-okay with slices, no problem here
print(["a", "b", "c"][999])   # ouch,

This may seem horribly inconsistent to you, but if slices were as picky as single indices they would be very tricky to work with.
Compare the following two code snippets.
Each one returns the first 5 items of a list, but the second does so in a manner that never uses slices that extend beyond the end of the list.
As you can see, the code is a bit more convoluted with no clear gain.

In [None]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list2 = ["a", "b", "c"]

for l in [list1, list2]:
    print(l[:5])

In [None]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list2 = ["a", "b", "c"]

for l in [list1, list2]:
    end = min(5, len(l))
    print(l[:end])

**Caution:** slices always return lists, whereas a single index returns an item.
Compare the following:

In [None]:
list2 = ["a", "b", "c"]
print(list2[2])
print(list2[2:])

### Summary

```python
# general list format
[item1, item2, item3]
# adding an item to a list
list.append(some_list, item)
# using the append-method instead
some_list.append(item)
# checking membership
item in some_list
item not in some_list
# concatenating lists
list1 + list2 + list3 + list4  # and so on
# using indices
list1[some_index]
# and slices for extracting sublists
list1[start_index:end_index]
```

### Common mistakes

- Lists use square brackets, not parentheses or curly braces.
- Don't confuse `append` (adding items to a list) and `+` (concatenating lists).
  Something like `some_list + 5` won't work!
- Don't use `some_list + [item]` for adding an item to a list.
  The code works, but it's clunky and slower than `list.append(some_list, 5)`.
- Do not confuse functions and method.
  You can use `list.append(some_list, 5)` or `some_list.append(5)`, but not `append(some_list, 5)`, or `some_list.list.append(5)`, or `some_list.append(some_list, 5)`.
- Indices are numbered left-to-right starting from 0, not 1.
- Never use an index that's too large for the list.
- Don't confuse index notation `[some_index]` and slice notation `[start_index:end_index]`.

## More on strings

Strings are very similar to lists.

In [5]:
# defining strings
string1 = "What a"
string2 = "lovely string"

# concatenation with +; note that we have to manually add the space between them
print(string1 + " " + string2)

# membership test with in and not in
if "hat " in string1:
    print(string1)
if "ING" not in string2:
    print(string2)

What alovely string
What a
lovely string


In addition, the functions `str.upper`, `str.lower`, `str.title` (or their corresponding methods) can be used to modify a strings capitalization.

In [6]:
improper_cap = "This String, it is CaPitaLizEd BADLY!!!"

# all upper case
print(str.upper(improper_cap))  # function
print(improper_cap.upper())     # method; don't forget about () at the end

# all lower case
print(str.lower(improper_cap))  # function
print(improper_cap.lower())     # method; don't forget about () at the end

# all title case
print(str.title(improper_cap))  # function
print(improper_cap.title())     # method; don't forget about () at the end

THIS STRING, IT IS CAPITALIZED BADLY!!!
THIS STRING, IT IS CAPITALIZED BADLY!!!
this string, it is capitalized badly!!!
this string, it is capitalized badly!!!
This String, It Is Capitalized Badly!!!
This String, It Is Capitalized Badly!!!


Keep in mind that Python does not ignore capitalization differences by default.

In [7]:
print("String" == "STRING")
print("String".lower() == "STRING".lower())

False
True


### Summary

```python
# use double quotes to avoid issues with apostrophes
"somebody's favorite string"
# concatenation with +
"some string" + " and some other string"
# change capitalization with function
str.upper(some_string)
str.lower(some_string)
str.title(some_string)
# change capitalization with method
some_string.upper()
some_string.lower()
some_string.title()
```

### Common mistakes

- When concatenating strings with `+`, you have to handle whitespace yourself.
  The output of `"some" + "string"` is `"somestring"`, not `"some string"`.
- When using methods for changing capitalization, don't forget about the parenthesis at the end.
  It is `some_string.upper()`, not `some_string.upper`.

## Sets

Sets are very similar to lists except that they are

1. unordered, and
1. do not contain duplicates.

In [None]:
list1 = ["some string"]
list2 = ["some string", "another string"]
list3 = ["another string", "some string"]
list4 = ["some string", "some string", "another string"]

# all four lists are distinct from each other
the_lists = [list1, list2, list3, list4]
for l1 in the_lists:
    print(l1, "is the same as")
    for l2 in the_lists:
        if l1 != l2:
            print(l2, l1 == l2, sep=": ")
    print("-----")

In [None]:
list1 = ["some string"]
list2 = ["some string", "another string"]
list3 = ["another string", "some string"]
list4 = ["some string", "some string", "another string"]

# but as sets, l2 = l3 = l4
the_lists = [list1, list2, list3, list4]
for l1 in the_lists:
    print(set(l1), "is the same as")
    for l2 in the_lists:
        if l1 != l2:
            print(set(l2), set(l1) == set(l2), sep=": ")
    print("-----")

Sets can be defined in two different ways.
Either you convert a list to a set using the `set()` function, or you directly define the set using curly braces.

In [None]:
set(["some string", "another string"]) == {"some string", "another string"}

Sets can be much faster than lists for membership tests, which are still done with `in` and `not in`.

In [1]:
print(0 in {0,1})
print(0 not in {0,1})

True
False


But sets are also more limited than lists.
They do not preserve order, cannot contain duplicates, and they can only contain so-called *hashable* objects, e.g. strings and numbers.
Lists, sets, and counters are not hashable and thus cannot be contained in sets.

In [None]:
# allowed
good_set1 = {"this", "is", "okay"}
good_set2 = {0, 3, 938, 16, -5, 3.7, "see", "numbers", "work", "111!11", 5, 5}

# bad
# bad_set1 = {["this", "is"], "not", "okay"}
# bad_set2 = {{"this", "is}, "not", "okay"}

A common use for sets is to remove duplicates from a list.
One first converts the list to a set, and then the set back to a list.

In [None]:
redundant_list = [0,0,0,0,0,0,0]
print(set(redundant_list))
print(list(set(redundant_list)))

### Summary

```python
# convert some_list to a set
set(some_list)
# define a set
{item1, item2, ...}
# membership test
item in some_set
item not in some_set
```

### Common mistakes

- Sets are not lists.
  You cannot use indices, slices, `+`, or `.append`.
- Sets are less flexible than lists, they can only contain certain types of objects.
  Don't try to put lists or sets inside sets.
- When in doubt, stick with lists and only use sets if you really need the extra speed.

## `for`-loops

Use a `for` loop to iterate over a "container-like" object, e.g. a list, a string, or a set.

In [8]:
for word in ["list", "items"]:
    print(word)
    
for word in ["set", "items"]:
    print(word)
    
for char in "string":
    print(char)

list
items
set
items
s
t
r
i
n
g


Since there are no restrictions on the code block under a `for`, it can contain other loops.

In [10]:
for word in ["list", "items"]:
    for char in word:
        if char in "aeiou":
            while True:
                print("I found a vowel!")
                print("Guess what it is!")
                guess = input()
                if guess == char:
                    break
            print("Yes, that's it! How did you know?")
    print(f"Done with {word}")
print("That's it folks!")

I found a vowel!
Guess what it is!
i
Yes, that's it! How did you know?
Done with {word}
I found a vowel!
Guess what it is!
i
Yes, that's it! How did you know?
I found a vowel!
Guess what it is!
e
Yes, that's it! How did you know?
Done with {word}
That's it folks!


### Summary

```python
for object in container:
    do_something
```

### Common mistakes

- You can only iterate over container-like objects (*iterables*).
  Something like `for i in 5` or `for n < 5` does not work.
- As always, pay attention to proper indentation.

## Counters

## Built-in functions

- len, sum, enumerate, print, input, max, min, sorted

## Custom functions

### Common mistakes

- Don't confuse **defining** a function and **calling** a function.
  First you define the function to specify what it does.
  Later on, you can call it to execute the code in the function.
  Just defininig the function does not do anything.
  
- Don't confuse `print` and `return`.
  The `print` function is only used to show messages to the user, whereas `return` is needed to pass a value out of a function.

## Libraries/modules/packages

Python comes with **libraries** that provide additional functionality for specialized purposes.
A different name for libraries is **modules**.
A **package** is a collection of modules.
Packages and modules are treated exactly the same, it's just that packages tend to be much larger than modules.

In [None]:
# we load the random library
import random

# and now we use one of its specialized functions
chosen_item = random.choice([0, 1, 2, 3])
print(chosen_item)

If you import some library `foo` with `import foo`, the function `bar` of library `foo` will be available as `foo.bar`.
In some cases, this can create very long function names.

In [None]:
import urllib.request
url = "http://thomasgraf.net/images/graf.jpg"
urllib.request.urlretrieve(url, "graf.jpg")  # this only works if your CoCalc account has been upgraded already!

In those cases, we can use `from foo import bar` instead.

In [None]:
from urllib.request import urlretrieve
url = "http://thomasgraf.net/images/graf.jpg"
urlretrieve(url, "graf.jpg")  # this only works if your CoCalc account has been upgraded already!

This is particularly useful with the `pprint` library, which is often only used for its `pprint` function.
The name `pprint` is short for *pretty print*.

In [None]:
import pprint
long_list = ["This", "long", "list", "contains", "many", "words", ",", "including", "supercalifragilisticexpialidocious"]
print("long_list with `print`:")
print(long_list)
print("\nAnd now with `pprint`:")
pprint.pprint(long_list)

In [None]:
from pprint import pprint
long_list = ["This", "long", "list", "contains", "many", "words", ",", "including", "supercalifragilisticexpialidocious"]
print("long_list with `print`:")
print(long_list)
print("\nAnd now with `pprint`:")
pprint(long_list)