# Python recap (Units 100-110)

Python is a very high-level programming language, which is why some call it a *scripting language* in contrast to "proper", heavy-weight programming languages like Java or C.
Python is not as fast as these languages, but it is just as powerful while making it much easier to write code in.
When **fast coding** is more important than **fast code**, Python is a good choice.
That's generally the case for data analysis, and increasingly so in computational linuistics, too.
This makes Python the ideal language for this course.

The rest of this notebook gives a quick summary of some Python basics (corresponding to units 100-105 of LIN 120).
Whereas the LIN 120 notebooks have detailed explanations in plain English, this summary is much more concise.
It is assumed that you have enough of a programming background to quickly pick up the core ideas from a few code snippets.

## `print`, `input`, and variables

In [None]:
print("Use print to show messages to the user")

In [None]:
print('Strings can also occur between single quotes')

In [None]:
print('But that\'s not a good idea for English because apostrophes have to be escaped')

In [None]:
print("That's much better!")
print("We'll always use double quotes for strings in this course.")

In [None]:
print("Let's get some input from the user")
# ask user for input and store it in variable user_input
# (note how comments start with #)
user_input = input()

print("Here's what you said:", user_input)

In [None]:
print("Enter a number!")
n = input()
print("I believe", n, "is the number you entered.")

Notice how Python automatically inserts a space between the arguments of `print`.

The `print` command above is a little clunky.
We can do better with **f-strings** (which is short for **format-strings**).
This is also known as string interpolation in other programming languages.

In [None]:
print("Enter a number!")
n = input()
print(f"I believe {n} is the number you entered.")
print(f"Without curly braces we only get n, not {n}.")

While f-strings are powerful, `print` with multiple arguments still has its uses.

In [None]:
# a creative way of printing banana;
# `sep` is inserted between all arguments of print
print("ba", "a", "a", sep="n")

In [None]:
# an even more creative way of printing banana;
# `end` is appended to the last argument
print("b", "n", "n", sep="a", end="a")

In [None]:
# print all arguments with the empty string as the separator
print("d", "o", "w", "n", sep="")

In [None]:
# print all arguments with a newline character as the separator
print("d", "o", "w", "n", sep="\n")

### Summary

```python
print(arg_1, arg_2, ..., arg_n, sep, end)
print(f"some string containing some_variable"
input()
```

### Common mistakes

- Don't confuse `print` (showing a message on the screen) and `input` (getting user input).
- Don't use `==` when defining variables.
  Only a single `=` is used for defining variables.
- Don't forget to add the prefix `f` when using variables inside a string.

## `if`, `else`, and `elif`

The `if`-`else` construct in Python behaves as in pretty much every other language.

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
    
if n >= 5:
    print("This will be printed because n is greater than or equal to 5.")

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
print("This will be printed because it is not indented.")
print("Whitespace indicates scope, so indentation matters a lot in Python!")

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
else:
    print("This **will** be printed because n is not strictly greater than 5.")

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
else:
    if n < 5:
        print("This won't be printed because n is not stricly less than 5.")
    else:
        print("This **will** be printed because n fails both conditions.")

Nested conditions are hard to read, in particular because of Python's mandatory indenting.
For complex conditions, use `elif` (short for *else if*) to keep hierarchies flat.

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
elif n < 5:
    print("This won't be printed because n is not stricly less than 5.")
else:
    print("This **will** be printed because n fails both conditions.")

Conditions are evaluated from top to bottom, so if a higher one subsumes a lower one, the lower one will never be checked.

In [None]:
n = 5

if n <= 5:
    print("This message will be printed.")
elif n == 5:
    print("Nothing in this block will ever be executed.")
    print("That's because whenever n == 5 holds, the higher-ranked n <= 5 holds, too.")

### Summary

```python
if condition_1:
    # any code you want, but properly indented
elif condition_2:
    # some other code
elif condition 3:
    # some other code
else:
    # what to do in the elsewhere case
```

### Common mistakes

- Don't forget the colon `:` after the condition.
- Never forget about proper indentation.
- The order of conditions matters.
  More specific conditions should be tested before more general ones.
- Equality is tested with `==` (two equal signs), not `=` (one equal sign).
  The latter is only for defining variables.

## Conditions

Anything can be used as a condition as long as it evaluates to `True` or `False`, which are called **Booleans**.
Conditions often involve one of the following operators:

- `==` (equals),
- `!=` (does not equal),
- `<` (strictly less than),
- `>` (strictly greater than),
- `<=` (less than or equals),
- `>=` (greater than or equals).

In [None]:
if False:
    print("This message is never printed because False is never true.")
elif True:
    print("This message is always printed because True can never be false.")
else:
    print("This message is never printed because `elif True` preempts it.")

In [None]:
n = 5

if n == 5:
    print("Yes, n equals 5.")
if n != 5:
    print("This doesn't get printed; n != 5 is false.")
if n < 5:
    print("This doesn't get printed; n < 5 is false.")
if n > 5:
    print("This doesn't get printed; n > 5 is false.")
if n <= 5:
    print("Yes, n is less than or equal to 5.")
if n >= 5:
    print("Yes, n is greater than or equal to 5.")

Conditions can be negated with `not`, and they can be combined with `and` and `or`.

In [None]:
n = 5

if n > 5 or (n < 10 and not n > 7):
    print("This condition is satisfied.")
    
if n < 10 or n > 1:
    print("An or-condition holds even if both requirements are met.")
    
if n > 10 and n < 10:
    print("This can never be satisfied. It's equivalent to `if False`.")

### Common mistakes

- Only conditions can be modified by `and`, `or`, and `not`.
  Something like `if n == 3 or 5` does not work (the code will run, but it won't do what you want).
- Don't use `not ==`. Use `!=` instead.

## `while`-loops

As in other programming languages, `while`-loops are like an `if` that keeps repeating until the condition is no longer met.

In [None]:
n = 0

while n < 5:
    print(f"n is currently {n}")
    n = n + 1
    
print(f"n has reached value {n}. We have left the loop.")

We can use `break` to force Python to leave the loop right away.
This sometimes allows for more elegant code.
Compare the two below.

In [None]:
reply = ""   # what does this do???

while reply != "No":
    print("What a nice loop.")
    print("Continue looping?")
    reply = input()  # oh, so now we finally know what reply is used for
    
print("We've left the loop!")

In [None]:
while True:
    # this would loop forever because `True` can never be false
    print("What a nice loop.")
    print("Continue looping?")
    reply = input()
    if reply == "No":
        break  # we're exiting the while loop
        
print("We've left the loop!")

### Summary

```python
while some_condition:
    some_code  # can contain one or more breaks to exit the loop
```

### Common mistakes

- Just as with `if`, don't forget the colon `:` at the end.
- Keep in mind that `if` and `while` serve different purposes.
  Use `if` for code that should be run once when a condition is satisfied.
  Use `while` for code that should be run over and over again until a condition is no longer met.

## Lists

Lists are one of the simplest **data structures** in Python.

In [None]:
number_list = [0, 1, 2, 3, 4]

Lists can contain even very complex objects, such as long strings, variables or other lists.

In [None]:
number_list = [0, 1, 2, 3, 4]
n = 5
another_list = [0, "some string", n, "another string", number_list, [0, [5, 10]], "the last item"]

Items can be added to lists with the `list.append` function.
Whether an item is in a list can be tested with the `in` and `not in` operators:

In [None]:
# we start an empty list
memory = []

while "Star Trek" not in memory:
    print("What's the best Sci-Fi franchise?")
    list.append(memory, input())

Two lists can be concatenated into one with the `+` operator.
The order of arguments matters.

In [1]:
list1 = [0, 1, 2, 3, 4]
list2 = ["a", "b", "c"]
print(list1 + list2)
print(list2 + list1)
print(list1 + list1 + list2 + list1)

[0, 1, 2, 3, 4, 'a', 'b', 'c']
['a', 'b', 'c', 0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 'a', 'b', 'c', 0, 1, 2, 3, 4]


### Summary

```python
# general list format
[item1, item2, item3]
# adding an item to a list
list.append(some_list, item)
# checking membership
item in some_list
item not in some_list
# concatenating lists
list1 + list2 + list3 + list4  # and so on
```

### Common mistakes

- Lists use square brackets, not parentheses or curly braces.
- Don't confuse `append` (adding items to a list) and `+` (concatenating lists).
  Something like `some_list + 5` won't work!
- Don't use `some_list + [item]` for adding an item to a list.
  The code works, but it's clunky and slower than `list.append(some_list, 5)`.

## Libraries/modules/packages

Python comes with **libraries** that provide additional functionality for specialized purposes.
A different name for libraries is **modules**.
A **package** is a collection of modules.
Packages and modules are treated exactly the same, it's just that packages tend to be much larger than modules.

In [None]:
# we load the random library
import random

# and now we use one of its specialized functions
chosen_item = random.choice([0, 1, 2, 3])
print(chosen_item)

If you import some library `foo` with `import foo`, the function `bar` of library `foo` will be available as `foo.bar`.
In some cases, this can create very long function names.

In [None]:
import urllib.request
url = "http://thomasgraf.net/images/graf.jpg"
urllib.request.urlretrieve(url, "graf.jpg")  # this only works if your CoCalc account has been upgraded already!

In those cases, we can use `from foo import bar` instead.

In [None]:
from urllib.request import urlretrieve
url = "http://thomasgraf.net/images/graf.jpg"
urlretrieve(url, "graf.jpg")  # this only works if your CoCalc account has been upgraded already!

This is particularly useful with the `pprint` library, which is often only used for its `pprint` function.
The name `pprint` is short for *pretty print*.

In [None]:
import pprint
long_list = ["This", "long", "list", "contains", "many", "words", ",", "including", "supercalifragilisticexpialidocious"]
print("long_list with `print`:")
print(long_list)
print("\nAnd now with `pprint`:")
pprint.pprint(long_list)

In [None]:
from pprint import pprint
long_list = ["This", "long", "list", "contains", "many", "words", ",", "including", "supercalifragilisticexpialidocious"]
print("long_list with `print`:")
print(long_list)
print("\nAnd now with `pprint`:")
pprint(long_list)