# Python recap (Units 100-110)

Python is a very high-level programming language, which is why some call it a *scripting language* in contrast to "proper", heavy-weight programming languages like Java or C.
Python is not as fast as these languages, but it is just as powerful while making it much easier to write code in.
When **fast coding** is more important than **fast code**, Python is a good choice.
That's generally the case for data analysis, and increasingly so in computational linuistics, too.
This makes Python the ideal language for this course.

The rest of this notebook gives a quick summary of some Python basics (corresponding to [units 100-110 of LIN 120 Language and Technology](https://github.com/CompLab-StonyBrook/lin120_public/tree/master/notebooks)).
Whereas the LIN 120 notebooks have detailed explanations in plain English, this summary is much more concise.
I assume throughout this recap that you have enough of a programming background to quickly pick up the core ideas from a few code snippets.

## `print`, `input`, and variables

In [None]:
print("Use print to show messages to the user")

In [None]:
print('Strings can also occur between single quotes')

In [None]:
print('But that\'s not a good idea for English because apostrophes have to be escaped')

In [None]:
print("That's much better!")
print("As a convention, we'll always use double quotes for strings in this course.")

In [None]:
print("Let's get some input from the user")
# ask user for input and store it in variable user_input
# (note how comments start with #)
print("Dear user, please say something:")
user_input = input()

print("Here's what you said:", user_input)

In [None]:
# input() can also take a string as an argument;
# note the difference in output compared to how we did it with print() above
input("Dear user, please say something:")

In [None]:
# we can store the user input in a variable, e.g. n
print("Enter a number!")
n = input()
print("I believe", n, "is the number you entered.")

In [None]:
# EXERCISE
# reimplement the cell above but now use input() instead of the first instance of print()

Notice how Python automatically inserts a space between the arguments of `print`.

In [None]:
print("Enter a number!")
n = input()
print("I believe", n, "is the number you entered. Yes, indeed, you have entered", n, "for sure!")

The `print` command above is a little clunky.
We can do better with **f-strings** (which is short for **format-strings**).
This is also known as string interpolation in other programming languages.
In string interpolation, a variable `var` can be included in a string as `{var}`.

In [None]:
print("Enter a number!")
n = input()
print(f"I believe {n} is the number you entered. Yes, indeed, you have entered {n} for sure!")
print(f"Without curly braces we only get n, not {n}.")

While f-strings are powerful, `print` with multiple arguments still has its uses.

In [None]:
# a creative way of printing banana;
# `sep` is inserted between all arguments of print;
# if sep isn't specified, then sep=" " (i.e. a space)
print("ba", "a", "a", sep="n")

In [None]:
# an even more creative way of printing banana;
# `end` is appended to the last argument
print("b", "n", "n", sep="a", end="a")

In [None]:
# am even more creative way that uses the empty string ""
# so that we do not need to specify `end`
print("b", "n", "n", "", sep="a")

In [None]:
# print all arguments with the empty string as the separator
print("d", "o", "w", "n", sep="")

In [None]:
# print all arguments with a newline character as the separator
print("d", "o", "w", "n", sep="\n")

In [None]:
# EXERCISEE
# What is the shortest possible print statement for printing the following string?
# abcdefghXabcdefghYabcdefghZabcdefghabcdefgh
print("", "X", "Y", "Z", "", "", sep="abcdefgh")

Don't forget that variables aren't limited to strings, they can refer to pretty much anything.
Effectively, a variable is a name you attach to something that you want to refer back to later on.
Writing `var = foo` means that from here on we can refer to `foo` as `var`.
If `foo` itself is a variable, then `var = foo` means that the thing `foo` currently refers to can also be referred to as `var`.

In [None]:
var = 5
newvar = var + 8
print(var, newvar)

In [None]:
# variables can be redefined
var = 5
newvar = var + 8
var = 0
print(var, newvar)

In [None]:
var = 5
newvar = var + 8
var = newvar
print(var, newvar)

In [None]:
# do you remember what happens if we have two variables and and b with a = b
# and then change a?
var = 5
newvar = var
var = 10
print(var, newvar)

In [None]:
# what if we have two variables a and b with a = b
# and then change b?
var = 5
newvar = var
newvar = 10
print(var, newvar)

### Summary

```python
print(arg_1, arg_2, ..., arg_n, sep, end)
print(f"some string containing {some_variable}")
input()
input("with a message right before the input field")
```

### Common mistakes

- Don't confuse `print` (showing a message on the screen) and `input` (getting user input).
  Yes, `input` can be used to print a message on the same line, but that doesn't make `input` the same as `print`.
- Don't use `==` when defining variables.
  Only a single `=` is used for defining variables.
- Don't forget to add the prefix `f` when using variables inside a string.

## `if`, `else`, and `elif`

The `if`-`else` construct in Python behaves just like in pretty much every other language.
We give `if` a condition, and if that condition is met we execute the indented code below `if`.
Otherwise, we move on to the next piece of the `if` block, assuming there is one.

In [None]:
# two `if` code blocks without any `else`
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")

if n >= 5:
    print("This will be printed because n is greater than or equal to 5.")

In [None]:
# indentation MATTERS!
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
print("This will be printed because it is not indented.")
print("Whitespace indicates scope, so indentation matters a lot in Python!")

In [None]:
# a single `if` code block with an `else`
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
else:
    print("This **will** be printed because n is not strictly greater than 5.")

In [None]:
# code blocks can be nested
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
else:
    if n < 5:
        print("This won't be printed because n is not stricly less than 5.")
    else:
        print("This **will** be printed because n fails both conditions.")

Nested conditions are hard to read, in particular because of Python's mandatory indenting.
For complex conditions, use `elif` (short for *else if*) to keep hierarchies flat.

In [None]:
n = 5

if n > 5:
    print("This won't be printed because n is not strictly greater than 5.")
elif n < 5:
    print("This won't be printed because n is not stricly less than 5.")
else:
    print("This **will** be printed because n fails all the conditions above.")

Conditions are evaluated from top to bottom, so if a higher one subsumes a lower one, the lower one will never be checked.

In [None]:
n = 5

if n <= 5:
    print("This message will be printed.")
elif n == 5:
    print("Nothing in this block will ever be executed.")
    print("That's because whenever n == 5 holds, the higher-ranked n <= 5 holds, too.")

Particularly simple uses of `if` can be put on a single line using a *ternary if*.
This can make code more elegant, but if you're not perfectly sure how to use ternary `if`, just avoid it.
It's better to sacrifice some elegance than write code that you do not fully understand.

In [None]:
n = 5

print("This message will be printed.") if n == 5 else print("We always need an else part for single-line if.")

print("This message won't be printed.") if n < 5 else print("But this message will be.")

# an even shorter version with print
print("n is 5." if n == 5 else "n is not 5.")

# ternary if for defining a variable
b = 9 if n > 5 else 3
print(b)

# the minimally different line below would not work
# b = 9 if n > 5 else b = 3

### Summary

Normal `if`:

```python
if condition_1:
    # any code you want, but properly indented
elif condition_2:
    # some other code
    # only executed if condition_2 is met and condition_1 isn't
elif condition_3:
    # some other code
    # only executed if condition_3 is met and the previous ones aren't
else:
    # what to do when none of the conditions are met;
    # if this is missing, no code is executed
```

Ternary `if`:
```python
code_if_true if some_condition else code_if_false
```

Ternary if is often used to define variables

```python
var = ternary_if_code
```

### Common mistakes

- Don't forget the colon `:` after the condition.
- Never forget about proper indentation.
- The order of conditions matters.
  More specific conditions should be tested before more general ones.
- Equality is tested with `==` (two equal signs), not `=` (one equal sign).
  The latter is only for defining variables.

## Conditions

Anything can be used as a condition as long as it evaluates to `True` or `False`, which are called **Booleans**.
Conditions often involve one of the following operators:

- `==` (equals),
- `!=` (does not equal),
- `<` (strictly less than),
- `>` (strictly greater than),
- `<=` (less than or equals),
- `>=` (greater than or equals).

In [None]:
if False:
    print("This message is never printed because False is never true.")
elif True:
    print("This message is always printed because True can never be false.")
else:
    print("This message is never printed because `elif True` preempts it.")

In [None]:
n = 5

if n == 5:
    print("Yes, n equals 5.")
if n != 5:
    print("This doesn't get printed; n != 5 is false.")
if n < 5:
    print("This doesn't get printed; n < 5 is false.")
if n > 5:
    print("This doesn't get printed; n > 5 is false.")
if n <= 5:
    print("Yes, n is less than or equal to 5.")
if n >= 5:
    print("Yes, n is greater than or equal to 5.")

Conditions can be negated with `not`, and they can be combined with `and` and `or`.
Use brackets to indicate in what order `not`, `and`, and `or` have to be evaluated.

In [None]:
n = 5

if n > 1 or (n < 10 and not n > 4):
    print("This condition is satisfied (because n > 1 holds).")

if (n > 1 or n < 10) and not n > 4:
    print("not gonna happen")
else:
    print("The condition wasn't satisfied (because not n > 4 is false)")

if n > 1 or n < 10:
    print("An or-condition holds even if both requirements are met.")

if n > 10 and n < 10:
    print("not gonna happen")
else:
    print("The condition can never be satisfied. It's equivalent to `if False`.")

### Common mistakes

- Only conditions can be modified by `and`, `or`, and `not`.
  Something like `if n == 3 or 5` does not work (the code will run, but it won't do what you want: `5` always evaluates to `True`, so the condition is met irrespective of the value of `n`).
- Don't use `not ==`. Use `!=` instead.

## `while`-loops

As in other programming languages, `while`-loops are like an `if` that keeps repeating until the condition is no longer met.

In [None]:
n = 0

while n < 5:
    print(f"n is currently {n}")
    n = n + 1

print(f"n has reached value {n}. We have left the loop.")

We can use `break` to force Python to leave the loop right away.
This sometimes allows for more elegant code.
Compare the two below.

In [None]:
reply = ""   # hmm, why is this needed??? can't tell at this point

while reply != "No":
    print("What a nice loop.")
    print("Continue looping?")
    reply = input()  # oh, so now we finally know what reply is used for

print("We've left the loop!")

In [None]:
while True:
    # this would loop forever because `True` can never be false
    print("What a nice loop.")
    print("Continue looping?")
    reply = input()
    if reply == "No":
        break  # we're exiting the while loop

print("We've left the loop!")

### Summary

```python
while some_condition:
    some_code  # can contain one or more breaks to exit the loop
```

Instead of the `while`-loop pattern above, we will often use `while True` in combination with `break`.

```python
while True:
  some_code
  if some_condition:
    break
```

### Common mistakes

- Just as with `if`, don't forget the colon `:` at the end of the condition.
- Keep in mind that `if` and `while` serve different purposes.
  Use `if` for code that should be run once when a condition is satisfied.
  Use `while` for code that should be run over and over again until a condition is no longer met.

## Lists

Lists are one of the simplest **data structures** in Python.

In [None]:
number_list = [0, 1, 2, 3, 4]

Lists can contain even very complex objects, such as long strings, variables or other lists.

In [None]:
number_list = [0, 1, 2, 3, 4]
n = 5
another_list = [0, "some string", n, "another string", number_list, [0, [5, 10]], "the last item"]

Items can be added to lists with the `list.append` function.
Whether an item is in a list can be tested with the `in` and `not in` operators:

In [None]:
# we start an empty list
memory = []

while "Star Trek" not in memory:
    print("What's the best Sci-Fi franchise?")
    list.append(memory, input())

The `list.append` function is a specific instance of a range of functions that follow the template
```python
type_of_object.do_something(object, arg_1, ..., arg_n)
```
For all these functions, one can instead use the shorthand `object.do_something(arg_1, ..., arg_n)`.
This is known as a **method** (see the section on functions for details).

We can use `in` and `not in` to test whether a list contains a given item.

In [None]:
# we start an empty list
memory = []

while "Star Trek" not in memory:
    print("What's the best Sci-Fi franchise?")
    memory.append(input())

Two lists can be concatenated into a new list with the `+` operator.
The order of arguments matters.

In [None]:
list1 = [0, 1, 2, 3, 4]
list2 = ["a", "b", "c"]
print(list1 + list2)
print(list2 + list1)
print(list1 + list1 + list2 + list1)

Each item in a list can be referenced by its **index**.

In [None]:
list2 = ["a", "b", "c"]
print(list2[0])
print(list2[1])
print(list2[2])

Note that indexation starts at 0, not 1.
Intuitively, each element of a list occurs between two numbers, and we use the one to the left to show us the item.

```python
0 a 1 b 2 c 3
```

If you accidentally use an index that's larger than the one before the last item, you'll get an error message.

In [None]:
print(["a", "b", "c"][2])  # 2 is the index of the last item

In [None]:
print(["a", "b", "c"][3])  # 3 is greater than 2; this means trouble

We can use negative indicese to pick out items counting from right to left, instead.

```python
-3 a -2 b -1 c -0
```

But note that we still pick out each item based on the index to its left, so there is no point in using `-0`.
Instead, Python treats this the same as `0` (which makes sense if you think of indices as integers).

As with positive indices, though, negative indices aren't allowed to get out of range.

In [None]:
print(["a", "b", "c"][-2])  # 2 is the index of the second item from the right
print(["a", "b", "c"][-3])  # 3 is the index of the third item from the right; still good, even though 3 didn't work

In [None]:
print(["a", "b", "c"][-4])  # oops, no item at this index; this means trouble

In [None]:
print(["a", "b", "c"][-0])  # and what do we get here?

We can also use indices to pick out parts of a list with **slices**.

In [None]:
list2 = ["a", "b", "c"]
print(list2[1:3])

The index before/after the colon `:` can be omitted.
In that case Python uses the first/last index of the list.

In [None]:
list2 = ["a", "b", "c"]
print(list2[:2])  # from start to 2
print(list2[1:])  # from 1 to end
print(list2[:])   # from start to end

With slices, there is no problem when the second index is too large for the list.

In [None]:
print(["a", "b", "c"][:999])  # a-okay with slices, no problem here

In [None]:
print(["a", "b", "c"][999])  # ouch

This may seem horribly inconsistent to you, but if slices were as picky as single indices they would be very tricky to work with.
Compare the following two code snippets.
Each one returns the first 5 items of a list, but the second does so in a manner that never uses slices that extend beyond the end of the list.
As you can see, the code is a bit more convoluted with no clear gain.

In [None]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list2 = ["a", "b", "c"]

for l in [list1, list2]:
    print(l[:5])

In [None]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list2 = ["a", "b", "c"]

for l in [list1, list2]:
    end = min(5, len(l))
    print(l[:end])

**Caution:** slices always return lists, whereas a single index returns an item from the list.
Compare the following:

In [None]:
list2 = ["a", "b", "c"]
print(list2[2])
print(list2[2:])

**Caution:** Lists and variable assignments can interact in unexpected ways.
When two variables `foo` and `bar` refer to the same list, using `foo` with an operation that changs the list also affects `bar`.
But using `foo` with an operation that creates a new list means that `foo` now refers to that new list whereas `bar` still refers to the original list without any changes.  

In [None]:
# .append changes an existing list, variables behave as expected
foo = ["a"]
bar = foo
bar.append("b")
print(foo)
print(bar)

In [None]:
# + creates a new list, old variable assignments still refer to the original list
foo = ["a"]
bar = foo
bar = bar + ["b"]
print(foo)
print(bar)

### Summary

```python
# general list format
[item1, item2, item3]
# adding an item to a list
list.append(some_list, item)
# using the append-method instead
some_list.append(item)
# checking membership
item in some_list
item not in some_list
# concatenating lists
list1 + list2 + list3 + list4  # and so on
# using indices
list1[some_index]
# and slices for extracting sublists
list1[start_index:end_index]
```

### Common mistakes

- Lists use square brackets, not parentheses or curly braces.
- Don't confuse `.append` (adding items to an existing list) and `+` (concatenating lists).
  Something like `some_list + 5` won't work!
- Also, `.append` modifies the existing list, whereas `+` constructs a new list.
  This matters for variable assignment.
  So `some_list.append(5)` does not do the same as `some_list + [5]`.
- Do not confuse functions and method.
  You can use `list.append(some_list, 5)` or `some_list.append(5)`, but not `append(some_list, 5)`, or `some_list.list.append(5)`, or `some_list.append(some_list, 5)`.
- Indices are numbered left-to-right starting from 0, not 1.
- Never use an index that's too large for the list.
- Don't confuse index notation `[some_index]` and slice notation `[start_index:end_index]`.

## More on strings

Strings are very similar to lists.

In [None]:
# defining strings
string1 = "What a"
string2 = "lovely string"

# concatenation with +; note that we have to manually add the space between them
print(string1 + " " + string2)

# membership test with in and not in
if "hat " in string1:
    print("We can easily check for substrings.")
if "ING" not in string2:
    print("But the checks are case sensitive.")

However, there is no counterpart to `.append`.

In [None]:
str.append("foo", "bar")

In [None]:
"foo".append("bar")

In addition, the functions `str.upper`, `str.lower`, `str.title` (or their corresponding methods) can be used to convert a string into a new one with appropriate capitalization.

In [None]:
improper_cap = "This String, it is CaPitaLizEd BADLY!!!"

# all upper case
print(str.upper(improper_cap))  # function
print(improper_cap.upper())     # method; don't forget about () at the end

# all lower case
print(str.lower(improper_cap))  # function
print(improper_cap.lower())     # method; don't forget about () at the end

# all title case
print(str.title(improper_cap))  # function
print(improper_cap.title())     # method; don't forget about () at the end

**Caution:** These methods create new strings, they do not modify the current string.
This matters when using variables.

In [None]:
var = "foo"
print(var.upper())
print(var)  # still lowercase

In [None]:
var = "foo"
var = var.upper()
print(var)  # now it is uppercase

Keep in mind that Python does not ignore capitalization differences by default.

In [None]:
print("String" == "STRING")
print("String".lower() == "STRING".lower())

### Summary

```python
# use double quotes to avoid issues with apostrophes
"somebody's favorite string"
# concatenation with +
"some string" + " and some other string"
# change capitalization with function
str.upper(some_string)
str.lower(some_string)
str.title(some_string)
# change capitalization with method
some_string.upper()
some_string.lower()
some_string.title()
# check substring
some_substring in some_string
some_substring not in some_string
```

### Common mistakes

- When concatenating strings with `+`, you have to handle whitespace yourself.
  The output of `"some" + "string"` is `"somestring"`, not `"some string"`.
- When using methods for changing capitalization, don't forget about the parenthesis at the end.
  It is `some_string.upper()`, not `some_string.upper`.

## Sets

Sets are very similar to lists except that they are

1. unordered, and
1. do not contain duplicates.

In [None]:
list1 = ["some string"]
list2 = ["some string", "another string"]
list3 = ["another string", "some string"]
list4 = ["some string", "some string", "another string"]

# all four lists are distinct from each other
the_lists = [list1, list2, list3, list4]
for l1 in the_lists:
    print(l1, "is the same as")
    for l2 in the_lists:
        if l1 != l2:
            print(l2, l1 == l2, sep=": ")
    print("-----")

In [None]:
list1 = ["some string"]
list2 = ["some string", "another string"]
list3 = ["another string", "some string"]
list4 = ["some string", "some string", "another string"]

# but as sets, l2 = l3 = l4
the_lists = [list1, list2, list3, list4]
for l1 in the_lists:
    print(set(l1), "is the same as")
    for l2 in the_lists:
        if l1 != l2:
            print(set(l2), set(l1) == set(l2), sep=": ")
    print("-----")

Sets can be defined in two different ways.
Either you convert a list to a set using the `set()` function, or you directly define the set using curly braces.

In [None]:
set(["some string", "another string"]) == {"some string", "another string"}

You can also use `.add` to add elements to a set.
Since sets have no order, it does not matter in what order elements are added.

In [None]:
var = set()  # start with an empty set
var.add(5)
var.add("a")
print(var)

In [None]:
foo = set()
foo.add(5)
foo.add("a")

bar = set()
bar.add("a")
bar.add(5)

print(foo == bar)

Sets can be much faster than lists for membership tests, which are still done with `in` and `not in`.

In [None]:
print(0 in {0,1})
print(0 not in {0,1})

But sets are also more limited than lists.
They do not preserve order, cannot contain duplicates, and they can only contain so-called *hashable* objects, e.g. strings and numbers.
Lists, sets, and counters are not hashable and thus cannot be contained in sets.

In [None]:
# allowed
good_set1 = {"this", "is", "okay"}
good_set2 = {0, 3, 938, 16, -5, 3.7, "see", "numbers", "work", "111!11", 5, 5}

In [None]:
bad_set1 = {["this", "is"], "not", "okay"}

In [None]:
bad_set2 = {{"this", "is"}, "not", "okay", "either"}

A common use for sets is to remove duplicates from a list.
One first converts the list to a set, and then the set back to a list.

In [None]:
redundant_list = [0, 0, 0, 0, 0, 0, 0]
print(set(redundant_list))
print(list(set(redundant_list)))

### Summary

```python
# convert some_list to a set
set(some_list)
# define a set
{item1, item2, ...}
# membership test
item in some_set
item not in some_set
```

### Common mistakes

- Sets are not lists.
  You cannot use indices, slices, `+`, or `.append`.
- Sets are less flexible than lists, they can only contain certain types of objects.
  Don't try to put lists or sets inside sets.
- When in doubt, stick with lists and only use sets if you really need the extra speed.

## `for`-loops

Use a `for` loop to iterate over a "container-like" object, e.g. a list, a string, or a set.
For each item to be iterated over, the `for` loop identifies it with a variable `var` and then carries out some computations that may or mayt not involve `var`.

In [None]:
for word in ["list", "items"]:
    print(word)

for word in ["set", "items"]:
    print(word)

for char in "string":
    print(char)

for char in "string":
    print("x", end="")

Since there are no restrictions on the code block under a `for`, it can contain other code blocks such as `while` loops and `if` blocks.

In [None]:
for word in ["list", "items"]:
    for char in word:
        if char in "aeiou":
            while True:
                print("I found a vowel!")
                print("Guess what it is!")
                guess = input()
                if guess == char:
                    break
            print("Yes, that's it! How did you know?")
    print(f"Done with {word}")
print("That's it folks!")

You already saw that the code inside a `for` block does not actually have to use the variable defined by the `for`-loop.
As a convention, we will use `_` as the variable name in these cases.

In [None]:
for _ in "string":
    print("x", end="")

### Summary

```python
for var in container:
    do_something
```

### Common mistakes

- You can only iterate over container-like objects (*iterables*).
  Something like `for i in 5` or `for n < 5` does not work.
- As always, pay attention to proper indentation.

## Counters

Counters may look similar to lists, but they work very differently.
A Counter is a collection of **keys**, each one of which is associated with a specific numerical **value**.
The default use for a Counter is to keep track of how often a specific item occurs in a list.

In [None]:
from collections import Counter

words = ["John", "likes" , "John", "and", "only", "John"]
print(Counter(words))

The keys are used to retrieve specific values from the Counter.
You cannot use indices because in contrast to lists, Counters have no fixed order.

In [None]:
from collections import Counter

words = ["John", "likes" , "John", "and", "only", "John"]
counts = Counter(words)

for word in words:
    print(f"Count of {word}: {counts[word]}")

To get a list of the most common words and their respective counts, use the `Counter.most_common` function, or the `.most_common` method.

In [None]:
from collections import Counter

words = ["John", "likes" , "John", "and", "only", "John", "and", "that", "'s", "a", "fact"]
counts = Counter(words)

# print the 2 most common items; function style
print(Counter.most_common(counts, 2)

# print the 2 most common items; method style
print(counts.most_common(2))

Counters can be iterated over with `for`.
But this will only iterate over the keys, not the values.

In [None]:
from collections import Counter

words = ["John", "likes" , "John", "and", "only", "John", "and", "that", "'s", "a", "fact"]
counts = Counter(words)

for c in counts:
    print(c)

You can use `.keys()` and `.values()` to get the keys and values of a Counter, respectively.
Both are also available as functions (`Counter.keys()` and `Counter.values()`).

In [None]:
from collections import Counter

words = ["John", "likes" , "John", "and", "only", "John", "and", "that", "'s", "a", "fact"]
counts = Counter(words)

# print keys, as function
print(Counter.keys(counts))
# print keys, as method
print(counts.keys())

# print values, as function
print(Counter.values(counts))
# print values, as method
print(counts.values())

### Summary

```python
# Counters need to be imported
from collections import Counter
# create a counter counts from some_list
counts = Counter(some_list)
# get the n most common items
Counter.most_common(counts, n)  # function style
counts.most_common(n)           # method style
# get the list of keys (i.e. items)
Counter.keys(counts)  # function style
counts.keys()         # method style
# the the list of values (i.e. counts)
Counter.values(counts)  # function style
counts.values()         # method style
```

### Common mistakes

- Counters use keys, not indices.
  Something like `some_counter[1]` won't work.
- If the key is a string, don't forget the quotes.
  To look at the value of `"some_string"`, you have to use `some_counter["some_string"]`, not `some_counter[some_string]`.
- When using the `keys` and `values` methods, don't forget the parenthesis.
  Just `some_counter.keys` won't work, it has to be `some_counter.keys()`.
- By default, iterating over a counter with `for` iterates over the keys, not the values.

## Built-in functions

Python has several built-in functions.
Two of them are `print` and `input`, discussed at the beginning of this notebook.
Other important ones are `len`, `sorted`, `max`, `min`, and `sum`.

The `len` function tells you the length of an item.
It works with strings, lists, sets, Counters, and many other container-like objects.

In [None]:
# length of string, by total number of characters
print(len("This is short, but not that short"))

# length of list, by number of items
print(len(["This", "is", "short", "but", "not", "that", "short"]))

# length of set, by number of items;
# not the shorter length because sets do not contain duplicates
print(len({"This", "is", "short", "but", "not", "that", "short"}))

# length of Counter, by number of items;
# not the shorter length because Counters do not contain duplicate keys
print(len(Counter(["This", "is", "short", "but", "not", "that", "short"]))

The `sorted` function orders objects.

In [None]:
sorted(["flowers", "are", "pretty"]

It can only be used on objects that have an ordering, like strings and lists.
It cannot be used with sets or Counters.
It also cannot be used on lists that contain items of incomparable type.
A list of strings is fine, and so is a list of numbers, but a list of numbers and strings cannot be sorted.

In [None]:
# fine
sorted(["Anne", "punched", "Peter"])

In [None]:
# fine
sorted([.7, 5, -3])

In [None]:
# bad
sorted(["Anne", "punched", "Peter", 3, "times"])

In [None]:
# fine, because 3 is now a string
sorted(["Anne", "punched", "Peter", "3", "times"])

In [None]:
# bad, because we have both strings and lists of strings
sorted(["Anne", "punched", "Peter", ["3", "times"]])

Numbers are sorted from smallest to largest.
Strings are ordered as follows:

1. punctuation
1. words that start with upper case, in alphabetical order
1. words that start with lower case, in alphabetical order

The same ordering principles are used by the functions `max` and `min`.
Each one takes an arbitrary number of arguments and returns a specific one of them.
The `max` function returns the argument that would be last if the arguments were rearranged with `sorted`.
Similarly, the `min` function returns the argument that would be first.

In [None]:
max(.7, 5, -3)
min(.7, 5, -3)
max("Anne", "punched", "Peter")
min("Anne", "punched", "Peter")

Since `max` and `min` presuppose an ordering, their arguments must be comparable in the same manner outlined above for `sorted`.

Finally, the `sum` function takes a list of numbers and returns their sum.

In [None]:
sum([3,17, .7, 8])

In [None]:
# this does not work
sum(3, 17, .7, 8)

In [None]:
# nor does this
sum([3, 17], [.7, 8])

In [None]:
# strings cannot be summed
sum(["does", "not", "work"])

In [None]:
# but this is fine
sum(Counter(["does", "not", "not", "work"]))

## Custom functions

Like almost every other programming language, Python allows you to define your own functions.
Functions take a fixed number of arguments and return a specific output.
You should always add a docstring that describes what the function does.
For simple functions, docstrings can be very short.

In [None]:
# define a new function for adding 1
def add_1(n):
    """Increment n by 1."""
    # return the result of increasing n by 1
    return n + 1


# a definition by itself does not do anything;
# but we can now call the function wherever we want
m = 0
print("The number is", m)
while m <= 5:
    print("Call add_1 function to increment number by 1")
    m = add_1(m)
    print("The number is now", m)
    print("---")

Functions can have multiple `return`statements.
But as soon as the first `return` is reached, the function ends and outputs the value specified by this `return` statement.

In [None]:
def towards_10(n):
    """Gradually shift n towards 10."""
    if n < 10:
        print("towards_10 info: n too small, incrementing by 3")
        return n + 3
    elif n > 10:
        print("towards_10 info: n too large, decrementing by 1")
        return n -1
    else:
        print("towards_10 info: n is already 10")
        return n


m = 0
print("The number is now", m)
print("---")
while True:
    print("Call towards_10 function to get the number closer to 10")
    m = towards_10(m)
    print("The number is now", m)
    print("---")
    if m == 10:
        print("Call towards_10 function to get the number closer to 10")
        n = towards_10(m)
        print("The number is now", m)
        print("---")
        break

In [None]:
def contains_small_prime(number_list):
    """Check list for primes < 10."""
    for n in number_list:
        if n in [2, 3, 5, 7]:
            # we found a small prime;
            # return True and stop here
            return True
    # we have made it through the whole for-loop without returning;
    # hence the list does not contain a small prime
    return False


print(contains_small_prime(["Alex", "Babs"]))
print(contains_small_prime([0, 593, 7]))

Always keep in mind that a function ends as soon as the first `return` is encountered.
You cannot use `return` inside a loop to get multiple outputs.

In [None]:
def broken_search_all_shorts(string_list):
    """Return all short strings in the list."""
    for s in string_list:
        if len(s) < 3:
            return s

print(broken_search_all_shorts(["a", "bee", "is", "not", "dangerous"]))
print("Intended output: a, bee, is, not")
print("But as you can see, the function stops with the first return.")

You can use `return` together with ternary `if` (see also the earlier section on `if`).

In [None]:
# the long version without ternary if
def same_length_concat(string1, string2):
    "Concatenate strings of same length, otherwise return first string."
    if len(string1) == len(string2):
        return string1 + string2
    else:
        return string1


print(same_length_concat("Hi ", "Sue"))
print(same_length_concat("Hi ", "John"))

In [None]:
# same functionality, but shorter and more readable
def same_length_concat(string1, string2):
    return string1 + string2 if len(string1) == len(string2) else string1


print(same_length_concat("Hi ", "Sue"))
print(same_length_concat("Hi ", "John"))

Functions are blackboxes in the sense that nothing that happens inside the function is accessible from the outside.
The only accessible material is whatever is output by `return`.

In [None]:
def blackbox():
    """Define variable a"""
    var_a = "This variable exists inside the function"
    print(var_a)


# calling blackbox to define a
blackbox()
# but this still gives an error because a does not exist outside blackbox
print(var_a)

For the same reasons, variables inside functions do not overwrite the value of variables outside the function.

In [None]:
var_b = "This variable exists outside the function"
print(var_b)


def blackbox():
    """Define variable a"""
    var_b = "This variable exists inside the function"
    print(var_b)

# call blackbox
blackbox()
# but the value of a is still the same
print(var_b)

### Summary

```python
def function_name(arg_1, arg_2, ..., arg_n):
    # some code block, perhaps containing one or more returns
```

### Common mistakes

- Don't confuse **defining** a function and **calling** a function.
  First you define the function to specify what it does.
  Later on, you can call it to execute the code in the function.
  Just defininig the function does not do anything.
  
- Don't confuse `print` and `return`.
  The `print` function is only used to show messages to the user, whereas `return` is needed to pass a value out of a function.
  
- The function ends as soon as it encounters the first `return`.
  You cannot return multiple times within the same function call.
  
- Functions are blackboxes.
  Variables defined inside a function cannot be used outside of it.

## Libraries/modules/packages

Python comes with **libraries** that provide additional functionality for specialized purposes.
A different name for libraries is **modules**.
A **package** is a collection of modules.
Packages and modules are treated exactly the same, it's just that packages tend to be much larger than modules.

In [None]:
# we load the random library
import random

# and now we use one of its specialized functions
chosen_item = random.choice([0, 1, 2, 3])
print(chosen_item)

If you import some library `foo` with `import foo`, the function `bar` of library `foo` will be available as `foo.bar`.
In some cases, this can create very long function names.

In [None]:
import urllib.request
url = "http://thomasgraf.net/images/graf.jpg"
urllib.request.urlretrieve(url, "graf.jpg")  # this only works if your CoCalc account has been upgraded already!

In those cases, we can use `from foo import bar` instead.

In [None]:
from urllib.request import urlretrieve
url = "http://thomasgraf.net/images/graf.jpg"
urlretrieve(url, "graf.jpg")  # this only works if your CoCalc account has been upgraded already!

This is particularly useful with the `pprint` library, which is often only used for its `pprint` function.
The name `pprint` is short for *pretty print*.

In [None]:
import pprint
long_list = ["This", "long", "list", "contains", "many", "words", ",", "including", "supercalifragilisticexpialidocious"]
print("long_list with `print`:")
print(long_list)
print("\nAnd now with `pprint`:")
pprint.pprint(long_list)

In [None]:
from pprint import pprint
long_list = ["This", "long", "list", "contains", "many", "words", ",", "including", "supercalifragilisticexpialidocious"]
print("long_list with `print`:")
print(long_list)
print("\nAnd now with `pprint`:")
pprint(long_list)   # now we can use the shorter command

## Regular expressions

A regular expression (regex) is a way of definining string patterns.
If a string matches the regex's pattern, specific actions can be performed depending on what specific function used the regex:

- `re.findall`: return a list of all matches
- `re.sub`: replace each instance of the matched pattern by some other string

In [None]:
import re
re.findall(r"a", "banana") # find all instances of "a" in banana

In [None]:
import re
re.findall(r"an", "banana")  # find all instances of "an" in banana

In [None]:
import re
re.findall(r"a+", "aarvdark")  # find all instances of one or more as in aardvark

In [None]:
import re
re.findall(r"[ae]n", "unrepentant")  # find all instances of a or e, followed by n

In [None]:
import re
re.findall(r"[ae]+", "these algae meals")  # find all instances of one ore more characters that are a or e

In [None]:
import re
re.findall(r"\w", "HI! I'm Cheffie!")  # find all word characters (excludes space and punctuation)

In [None]:
import re
re.findall(r"\w+", "HI! I'm Cheffie!")  # find all sequences of word characters (i.e. words)

In [None]:
import re
re.findall(r"[\w']+", "HI! I'm Cheffie!")  # find all sequences of word characters and/or apostrophes

In [None]:
import re
re.sub(r"\w+", r"word", "HI! I'm Cheffie!")  # replace every word by "word"

For practicing regular expressions on your own, try [pythex](https://pythex.org/).

### Summary

```python
re.findall(regex, some_string)  # return list of matching substrings
re.sub(regex1, regex2, some_string)  # replace all matches of regex1 by regex2
```

- All regex patterns are strings prefixed by `r`.
- Syntax for regex matching:
    - `a` matches character `a`
    - `[abc]` matches every character that is `a`, `b`, or `c`
    - `+` matches one or more instances of the preceding regex
- Special regex abbreviations:
    - `\w` match a word character (letter of alphabet, digit)
    - `\d` match a digit (0-9)
    - `\s` match a white space character (space, tab)
    - `\W` match a character that is **not** a word character
    - `\D` match a character that is **not** a digit
    - `\S` match a character that is **not** a whitespace character

### Common mistakes

- Don't forget about the `r` before the regex-string.
- When defining alternatives, don't use commas.
  You have to use `[abc]`, not `[a,b,c]`.
- `\w` and `\d` are not mutually exclusive.
  Whatever matches `\d` also matches `\w` because the latter matches both letters and digits.

## Comprehensions

Comprehensions allow you to build up container-like objects like sets and lists without the use of a separate `for`-loop.
The most common type is list comprehensions, but set comprehensions work almost exactly the same

In [None]:
# doubling all items in a list, long way
number_list = [1, 3, 17, 91]
doubled = []
for n in number_list:
    doubled.append(2 * n)
print(doubled)

In [None]:
# doubling all items in a list, with comprehension
number_list = [1, 3, 17, 91]
doubled = [2 * n for n in number_list]  # list comprehension uses [ and ]
print(doubled)

In [None]:
# only add number and double it if n > 10
number_list = [1, 3, 17, 91]
doubled = [2 * n for n in number_list if n > 10]
print(doubled)

In [None]:
# a set comprehension
number_list = [1, 3, 17, 91, 17, 3, 1]
doubled = {2 * n for n in number_list}  # set comprehension uses { and }
print(doubled)

In [None]:
# a more complex list comprehension
words = ["C", "Python", "LaTeX", "awk"]
multipliers = [1, 2, 3]
copies = [w.lower() * m for w in words for m in multipliers]
print(copies)

In [None]:
# the previous exapmle without using comprehension
words = ["C", "Python", "LaTeX", "awk"]
multipliers = [1, 2, 3]
copies = []
for w in words:
    for m in multipliers:
        copies.append(w.lower() * m)
print(copies)

Comprehensions are not only more convenient than standard `for`-loops, but also **faster**.
But they cannot do everything.

In [None]:
# there are no string comprehensions
abc = ""
for m in [1, 2, 3]:
    for char in ["a", "b", "c"]:
        abc = abc + char * m
print(abc)

In [None]:
# some lists cannot be built with comprehensions
memory = [1, 2]
for n in [3, 4, 5, 6, 7, 8]:
    if n == memory[-1] + memory[-2]:
        memory.append(n)
print(memory)

### Summary

```python
# list comprehension
[some_func(var) for var in some_container if some_condition]
# set comprehension
{some_func(var) for var in some_container if some_condition}
```

### Common mistakes

- Don't underuse comprehensions.
  They can make your code a lot more readable and faster.
  Whenver you want to define an empty list and add elements to it with a `for`-loop, ask yourself if a comprehension would do instead.
- Don't overuse comprehensions.
  Nested comprehensions are allowed, but can be hard to figure out.

## Miscellaneous

### Basic mathematics

In [None]:
# addition
print(5 + 3)     # produces an integer
print(5.0 + 3)   # produces a float (= decimal number)
print(5 + 3.0)   # also produces a float

In [None]:
# subtraction
print(5 - 3)
print(5.0 - 3)
print(5 - 3.0)

In [None]:
# multiplication
print(5 * 3)
print(5.0 * 3)
print(5 * 3.0)

In [None]:
# exponentation
print(5 ** 3)
print(5.0 ** 3)
print(5 ** 3.0)

In [None]:
# division
print(5 / 3)    # produces a float, not an integer!
print(5.0 / 3)  # produces a float, as expected
print(5 / 3.0)  # produces a float, also as expected

In [None]:
print(int(5 / 3))  # this converts the float to an integer
                   # -> cuts off all the decimals

### Augmented assignments

When changing the value of a variable, use augmented assignments if possible.
They are faster and more convenient.

In [None]:
# clunky
some_var = "hi"
some_var = some_var + " there"
print(some_var)

In [None]:
# augmented assignment
some_var = "hi"
some_var += " there"
print(some_var)

In [None]:
# clunky
some_var = "ma"
some_var = some_var * 2
print(some_var)

In [None]:
# augmented assignment
some_var = "ma"
some_var *= 2
print(some_var)

In [None]:
# clunky
some_var = 0
some_var = some_var + 1
print(some_var)

In [None]:
# augmented assignment
some_var = 0
some_var += 1
print(some_var)

In [None]:
# clunky
some_var = 2
some_var = some_var * 3
print(some_var)

In [None]:
# augmented assignment
some_var = 2
some_var *= 3
print(some_var)

```python
# some augmented assignments
some_var += n   # addition/concatenation
some_var *= n   # multiplication/string copying
some_var -= n   # subtraction/does not work with strings
some_var /= n   # division/does not work with strings
some_var **= n  # exponentation/does not work with strings
```

### Type annotations for functions

In Python, each object has a **type** associated with it.

**Object** | **Type**
-:         | :-
string     | `str`
list       | `list'
set        | `set`
Counter    | `collections.Counter`
integer    | `int`
decimal numbers | `float`

In [None]:
type("hi")

In [None]:
type(["some", "list", 2, 3, ["with some stuff", "in it"]])

In [None]:
type({"a", "set"})

In [None]:
from collections import Counter
type(Counter(["a", "counter", "from", "a", "list"]))

In [None]:
type(3)

In [None]:
type(3.0)

Sometimes it can be useful to add type information to a function to avoid ambiguities

In [None]:
# Is the function below meant for numbers or strings?
# It works with both...
def mystery_function(a, b):
    return a + b * 2


print(mystery_function(5, 3))
print(mystery_function("ba", "na"))

In [None]:
# Oh, I see:
# a is a string, and
# b is a string, and
# the function outputs a string.
def mystery_function(a: str, b: str) -> str:
    return a + b * 2


# but this information is just for us,
# Python still allows any arbitrary argument
print(mystery_function(5, 3))
print(mystery_function("ba", "na"))

Type annotation is not essential for normal usage, but in combination with other tools like *pytest* it can help avoid some bugs.
Still, most Python programmers don't use type annotations and instead describe the types in the function's docstring if necessary.
In this course, however, we will sometimes use type annotations where they increase clarity.