#| label: intro

# Course material 2
## Lesson 3 (25.10.2023)

> Disclaimer: Material is taken from 
> 
> + The Alan Turing Institute from the course "Research Software Engineering with Python" Retrieved from: https://alan-turing-institute.github.io/rse-course/html/module01_introduction_to_python/01_05_dictionaries.html
> + Downey, A. (2012). Think Python. " O'Reilly Media, Inc.".

## Dictionaries

The last time we introduced among others lists, tuples, and ranges.<br/>
We learned also how to access an element of these *sequence* types.<br/>
Now, we will turn to a further type, the Python `dictionary`, where we will access an element (a `value`) by using a `key`.<br/>
There are different ways of creating a dictionary:

+ use curly brackets: `{"key": value}`
+ create an empty dictionary and add then new entries: `dict = ()` and then `dict["key"] = value`

In [4]:
# access an element from a list
var_list = [13, 25, 16]
# get the 2nd element
var_list[1]

# create a dict using curly brackets
dict1 = {"lea": 13,
         "tom": 25,
         "gerrit": 16
        }
dict1

{'lea': 13, 'tom': 25, 'gerrit': 16}

An advantage of the dictionary is that you can create keys and values without knowing ahead which or how many keys you need.<br/>
See the following example:

In [7]:
def count_letters(string):
    # create an empty dictionary
    dic = dict()
    # for each letter in the input string, do:
    for letter in string:
        # check whether the letter is already in the dictionary
        if letter not in dic:
            # create new entry and pass a count value of 1
            dic[letter] = 1
        # letter is already in the dictionary
        else:
            # set the counter for the respective letter one count up
            dic[letter] += 1
    # after each letter in string was checked return the dictionary        
    return dic

count_letters("helloWorld")

{'h': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}

You can get all keys and value of a dictionary by using `keys()`and `values()` <br/>
When you want access the value of a particular key you can do this similar as we learned it for sequences: `dict[key]` 

When you write a function and do not know which keys might be in a dictionary but you know all possible keys <br/>
you can use the `get(key, default)` function. This function will return either the value of the key if the key exists <br/>
or a default value (if specified) if the key does not exist.

In [21]:
letter_dict = count_letters("helloworld")
# show all keys
letter_dict.keys()
# show all values
letter_dict.values()

# access the value of a specific key
letter_dict["e"]

# problem might sometimes be that you don't know the exact keys but only all possible keys.
# you can use the get function which returns the value of a key if it is in the dictionary
# or a default value otherwise (here we set 0 as default value)
letter_dict.get("x", 0)
# Note that the following will give an error
letter_dict["x"]

KeyError: 'x'

Lists can appear as values in a dictionary. For example, if you were given a dictionary that maps from letters to frequencies, <br/>
you might want to invert it; that is, create a dictionary that maps from frequencies to letters.

In [25]:
def invert_dict(dic_old):
    # create an empty dictionary
    dic_new = dict()
    # for each key in the input dictionary, do:
    for key in dic_old:
        # get value of the  corresponding key from the old dict 
        val = dic_old[key]
        # use the value as key for the new dict
        # check whether the key (old value) does already exists
        if val not in dic_new:
            # if key is new, create an new entry in dict whereby
            # values are of type list 
            dic_new[val] = [key]
        else:
            # if key already exists, add new value to existing list
            dic_new[val].append(key)
    # return reversed dictionary
    return dic_new

invert_dict(letter_dict)

{1: ['h', 'e', 'w', 'r', 'd'], 3: ['l'], 2: ['o']}

This system works fine if the keys are immutable. But if the keys are mutable, like lists,
bad things happen. <br/> For example, when you create a key-value pair, Python hashes the
key and stores it in the corresponding location. <br/> If you modify the key and then hash it
again, it would go to a different location. <br/>
In that case you might have two entries for the same key, or you might not be able to find a key.

You can use only immutable types as keys (e.g., tuples)

In [31]:
# lists are mutable therefore you will get an error
{["key11", "key12"]: 1,
 ["key21", "key22"]: 2}

# but tuples are immutable and thus here it works
{("key11", "key12"): 1,
 ("key21", "key22"): 2}

{('key11', 'key12'): 1, ('key21', 'key22'): 2}

Dictionaries will retain the order of the elements as they are defined

In [36]:
# print original order of keys
letter_dict

# create a new dictionary with keys ordered alphabetically
ordered_dict = {key: letter_dict[key] for key in sorted(list(letter_dict.keys()))}    
ordered_dict

{'d': 1, 'e': 1, 'h': 1, 'l': 3, 'o': 2, 'r': 1, 'w': 1}

## Sets

A set is a list which cannot contain the same element twice. We can create a set in two ways:

+ either by calling `set()` on any sequence, e.g. a list or string.
+ or by defining one explicitely using curly brackets `{1,2,3,4}`

In [38]:
# use the set function on a string
set("hello world")

# create a set explicitely
{1,2,3,4}


{1, 2, 3, 4}

+ sets have no particular order
+ you can join the elements of a set by using `join()`
+ typical set operations can be applied

In [52]:
# create a string
day = "Today is Thursday"
# convert string into a set
day_set = set(day)
# as a set has no particular order it won't maintain the given order 
day_set
# you can join the elements of a set again together
"".join(day_set)

# Set operations
A = {1,2,3,4}
B = {3,4,5,6}
# intersection
A & B
A.union(B)
# union
A | B
# difference
A - B
# symmetric difference
A.symmetric_difference(B)

{1, 2}

## Data Structures

We have already seen several ways of structuring data like nested lists, dictionaries, dictionaries with lists, etc. </br>
When data become more and more complicated and complex nested data structures are very useful.

In [65]:
# let us start by creating two dictionaries
participant1 = {"subject_id": 1, "condition": "A", "item_id": 1}
participant2 = {"subject_id": 2, "condition": "B", "item_id": 1}

# now we collect the participants information in one list
participants = [participant1, participant2]

# each participant might have a list of responses they gave to a survey
participant1["response"] = [2,3,5,6,3,2,1]
participant2["response"] = [2,5,7,3,2,5,3]

# probably we want to include some information whether a participant dropped out
participant1["missing"] = False
participant1["missing"] = False

# and we can add even further information ...
# the format on the RHS is called list comprehension and we will
# come to this later in this session
participant1["buffer"] = [participant1["response"][r] for r in [0,-1]]
participant2["buffer"] = [participant2["response"][r] for r in [0,-1]]

participants

[{'subject_id': 1,
  'condition': 'A',
  'item_id': 1,
  'response': [2, 3, 5, 6, 3, 2, 1],
  'missing': False,
  'buffer': [2, 1]},
 {'subject_id': 2,
  'condition': 'B',
  'item_id': 1,
  'response': [2, 5, 7, 3, 2, 5, 3],
  'buffer': [2, 3]}]

# Control flows
## Conditionality 
We almost always need the ability to check conditions and change the behavior of the program accordingly. <br/>
Conditional statements give us this ability. 

### if-statement
The simplest form is the `if` statement <br/>
The boolean expression after if is called the condition. <br/>
If it is true, then the indented statement gets executed. If not, nothing happens.

In [70]:
#x = 2
x = 3
if x%2 == 0:              
    print("x is even.")

There is no limit on the number of statements that can appear in the body, but there has to be at least one. <br/> 
Occasionally, it is useful to have a body with no statements (usually as a place keeper for code you haven’t written yet). <br/>
In that case, you can use the pass statement, which does nothing.

In [71]:
if x%2 != 0:
    pass

*Short remark on indentation*

In Python, indentation is semantically significant. You can choose how much indentation to use, <br/>
so long as you are consistent, but **four spaces** is conventional. Note: `<tab>` might work in 'good' editors, but not always.<br/>

If there is problem with indentation in your code you will get an `IndentationError`

In [103]:
if x > 2:
print("someting")

IndentationError: expected an indented block after 'if' statement on line 1 (891092710.py, line 2)

### if-else statment
A second form of the if statement is alternative execution, in which there are two <br/>
possibilities and the condition determines which one gets executed

In [72]:
x = 3

if x%2 == 0:
    print("x is even")
else: 
    print("x is odd")

x is odd


Sometimes there are more than two possibilities and we need more than two branches. <br/>
One way to express a computation like that is a chained conditional:

In [85]:
x = -3

if x > 0:
    print ("x is positive")
elif x < 0:
    print("x is negative")
else:
    print("x is zero")

# it is also possible to chain multiple elif statements
x = float("inf")

if x > 0 and not float("inf"):
    print ("x is positive")
elif x < 0:
    print("x is negative")
elif x == float("inf"):
    print(r"x is + infinity")
else:
    print("x is zero")

x is negative
x is + infinity


One conditional can also be nested within the other one. Although the indentation of the statements makes the structure apparent, <br/>
nested conditionals become difficult to read very quickly. 
In general, it is a good idea to avoid them when you can.

In [89]:
x = 0

if x  == float("inf"):
    print(r"x is + infinity")
else:
    if x > 0:
        print ("x is positive")
    elif x < 0:
        print("x is negative")
    else:
        print("x is zero")

x is zero


### Recursion
It is legal for one function to call another; it is also legal for a function to call itself. <br/>
It may not be obvious why that is a good thing, but it turns out to be one of the most magical things a program can do.

In [92]:
def countdown(n):
    if n <= 0:
        print('Blastoff!')
    else:
        print(n)
        countdown(n-1)

countdown(10)

10
9
8
7
6
5
4
3
2
1
Blastoff!


### Keyboard input
The programs we have written so far are a bit rude in the sense that they accept no input <br/>
from the user. They just do the same thing every time.
Python provides the `input` function which gets input from the keyboard. <br/>
When this function is called, the program stops and waits for the user to type something. <br/>
When the user presses Return or Enter, the
program resumes and raw_input returns what the user typed as a string.

In [102]:
def welcome():
    name = input("What is your name?")
    print("Hello,", name)

welcome()

def addition():
    a = input("a =")
    b = input("b =")
    sum = int(a) + int(b)
    print(f"{a} + {b} = {sum}")

addition()

What is your name? f


Hello, f


a = 3
b = 53


3 + 53 = 56


## Iteration

An other aspect of control is *looping* over objects. </br>
A statement for repetition is the `for`statement. 

The *syntax* of a `for` statement is similar to a function definition:

+ It has a header that ends with a colon and
+ an indented body.

The body can contain any number of statements.</br>
A `for` statement is sometimes called a *loop* because the flow of execution runs through the body and then loops back to the top. 

In [7]:
my_list = [1,2,3,4]
vowels = "aeiou"

for i in my_list:
    print(i)

# Any sequence type is iterable:
for letter in "Hello":
    if letter in vowels:
        print("vowel")
    else:
        print("consonant")

# a dictionary is also iterable
dic = {"Lea": 23, "Paul": 30, "Gerrit": 29}

for name in dic:
    print(f"{name} is {dic[name]} years old.")

# a range is also iterable
for i in range(5):
    print(i)

1
2
3
4
consonant
vowel
consonant
consonant
vowel
Lea is 23 years old.
Paul is 30 years old.
Gerrit is 29 years old.
0
1
2
3
4


Last time we learned about unpacking. You can use unpacking for iterations

In [6]:
triples = [[4, 11, 15], [39, 4, 18]]

for first, middle, last in triples:
    print(middle)

# as we don't need the first and last item we can also use _ 
for _, middle, _ in triples:
    print(middle)

11
4
11
4


### Break, Continue

+ Continue skips to the next turn of a loop
+ Break stops the loop early

Sometimes you don’t know it’s time to end a loop until you get half way through the body. In that case you can use the break statement to jump out of the loop.

For example, suppose you want to take input from the user until they type done. You could write:

In [11]:
# The loop condition is `True`, which is always true, 
# so the loop runs until it hits the `break` statement.
while True:
    line = input('> ')
    if line == 'done':
        break
    print(line)
print('Done!')

# another example with break and continue
# for each number in the range from 1 to 8, do:
# if number is eight stop
# if number is even return directly to the top and skip the rest
for n in range(9):
    if n == 8:
        break
    if n % 2 == 0:
        continue
    print(n)

>  done


Done!
1
3
5
7


### List comprehension

If you write a for loop inside a pair of square brackets for a list, you magic up a list as defined. 

List comprehensions can make for more concise but hard to read code, so be careful.

In [24]:
# use a for loop
# initialize an empty list
res = []
for x in range(8):
    #if number is even: double
    if x%2 == 0:
        res.append(x + x)
    # if number is odd: square
    else:
        res.append(x * x)

print(res)

# use list comprehension with if-else statement
res = 0
res2 = [x+x if x%2 == 0 else x*x for x in range(8)]
res2

# list comprehension with if statement
# add one to the number if it is even
res3 = [x+1 for x in range(5) if x%2 == 0]
res3

[0, 1, 4, 9, 8, 25, 12, 49]


[1, 3, 5]

If you write two for statements in a comprehension, </br> 
you get a single array generated over all the pairs:

In [26]:
# Difference between all combinations of two 
# number y and x
[x - y for x in range(4) for y in range(4)]

# If you want something more like a matrix, 
# you need to do two nested comprehensions
[[x - y for x in range(4)] for y in range(4)]

[[0, 1, 2, 3], [-1, 0, 1, 2], [-2, -1, 0, 1], [-3, -2, -1, 0]]

There exists also something like a dictionary comprehension

In [28]:
{(str(x)) * 3: x for x in range(3)}

{'000': 0, '111': 1, '222': 2}