# <center><u>Introduction to data analysis using Python<u></center>
### <center>July 21-22th, 2024</center>
### <center>Eitan Hemed, PhD</center>
### <center>Department of Psychology, University of Haifa</center>


---
<p align="center">
 <b> All materials and exercise solutions are available on </b>
<a style="font-weight:bold" href="https://github.com/EitanHemed/python-workshop-2023">Github</a>
</p>


Welcome to <u>Introduction to data analysis using Python</u> workshop.
I hope this workshop will show you what can be done with Python (or programming languages in general), and will give you the tools and motivation to continue learning on your own.
-Eitan

# Aim and scope





## Python

Python is a very popular programming language both in industry and academia, for many reasons. One of the main reasons is that Python is highly readable, and can often times be read as plain English. This makes Python highly accessible to people with little to no experience with code.

In recent years it became extremely popular in the data science community, as it has a rich ecosystem of libraries for data analysis, machine learning, and visualization. Thus, it is very useful to researchers in the social sciences, which often do not have a strong background in programming.

Python is about 30 years old, and was named after [Monty Python](https://en.wikipedia.org/wiki/Monty_Python). We will be using a modern version of Python (3.10).


### What our workshop is and isn't

Assuming some participants have no experience programming at all, we will spend some time on foundations. It is crucial to get a firm grasp on the basics of Python in order to be able to use it for more complex tasks, such as reading, pre-processing, analyzing and plotting your research data.

The program is packed, and we might skip some planned features or exercises pending time limitations. However, the materials will remain available to you indefinitely here or on a Github repository.

By the end of this workshop you should have a basic understand of the Python language, and be capable to solve simple data analysis problems. Note that this is not a comprehensive Python course and there are many topics that we will not cover, simply because of time constraints (e.g., writing code modules, object-oriented programming). Suggestions for continued learning will be provided.

----

Learning to program is like learning a new langauge, certain concepts are likely to be challenging. This is an introductory workshop, don't be shy to ask for help and for clarifications.

### What are code notebooks?
The workshop uses code notebooks. Code notebooks are a type of file that combines both code and markup (formatted text, images, graphs, etc.). They are popular among researchers and data scientists as they allow you to create a document which combines both the analysis code and the summary of the analysis (e.g., figures, tables, etc.).

### What is Google Colab?

[Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) is a freemium service offered by Google. It allows you to run Python code from your browser and comes pre-installed with almost any extension you would ever need for data-related work (including GPU capabilities); and it is easy to add any missing packages related to your project. If you don't want to use Colab you can download the workshop notebooks and run it locally on your computer or other web services like [Kaggle](https://www.kaggle.com/).


## Schedule (Tentative)

---


*Day 1:*


| Slot        | Subject                        |
|-------------|--------------------------------|
| 08:45-10:15 | Python fundamentals            |
| 10:15-10:30 | Break                          |
| 10:30-12:00 | Python fundamentals (cont.), Modules |
| 12:00-12:30 | Break                          |
| 12:30-14:15 | Modules                        |


---


*Day 2:*


| Slot | Subject                     |
|----------|--------------------------|
| 09:00-10:30       | Modules?                    |
| 10:30-10:45       | Break                |
| 10:45-12:15        | Installing Modules, Working in Google Drive, Group Projects
| 12:15-12:45       | Break                |
| 12:45-14:00        | Local installation, Continued learning |


# Python fundamentals




## Variables

A basic building block in Python (and most other programming languages) is the variable.
In Python, a variable is a name that references some value.
This value can represent a number, some text, or other objects that we will soon get to know.

---

### Your first variable
Variables in Python are assigned using an expression of the form `a = b`. Below we assign the value `3` to `x`. Now `x` will reference the value `3`, until it is changed.

To execute the code cell either place your mouse on the square brackets (`[ ]`) on the left of the cell, or click anywhere within the cell and press `Ctrl+Enter`.


In [None]:
x = 3

Python has a built-in function named `print`, that can enable us to display interactively the value of a variable in the notebook, or console that we are working in.  

While `print` is quite simple, it can also be used for generating complex reports, and is many times used for debugging code (although using a debugger is more recommended in the long run).

In [None]:
print(x)


More generally, a function is an object which (usually) receives an input and (usually) emits output. We will cover functions more explicitly at a later stage in the workshop. 



A **variable** can **vary** - hence we change the value of `x` to be something else. Note that this overwrites the value that `x` points to.

In [None]:
x = 6 # Assign a new value to x
print(x) # Print to the console the new value of x

Note that in Python code, some of the code can be marked as a comment.

`# This is a comment`

Comments are not evaluated, meaning that the Python interpreter ignores them. 

Comments can be used to clarify for the reader parts of the code that are non-trivial. We will see at a later point how comments are used when documenting more complex code structures than a variable assignment.

Although theoretically when using a notebook environment you don't nessarcarily need comments (as you have text cells), you might want to use them to document specific operatioons within a code cell. 

#### $\color{dodgerblue}{\text{Exercise!}}$

So far we only defined a variable. We said we can reference to a specific value using a variable.

In the next cell, define a variable named `new_x` that has the value of `x` plus two. Remove the comment (`#`) and replace the ellipsis (`...`) where needed.

In [None]:
x = 3
## Remove the comment and replace the ellipsis to create a new variable named new_x
# ... = ... ... ...
# new_x = x + 2

## Constraints on variable names

There are two several limitations on the format of variable names you can use. 



* Variable names can contain upper and lowercase latin letters.
* Variable names can contain underscores underscores (_). 
* Variable name can contain digits (0-9), but cannot start with a number.
* Variable names cannot contain spaces or other special characters, except for underscores (_).
* Variable name cannot be one of several reserved [keywords](https://docs.python.org/3/reference/lexical_analysis.html#keywords).

In [None]:
my_age = 31
__AGE__ = 40

Similary to most programming langauges, variable names are case-sensitive. If you run the following code you will get an error, as the Python interpreter looks for a variable named `AGE`. 

In [None]:
age = 30
print(AGE)

While errors can be discouraging, they are extremely important, as they are informative, and can help you understand what went wrong, especially when you are working with a long piece of code. Errors have specific types, related to the problem that occurred, and they point to the line in which the error occurred (click on traceback).

There are naming conventions for variables in Python (e.g., when to use all caps as in `AGE`), but they will not be explicitly covered in the workshop. For more info [see here](https://www.python.org/dev/peps/pep-0008/#naming-conventions).

These conventions help you make sense of the code you are reading, whether it is your own or someone else's.


Note that errors are informative - they usually have an elaborate message of what went wrong, have different types which adds more context, and point to the line in which the error occurred. When you are running a long script rather than a cell in a notebook, they can be very helpful in debugging your code.


## Data types
There are several built-in data types, that can be used to represent different types of information, in various ways. They can also be put together to create more complex data structures.

![](https://miro.medium.com/v2/resize:fit:631/1*8J9yViDiXqNYeEWctWkdfQ.jpeg)

[image source](https://miro.medium.com/v2/resize:fit:631/1*8J9yViDiXqNYeEWctWkdfQ.jpeg)


### Integers
Integers are simply whole numbers - numbers that can be written without a fractional component. We worked with them in the previous section. 



Integers can be subjected to various arithmetic operations.

| Operator | Operation        |
|----------|------------------|
| +        | Addition         |
| -        | Subtraction      |
| *        | Multiplication   |
| /        | Division         |
| **       | Exponentiation   |




In [None]:
result = 3 ** 2 - (4 / 2)
print(result, type(result))

Note that the result we received is not an `int`, but a `float`. This is because the division operator (`/`) returns a float, even if the result is a whole number.

### Floats

Floats are numbers that contain a decimal fraction. 

In [None]:
pi = 3.14
p = 0.001
p_without_leading_zero = .001 # You can omit the leading zero, for probability for example

Together with `int`s and other numeric types they can be used to perform mathmatical operations. 

In [None]:
x = 4
y = 0.5
print(x ** y)

Here are two other operators which are useful in various cases when using numeric types in Python.

| Operator | Operation        |
|----------|------------------|
| //       | Floor Division   |
| %        | Modular Division |



Floor division (`//`) rounds down the result of the division and returns an integer.



In [None]:
print(1 // 2) # Rather than 0.5

 #### $\color{dodgerblue}{\text{Exercise!}}$

Complete the expression by entering an `int` instead of the `...`, such that the first printed result would be a fractional number and the second would be an integer.

In [None]:
an_int = 5
print(10 / an_int)
print(10 // an_int)

Modulo operation (`%`) returns the remainder when dividing the left-hand number by the right-hand number.

This is very useful when we want to iterate over a set of numbers and identify when a specific condition occurs along a cycle (e.g., differentiate between odd and even numbered trials in your experiment).

In [None]:
# Odd and even numbers
print(4 % 2, 5 % 2, 6 % 2, 7 % 2)
# Days of the week
print(6 % 7, 7 % 7, 8 % 7)


### Booleans
Booleans (`bool` for short) are actually a special case of integers that are used to mark truth values.
There are two booleans, `True` and `False` can be evaluated to 1 and 0, respectively.

We will later see the importance of booleans when learning how to control the flow of a program, or when trying to select a subset of data based on a condition.

In [None]:
print(True + 1)
print(False - True)

However, we can use the built-in `type` function to see that `True` and `False`. They have a unique data type and are not simply another name for 0 and 1.

In [None]:
print(type(False), type(0), type(True), type(1))

When we make comparison between two values, we get the corresponding `bool`. To make a comparison we can use the following operators:

| Operator | Name                     |
|----------|--------------------------|
| ==       | Equal                    |
| !=       | Not equal                |
| <        | Greater than             |
| <=       | Less than                |
| >        | Greater than or equal to |
| >=       | Less than or equal to    |

In [None]:
3 == 4

In [None]:
3 <= 4

Sometimes we would want to evaluate a more complex condition, such as is this number positive and less than 3. To do this we can chain comparisons using `and`, `not`, and `or`. 



 #### $\color{dodgerblue}{\text{Exercise!}}$

 What would be the result of the following comparisons? Try to answer before running the next cell code.

| Expression       | Result           |
|------------------|------------------|
| 3 == 5 or not 3 == 5 |        ?         |
| 2 < 3 and 3 == 5 |        ?         |
| 2 < 3 or 3 == 5  |        ?         |
| not 2 < 3 or not 3 == 5  |        ?         |
| not 3 == 5       |        ?         |
| not not 3 == 5       |        ?         |



Please ignore the `f`, quotation marks and curly brackets. These will be explained later in the workshop.

In [None]:
print(
    f"{(3 == 5 or not 3 == 5) = }",
    f"{(2 < 3 and 3 == 5) = }",
    f"{(2 < 3 or 3 == 5) = }",
    f"{(not 2 < 3 or not 3 == 5) = }",
    f"{(not 3 == 5) = }",
    f"{(not not 3 == 5) = }",
    sep='\n'
)

As you've seen, 
*   `and` returns true if both chained conditions return `True`
*   `or` returns `True` if either of the expressions (or both) are true. 
*   `not` simply negates the following expression (i.e., returns the opposite truth value).

In Python (almost) any object can be tested for its truth value, which is either `True` or `False`. Keep that in mind for later.

One gotcha that we should be aware of related to the order in which Python evaluates expressions.

Python evaluates expressions from left to right, and stops evaluating as soon as the truth value of the expression is known. This is called short-circuiting. However, there is also a precedence order, on top of the order of evaluation.

1. Parentheses (): They have the highest precedence and can be used to force an expression to evaluate in the order you want.
2. Logical NOT not: It has the next highest precedence.
3. Logical AND and: It has lower precedence than not.
4. Logical OR or: It has the lowest precedence.


So, for example, in the expression `not 3 == 5 or 2 < 3`, Python first evaluates `not 3 == 5` (which is `True`), then `2 < 3` (which is `True`), and finally performs the comparison of `True or True`, which is `True`.

While it is frowned upon to use redundant parentheses, you might find that you need it as a beginner, it is better to be explicit than to be wrong.



### Strings
Strings are ordered collections of any non-negative number of characters.

We define a string by using single or double quotation marks.


In [None]:
my_alma_mater = 'University of Haifa'
current_month = "July"
single_character_string = '@'
just_an_empty_string = ''

Strings are collections. So by definition they contain other elements. In python there is no individual data type for a single character. A single character is simply a string of length 1. 

Here we can get the number of objects in a string, using the built-in `len` function.

In [None]:
print(len('T'))

In [None]:
len("This sentence is composed of 42 characters")

#### String indexing
We noted that strings are ordered collections. Meaning that we can use indices to retrieve characters at specific positions, either single characters or slices of a string. One way to do so is use square brackets, that contain the indices we are intrested in. This syntax works for almost all ordered collections in Python.

In [None]:
s = "Hello World!"
print(s[0])

Note that in Python indexing is 0-based. You can think of it as *the number of elements before a specific index*, there are exactly 0 elements before the element in the first index.


---

Indexing in Python is end-exclusive. When we want to retrieve a slice of a collection, the syntax is `[start: stop+1]`, meaning that if we want the first five characters, we would use the slice `[0:5]` (0, 1, 2, 3 and 4).


In [None]:
s = "Hello World!"
print(s[0:5])

Omitting the stop argument in an indexing operation would implicitly return the string from start to finish (the same would happen if also the start argument would be dropped. i.e., `s[:]`). 

In [None]:
s = "Hello World!"
print(s[0 : ])

#### $\color{dodgerblue}{\text{Exercise!}}$
Print the exclamation mark (`!`) that is the last character in the string defined under `s`.

In [None]:
s = "Hello World!"
print(s[len(s) - 1]) # The expected output is '!'

Luckily, we don't need to use the number of characters in a string to index specific characters or sets of characters towards the end of the string, we can simply use end-based indexing. The last character in a collection is indexed at -1, the second to last at -2, and so on and so forth.

To make sense of end-based indexing we can think of it as *the number of elements after a specific index*, there are exactly 0 elements after the element in the last index (-1), and 1 element after the element in the second to last index (-2).

Experiment with the following code to get a better understanding of end-based indexing.

In [None]:
s = "Hello World!"
print(s[-6:]) # The expected output is 'World!'

One last thing to know about string indexing is that the step size of indexing a slice can be specified (i.e., `[start:stop:step]`). 

By default, we are implicitly using a step size of `1`. However, we can use other values to skip every fixed number of characters.

In [None]:
s = "Hello World!"
# Indexing between the first and last character (excluding)
#  skipping every 2nd character
print(s[0 : -1 : 2])

In [None]:
s = "Hello World!"
# Less verbose by dropping the start and stop arguments,
print(s[::2])


A negative step size would traverse the string in the opposite direction.

#### $\color{dodgerblue}{\text{Exercise!}}$
A palindrome is a sentance or a word that is the same when written in reverse. Test if the following strings are palindromes or not, by printing the truth value of the comparison. 

In [None]:
s1 = 'kayak'
is_s1_a_palindrome = (s1 == s1[::-1])
print(is_s1_a_palindrome)

Finally, given a string has a specific length, trying to index characters that are not within it's length would lead to an error.

In [None]:
"This sentence is composed of 42 characters"[45]

As a side note, slices are an actual data type in Python, and can be used to represent a range of values. You can create a `slice` object by using the built-in `slice` function, and then use it to index various collections.

In [None]:
s1 = "Hello World!"
s2 = "This is a sentence"

# Create a slice object, from the first index, up to the sixth index, skipping every second character
slc = slice(0, 5, 2)

print(s1[slc])
print(s2[slc])

#### String mutability
String are immutable objects, meaning that once created, individual characters or slices within the string cannot be changed, and this would lead to an error. Down the road we will encounter mutable collections. 




In [None]:
s = 'kayak'
s[0] = 'K'

#### Common string operations and methods


You might have heard that In Python, everything is an object. 

Broadly speaking, objects are entities which have attributes and methods. 

Attributes are properties of the object (i.e., "things the object knows"). 

Methods are functions that are bound to the object ("things the object can do"). To add more to the confusion, functions are also objects in Python.


-------------

Accessing an attribute of an object is done by using the dot (`.`) operator, followed by the attribute name. As methods are functions, they can be called by using the dot (`.`) operator, followed by the method name and parentheses.

Strings include several methods that can be used to manipulate them.

One example is string capitalization, achieved by using the `upper`, `lower` and 'title' methods.

In [None]:
lower_case = 'some letters'
upper_case = lower_case.upper()
print(upper_case)
lower_case_2 = upper_case.lower()
print(lower_case_2)
titled = lower_case.title()
print(titled)

We have mentioned that strings are immutable, so the methods we used above do not change the original string, but rather return a new object. We can see that by comparing the two strings below, and then explicitly checking if they are the same variable using the `is` operator.

In [None]:
lower_case == lower_case_2

In [None]:
lower_case is lower_case_2

Strings can be concatenated. We do this mostly using the `+` operator. The operator performs a different action when flanked by two strings, rather than numeric types.

This operation behind the scene calls a string method called `__add__` that is defined for strings. Part of the appeal of Python is hiding away these details from the user, and allowing them to use the same syntax for different types of objects (i.e., integer addition and string concatenation).

In [None]:
print('Hello' + ' ' + 'World!')

Another operator overloading that is defined for strings is multiplication, simply repeating the string for the specified number of times. 

In [None]:
'La ' * 3 # However, multiplying by a float (e.g., 3.0) would not work

You cannot  add a string and a numeric type.

In [None]:
3 + "5"

But you can change the data type of a numeric type into a string, or the other way around, using the built-in functions `str`, `int`, `float` etc.

In [None]:
print(str(3) + "2",
    (int("3") + float("5.2")))

#### String formatting
Formatted strings are "templates" that receive different values, which are integrated into the string. There are several ways to format a string. 

The most common method today is called f-strings. They use curly brackets to refer to already defined variables, or even contain whole expressions.

In [None]:
x = 3 
y = 5
s = f'The sum of x and y is {x + y}'
print(s)

Formatted strings are highly effective when preparing automated reports (i.e., writing a script which runs some statistical analysis and generates a formatted report). 

As a researcher, think of the utility enabled by the following:

In [None]:
Z = 1.98
p = 0.005
df = 40
bf = 10050.352

f"The difference significant [Z({40:.0f}) = {Z:.2f}, p-value = {p}] and the Bayes factor 1:0 was {bf:2e}" 

Note that we have used specific notation for formatting the numbers `:.xf` denotes that the number should be formatted as a float with `x` decimal points. 

`:.xe` denotes that the number should be formatted in scientific notation with `x` decimal points.

We have seen above that we can use curly brackets and an equal sign to easily print an expression and its value. 

In [None]:
print(f"{1 + 3 = } ")

A different way to format strings is by calling the string method called `format`. Each way has its use-cases, and you can read more about it [here](https://realpython.com/python-string-formatting/).

In [None]:
x, y, z = 3, 5, 10
"The value of x is {}, the value of y is {}, their sum is {}, and not {z_var}".format(x, y, x+y,
                                                                                      z_var=z)

For this method you can combine variables referenced by their name (`{z_var}`) and by their order (as with `x`).

#### Unicode

In Python 3 strings are by default unicode characters. This means that they can potentially contain any unicode character. This allows you to easily use non-latin characters and even emojis.

In [None]:
s = ( # Specifying the character name
    "\N{GREEK SMALL LETTER PSI} \n" 
     # Specifying the unicode character code
      "\U0001F44C \n"
     # Just plain characters
      "The University of Haifa - אוניברסיטת חיפה - جامعة حيفا \n" 
     # Using an emoji name
      "\N{Snake}")
print(s)

Note that the strings included this expression `\n`. The backslash marks to Python that the following is a special character. Specifically `\n` is a newline character, but there are others as well. If you want to actually write the expression `\n` in a string, you will need to use another backslash.

In [None]:
print("This is a \t tab")
print("Here is new \n-line")
print("These do nothing \\t \\n")

This issue can be very frustrating if you are using Windows and wish to read, write or manipulate some file. That is because file paths are written with backslashes (`C:\Users\User\Desktop\data.xlsx'`) rather than forward slashes (`/`). 

To avoid this, you can use the `r` prefix, which tells Python to treat the string as a raw string, and ignore special characters, double the backslashes (to indicate escape of the special character), or specify the path using forward slashes. There are more advanced methods, of course. 

### Lists

Lists are mutable, ordered collections of hetrogenous objects (i.e., unlike strings that contain only strings). We can construct a list by using square brackets.

In [None]:
empty_list = []
nonempty_list = [1, 3, "Hi!", 5000]
print(empty_list)
print(nonempty_list)

We can also create a list using the built-in `list` function. We usually do this when creating a list from another collection, like a string. If you only call `list()` you would get an empty list.

In [None]:
chars = list('many chars')
print(chars)

Lists are usfeul when we want to group and store items. Similar to strings we can retrieve specific items from a list using indices. 

In [None]:
departments = ['Psychology', 'Geography', 'Political Science']
departments[-1]

#### $\color{dodgerblue}{\text{Exercise!}}$

Lists are very flexible, as they allow us to group together objects from different data types, and even lists at variable lengths (or lists within lists). We'll see later that this flexibility comes at a cost.

We can chain indexing operations to retrieve specific indices.
Complete the code below to print the string `'C D'`.

In [None]:
letters = 'ABC DEF'
digits = [1, 2, 3, 4, 5, 6]
chars = [letters, digits]
print(chars[0][2 : 5])

#### $\color{dodgerblue}{\text{Exercise!}}$

Once created, we can mutate lists by using indexing operation (i.e., `var[0]`) and an assignment. 

Complete the code below, so it would print "Hello-world-!"

In [None]:
text = ['Hello', 457, '!']
text[1] = 'World'
print('-'.join(text))

We used a string method called `join` that receives a list (or another collection) and pasts the string between each element in the input.

As we've seen, lists can be mutated using indexing and assignment, however, there are also dedicated method. These methods are slightly different from what we've seen before, as they mutate they object in-place rather than returning a new object. Here is a demonstration of what this means.

In [None]:
x = ['a']
x.append(5) # No reassignment
print(f'x is {x}')

#### $\color{dodgerblue}{\text{Exercise!}}$

What would be the outcome of re-assigning the result of `append` call onto a variable? Check the `type` of the variable `x` after running the code below.

In [None]:
x = ['a']
x = x.append(5)
print(f'x is now {type(x)}')

`None` is a dedicated null value in Python. When a function (or a method) mutates an object in-place rather than returning a new object, it implicitly returns `None`.



Here are additional `List` methods, used to modify and retrieve information about the items in a list.



In [None]:
# Create a list
_ = list('aaaa_bbbbc')

print(_)

# Remove and return the last element
last_element = _.pop()
print(f'The removed item was {last_element}')

a_count = _.count('a')
print(f'The number of occurrences for a in the list is {a_count}')

# Get the index of the first occurrence of b
b_index = _.index('b')
print(f'The first occurrences of b in the list is in index {b_index}')

### Dictionaries

Dictionaries are unordered, mutable, collections of mappings between keys and values (just like a dictionary). 

Dictionaries can created using the following syntax brackets `{k1 : v1, k2: v2}`.

In [None]:
d = {'a': 1, 
     5: 'ZZZ', 
     3.45: ['a', 'b', 'c']}
print(d.keys()) # The keys in a dictionary
print(d.values()) # The values in a dictionary
print(d.items()) # The items in a dictionary

print({}) # Empty dictionary

A key in a dictionary has to be a unique. Using the same key will 'overwrite' the previous key:value pair. 

In [None]:
d = {'Year': 2021, 'Month': 'September'}
print(f"The current month is: {d['Month']}")
d['Month'] = 9 # Indexing via the `Month` key we reassign the value under it.
print(f"The current month is: {d['Month']}")

A dictionary key has to be an immutable object, like a string or an integer (but not a list). The type of the value under each key is unrestricted. 



In [None]:
d = {['a']: False}

Dictionaries can also be formed using the built-in `dict` function (note that you can only use it to define string-type keys). 

New entries can be added through indexing and assignment or the `update` method


Querying the dictionary for an inexistent key would return a `KeyError`.

In [None]:
d = dict(name='Eitan') 
d['university'] = 'Haifa'
d.update(status='Postdoc') # d is modified in-place
print(d)
d['department']

A useful method of dictionaries is `get`,
it allows you to query a dictionary for a key, and return a default value in case the key is not defined in the dictionary. The default value for an undefined key is `None`.

In [None]:
address = {'Country': 'Israel', 'City': 'Haifa'}
# Default value for undefined keys
print(address.get('Street'),
      address.get('Street', 'Undefined street'), sep='\n')

Other features:

As mentioned before, there is no limitation on the type of objects that can be assigned as dictionary values.

References to dictionary keys can of course be stored in variables. 

The length of a dict is the number of key-value pairs.

In [None]:
lower_level_dict = {1: 'a', 'b': 3, 'non_empty_list': [{}, ]}
top_level_dict = dict(lower=lower_level_dict, empty_list=[])

print("Number of key-value pairs in top_level_dict: "
   f"{len(top_level_dict['lower'])}")

key_to_lower_level_dict = 'lower'

print(f"\nUsing the key '{key_to_lower_level_dict}', we retrieved this: \n"
        f"{top_level_dict[key_to_lower_level_dict]}")

## Control Flow 1

### if statements

We previously said that in Python and object can be checked for its truth value, returning either `True` or `False`. 

#### $\color{dodgerblue}{\text{Exercise!}}$
Print the truth value of the following objects. Use the built-in function `bool`. 

In [None]:
print("The truth value of the following are: ")
print(f"Zero - {bool(0)}",
      f"Empty list - {bool([])}",
      f"Empty dictionary - {bool({})}",
      f"None - {bool(None)}",
      f"Empty string - {bool('')}", sep='\n')

As you've seen, the truth values of 0, None and empty collections (dictionary, list, string) are `False`. 

The truth values of virtually all other objects are `True`. 



---



We can use truth values to control the flow of our program, using `if` and `else` statements. For example:

In [None]:
fruits = ['Mango', 'Pears', 'Grapes']

number_of_fruit_items = len(fruits)

if number_of_fruit_items: # State our decision rule
    print(fruits) # Define what should happen if the decision rule is True
else: # Define what should happen if the decision rule is not fulfilled
    print("No need to print an empty list.") # If decision rule is False

#### $\color{dodgerblue}{\text{Exercise!}}$
Modify the list on the previous cell such that it would print "No need to print an empty list.".


This is the first time that we encountered indention.

    First level indention

        Second level indention 

            Third level indention

Indention is a key concept in Python and allows the interpreter to parse the code into different blocks of code. The standard is four spaces. Don't use tabs (in most text editors, the tab key is already set to insert four spaces). 

 See here for [more conventions](https://www.python.org/dev/peps/pep-0008/#indentation) regarding code indention. 

In [None]:
if "Hellow world":
 x = 3
 z = 3
   y = 5

Code in the block nested under an `if` statement that is `False`, is not evaluated. Thus, we might miss an error such as refer to an inexistent variable. 

#### $\color{dodgerblue}{\text{Exercise}}$
Complete the following code so that only the first and third blocks of code will be executed, and the second block will be skipped.

In [None]:
this_variable_exists = [1, 2]

if True:
    # This is executed
    print(this_variable_exists)
if False:
    # This isn't
    print(this_variable_does_not_exist)
else: 
    # This also will be executed!
    print(this_variable_does_not_exist_as_well)

We just saw you can chain multiple `if` statements. 
You can't chain two `else` statements. An `else` has to be preceded by `if` or `elif`.

The code in the bock following an `elif` statement will be executed only if `elif` returns `True` and the preceding `if` returned `False`.


#### $\color{dodgerblue}{\text{Exercise}}$
Complete the following code so that the list `fruits` will be printed. Note that you can test if an item is in a collection using the `x in y` expression.

In [None]:
fruits = ['Mango', 'Pears', 'Grapes']

number_of_fruit_items = len(fruits)

if number_of_fruit_items == 1: 
    print("There is only 1 fruit")
elif 'Banana' in fruits:
    print(fruits)


We can utilize logical operators we met earlier, such as `not`, `and` and `or` to change the truth value of a statement.

## Functions

Up until now we implicitly used functions, without ever defining functions on our own.

Functions are objects which receive input (usually) and return output (usually).

Functions are very useful whenever you want to obscure a lengthy operation or a re-do the same process many times, possibly with ability for slight modifications on each run.

Functions are objects defined using the `def` statement. `def` statement is actually an assignment of a function object to a variable. 



In [None]:
def perform_nothing(): # Function name
    pass

print(f"silly_function is a {type(perform_nothing)}") # f is currently a function

perform_nothing() # Calling f
perform_nothing() # Calling f again

Functions are objects. As such they can be passed to other functions, as input. Also, as functions are referenced via variables, they can be overwritten, when their name is assigned to a different object.

In [None]:
perform_nothing = 3
print(f"f is now a {type(perform_nothing)}")

As functions usually describe a process, the convention is to name functions using verbs. 

According to the convention function names are all lower-case with words separated by underscores.


#### $\color{dodgerblue}{\text{Exercise}}$
The previous function did not have input arguments. We can set a function to accept input arguments.
Complete the following function definition. 

In [None]:
def increment_by_2(a):
    result = a + 2
    return result

increment_by_2(10) # should return 12

Variables defined outside a function, are available to the function. Even if they are not defined at the time the function is defined (as long as they will be defined prior to the first call to the function). 

Variables defined inside a function, which are either returned and not assigned to a variable or simply not returned cannot be references outside the function.

This concept is called `scope`, and is otherwise out of the scope of this workshop.

#### $\color{green}{\text{Sidenote}}$ 
Actually, you can declare a variable to be of global scope, making it accessible outside of the function's namespace without returning it. However, this practice is used rarely. Seperating the function's namespace from the global namespace is a good practice.

See more [here](https://docs.python.org/3/faq/programming.html#what-are-the-rules-for-local-and-global-variables-in-python).


In [None]:
y = 3

def increment_2(a):
    a_incremented = a + 2
    print(f"I can print the length of a_incremented ({a_incremented}) from inside the function")
    print(f"Also, y is {y}, although it wasn't defined inside the function.")
    print(f"Also, x is {x}, although it was defined right after the function was defined.")
    return a_incremented

x = []

increment_2(10)

print(f"I can't print the length of {a_incremented} from outside the function.")

We can have multiple return statements in a function, to handle different cases.

In [None]:
def test_if_number_is_even(n):
    if n % 2 == 0:
        return "Even"
    else:
        return "Odd"

for i in [0, -1, 1, 44]:
    print(f"{i} is {test_if_number_is_even(i)}")

A function can call other built-in or user-defined function. It can also accept multiple arguments and return multiple results.

#### $\color{dodgerblue}{\text{Exercise}}$
Complete the following function definitions. 



In [None]:
def raise_to_the_power(base, exp): # Multiply `base` by itself `exp` times
    return base ** exp

def test_if_negative(n):
    return n < 0 # Check if the result is negative

def process_number_power(number, exponent):
    power = raise_to_the_power(number, exponent) # Call our first function
    number_is_negative = test_if_negative(power) # Call our second function
    return power, number_is_negative # Returning the results, can also be `return (power, ...)`

Note that `process_number_power` returns a `tuple` of two values. We can assign the results to a single variable:

In [None]:
results_in_tuple = process_number_power(-3, 4)
print(results_in_tuple) # Should print (81, False)

or to multiple variables:

In [None]:
unpacked1, unpacked2 = process_number_power(-3, 3)
print(unpacked1, unpacked2) # Should print (-27, True)

#### $\color{green}{\text{Sidenote}}$ 
When we reassigned the two arguments into `results_in_tuple`, we created a `tuple`. A `tuple` is an ordered collection that can contain different data types (like a list). However, tuples are immutable, meaning that they cannot be modified once created (but they can be used as keys in a dictionary).

A tuple is defined using parentheses `(a, b, c)`. A common gotcha is that a tuple with a single element is defined using `(a,)` and not `(a)`.




### Keyword arguments
Up until now we looked at *positional arguments*. They are ordered. Sometimes it is easier to use *keyword arguments*. Especially when we are creating a function with a large number of arguments.

Keyword arguments can be set to a default. The best practice is to use an immutable value such as an integer, string or simply `None`.

In [None]:
def calculate_weekly_wage(
    hourly_wage, hours_per_week=None, seniority_in_years=0,
         wage_modifier_by_seniority=1):
    return (hours_per_week * hourly_wage * 
        (wage_modifier_by_seniority ** seniority_in_years))

hours = 35
rate = 85.5

print(f"Working {hours} hours a week, amounts to about "
        f"{calculate_weekly_wage(hourly_wage=rate, hours_per_week=hours)} over a week")

#### $\color{green}{\text{Sidenote}}$
Avoid using mutable objects as default values. The default value is evaluated only once, when the function is defined.
If the default is a mutable object such as a list, and you mutate the object, the change is reflected in every call to the function. Then, your default value is no longer what you expect it to be.

```
def append_to(element, to=[]):
    to.append(element)
    return to

# VS.

def append_to(element, to=None):
    if to is None:
        to = []
    to.append(element)
    return to
```


#### $\color{dodgerblue}{\text{Exercise}}$

Now you will explore several ways of calling a function.

Complete the code so that calls to `calculate_weekly_wage` will return the same values.

In [None]:
base_wage = 120.2
hours_worked = 40
years_on_job = 5
seniority_modifier = 1.0025

Use positional arguments as we've done so far - as they name suggests for positional arguments, the order of the arguments matters.

In [None]:
wage1 = calculate_weekly_wage(base_wage, hours_worked,
                              years_on_job, seniority_modifier)
print(f"The monthly wage is around \N{New Sheqel Sign}{4 * wage1:.2f}.")

Use keyword arguments. Note that the order of the arguments doesn't matter.

In [None]:
wage2 = calculate_weekly_wage(
    base_wage, hours_worked,wage_modifier_by_seniority=seniority_modifier,
                              seniority_in_years=years_on_job)

Use a dictionary to store the arguments and unpack it into the function call ("explode"). Note that this uses two stars (**) operators.

In [None]:
wage3_dict = {'hourly_wage': base_wage, 'hours_per_week': hours_worked,
              'seniority_in_years': years_on_job,
              'wage_modifier_by_seniority': seniority_modifier}
wage3 = calculate_weekly_wage(**wage3_dict)

Use a list to store the arguments and unpack it into the function call. Note that this uses one star (*) operator.

In [None]:
wage4_list = [base_wage, hours_worked,
                              years_on_job, seniority_modifier]
wage4 = calculate_weekly_wage(*wage4_list)

print(wage1 == wage2) # Should print True
print(wage1 == wage3) # Should print True
print(wage1 == wage4) # Should print True

### Docstrings

Docstrings are strings that are defined under functions (and some other objects) and used to document their behavior and usage. 


We can use the built-in `help` function to get the docstring out of a function. 

In many modern IDEs (or notebook-environments like Jupyter), you can hover over a function to view the docstring (or even open it in a new tab). 

In [None]:
# help(print)
# help(len)
# help(dict.update)

There are several styles to writing a docstring, but here is the style used by Google (not nessacrily the most frequent one).

Regardless of style a docstring contain a brief non-technical summary, an extended summary (can be slightly more technical), the list of input arguments (hopefully with their types) and a description of the function's output. For scientific functions, it is also common that references to relevant papers describing the method are included.



In [None]:
def calculate_weekly_wage(
    hourly_wage, hours_per_week=None, seniority_in_years=0,
         wage_modifier_by_seniority=1):
    
    """Calculate and return the weekly wage of a worker.

    Returns the product of `hourly_wage` and `hours_per_week` multiplied 
    by `wage_modifier` raised to the power of `seniority_in_years`.

    Args:
        hourly_wage (float): Sums paid for each working hour.
        hours_per_week (int): Number of hours worked during the week.
        seniority_in_years (int): Number of years in the workplace.
        wage_modifier_by_seniority (float): A benefit applied to wage that depends
        on seniority.


    Returns:
        float: The weekly wage of a worker.
    """

    return (hours_per_week * hourly_wage * 
        (wage_modifier_by_seniority ** seniority_in_years))


# help(calculate_weekly_wage)

# print(calculate_weekly_wage.__doc__)


These days, docstrings are also used to automatically generate documentation for code. Rarely you would have to call `help`, because you can always find the documentation online.

Another thing about docstrings, is that they can be used today as prompts to different code generation AI-based tools.

## Control Flow 2

The final control flow syntax structure that we will cover, are loops. Loops are used when we want to execute a process iteratively (for a fixed number of iterations, for a fixed time duration, or until a condition is met).

### while loops

Loops are composed of a loop-head and a loop-body. Generally the loop will not stop repeating itself as long as the condition specified in the loop head is met (i.e., returns `True`).


Here is a demonstration of a looping style that is very dated in modern Python, and comes as a legacy from prior languages. You'd rarely encounter this, but you should know it, and it is pretty straightforward to understand.

In [None]:
counter = 0

while counter < 10:
    counter = counter + 1 
    print(f'counter is now: {counter}')

While loops are very useful when waiting for something to happen (e.g., when a program waits for the user to perform an action). However, they are not very frequent in data analysis.

---
Two keywords you can encounter in loops are `break` and `continue`. They are pretty self-explanatory, but here is a demonstration. `break` stops the loop, while `continue` skips the current iteration and moves on to the next one.

In [None]:
iterations_completed = 0
while True:
    user_input = input('Please enter a number (Q/q to quit): ')
    if user_input.lower() == 'q':
        print('Quitting...')
        break
    elif user_input == 'skip':
        print("Skipping this iteration...")
        continue
    else:
        print(f'You entered: {user_input}')
    iterations_completed += 1
    print(f'Iterations completed: {iterations_completed}')
print("Done!")

Here is a very rudimentary demo of a one-off investment calculator, just to get the idea, and to give you some practice. We won't go over it in class, but you can run it later (or now) and analyze it to gather some insights about while loops.
 
This is **not** how you would or should build an interactive console-based calculator (for example, you'd handle errors and exceptions, and you'd wrap everything in a designated class).

The following example makes use of the `input` built-in function which receives text from the user and evaluates it as a string. This function is used here only for demonstration purposes, it is rarely used due to security issues (see [here](https://realpython.com/python-eval-function/#minimizing-the-security-issues-of-eval)).


In [None]:
def get_investment_input():
    """Get input from the user and return it as a tuple."""
    return(
        float(input("Please enter initial (integer or float; e.g., 100): ")),
        int(input("Please enter number of years (integer; e.g., 5): ")),
        float(input("Please return expected annual interest rate (float; e.g., 10 for 10 %): "))
    )

def calculate_revenue(initial_sum : float, years : int = 1, 
                         annual_interest_rate: float =1.05):
    """Calculate the revenue of an investment.

    Args:
        initial_sum (float): The initial sum of money invested.
        years (int): The number of years the money is invested.
        annual_interest_rate (float): The annual interest rate of the investment.

    Returns:
        float: The revenue of the investment.
    """
    return initial_sum * annual_interest_rate ** years

def report_revenue(sum : float):
    print(f"Investment results in {sum:.2f}.")

def main():
    name = input("Please enter your name: ")
    print(f"Hi {name}, Starting calculator...")

    # Start an infinite loop. True is always True. 
    while True:

        if input(
            "Type Q\q to quit. Anything else to continue. ") in ['q', 'Q']:
            break
            # Not on colab, you can also use ctrl+c to exit an infinite loop,
            #  but it would crash the program. 

        # Here there is an implicit "else" statement
        report_revenue(calculate_revenue(*get_investment_input()))
        
    print(f"Bye {name}!")

In [None]:
main()

### for loops

For loops are similar to while loops in the sense of performng an action repeatedly. However, instead of setting the number of iterations based on a truth value, we are using an iterator.

Basically, an iterator can be any collection that we encountered so far - lists, strings, dictionaries and tuples. We can also create iterators using the `range` built-in function.




#### range
`range` is a function that returns an iterator. It can be used to create a list of integers, or to iterate over a sequence of integers.

Why range rather than a list? Because it is more memory efficient. A list is a collection of objects, while range is a generator of objects. We won't talk much about generators, except that they are more memory efficient by not storing the entire collection in memory, but rather generating it on the fly.

In [None]:
some_integers = [0, 1, 2, 3, 4, 5] # A regular list
integers_from_range = range(6) # 
# The regular list we know
print(some_integers)
# A range object
print(integers_from_range)
# A list from a range object
print(list(integers_from_range))

print(type(some_integers), type(integers_from_range))

Instead of turning range into a list, we can use it to loop.

In [None]:
# Specifying start, stop and step like in a slicing operation.
for i in range(0, 6, 2):
    print(f"i is now {i}")
# Remember that slicing is end-exclusive.

We can use for loops to iterate over items in a list or dictionary, or characters in a string rather than mere indices.

In [None]:
d = {'Language': 'Python', 'Environment': 'Google Colab', 'Day': 'Sunday'}

# Iterte ove dictionary keys
for this_key in d:
    if this_key == "Language":
        print(f"The current language is {d[this_key]}!")
    elif this_key == 'Environment':
        print(f"We are using {d[this_key]}!")
    elif this_key == 'Day':
        print(f"Today is {d[this_key]}!")
    else:
        pass # Does nothing

Previously we encountered the dictionary `items` method, that when called returns the pairs of keys and values. Using `items` is an example to how we can reassign two new values on each iteration of the loop.

This can be much more readable and elegant at times. 

In [None]:
d = {'Language': 'Python', 'Environment': 'Google Colab', 'Day': 'Thursday'}

# Iterte ove dictionary items
for this_key, this_value in d.items():
    if this_key == "Language":
        # No need now to look for the value in the dictionary
        print(f"The current language is {this_value}!")
    elif this_key == 'Day':
        print(f"Today is {this_value}!")
    else:
        pass # Does nothing



#### zip
Another way to itearte over pairs (or triplets, or any arbitrary number) of elements is by constructing a zip object. 

The built-in `zip` function takes two iterables and constructs a collection of tuples that match the length of the shortest iterable.

#### $\color{dodgerblue}{\text{Exercise!}}$
Complete the following code to create a zip object from the three iterables, and then iterate over it to print on each iteration a letter, a digit and a special character.

In [None]:
letters = 'ABC' # Create some letters
digits = range(-18, -12, 3) # Generate negative integers
special_characters = dict(face='\N{Upside-Down Face}', shin='ש') # A dictionary
zipped_values = zip(letters, digits, special_characters.values())

for letter, digit, char in zipped_values: # iterate over the resulting object, to print the following:
    print(letter, digit, char)

One last thing we would demonstrate is showing how loops can be used for implementing computational processes. In this exercise, you need to complete the code to calculate the mean of a sample of values. 

#### $\color{dodgerblue}{\text{Exercise}}$
Complete the following code.

In [None]:
# iq_scores is a list containing a sample of values from a normal distribution (m = 100, sd = 15)
iq_scores = [90, 87, 117, 112, 93, 113, 100, 110, 94, 102,
     109, 92, 94, 90, 69, 111, 138, 96, 101, 95]

sum_for_mean = 0

for score in iq_scores: # Iterate over the IQ scores
    sum_for_mean = sum_for_mean + score # Add the score to the sum

# Mean is sum / N
mean = sum_for_mean / len(iq_scores)

print(mean) # should print 100.65

The crux here was that this seemed needlessly cumbersome. Which is why we can be thankful for modules and packages, our next major subject. 

# Misc.

Many times, it is much more elegant to write much shorter code. 

While we won't dedicate this much time, there are a couple of operations and expressions you should be aware of - comprehensions, lambdas and ternary expressions. They are fairly easy to grasp, and you can go over it on your own in more depth if you wish.




#### List comprehension

List comprehension is a form of syntax that constructs a list from an iterable. The syntax of list (tuple/dict) comprehension is `action_on_item for item in iterable`.




#### $\color{dodgerblue}{\text{Exercise!}}$

Here is a loop that iterates over a lower-case string and appends to a list the upper-case characters. Complete the list comprehension below, so the printed result would be `True`.

In [None]:
results = []
my_str = 'abcdef'

for c in my_str:
    results.append(c.upper())

results == [c.upper() for ... in ...]


#### lambda functions

lambdas are a single expression functions. They are often used on the fly, when we want to specify some semi-complex operation (i.e., anonymously, without assigning them to a variable).


In [None]:
f = (lambda x: x ** 3 - 1)
[f(x) for x in range(5)]

#### Ternary Operators
Finally, we briefly touched upon a form of control flow earlier, called ternary operator (also known as conditional expressions).
They can sometimes produce elegant and succint code. The syntax for conditional expressions is `value_if_true if condition else value_if_false`.


#### $\color{dodgerblue}{\text{Exercise!}}$
Complete the following code so that both the if-else statement and the ternary operator would print the same text. 

In [None]:
budget = 50
expenses = 51

# Verbose (4 lines)
if budget < expenses:
    print('In deficit')
else:
    print("Were OK")

# Concise (1 line)
print("In deficit") if budget < expenses else print("Were OK")


# Built-in Modules

Modules are files ('my_module.py') containing Python definitions and statements. Anytime you quit your Python session, everything you've typed into the console is wiped from memory. This is why you should use files (modules) to save your work.

Modules allow us to share meaningful Python programs, or mere scripts (e.g., a set of variables and loops).

Modules are objects (like functions, or variables, or almost anyhting in Python). This means that they have to be assigned to a name. We do this assignment using the `import` statement. Python comes with many built-in libraries that are composed of many modules, and modules can also be installed.

Modules can contain variables, like strings.

In [None]:
import string

print(type(string))
print(string.ascii_lowercase)

Or numbers

In [None]:
import math
print(math.pi)

We can import a specific name from a module

In [None]:
from statistics import mean

iq_scores = [90, 87, 117, 112, 93, 113, 100, 110, 94, 102,
     109, 92, 94, 90, 69, 111, 138, 96, 101, 95]

mean(iq_scores)

In [None]:
import this

Whenever you start a Python session, you only need to import the module once. If you try to use a module that you didn't previously import (e.g., because your session crashed), it would be as if you are trying to refer to a nonexistent name (i.e., variable).

First run the following cell, then uncomment the import and run it again.

In [None]:
import itertools
list(itertools.product(list('ab'), range(2)))

The same goes if you previously imported a name from a module (`from statistics import mean`), but not the whole module.

In [None]:
from statistics import mean
statistics.std

#### $\color{green}{\text{Sidenote}}$

In the past it was more prevelant to see people using the `from x import *` style, which imported nearly **every** name (e.g., variable, function, sub-module) from a package. While it is an option, it is generally discouraged to do so, see [here](https://stackoverflow.com/a/2386943).

#### $\color{green}{\text{Sidenote}}$

One of the major strengths of Python is the abundance of add-on packages that allow you to complete all sorts of tasks.

Before we move on to modules that are not built-in in Python, it is a good place to say that often the best tool for knowing how to do something in Python is **using Google**.

Let's say you want to sample random numbers in Python. If you google for example [how to normal distribution python](https://www.google.com/search?channel=fs&client=ubuntu&q=how+to+normal+distribution+python%60), you will find many relevant results.

The same is actually true for any task in Python. One of the main sites for finding how-to's is the forums site [StackOverflow](https://stackoverflow.com).

It is so common that Colab developers even put in Colab a dedicated 'search StackOverflow' button when an error occurs in a code cell.

The availability of AI coding assistants is a major change in this regard, but we'll get to that later in the workshop.