# Programmierung in Python: Grundkonzepte

Tutor: Jaime Rodríguez-Guerra (jaime.rodriguez@charite.de). Built on previous contributions from Jan Philipp Albrecht (j.p.albrecht@fu-berlin.de).

## 1. Aims of this talktorial/session

This notebook will teach you the basic concepts necessary to understand and write basic Python code using practical examples based on the datasets you will use during the course!

## 2. Learning goals

In general, the goal should be to understand the ground concepts behind the material as other lectures depend on it. Do not worry if you don't know the answer to every question. Try to understand as much as possible and to ask often to solve to your problems.

### 2.1 Theory

- Python and Jupyter Notebooks
- Variables
- Flow control
- Functions
- Modules
- Imports

### 2.2 Practical

- How to use notebooks
- Data structures:
  - Assigning variables
  - Perform operations on variables
  - Indexing and slicing lists
  - Create and alter dictionaries
- Flow control:
  - Taking decisions: `if-else` conditions
  - Repeating actions: `for` loops
- Reusing code:
  - Defining and calling functions
  - Importing modules

Don't hesitate to interrupt and ask if concepts remain unclear or tasks are not understood!


## 2.3 Datasets

Getting familiar with the data you will handle is one of the first steps in every data science pipeline.

During the first three lessons we will work with the number of corona cases registered in Germany. RKI.de maintains a record of all CoViD-19 cases in Germany. Berlin.de also offers a dataset of corona cases split by neighborhood. We will use both to illustrate different aspects of Python while remaining relatable to the current events. 

The rest of the lessons will use the Pima Indians Diabetes Database.

***


## 3 References

The official documentation for Python:
- https://docs.python.org/3/
- https://docs.python.org/3/tutorial/modules.html

Sources for datasets:
- [RKI dataset](https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data?orderBy=Bundesland&where=Bundesland%20%3D%20%27Berlin%27)
- [Berlin.de dataset](https://www.berlin.de/lageso/gesundheit/infektionsepidemiologie-infektionsschutz/corona/tabelle-bezirke/)
- [PIMA Indians diabetes dataset](https://www.kaggle.com/uciml/pima-indians-diabetes-database)

***

## 4. Theory and Practice

This is a learning-by-doing notebook alternating between theory and practice. First, a short introduction to a particular topic is given, showing examples with code. You are then ask to perform similar tasks to the examples in the theory.

You can check your results by viewing the sample solution.


### 4.1 Python and Jupyter Notebooks

#### What is Python?

Python is a widely used general-purpose high-level programming language.
The term "high-level" means that the language has abstracted away most of the technical details and manages them for you (e.g. memory allocation).

In this course we will use Python 3.7.

#### What is this interface?

This web-based interface  is a so called **Jupyter notebook**, a web application that allows you to create and share documents which contain executable code, equations, visualizations and explanatory text. 

It allows you to define so called *cells* of different formats:

- Rich-text or Markdown cells (like this cell)
- Code cells

You can *run* (execute) each cell seperately by pressing <kbd>Shift</kbd>+<kbd>Enter</kbd> or <kbd>Ctrl</kbd>+<kbd>Enter</kbd>. All names you define (variables, functions...) are available for **both** preceding _and_ following cells once executed. Careful with the order in which you execute your cells!

Some code cells additionally produce an output (text, images, etc.). In that case, the output will appear underneath the code cell.


> __Exercise__
>
> Try to execute the following cell and make sure the output appears.

In [1]:
print("This is a cell with code.")
# This is a comment line. 
# Lines in a code cell starting with the '#' symbol are
# ignored when you run a piece of code

# print('This sentence will NOT appear underneath the cell!')

This is a cell with code.


### 4.2 Variables and Operations

There are several **types** of objects depending on their contents.

In the following cells, you will learn about:

- scalar (one element) objects:
    - integers (`int`)
    - boolean (`bool`)
    - floating-point numbers (`float`)
    - strings (`str`)
- collection objects:
    - lists (`list`)
    - dictionaries (`dict`)

#### Assignment

Regardless of the type, assignment can be done with the following syntax:

```python
name = value
```

You can think of _names_ as a label hanging from a _value_. As with real objects, you can attach several names to the same value:

```python
color = "pink"
# add a label to the same object
colour = color
# you can also do this
colour = color = "pink"
```

Two things to note:

1. The _type_ of the value will be implicitly inferred from the value _contents_.
2. You cannot use spaces in the name. Python style guide recommends using `lowercase_words_separated_by_underscores`.


## Scalar objects

### Integers
Integers (`int` for short) are numbers without decimal digits. In the following example, you see the definition of a variable named  `numeric_variable`. After its assigment, the name will be reassigned twice. After each operation the new value is shown with the function `print()`. Everything inside the `(...)` will be shown as output.

```python
# assigning the name 'numeric_variable' to a value of '3'.
numeric_variable = 3
# printing a variable will show its content underneath the cell
print(numeric_variable)

# assigning a different value to variable namend 'var1', thereby OVERWRITING the previous definition:
numeric_variable = 4
print(numeric_variable)

# assigning a new value to variable namend 'var1', thereby setting a new value using the old one.
numeric_variable = numeric_variable + 1
print(numeric_variable)
    
```  

The output will be:
```python
    3
    4
    5
```

Note that Python doesn't care about the actual _name_. You can use any word (e.g. `ugly_table = 4`), but it's often recommended to use **meaningful** names so we can get an idea on what the name refers to.

### Boolean

Boolean variables can only have two values: `True`  and `False`. They can be considered a very small subset of `int`: `True` is equivalent to `1` and `False` equivalent to `0`. Assignment works similarly, but you have to use the _Capitalized_ spelling!


```python
this_is_a_boolean = True
```

> *TASK:* <br>
*Try to assign the value `False` to the variable from the example above.*

In [None]:
this_is_a_boolean = false
print(this_is_a_boolean)

 ### Strings
 
A `str` type is able to store _text_; i.e. they represent words, sentences or symbols. You need to use `""` quotes to define them!

```python
# assigning the variable with the name 'my_string' the value "This is a string". 
my_string = "This is a string"
```

Note that a `str` can contain a the text representation of a number. They resulting type will depend on the presence of the quotes!

```python
also_a_string = "1"  # I am a string due to the "", not a number!
i_am_an_int = 1  # I am an int due to the missing ""
```

###  Float
Variables of type `float` are numbers with floating decimal. Whenever you write a number that contains a `.`, the type of the variable is inferred to be a `float`.

```python
one_and_a_half = 1.5  # a typical float

# The following is also a float. 
# Although this number mathematically has no floating decimal, it is written with a "."
one_dot_zero = 1.0
also_one_dot_zero = 1.  # and this one too!
```



In [3]:
my_string = "This is a string"
# your lines of code here

#### Exercise time!

> Let's go the [RKI.de dataset website](https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data?orderBy=Bundesland&where=Bundesland%20%3D%20%27Berlin%27) and let's try to identify the different _types_ contained in each row.

## Collection objects

### lists

Python has a datatype called `list`, where other variables (regardless of their type) can be positionally stored. In other words, `list`s are a sorted collection of elements. 

The syntax for list definition uses square brackes `[ ]`, which surround the values separated by commas `,`:

![def_list.png](attachment:def_list.png)

The code `my_list = [5, 10]` would therefore store `5` at first position, and `10` at second position.

  
> *TASK:*<br>
> *Try to:*
> - *define a list called `animals`*
> - *at the first* **position**, *store the string `"horse"`.*
> - *store the string `"spider"` TWICE at* **position** *2 and 3.*
 

In [2]:
neighborhoods = ["mitte", "kreuzberg", "kreuzberg"]
print(neighborhoods)

['mitte', 'kreuzberg', 'kreuzberg']


Once defined, you can access the contained elements in different ways. The most common way is by using the **index** of the element. This is the position number, but the count starts from `0`.

```python
# print the first element of the list
print(neighborhoods[0])
```

Note that the `index` itself could be a variable of the type `int`!

```python
index = 0
print(neighborhoods[index])
```


> *TASK:*
> *Now print out your favorite kiez in the list in two ways:*
> - *by using an integer number corresponding to the `index` of your favorite kiez.*
> - *by using a variable called `index`.*

In [3]:
neighborhoods = ["mitte", "kreuzberg", "neukölln"]
print(neighborhoods[0])

index = 0
print(neighborhoods[index])

asdasd = 0
print(neighborhoods[3])

mitte
mitte


IndexError: list index out of range

***

Oh! Was that an error? 

<font color=red>Setting or accessing a value of an index of a list which is not defined results in an error!</font>

```python
list1[3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-831b15cbf272> in <module>()
----> 1 list1[3]

IndexError: list index out of range

```

### Slicing (range-indexing) a list

In the following you find a definition of a list called `more_neighborhoods`. By giving a **range** of indices you have access to a contiguous subset of the list. A range can be set in the following way:

```python
more_neighborhoods = ["mitte", "kreuzberg", "neukölln", "spandau"]
more_neighborhoods[start:stop]
```
where `start` and `stop` are numbers (integers). Thus, ` more_neighborhoods[1:3] ` would give an output of `["kreuzberg", "neukölln"]`. Notice that the `stop` index is _not_ included in the output.

> *TASK:*
>
> *print out all strings in the list `more_neighborhoods` ending in a consonant*

In [7]:
more_neighborhoods = ["mitte", "kreuzberg", "neukölln", "spandau"]
print(more_neighborhoods[1:3])

['kreuzberg', 'neukölln']


In [9]:
# More examples
print(more_neighborhoods[3:5])  # even if we are out of bounds, this will not raise an error
print(more_neighborhoods[3])
print(more_neighborhoods[-1])  # negative slicing can be used to count from the end; -1 == last one!

['spandau']
spandau
spandau


Cool trick: Python considers `str` types to be a _sequence_ of characters, which is very similar to the structure of a `list`. In fact, we can apply the same synthax to get a **substring** of our string. Keep that in mind, it will become necessary later on.

```python
far_away = "spandau"
print(far_away[1:4])
# will print:    
"pan"
```

### Dictionaries

We have learnt that different objects can be grouped in a sequential container called `list`. There's another useful container called a dictionary (type `dict`). 

With `dict` objects, you don't obtain a _sequence_ of variables, but a _mapping_ of variable _key-value_ pairs, like in real life dictionaries! (The word is the key, the value is the definition for that word). 

They can be useful, for example, to assign properties to variables. Let's say, classmates and their age:


![def_dict.png](attachment:def_dict.png)


Or, neighborhoods and the [number of cases as of 30.06.2020](https://www.berlin.de/lageso/gesundheit/infektionsepidemiologie-infektionsschutz/corona/tabelle-bezirke/):

```python
cases_by_neighborhood = {"mitte": 1188, "kreuzberg": 6666, "neukölln": 1011, "spandau": 445}
```

The keys in this example are `"mitte"`, `"kreuzberg"`, `"neukölln"` and `"spandau"`. The values are `1188`, `666`, `1011`, `445` respectively.

Values can be accessed by refering to the specific **key** by using `[]` brackets (same as lists!):

```python
cases_by_neighborhood["mitte"]

    1188
```

Variables of type `dict` can be easily extended by assigning a **key**-**value** pair to the dictionary. This looks like the following:

```python
cases_by_neighborhood["lichtenberg"] = 419
```

In the same way already existent  **key**-**value** pairs can be overwritten.

> **TASK:**
> *We made a typo in the dict definition above. Kreuzberg (or more specifically, Friedrichshain-Kreuzberg) has 666 cases, not 6666! Correct the **key**-**value** pair and set its value to `666` without redefining the entire `cases_by_neighborhood` dictionary. 
> Check your correction by printing the new dictionary value.*

In [11]:
cases_by_neighborhood = {"mitte": 1188, "kreuzberg": 6666, "neukölln": 1011, "spandau": 445}
# Your lines of code here

print(cases_by_neighborhood)

{'mitte': 1188, 'kreuzberg': 6666, 'neukölln': 1011, 'spandau': 445}


***

### Operations

Every type defines some **operations** to do basic tasks.

For example, a variable of type `int` defines, among others, the following operations:
- Addition `+`
- Substraction `-`
- Multiplication `*`
- Division `/`


A Division is allowed but **changes** the type of the variable (from Integer to `float`). 


```python
# applying basic operations on the cases registerd in Mitte and Friedrichshain-Kreuzberg
print(cases_by_neighborhood["mitte"] + cases_by_neighborhood["kreuzberg"])  
   7854 # type int
print(cases_by_neighborhood["mitte"] - cases_by_neighborhood["kreuzberg"])  
   -5478 # type int
print(cases_by_neighborhood["mitte"] * cases_by_neighborhood["kreuzberg"])  
   7919208 # type int
print(cases_by_neighborhood["mitte"] / cases_by_neighborhood["kreuzberg"])  
   0.1782178217821782 # type float


```

> *TASK:* <br>
*Now try to use the addition-operation for the variable named `my_string` already defined in the cell below. <br>
Save the addition in a variable called `res1` and print the result.*


Depending on the type of the object these operations are **contextually** defined.

Note how mixing two types of variables with an operation can result in an error! For example the following is forbidden:

```python
var1 = 5
res1 = "test"
# this should produce an error, since the operator '+' can not combine 'int' and 'str' variables! 
var1 + res1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-05639e44053b> in <module>()
      1 # this should produce an error, since the operant '+' can not combine 'int' and 'str' variables!
----> 2 var1 + res1

TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

### comparison operation
An **equality operation** (a comparison operation) can be performed by using the `==` operator. The result is always a `bool`-type having either the value `True` or `False`, depending on whether the content of two variables are indeed equal or not. Beside equality, we can also check for inquality be using the `!=` operation.

Imagine this as a question you ask to the computer. "Is the content of var1 **equal** to the content of var2?"

- Equality `==`
- Inequality `!=`

```python
print(var1 == var1)  # type bool
    True
    
print(var1 != var1)  # type bool
    False
```

Note how comparisons _always_ produce a `bool` type!

In [11]:
x = 7
y = 7
print("Is x equal to y? The answer is...", x == y)
print("Is x not equal to y? The answer is...", x != y)

Is x equal to y? The answer is... True
Is x not equal to y? The answer is... False


Instead of equality (`==`), there are other comparison operations which return a `bool`-type value. Think of it as other questions to the computer than only asking it for equality. These "questions" can be used in a `while` loop:
- `<` strictly less than
- `>` strictly greater than
- `<=` less than or equal
- `>=` greater than or equal
- `!=` not equal


***

### 4.3 Flow control

As any programming language python gives flow control possibilities: `if, else, for, while`.

#### Decisions: if, elif, else

Conditionals allow you to you easily define automated decisions.

We could have a piece of code checking the number of cases in our neighborhood every day so we can get an alert if the number of cases exceeds a certain threshold. For example:

```python
if cases_by_neighborhood["kreuzberg"] > 1000:
    print("The number of Friedrichshain-Kreuzberg is above a thousand!!!!!")
    # here, more lines could follow
else:
    # If the `if` clause is not true, then this block gets executed instead.")
    print("Kreuzberg has not reached a 1000 cases yet.")

# These lines are not indented anymore.
# They will be executed regardless of the value of variable 'status'. 
print("Number of cases in Kreuzberg:", cases_by_neighborhood["kreuzberg"])    
```

Notice the syntax: 
* Keyword `if` indicates the start of an ` if`-statement. 
* A **conditional expression** (here `cases_by_neighborhood["kreuzberg"] > 1000`) follows, which will return (explicitly or implicitly) a `bool`-type. Thus, either `True` or `False`. 
* A `:` closes the ` if`-statement. 
* An **indented** block follows. All contiguous lines sharing the same level of indentation (or larger) belong to the same block and are executed only if the condition is met.
* After the `if` block, an `else` block is defined. This is optional!

Now further think of a scenario, where we have to decide whether to run a certain code based on 3 or more different conditions. Alternative, exclusive conditions can be done with `elif`.

```python
if cases_by_neighborhood["kreuzberg"] > 1500:
    print("The number of Friedrichshain-Kreuzberg is above a 1500!!!!!")
    # here, more lines could follow
elif cases_by_neighborhood["kreuzberg"] > 750:
    print("The number of Friedrichshain-Kreuzberg is above a 750!!!")
    # here, more lines could follow
elif cases_by_neighborhood["kreuzberg"] > 375:
    print("The number of Friedrichshain-Kreuzberg is above a 375!!!")
    # here, more lines could follow
else:
    # If the `if` clause is not true, then this block gets executed instead.")
    print("Kreuzberg has not reached a 1000 cases yet.")
```
***

> **TASK**
>
> Copy the code above in a new cell. Before running it, think: What's the expected output of the cell below? Why?

In [14]:
# Copy the if-elif-elif-else code here



### Repetition: for-loop

When we want to repeat the same action for different parameters, we wouldn't like to write the same code N times, right? That's what computers are for, anyway.

Let's check which neighborhoods are above a thousand cases. You might be tempted to do this:

```python
if cases_by_neighborhood["kreuzberg"] > 1000:
    print("The number of cases in Friedrichshain-Kreuzberg is above a thousand!!!!!")
if cases_by_neighborhood["mitte"] > 1000:
    print("The number of cases in Mitte is above a thousand!!!!!")
if cases_by_neighborhood["neukölln"] > 1000:
    print("The number of cases in Neukölln is above a thousand!!!!!")
if cases_by_neighborhood["spandau"] > 1000:
    print("The number of cases in Spandau is above a thousand!!!!!")

```

This is really boring because we have to copy and paste the same code N times, and then replace the _key_ of the dictionary, and the full name of the neighborhood. It also introduces a lot of duplication. What if we want to change the threshold from `1000` to `2000` in the future? That would be four replacements. Imagine this for all neighborhoods in Berlin, or for all cities in Germany! No way!

That's what loops are for! Let's analyze each block. They are really similar because, well, they are copies:

```python
if cases_by_neighborhood[THIS_IS_THE_DICT_KEY] > 1000:
    print("The number of cases in THIS_IS_THE_FULL_NAME is above a thousand!!!!!")
```

We need to repeat _that_ action, using different dictionary keys.

```python
neighborhoods = ["mitte", "kreuzberg", "neukölln", "spandau"]

for neighborhood_key in neighborhoods:
    if cases_by_neighborhood[neighborhood_key] > 1000:
        print("The number of cases in", neighborhood_key, "is above a thousand!!!!!")    
```


What's happening here?

1. `for NAME in COLLECTION:`. Just as `if`, a `:` is needed to end the line.
2. `NAME` will be assigned to the first element in `COLLECTION`.
3. The indented block(s) will be executed with `NAME = FIRST VALUE`.
4. `NAME` will take the second element.
5. The indented block(s) will be executed with `NAME = SECOND VALUE`
6. ... and so on. The loop ends when there are no more elements in the list to assign.

> **TASK**
>
> How many entries does `neighborhoods` contain? You can use `len(neighborhoods)` to guess the answer, but you can also compute it with a `for` loop, a reassignable `int` variable and the `+` operator.

In [16]:
#  Try your best here
count = 0
for neighborhood in neighborhoods:
    ...

Note: There are more ways to _repeat actions_ in Python, like using `while` loops or recursion, but `for` is by far the most common!

***

### 4.4 Introduction to functions

Repeating a single operation is useful so you do not get bored writing the same lines again and again. In fact, avoiding the repetition of tasks is one of the main core concepts in programming! We are lazy and we want to do as little as possible. This leads to devising mechanisms of reusing the same code in different places so we do not have to write it again! 

This is what functions are there for: **reusable pieces of code that can be parametrized**. 

This means they accept arguments! They can optionally _return_ a result too. They are very similar to mathematical functions in that sense!

A task like "counting up all entries in a list" is probably something somebody has already coded before. That's why we have `len()`. For common tasks it is always worth asking (Google knows everything) if this task is generally known and a **function** has already been created.

Functions are sequence(s) of instructions that perform a specific task. They can be used (called) in other program parts whenever that certain task sould be executed. Variables you pass to a function are called an argument.

Think of it as a task which you give to somebody and he gives you back the result. Imagine a task for a cook like "cutting vegtables". The function would have the instruction **how** to move the knife, but not **what** to cut, since there is more than one vegetable for which this instruction is valid. You then give for example a carrot to the cook (call the funtion with a "carrot" as argument) and get the cut carrot back.

Calling a routine with the name `sum_list` would look like this:

![funct_call_easy.png](attachment:funct_call_easy.png)


In this example nothing happens with the answer of the call. The returning value will simply be ignored. Think of the cook you asked to cut the carrot. He did it properly and hands you the cut carrot, but you don't take it.

Thus, we need to **store** the returning value in a variable to "take" it.

```python
num_legs_list = [4,2,8,150]
sum_num_legs_list = sum_list(num_legs_list)
```

So, as a recap, in order to use a function, you first have to know:

1. If the function **exists**, either because you have written it or because you have imported it from another file (see below).
2. What **arguments** or **parameters** the function is expecting. A function can take arguments or not. Arguments can be required (positional arguments) or optional (keyword arguments).
3. If the function **returns** something. If it does not return anything, it doesn't mean it didn't _do_ anything. It might have some kind of side effect (writing a file to disk, for example).


>*TASK:*
*You already know and even used one particular function. Do you know how it is named and what it does?*

Variables of nearly every datatype (you will learn more about other types in the next class), have defined **functions** and **attributes**. The latter are (here for simplicity) other variables always named the same for every variable of a certain type. Functions and attributes can be accessed by writing a `.` after the **NAME** of the variable. 

They help the programmer performing basic operations and tasks.


A `float` datatype for example has an attribute called `real` and a function called `is_integer()`.
```python

x = 1.2  # a float value
print(x.real)  # accessing attribute of variable "x" called "real". Notice: No () brackets!
print(x.is_integer())  # calling function is_integer() of variable "x" with no argument.

    1.2
    True
```

The **attribute** `real` contains the real part of the floating point number (in case of a complex number this becomes important).<br>
The **function** `is_integer()` does not need any arguments and returns a `bool` type having the value `True` if `x` can be written as `int` type.


<font color=green> Make sure you understood this concept before you move on. Other lectures depend on these concepts! `


### 4.5 Modules and imports

If you are expecting to use the functions you have defined in this notebook in a different one, you will discover it's not possible right away! To support this, Python has a way to put definitions in a file for you to reuse it. Definition-containing files are called modules; definitions from a module can be imported into other modules, scripts or notebooks.

A module is therefore a file containing Python definitions and statements. You can easily identify them thanks to the `*.py` file extension.

> More info on the [Python documentation](https://docs.python.org/3/tutorial/modules.html)!

The Python [`import` system](https://docs.python.org/3/reference/import.html) can be rather complex. We therefore only tell you the very basic information you need to know to follow the course. 

Python ships with a large battery of modules that you can use right away. Since it would be unfeasible to load all definitions directly when starting Python (it simply would take too long and would be too memory consuming), only a very small subset is available upon initialization. The other ones must be _explicitly_ invoked or **imported**. That can be achieved with the `import` command.

For example, to get access to more scalar functions like the square root, you need to import the `math` module, which defines `sqrt`:

```python
import math
help(math.sqrt)
```

The import system can also be used to load modules developed by 3rd parties. One of the most popular modules are **numpy** and **pandas**. Import statements look like the following:

```python
import numpy
import pandas
```

In order to use a module, you need to **know** its existence and its contents. Unfortunately, there is no other way than checking out the documentation of the corresponding module. Functions from the module can then be used in with the following synthax:

```python
# calling the function "mean" from the numpy module, giving the list "[1,2,3,4]" as argument.
numpy.mean([1,2,3,4])

```

Since module names are sometimes long (and informaticians are lazy) we can give them a different name by using the codeword `as`.

```python
import numpy as np
import pandas as pd
```

Whenever we need a function/definition from the `numpy` module we can further refer to the module as `np`.


***

### 4.6 Optional: Defining your own function

What if you have a task but you can not find a function for it which is already defined? Then you can write your own instructions to solve a particular task!

The general structure of a function python follows the following synthax:

![def_function.png](attachment:def_function.png)

The keyword `def` indicates the beginning definition. `ARG_1, ARG_2, ..., ARG_N` are arbitrarily many **arguments** expected to be passed to the subroutine when calling the routine. Try to understand the following definition of a function:

![funct_sum_list.png](attachment:funct_sum_list.png)

The names of the arguments and its values will only exist in the scope of this function. Outside, these definitions are <font color=red>NOT` available.

> *TASK:*<br>
*Use the knowledge to define a function which builds the **average** of a list, using the definition of function `sum_list()` from this cell. Test your implementation with the list called `num_legs_list`.*

In [None]:
# This functions sums up each entry in a list.
# It expects exactly 1 argument, which is called in the scope of the function 'any_list'
# It returns (to the calling routine) the sum of the list, called 'list_sum'. (only in the scope of the function.)
def sum_list(any_list):
    list_sum = 0 
    for entry in any_list:
        list_sum += entry
    return list_sum

# define the desired number of arguments (if any necessary) and give them reasonable names
def average():
    # here you should probably call the function sum_list() and do something with the return value
    
    return avrg

# here you should probably call your average function with your list 'num_legs_list'
num_legs_list = [4,2,8,150]

*TASK:*<br>
*Call the function named `average` again. This time, pass `new_list` defined in the cell below as an argument. Check the correctness of the result.*

In [None]:
new_list = [1,4,10,25,10]

# call the function "average" using "new_list" as an argument.


## 5. Discussion

Python is a very expressive language that has risen in popularity in the recent years. The ecosystem of packages and modules for data science and, more specifically, data science is _vast_. Learning the basics will be one of your master assets down your career, especially if you decide to pursue an academic research path!

Right now, all these concepts might be very spread and do not make sense together. Hopefully, after seeing more application-oriented examples in the following days you will have a better understandind of the potential usefulness and power of Python.

_Hello world, future Pythonista!_

***

## 6. Quizz

- What is the expected output of the following code?
```python
x = ""
for i in "python_is_amazing!":
    if i == "_":
        x += " "
    else:
        x += i
print(x)
```

- When executing the following code, will there be an error?

```python
x = 5
y = "5"

print(x + y)

```

- What do we need to change it order to get the correct output "10"?
***

In [1]:
# copy the code from the quizz and run the cell to check your answers!