# Exercise - Fundamentals of Python for Data Science

<h3><span style="color:blue">MOTD: 'Google is your friend!'</span></h3>

## 0. A Python Easter Egg and More
<h4><span style="color:blue">0.1 Answer the following questions:</span></h4>
* What happens when you `import this` in Python? (try it out and maybe someone is even "contemplating" what happens on "medium")

* What does this have to do with PEP 8? (you should see what the people at [Real Python](https://realpython.com/python-pep8/) have to say about the latter and follow it in your Python programming!)

* Does this in any way relate to *pythonic code*? (again, there might be some evidence found on "towardsdatascience")

* Is there a way to check automagically whether your code complies to PEP 8? Is this also the case for Jupyter Notebooks?
  If so, load the appropriate extension and allow lines that are longer than 80 characters (120, for example)

In [1]:
# make sure here, that you can check your code w.r.t. compliance to PEP 8


In [2]:
# allow longer lines (not just 79 characters)


## Scenario

In the following, we shall concentrate on a very simple *scenario*: Suppose we have $n$ scores $x_1,x_2,\dots,x_n$ and we want to calcuate the `mean` (a.k.a `average`), `variance` and `standard deviation` of these scores. 

To recapitulate (yes, we divide by $n$, not $n-1$, so we are computing the [*population variance*](http://www.differencebetween.net/science/mathematics-statistics/difference-between-sample-variance-population-variance/)): 

<b>`mean`</b>:  $$\overline{x} = \frac{1}{n} \sum_{i=1}^n x_i = \frac{x_1+x_2+...+x_i}{n}$$
---
**`variance`**: $$ var = \frac{1}{n} \sum_{i=1}^n (x_i - \overline{x})^2$$
---
**`standard deviation`**: $$ \sigma = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_i - \overline{x})^2}$$
---

To broaden our scenario somewhat, let us further assume that the scores were produced by applying some model for a real life task and that in addition we also have $n$ *observations* $y_1,y_2,\dots,y_n$ that we can compare the scores to. Then, we often compute the [*sum of squared errors*](https://365datascience.com/sum-squares/) as a measure for how well our model was able to reproduce the observations.

**`SSE`**: $$ s =  \sum_{i=1}^n (x_i - y_i)^2$$
---

## Concrete Starting Scenario
To make things concrete, let us start with the following 
* 8 scores: 45, 68,73, 85, 95, 88, 76, 82 and 
* 10 observations: 50, 65, 78, 84, 95, 85, 74, 83, 90, 88

## 1. Python as a calculator

<h4><span style="color:blue">1.1 Complete the following tasks based on the concrete starting scenario above</span></h4>

1. Calculate the mean of the scores.
2. Find the data type of the individual scores

## 2.  Variables, expressions, statements
<h4><span style="color:blue">2.1 Complete the following tasks:</span></h4>

1. Save the number of scores available to a variable.
2. Use the variable to calculate the mean and again save the result in a variable. 
3. Display the data type of the result. 
4. Print out the result. 

>**Note**: Make sure you always use markdown (or comments) to make it clear what exactly you are doing (i.e. which one of the tasks you are tackling)

### 2. A Formatting output
<h4><span style="color:blue"> 2.2 Find three ways to print out a whole sentence, which makes it clear what the meaning of the printed value is.</span></h4>


>**Hint**: Check out [this tutorial](http://zetcode.com/python/fstring/) or [this one](https://realpython.com/python-f-strings/) to help you with the task. Take good note of the power of *f-strings*!

## 3. Lists

<h4><span style="color:blue">3.1 Complete the following tasks:</span></h4>

1. Create a list with the 8 scores above as elements in the list.
2. Add 2 more scores 86, 93 to the newly created list.
                
>**Hints**: 
* You should check out [`append()` or `extend()`](https://www.geeksforgeeks.org/append-extend-python/)
* The final result should be a list of 10 elements: [45, 68, 73, 85, 95, 88, 76, 82, 86, 93] 

### 3.A Slicing. 

Try and test the following examples yourself first and then do the exercises below:                

> **Hint**:                    
List slicing uses the symbol `:` to access parts of a list:
```python
list[first_index:last_index:step]
list[:]
```
Note that `first_index` is included in the result, while `last_index` is **not**!
By default the first index is 0, the last index is the last one, and the step is 1. The step is optional. A good explanation may be found on [StackOverflow](https://stackoverflow.com/questions/509211/understanding-slice-notation).

```python
# Example: 
# List slicing
a = [0, 1, 2, 3, 4, 5]
a[:2] # the first 2 elements in the list 1, i.e a[0] and a[1]
a[2:] # elements from a[2] to the end  
a[2:-1] # elements from a[2] to the last one (exclusive) # index the last index of a sequence by using -1
a[::2] # every each other 
```
```python
# The following slices are equivalent
a = [1, 2, 3, 4, 5, 6, 7, 8]
a[:]
a[::1]
a[0::1]

```

<h4><span style="color:blue">3.2 Select from the list of scores you created above:</span></h4>

1. the first 4 of elements
2. all elements but the last one
3. the last element 
4. every second element from the list of scores
5. the elements in reverse order
6. all the elements in the list (at least 3 ways)

## 4 Flow control 
<h4><span style="color:blue">4.1 Find at least two ways to print out all the elements in the list of scores <em>individually</em> .</span></h4>

>**Note**: Not every way has to be pythonic here - indeed, this should be impossible since according to the Zen of Python "There should be one-- and preferably only one --obvious way to do it." Obviously, you will have to use some form of a *loop*.

<h4><span style="color:blue">4.2 Find at least two ways to print out elements greater than 80 from the list of scores.</span></h4>

<h4><span style="color:blue"> 4.3 Find a way to print both the index as well as the value for elements greater than 80 <em>in an informative way</em></span></h4>

> **Hint**: Check out `enumerate` for a pythonic solution!

## 5. Functions

### Define functions
<h4><span style="color:blue"> 5.1 Given a list of values, define functions</span></h4>

1. to compute the *mean* of the values;
2. to compute the *variance* of the values;
3. to compute the *standard deviation* of the values.
4. Finally, assuming you have two lists of values, one of which contains observations and the other contains model outputs (scores), define a function to compute the *sum of squared errors*.

>**Notes/Hints**: 
* *Do not import* any additional libraries, just use the built-in abilities of Python (one of which is the `sum` function)
* You may assume that the lists contain numeric values only, i.e., you do not have to check this!
* You may assume that the lists containing the observations and the scores are of equal length.
* Prove that your functions work using the list of scores (e.g. by comparing the mean to the result you got before).

## 6. Numpy

### 6.A Arrays
<h4><span style="color:blue">6.1 Complete the following tasks:</span></h4>

1. Create a `numpy` array holding all 10 scores. 
2. Check the type of the array
3. Find out the dimensions of the array
4. Find out the data type of the array's elements 

(Google is your friend, especially for tasks 3 and 4).

>**Hint**: You must import the necessary package `numpy` as `np` first and then use `np.array()` to create an array. 

### 6.B Slicing
<h4><span style="color:blue">6.2 Select from the <tt>numpy</tt> array of scores you created above (see exercise 3.2):</span></h4>

1. the first 4 of elements
2. all elements but the last one
3. the last element 
4. every second element from the list of scores
5. the elements in reverse order
6. all the elements in the list (at least 3 ways)

Once you have done this, try answering the question what difference it makes to use `numpy` arrays instead of Python lists.

### 6.C 2-Dimensional Arrays

<h4><span style="color:blue">6.3 Complete the following tasks:</span></h4>

1. Convert the array of scores into a two dimensional array with 2 rows and five columns
2. Display the shape of the new array
3. Show the data type of the new array's elements
4. Check the type of the new array

### 6.D Multi-Dimensional Slicing: 

<h4><span style="color:blue">6.4  Select from the 2 dimensional <tt>numpy</tt> array of scores</span></h4>

1. all but the last column 
2. just the last column

### 6.E  Array Calculations

<h4><span style="color:blue">6.5  Use <tt>numpy</tt> to compute</span></h4>

1. the mean of the scores
2. the variance of the scores
3. the standard deviation of the scores
4. the sum of squared errors of the scores vs. the observations

Do this based on the Python lists as well as based on the `numpy` arrays (one dimensional)!
[print(score)            
>**Note**: More on *vectorized operations* in `numpy` may be found on [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html).

## 7. Lambda Functions

>Note: A lambda function is a small anonymous function. A lambda function can take any number of arguments, but can only have one expression.                   
**Syntax:** 
**_`lambda argument_list : expression`_**

Example: instead of using the (named) function `add()`, we could use an (anonymous) lambda function 

```python
#function to add 2 numbers
def add (x,y):
    return x+y
add(2,5)

# using a lambda function to add numbers "anonymously"
(lambda x,y: x+y)(2, 5)

```

<h4><span style="color:blue">7.1 What happens when you use automatic checking for PEP 8 compatibility and assign a name to a lambda function?</span></h4>

>**Hint**: Try it out using `add_lam` as a name for the lambda function to add two numbers above

<h4><span style="color:blue">7.2 Use an anonymous lambda function to compute the mean of a Python list (or <tt>numpy</tt> array) </span></h4>


>**Hint**: You may use the built-in function `sum()` in this context.

<h4><span style="color:blue">7.3 Do you think lambda functions are useful at all in Python?</span></h4>

> **Note**: Substantiate your answer with a suitable reference!