# Statistics and Data Science: Exercises library

## Data normalization with Euclidean norm

A very common operation is to transform you data by normalization. Imagine you have a list of data points $x=$`[21.4,45.7,38.5,76.4,61.9,43.4,52.6,27.2]` and you want to normalize your data using the [Euclidean norm](https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm), i.e., convert the data between 0 and 1 with the following operation:

$\hat{x}_{i} = \frac{x_{i}}{||x||}$

where: $||x||=\sqrt{x_1^2+...+x_n^2}$

Normalization is common (necessary) when you deal with several variables that have very different scales.

- Using list comprehension, create a new list with normalized $x$ using the Euclidean norm. 

In [None]:
raw_data=[21.4,45.7,38.5,76.4,61.9,43.4,52.6,27.2]
summ_squ= sum(i**2 for i in raw_data)
absolute= summ_squ**(1/2)
Euclidean_norm= [x/absolute for x in raw_data]

print(Euclidean_norm)


""" pretty solution
# We first compute the square of each element using comprehension
x_square = [i**2 for i in x]

# Then we compute the Euclidean norm
x_norm = (sum(x_square))**(1/2)

# Finally we normalize our list
x_hat = [i/x_norm for i in x]

print(x_hat)
"""

## Data cleaning with comprehension

Suppose we have the following list: $x=$`[21.4, 'NaN', 45.7,38.5,76.4,61.9, 'NaN', 43.4,52.6,27.2]`. Unfortunately we have some `'NaN'` values (Not a Number).

- Clean your list, dropping `'NaN'` values, using list comprehension

In [None]:
liste=[21.4, 'NaN', 45.7,38.5,76.4,61.9, 'NaN', 43.4,52.6,27.2]
for nu in liste:
    if nu== "NaN":
        liste.remove(nu)
print(liste)

#x_clean = [i for i in liste if i != 'NaN'] #!= select each value that DOES NOT have the value on teh right side
#use dropna  #for each i in the list, if its unequal "NaN, add i to the new list -> makes it much shorter
#liste.dropna()


## Data manipulation using dictionary comprehension

Comprehension is not only for list, dictionary too! Suppose you have the following dictionary, with the grades of some students on a 0-100 scale:

`{'Adam': 72, 'Elena': 91, 'Xiang': 87, 'Julie': 81, 'Takafumi': 79}`

- Use dictionary comprehension to convert the grade from the 0-100 scale to the Swiss 0-6 scale.
- Use dictionary comprehension to round to the nearest 0.25 (for instance 4.2 should be converted to 4.25). 

Tips: you can use the `round()` function

In [None]:
pairs={'Adam': 72, 'Elena': 91, 'Xiang': 87, 'Julie': 81, 'Takafumi': 79}
#new_pairs={}
new_6={key: value/100*6 for key,value in pairs.items()} #->dic comprehension, I could do a for loop as well
print(new_6) #-> how to get the worked on values back into the dictionary? -> add key beforehand

round_6={key: round(value*4)/4 for key,value in new_6.items()}  #-> how do I know that it rounds to 2 decimals?
print (round_6)
#for key,value in pairs.items():
 #   new_pairs=value/100*6]
  #  print(value)
#for key,new_pairs in pairs.items():
 #   new=int(new_pairs)
  #  round(new,0.25)
    
#newest_pairs=({round(new_pairs,0.25)} )   

## Green Bonds

You have a list of green bonds identifiers: 
`gb_ID = ['CH843556=S', 'CH843556=', 'CH868037=', 'CH6YT=RR', 'CH30YT=RR', 'CH975519=', 'CH1580323=', 'CH1580323=S', 'CH2452496=S']`

- Create a new list with the elements of `gb_ID` but removing the `'='` sign and what follows. For instance 'CH843556=S' should be CH843556
- Create a new list selecting the elements of `gb_ID` with nothing after the `'='` sign, i.e. we disregard elements such as 'CH843556=S' 

Hints: 
- You can use list comprehension inside another list comprehension.
- For the second question, you could use Regular Expressions [RegEx](https://docs.python.org/3/library/re.html). See also this [tutorial](https://www.w3schools.com/python/python_regex.asp)

In [None]:
import re
gb_ID = ['CH843556=S', 'CH843556=', 'CH868037=', 'CH6YT=RR', 'CH30YT=RR', 'CH975519=', 'CH1580323=', 'CH1580323=S', 'CH2452496=S']
clean_gb=[i.split("=")[0] for i in gb_ID]
cleaned_gb=[i for i in gb_ID if (re.findall("=$",i))] #OR if i[-1]= "="

#clean_gb=[]
#for i in gb_ID: 
#        if i == "=":
#            clean_gb=i.split("=")
print(clean_gb)
print(cleaned_gb)

## Optimizing recursive function

During the lecture, we have defined a function to calculate Fibonacci numbers: 

$F(0)=0$

$F(1)=1$

$F(n)=F(n-1)+F(n-2)$

However, our function was not efficient since we needed to repeat operations. For example, to compute $F(5)$, we needed $F(4)$ and $F(3)$, but to know $F(4)$ we needed to compute $F(3)$ and $F(2)$, and so on. Since Fibonacci numbers were not stored in memory, the function calculated many identical subproblems over and over again.

- Design a function that calculate Fibonacci numbers and solves the repetition issue.
- Create a list of the first 12 Fibonacci numbers.

Hint: you can use a dictionary

In [None]:
fibonacci_numbers={0:0, 1:1}
def Fibonacci(x):
#    while x<12:
        if x in fibonacci_numbers:
            return fibonacci_numbers[x]
        else:
            fibonacci_numbers[x]= Fibonacci(x-1)+Fibonacci(x-2) 
            return fibonacci_numbers[x]
print([Fibonacci(x) for x in range(12)])
#-> what's the mistake here? 

## Book information

We have some information about two books:

`(
Title = 'Sapiens: A Brief History of Humankind', 
Author = 'Yuval Noah Harari',
Year = 2011,
Language = 'Hebrew',
ISBN = '978-0062316097')`

`(
Title = 'Les Racines du ciel',
Author = 'Romain Gary',
Year = 1956,
Publisher = 'Gallimard'
)`

As you can see, the information we have differs.

- Write a function that prints for each key: 'The (key) is (value).'. The key should be in lower cases, except the ISBN number.
- Call your function with our two books.

For instance, the output for the second book should look like this:

The title is Les Racines du ciel.
The author is Romain Gary.
The year is 1956.
The publisher is Gallimard.

Hint: Try to use arbitrary keyword argument `**kwarg` and the format string method

In [None]:
def text(**book):
    for key,value in book.items():
        if key == "ISBN":
            print ("The {} is.".format(key,value))
        else:
            print ("The {} is.".format(key.lower,value))
    else:
        print ("")
#    return text
bookinfo_1= {Title = 'Sapiens: A Brief History of Humankind',
          Author = 'Yuval Noah Harari', 
          Year = 2011, 
          Language = 'Hebrew', 
          ISBN = '978-0062316097')
             
bookinfo_2={Title = 'Les Racines du ciel',
            Author = 'Romain Gary', 
            Year = 1956, 
            Publisher = 'Gallimard'}
    