# Really short introduction

## Introduction to Jupyter notebooks

Jupyter notebooks are able to execute code interactively.
In order to execute code, click into a cell and press the "play" button or (recommended) press shift+enter

In [None]:
# press shift + enter here:
for i in range(10):
    print("Hello {}".format(i))

## A very short python introduction

Dynamically (but strongly) typed language

In [None]:
i = 2
print(i)
print(type(i))
i = "string"
print(i)
print(type(i))

Built for being easy to learn:
* minimal boilerplate
* human readable syntax
* batteries included: huge standard libraries
* plenty of usefull data types: dicts, sets, lists,...

## Look Mom, no external library!

Let's try to do some data science with no external library, but still comprehensive syntax and some level of convenience

In [None]:
# list comprehension
a = [i for i in range(10)]
print(a)

In [None]:
# but in this simple case, the constructor of a list does the job
b = list(range(0,20,2))
print(b)

In [None]:
# dicts are soooo usefull, use them just everywhere (at least never again write a class as a data structure)
my_table = {"column_a": a, 
            "column_b": b} 
print(my_table)

In [None]:
# 'keys' returns a list of the keys (our columns)
my_table.keys()

In [None]:
# 'values' returns a list of the values
my_table.values()

In [None]:
# zip is one of those useful little helpers, glues together lists like a zipper
for row in zip(my_table["column_a"], my_table["column_b"]):
    print(row)

In [None]:
# So easy to define a function
def pretty_print_table(table):
    # "*" is not a pointer, it "unpacks" a list, very handy to pass it as function parameters
    for row in zip(*my_table.values()):
        print(row)
pretty_print_table(my_table)

In [None]:
def square(x):
    return x*x

# lets do some map reduce!
# map assigns the funtion "square" to every item in my_table["column_b"] and a new list is created with the results
my_table["column_c"] = list(map(square, my_table["column_b"]))
pretty_print_table(my_table)

In [None]:
from functools import reduce # <- don't ask why map is in the namespace, but reduce has to be imported
def sum(a,b):
    return a + b

# "reduce" aggregates a list by iteratively calling "sum" with the result of the previous call and the next item
# 1 2 3 4
# \/ | |
#  \/  |
#   \/
# 3 = sum(1, 2)
# 6 = sum(3, 3)
# 10 = sum(6, 4)
my_sum = reduce(sum, my_table["column_c"])
print(my_sum)

In [None]:
mean = my_sum/len(my_table["column_c"])
print(mean)

In [None]:
# A general variance function using map and reduce in 5 loc
def variance(samples):
    # lambdas are "inline" no-name function definitions for lazy people
    mean = reduce(lambda a, b: a + b, samples)/len(samples)
    quadratic_residuals = list(map(lambda a: (a - mean) * (a - mean), samples))
    variance = reduce(lambda a, b: a + b, quadratic_residuals)/len(quadratic_residuals)
    return variance
# can you spot the code duplication?

In [None]:
variance(my_table["column_c"])

# But now forget everything we did here...
...because there are third party libraries doing all that!

* numpy for the low level stuff: mean, variance, high performant array operations,...
* pandas for the high level stuff: dataframes, statistics, visualisations... 
* dask for data which doesn't fit your laptop memory: out-of-core, cluster computations