# <a href="https://girafe.ai/" target="_blank" rel="noopener noreferrer"><img src="https://raw.githubusercontent.com/girafe-ai/ml-mipt/7096a5df4cada5ee651be1e3215c2f7fb8a7e0bf/logo_margin.svg" alt="girafe-ai logo" width="150px" align="left"></a> [ml-mipt](https://github.com/girafe-ai/ml-mipt) basic course <a class="tocSkip">

# Seminar 01: Intro to Jupyter and Python tools <a class="tocSkip">

# Intro to Jupyter

Jupyter (pronouced as **/ˈdʒuː.pɪ.tər/**) is a play of names Julia, Python, and R and is actually a standard in area of educational programming activities.<br>
Jupyter allows to mix notes, explanations and even images with code.

This notebook provides shallow intro to Jupyter.<br>
In case you need a comprehensive thorough manual for Jupyter checkout [Jupyter Notebook Users Manual](https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb) by Bryn Mawr College.

## Setup

To use Jupyter server locally run:
```bash
pip install notebook
jupyter notebook
```

Another option is to use cloud based server, to name a few:
* [Google Colab](https://colab.research.google.com/)
* [Binder](https://mybinder.org/)
* [Amazon Sagemaker](https://studiolab.sagemaker.aws/)

### Advanced setup

#### [Jupyter Nbextensions Configurator](https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator)

provides various extensions for local notebook server. Consider to activate the following:

* Table of Contents (2)
* Collapsible Headings
* ExecuteTime
* Ruler
* ScrollDown
* Autopep8

#### Jupyter Notebook Viewer

For MacOS users there is a [beautiful app](https://github.com/tuxu/nbviewer-app) to preview Jupyter Notebooks (means no editting supported) which could be installed via:

`brew install --cask jupyter-notebook-viewer`

This is very handy to deal with many notebooks locally (rendering is faster than in Jupyter Notebook server)

## Cell types

Notebook contains many cells that allow different types of materials:    

1. code (press `Y` to change cell type to code)
2. Markdown (press `M`)
3. raw symbols (press `R`)

In [None]:
# Сell with code
a = 1

Cell with Markdown text

`Shift` + `Enter` allows to run cell (for Markdown it will render).

In [None]:
a = 1

In [None]:
print(a)

TODO:

* add modes description (editting [green] and navigating[blue])
* shortcuts to create[`A`, `B`], cut (delete)[`X`], undo delete[`Z`] and move cells
* shortcuts to restart[`00`, double 0] and interrupt[`II`] kernel

# Markdown

![Markdown logo](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png)

Markdown is a subset of HTML markup language. It simplifyes it to the most used operations such as headers, lists, links and so on.

Markdown, widely used in Jupyter, GitHub and myriads of other places (take a look on [Obsidian project](https://obsidian.md/) to create and manage your own knowledge base)

Great neat cheat sheets (quick self-explaining reference) for Markdown:
1. Best to start: [Markdown-Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) by Adam Pritchard
1. Explains advanced features: [Markdown Cheat Sheet](https://www.markdownguide.org/cheat-sheet) by markdownguide

Shortly, it allows:

0. Build lists
1. 
## Make <a class="tocSkip">
### titles <a class="tocSkip">
#### of different <a class="tocSkip">
##### sizes <a class="tocSkip">
3. Quote *text* <s>in</s> **different** ways
4. Add [hyperlinks](https://github.com/girafe-ai/ml-mipt)

* Build unordered lists

Use $\LaTeX$:

$
\left\{
\begin{array}{ll}
x = 16 \sin^3 (t) \\ 
y = 13 \cos (t) - 5 \cos (2t) - 2 \cos (3t) - \cos (4t) \<br>
t \in [0, 2 \pi]
\end{array}
\right.
$

And insert images (url would work if internet connection is available):
![](https://images.immediate.co.uk/production/volatile/sites/4/2018/08/iStock_13967830_XLARGE-90f249d.jpg?webp=true&quality=90&resize=400%2C200)

# Python

Is our primal language. If you didn't have much practce or want to refresh key concepts, welcome to [snakify](https://snakify.org/) or [питонтьютор](https://pythontutor.ru/) (the same materials in Russian)

You always live in a community, so please respect it's standards and guides.

## PEP 8
In our case __[PEP 8](https://www.python.org/dev/peps/pep-0008/)__ is a standard. (Dare to open it - __it's made for humans!__)

[Google Python Style Guide](https://google.github.io/styleguide/pyguide.html) contains reasonable extensions and motivations.

### Also don't forget about _this_
[Zen of Python, PEP 20](https://www.python.org/dev/peps/pep-0020/)

In [1]:
import this  # noqa: F401

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


# NumPy

**NumPy** — great Python library for matrix computations. It allows to work [easily] with matrices, arrays, math etc. In addition, it allows (and encourage) vector operations, which are **much** quicker.

 - [numpy](http://www.numpy.org)
 - [numpy tutorial](http://cs231n.github.io/python-numpy-tutorial/)
 - [100 numpy exercises](http://www.labri.fr/perso/nrougier/teaching/numpy.100/)

In [None]:
import numpy as np

Main dtype in numpy — [numpy.ndarray](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.array.html). 
Every `ndarray` has several dimentions or axis. 

In [None]:
vec = np.array([1, 2, 3])
vec.ndim  # number of axis

`Shift` + `Tab` when the cursor is inside the open parentheses allows to peek into the docks. Try to press `Tab` several times.

In [None]:
mat = np.array([[1, 2, 3], [4, 5, 6]])
mat.ndim

To get shape we could use `.shape` method:

In [None]:
vec.shape

To get `dtype` and `itemsize`:

In [None]:
mat.dtype

In [None]:
mat.itemsize

## Constructing numpy array

* Wrap list (or other iterable) with `np.array()` constructor

In [None]:
A = np.array([1, 2, 3])
A

In [None]:
A = np.array([1, 2, 3], dtype=np.float64)
A

In [None]:
B = np.array([(1, 2, 3), (4, 5, 6)])
B

* Some widely used arrays are available by special methods: `zeros`, `ones`, `empty`, `identity`:

In [None]:
np.zeros((3,))

In [None]:
np.ones((3, 4))

In [None]:
np.identity(3)

In [None]:
np.empty((2, 5))

Be careful with `empty` function: it just creates the array (like `malloc` operation in C). The elements of the arret are **not initialized** yet.

* `np.arange` allows to create range

In [None]:
np.arange(2, 20, 3)  # Like almost everywhere, left border is included, right - excluded

In [None]:
np.arange(2.5, 8.7, 0.9)  # Can deal with real numbers either

In [None]:
np.linspace(2, 18, 14)  # Warning! Right border is INCLUDED by default

* What do you think `reshape` method does?

In [None]:
np.arange(9).reshape(3, 3)

If one of the dimentions is set to `-1` if will be computed automatically

In [None]:
np.arange(8).reshape(2, -1)

In [None]:
C = np.arange(6).reshape(2, -1)
C

Transposition is easy easy

In [None]:
C.T

* Stacking (only arrays for now)

In [None]:
A = np.arange(6).reshape(2, -1)
np.hstack((A, A**2))

In [None]:
np.vstack((A, A**2))

In [None]:
np.concatenate((A, A**2), axis=1)

* Repeating an existing array

In [None]:
a = np.arange(3)
np.tile(a, (2, 2))

In [None]:
np.tile(a, (4, 1))

## Basic operations

* Basic arithmetic operations are element-wise

In [None]:
A = np.arange(9).reshape(3, 3)
B = np.arange(1, 10).reshape(3, 3)

In [None]:
print(A)
print(B)

In [None]:
A + B

In [None]:
A * 1.0 / B

In [None]:
A + 1

In [None]:
3 * A

In [None]:
A**2

Matrix multiplication via `*` is **elementwise** too!

In [None]:
A * B

Dot product is available via `.dot`:

In [None]:
A.dot(B)

Or simply:

In [None]:
A @ B

Matrices in binary operations are supposed to have same shape.<br>
However, if the shape can be [broadcasted](http://www.scipy-lectures.org/intro/numpy/operations.html#broadcasting) - you won't get an error.<br>
But **be careful** with this stuff.
![](images/numpy_broadcasting.png)

In [None]:
np.tile(np.arange(0, 40, 10), (3, 1)).T + np.array([0, 1, 2])

* Unary functions (sin, cos, exp etc.) are elementwise as well:

In [None]:
np.exp(A)

* Some operations are aggregating the array values: min, max, sum etc.:

In [None]:
A

In [None]:
A.min()

In [None]:
A.max(axis=0)

In [None]:
A.sum(axis=1)

## Indexing

Numpy allows many [different ways of indexing](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html). Short recap:

* Sequences of *indices* and *slices* are the main approaches. Slices additionally creates **views** instead of copies and save some RAM.

In [None]:
a = np.arange(10)
a

In [None]:
a[2:5]

In [None]:
a[3:8:2]

In [None]:
A = np.arange(81).reshape(9, -1)
A

In [None]:
A[2:4]

In [None]:
A[:, 2:4]

In [None]:
A[2:4, 2:4]

In [None]:
A[-1]

* List of indices for every axis:

In [None]:
A = np.arange(81).reshape(9, -1)
A

In [None]:
A[[2, 4, 5], [0, 1, 3]]

* Boolean indexing is a really cool feature!

In [None]:
A = np.arange(11)
A

In [None]:
A[A % 5 != 3]

In [None]:
A[np.logical_and(A != 7, A % 5 != 3)]  # Boolean operations are available as well

## Examples

In [None]:
A = np.arange(120).reshape(10, -1)
A

1. Select all odd rows of A matrix. 
2. Build a one-dimentional array of all elements not divisible by 3 from all even rows of A.
3. Get sum of A diagonal elements.
4. Return every second diagonal element from A in reverse order.

`Shift` + `Tab` when the cursor is inside the open parentheses allows to peek into the docks. Try to press `Tab` several times.

In [None]:
# YOUR CODE HERE

## Compare with pure Python

As we have already said, numpy is **fast**. Let's take a look:

In [None]:
A_quick_arr = np.random.normal(size=(1000000,))
B_quick_arr = np.random.normal(size=(1000000,))

A_slow_list, B_slow_list = list(A_quick_arr), list(B_quick_arr)

In [None]:
from time import perf_counter  # proper way to measure performance in Python, not `time.time()`

In [None]:
start = perf_counter()
ans = 0

for i in range(len(A_slow_list)):
    ans += A_slow_list[i] * B_slow_list[i]

print(perf_counter() - start)  # run time in seconds

In [None]:
start = perf_counter()
ans = sum([A_slow_list[i] * B_slow_list[i] for i in range(1000000)])
print(perf_counter() - start)

In [None]:
start = perf_counter()
ans = np.sum(A_quick_arr * B_quick_arr)
print(perf_counter() - start)

In [None]:
start = perf_counter()
ans = A_quick_arr.dot(B_quick_arr)
print(perf_counter() - start)

# Pandas

Python Data Analysis Library (Pandas) came from traders for _simple_ data analysis.

Let's take a look on famous [data](https://www.kaggle.com/c/titanic/data) from  Titanic [kaggle competition](https://www.kaggle.com/c/titanic). The target: predict, will the passenger survive or not.
* What is the type of this problem?
* What are objects?
* What is target?
* Which features could we use?

In [None]:
# Download data

!wget -nc https://raw.githubusercontent.com/girafe-ai/ml-mipt/a46ca1f1d2c4aae45a36a367b5003ad775b4de8c/datasets/titanic.csv

In [None]:
import pandas as pd

By default pandas uses `pd.DataFrame` object to load and store table data.

In [None]:
pass_data = pd.read_csv("titanic.csv")

Dataset is a table: every row is object, evety column - a feature (target is also a feature).<br>
Let's get the first values of this table with `.head()` method:

In [None]:
pass_data.head(3)

Column names:

In [None]:
pass_data.columns

Indexing via zero-based indices, index values or feature columns:

In [None]:
pass_data[2:5]

In [None]:
pass_data.iloc[1:5, 1:3]

In [None]:
pass_data["name"].head()

In [None]:
pass_data[["name", "sex", "parch"]].head()

Some special queries (like SQL) are also here:

In [None]:
pass_data[pass_data["sex"] == "female"].head()

In [None]:
# Women older than 60 and men:
condition = (pass_data["sex"] == "female") & (pass_data["age"] >= 60) | (pass_data["sex"] == "male")
pass_data[condition].head()

## Example
Let's take a look, how many single women were on the ship back then.

In [None]:
complex_condition = (
    (pass_data.sex == "female")
    & (pass_data.age > 18)
    & (pass_data.age < 25)
    & (pass_data.sibsp == 0)
    & (pass_data.parch == 0)
)

pass_data[complex_condition].shape

Histograms are avalable as well:

In [None]:
pass_data.age.hist(bins=30)

## Dealing with DataFrame objects

* Renaming columns:

In [None]:
pass_data.rename(columns={"sex": "Sex"}, inplace=True)
pass_data.head()

* Applying functions to columns or rows (e.g. for preprocessing):

In [None]:
def get_last_name(name: str):
    return name.split(",")[0].strip()


last_names = pass_data["name"].apply(get_last_name)
last_names.head()

* Adding columns:

In [None]:
pass_data["Last_name"] = last_names
pass_data.head()

* Removing ones

In [None]:
pass_data.drop("Last_name", axis=1, inplace=True)
pass_data.head()

* Dealing with missing values:

`.isnull()` and `.notnull()` methods allows to get binaty array with corresponding mapping:

In [None]:
pass_data["boat"].isnull().head()

In [None]:
pass_data[pass_data["boat"].notnull()].head()  # passengers with known safe boat number

* Sorting and selecting features

In [None]:
pass_data.sort_values(by=["pclass", "fare"], ascending=True).head()

In [None]:
pass_data.sort_values(by=["pclass", "fare"], ascending=[True, False]).head()

## Aggregating the data

`.groupby()` method allows to group data in groups by some criteria. Just like `GROUP BY` in SQL.

`Shift` + `Tab` when the cursor is inside the open parentheses allows to peek into the docks. Try to press `Tab` several times.

In [None]:
pass_data.groupby("Sex")

In [None]:
pass_data.groupby("Sex")["pclass"].value_counts()

In [None]:
pass_data.groupby("pclass")["fare"].describe()

In [None]:
pass_data.groupby("Sex")["age"].mean()  # average age of passengers depending on sex

Children and women are saved first. Let's check.

In [None]:
pass_data.groupby("Sex")["survived"].mean()

Same case for different class passengers:

In [None]:
pass_data.groupby("pclass")["survived"].mean()

## Saving data

The data can be saved to the disc as well

In [None]:
pass_data.to_csv("another_titanic.csv", index=False)

# [Matplotlib](http://matplotlib.org)

great Python library for visualizations.

* [Basic Usage Tutorial](https://matplotlib.org/stable/tutorials/introductory/usage.html)
* [2D and 3D plotting](http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb)
* [Visualization in pandas](http://pandas.pydata.org/pandas-docs/stable/visualization.html)

![](https://matplotlib.org/stable/_images/anatomy.png)

In [None]:
import matplotlib.pyplot as plt

In [None]:
x = np.linspace(1, 10, 20)

Take a look at `axes` object from the `plt.figure`. E.g. they allow to combine different plots on same axes.

In [None]:
fig = plt.figure(figsize=(10, 6))

axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])

axes.plot(x, x**2, "r")
axes.plot(x, x**3, "b*--")

axes.set_xlabel("x")
axes.set_ylabel("y")
axes.set_title("title")
axes.legend([r"$x^2$", "x^3"], loc=0)

plt.show()

In [None]:
fig = plt.figure(figsize=(10, 6))

axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])

axes.scatter(x, x**2, color="red", marker="*", s=80)
axes.scatter(x, x**3)

axes.set_xlabel("x")
axes.set_ylabel("y")
axes.set_title("title")

plt.show()

Matplotlib provides great opportunity of plots personalization.

In [None]:
fig = plt.figure(figsize=(10, 6))

axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])

axes.plot(
    x,
    x**2,
    "r^-",
    label="$y = x^2$",
    markersize=8,
    markerfacecolor="yellow",
    markeredgewidth=1,
    markeredgecolor="green",
)
axes.plot(x, x**3, "b*--", label="$y = x^3$", alpha=0.5)

axes.set_xlabel("x")
axes.set_ylabel("y")
axes.set_title("title")
axes.legend(loc=0, fontsize=18)

plt.show()

## Subaxes

And allows to specify axes places and sizes:

In [None]:
fig = plt.figure()

axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])  # main axes
axes2 = fig.add_axes([0.2, 0.5, 0.4, 0.3])  # inset axes

# main figure
axes1.plot(x, x**2, "r")
axes1.set_xlabel("x")
axes1.set_ylabel("y")
axes1.set_title("title")

# insert
axes2.plot(x**2, x, "g")
axes2.set_xlabel("y")
axes2.set_ylabel("x")
axes2.set_title("insert title")

plt.show()

## Subplots

There are some classic variants though.

**This is recommended way to create multiple plots on single figure**

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(16, 5))

for pow_num, ax in enumerate(axes):
    ax.plot(x, x ** (pow_num + 1), "r")
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_title(f"$y = x^{pow_num + 1}$", fontsize=18)
fig.tight_layout()  # dealing with intersecting figures

It can be saved to the file as well.

In [None]:
fig.savefig("pows.png", dpi=200)

## 3d plots

are also avalable via Matplotlib. E.g. the surface plot:

In [None]:
alpha = 0.7
phi_ext = 2 * np.pi * 0.5


def flux_qubit_potential(phi_m, phi_p):
    return 2 + alpha - 2 * np.cos(phi_p) * np.cos(phi_m) - alpha * np.cos(phi_ext - 2 * phi_p)


phi_m = np.linspace(0, 2 * np.pi, 100)
phi_p = np.linspace(0, 2 * np.pi, 100)
X, Y = np.meshgrid(phi_p, phi_m)
Z = flux_qubit_potential(X, Y).T

In [None]:
fig = plt.figure(figsize=(14, 6))

ax = fig.add_subplot(111, projection="3d")

p = ax.plot_surface(X, Y, Z, rstride=4, cstride=4, linewidth=0, cmap="jet")

More advanced 3d plots are available in [plotly](https://plotly.com/python/getting-started/)

## Hisograms
are available too (and the pandas histogram actually calls the matplotlib's function). Matplotlib provides greatet freedom for histograms though.

In [None]:
fig = plt.figure()
axes = fig.add_axes([0.0, 0.0, 1.0, 1.0])
bins = 20
index = np.arange(bins)
axes.hist(pass_data[pass_data["Sex"] == "male"]["age"].dropna(), bins=bins, alpha=0.6, label="male")
axes.hist(
    pass_data[pass_data["Sex"] == "female"]["age"].dropna(), bins=bins, alpha=0.6, label="female"
)

axes.legend()
axes.set_xlabel("Age", fontsize=18)
axes.set_ylabel("Count", fontsize=18)
axes.set_title("Age by gender", fontsize=18)

plt.show()

Matplotlib is huge, and many different features are avaialble.<br>
For example, [here](http://matplotlib.org/gallery.html) and [here](http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb) are provided great posts about this.

# Bonus track

If you are familiar with everything above this line or it was too easy and you got ahead of the whole class, here is the bonus task:

![The Game of Life](https://upload.wikimedia.org/wikipedia/commons/e/e5/Gospers_glider_gun.gif)

## The game of life
Let's implement [The Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) using numpy matrix operations.

If you don't want to read the Wikipedia page about it, here are the short game of life rules:

* There is 2d grid with cells. Every cell is eather *alive* (1) or *dead* (0).
* If a living cell has 2 or 3 neighboors it survives. Otherwise it dies (0, 1, 4 and more neighbours).
* If a dead cell has exactly 3 neigbours it becomes alive.

In [None]:
%matplotlib notebook

In [None]:
def np_life_tick(curr_state: np.ndarray) -> np.ndarray:
    # YOUR CODE HERE
    pass

Here is visualization code provided for you:

In [None]:
plt.ion()

# Start life
life_state = np.random.choice([0, 1], size=(100, 100))

fig = plt.figure()
ax = fig.add_subplot(111)
fig.show()

for _ in range(100):
    # make a tick
    life_state = np_life_tick(life_state)

    # display the tick
    ax.clear()
    ax.imshow(life_state, cmap="gray")
    fig.canvas.draw()

And some beautiful initializations if you succeded:

In [None]:
life_state = np.arange(100) % 2 + np.zeros([100, 100])

life_state[47:51, 49:51] = 1

fig = plt.figure()
ax = fig.add_subplot(111)
fig.show()

for _ in range(100):
    life_state = np_life_tick(life_state)
    ax.clear()
    ax.imshow(life_state, cmap="gray")
    fig.canvas.draw()

# Credits <a class="tocSkip">

Authors: [Radoslav Neychev](https://github.com/neychev), [Vladislav Goncharneko](https://github.com/v-goncharenko)

Based on [Evgeny Sokolov](https://github.com/esokolov) and [YSDA](https://github.com/yandexdataschool) open materials.