In [None]:
%matplotlib inline

import collections
import sklearn.datasets
import numpy as np
import matplotlib.pyplot as plt

def show_image(img):
    plt.figure(figsize=(8, 8))
    if not isinstance(img[0][0], collections.Iterable):
        plt.imshow(img, cmap='gray')
    else:
        plt.imshow(img)
    plt.xticks([])
    plt.yticks([])

In [None]:
# You can reload data here
olivetti = sklearn.datasets.fetch_olivetti_faces()
iris = sklearn.datasets.load_iris()
boston = sklearn.datasets.load_boston()
digits = sklearn.datasets.load_digits()
locals().update(np.load('data/toy_data.npz'))
starry_bw_lst = list(map(list, starry_bw))

### Array shape manipulation

* `np.reshape`, `np.ndarray.flatten`
* `np.newaxis`
* `np.transpose`

**Excercise:** Form the 2-D array (without typing it in explicitly)
```python
[[1,  6, 11],
 [2,  7, 12],
 [3,  8, 13],
 [4,  9, 14],
 [5, 10, 15]]
```

**Excercise:** Transform olivetti data set to a set of half faces

In [None]:
show_image(olivetti.images[0, :, :32]) # element 0
show_image(olivetti.images[0, :, 32:]) # element 1
# ... (shape=(800, 64, 32))

### Broadcasting

> When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when  
> * they are equal, or  
> * one of them is 1  

[More info](https://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc)

**Excercise:** Divide each column of the array:

```python
>>> import numpy as np
a = np.arange(25).reshape(5, 5)
```

elementwise with the array ```b = np.array([1., 5, 10, 15, 20])```. (Hint: np.newaxis).

**Excercise:** Normalize every image of olivetti faces dataset, show some of them after normalization

**Excercise:** Make Starry Night $r$-times red than it is, then normalize the colors(at each pixel)

In [None]:
r = 1.25
img = starry_night.copy()
    
# Your code here

show_image(img)

**Excercise:** Write a script that computes the Mandelbrot fractal. The Mandelbrot iteration:

![](https://www.scipy-lectures.org/_images/sphx_glr_plot_mandelbrot_001.png)

```python
N_max = 50
some_threshold = 50

c = x + 1j*y

z = 0
for j in range(N_max):
    z = z**2 + c
```

Point $(x, y)$ belongs to the Mandelbrot set if $|z|$ $<$ some_threshold.

### Sorting

* `np.sort`, `np.argsort`

**Excercise:** Print 10 highest tax values in boston houses dataset

**Excercise:** Sort boston houses dataset based on crime rate

**Excercise:** Sort boston houses dataset based on crime rate

**Excercise:** Compute top-3 accuracy of the following classifier on test set

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(digits.data, digits.target, test_size=1700)

clf = LogisticRegression(solver='lbfgs', multi_class='ovr', max_iter=1000)
clf.fit(train_x, train_y)

predictions = clf.predict_proba(test_x)