## Numpy + Pandas introduction

This notebook is a quick introduction to the numpy and pandas libraries. It is intended to be a quick reference for the most common operations.

The first thing we need to do is import the libraries. We will use the standard aliases for these libraries, `np` and `pd`.

```python

In [2]:
import numpy as np
import pandas as pd

### Task 1 (1 point)
Create 2 1D random integer (0, 100) numpy arrays of size 100, then create new array by choosing elements that appear in both arrays. Then replace all elements that are less than 50 with 0 and all elements that are greater than 50 with 1. Finally, calculate the mean of the resulting array.

In [19]:
a  = np.random.randint(0, 100, 100)
b = np.random.randint(0, 100, 100)
c = np.intersect1d(a,b)
c = np.where(c < 50, 0, 1)
print(np.mean(c))

0.4418604651162791


### Task 2 (1 point)
Construct the following block matrix (without explicitly writing it!):
```
[[10, 10, 10, 10, 10, 1,  0,  0,  0,  0],
 [10, 10, 10, 10, 10, 0,  1,  0,  0,  0],
 [10, 10, 10, 10, 10, 0,  0,  1,  0,  0],
 [10, 10, 10, 10, 10, 0,  0,  0,  1,  0],
 [10, 10, 10, 10, 10, 0,  0,  0,  0,  1],
 [ 1,  0,  0,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  1,  0,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  1,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  0,  1,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  0,  0,  1, 10, 10, 10, 10, 10]]
```
Then calculate its determinant.

In [42]:
a = 10*np.ones((5, 5))
b = np.eye(5)
c = np.kron(np.eye(2), a) + np.kron(np.eye(2)[::-1], b)
print(c)
print(np.linalg.det(c))

[[10. 10. 10. 10. 10.  1.  0.  0.  0.  0.]
 [10. 10. 10. 10. 10.  0.  1.  0.  0.  0.]
 [10. 10. 10. 10. 10.  0.  0.  1.  0.  0.]
 [10. 10. 10. 10. 10.  0.  0.  0.  1.  0.]
 [10. 10. 10. 10. 10.  0.  0.  0.  0.  1.]
 [ 1.  0.  0.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  1.  0.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  1.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  0.  1.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  0.  0.  1. 10. 10. 10. 10. 10.]]
2499.000000000004


### Task 3 (1 point)
Replace all nan values with the mean of the array. Then normalize the array by subtracting the mean and dividing by the standard deviation.

In [17]:
a = np.array([1,2,3,np.nan,5,6,7,np.nan,8,9,10,np.nan])
a[np.isnan(a)] = np.mean(a[~np.isnan(a)])
a = (a - np.mean(a)) / np.std(a)
print(a)

[-1.80739223 -1.42009389 -1.03279556  0.         -0.25819889  0.12909944
  0.51639778  0.          0.90369611  1.29099445  1.67829278  0.        ]


### Task 4 (2 points)
Generate 1000 x 1000 numpy random array. Then fill some random elements with nans. Then replace all nan values with the mean of the array. Finally save the array to a file, simulataneously saving the labels for each element. The labels should be calculated as follows: if the element is greater than 70% of all numbers in the array, then the label is equal to 1, otherwise the label is 0. The labels should be saved in a separate file.

In [21]:
a = np.random.uniform(0, 100, size=(1000, 1000))
a[np.random.randint(0, 1000, size=1000), np.random.randint(0, 1000, size=1000)] = np.nan
a[np.isnan(a)] = np.mean(a[~np.isnan(a)])
a = np.sort(a, axis=1)
border = a[int(len(a) * 0.7)]
labels = np.where(a < border, 0, 1)
np.save('labels.npy', labels)
np.save('data.npy', a)
print(np.load('data.npy'))
print(np.load('labels.npy'))

[[8.77832594e-02 9.58256034e-02 2.77336524e-01 ... 9.98532122e+01
  9.98616953e+01 9.99901002e+01]
 [1.04112935e-02 9.35591048e-02 2.12325820e-01 ... 9.99061533e+01
  9.99277998e+01 9.99585248e+01]
 [1.35955629e-01 1.57555911e-01 1.74426940e-01 ... 9.96688562e+01
  9.97392602e+01 9.99535375e+01]
 ...
 [3.22272555e-02 8.94016744e-02 3.14773968e-01 ... 9.98513527e+01
  9.98858927e+01 9.99681922e+01]
 [6.32194986e-03 2.68424955e-02 1.29943390e-01 ... 9.93178029e+01
  9.97417596e+01 9.98866333e+01]
 [5.19759465e-02 2.60742096e-01 4.90757031e-01 ... 9.93831759e+01
  9.97960476e+01 9.98347742e+01]]
[[0 0 0 ... 1 0 1]
 [0 0 0 ... 1 0 0]
 [1 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 1 0 0]
 [0 0 0 ... 0 0 0]
 [0 1 1 ... 0 0 0]]
