## Numpy introduction

This notebook is a quick introduction to the numpy and pandas libraries. It is intended to be a quick reference for the most common operations.

The first thing we need to do is import the libraries. We will use the standard aliases for these libraries, `np` and `pd`.

```python

In [1]:
import numpy as np

### Task 0 (0.5 point)
Define the maxx function that takes two numpy arrays and returns the one, which has the largest sum of elements. If the sums are equal, the function should return the first array.


In [2]:
def maxx(arr1, arr2):
    if np.sum(arr1) >= np.sum(arr2):
        return arr1
    return arr2

In [3]:
arr1 = np.array([6, 7, 8, 9, 10, 11])
arr2 = np.array([22, 23])
maxx(arr1, arr2)

array([ 6,  7,  8,  9, 10, 11])

### Task 1 (0.5 point)
Define the meanest_of_them_all function that takes a list of numpy arrays and returns the one, which has the largest mean of elements. If the means are equal, the function should return the first array. The function should work for arbitrary shapes of the arrays.

In [4]:
def meanest_of_them_all(arrays):
    idx = np.argmax(np.array([np.mean(arr) for arr in arrays]))
    return arrays[idx]

In [5]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([6, 7, 8, 9, 10])
arr3 = np.array([22, 23])
meanest_of_them_all([arr1, arr2, arr3])

array([22, 23])

### Task 2 (1 point)
Create 2 1D random integer (0, 100) numpy arrays of size 100, then create new array by choosing elements that appear in both arrays. Then replace all elements that are less than 50 with 0 and all elements that are greater than 50 with 1. Finally, calculate the mean of the resulting array.

In [6]:
a  = np.random.randint(0, 100, 100)
b = np.random.randint(0, 100, 100)
c = np.intersect1d(a,b)
c = np.where(c < 50, 0, 1)
print(np.mean(c))

0.35714285714285715


### Task 3 (1 point)
Construct the following block matrix (without explicitly writing it!):
```
[[10, 10, 10, 10, 10, 1,  0,  0,  0,  0],
 [10, 10, 10, 10, 10, 0,  1,  0,  0,  0],
 [10, 10, 10, 10, 10, 0,  0,  1,  0,  0],
 [10, 10, 10, 10, 10, 0,  0,  0,  1,  0],
 [10, 10, 10, 10, 10, 0,  0,  0,  0,  1],
 [ 1,  0,  0,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  1,  0,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  1,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  0,  1,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  0,  0,  1, 10, 10, 10, 10, 10]]
```
Then calculate its determinant.

In [7]:
a = 10*np.ones((5, 5))
b = np.eye(5)
c = np.kron(np.eye(2), a) + np.kron(np.eye(2)[::-1], b)
print(c)
print(np.linalg.det(c))

[[10. 10. 10. 10. 10.  1.  0.  0.  0.  0.]
 [10. 10. 10. 10. 10.  0.  1.  0.  0.  0.]
 [10. 10. 10. 10. 10.  0.  0.  1.  0.  0.]
 [10. 10. 10. 10. 10.  0.  0.  0.  1.  0.]
 [10. 10. 10. 10. 10.  0.  0.  0.  0.  1.]
 [ 1.  0.  0.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  1.  0.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  1.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  0.  1.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  0.  0.  1. 10. 10. 10. 10. 10.]]
2499.000000000004


### Task 4 (1 point)
Replace all nan values with the mean of the array. Then normalize the array by subtracting the mean and dividing by the standard deviation.

In [8]:
a = np.array([1,2,3,np.nan,5,6,7,np.nan,8,9,10,np.nan])
a[np.isnan(a)] = np.mean(a[~np.isnan(a)])
a = (a - np.mean(a)) / np.std(a)
print(a)

[-1.80739223 -1.42009389 -1.03279556  0.         -0.25819889  0.12909944
  0.51639778  0.          0.90369611  1.29099445  1.67829278  0.        ]


### Task 5 (1 point)

Treating separate rows as independent vectors, calculate the projection of vector v onto vector w for each rows of matrices Vs and Ws.

In [9]:
Vs = np.random.uniform(size=(100, 20))
Ws = np.random.uniform(size=(100, 20))

In [10]:
def projection(v, w):
    return np.dot(v, w) / np.dot(w, w) * w

In [11]:
res = np.array([projection(v, w) for v, w in zip(Vs, Ws)])

### Task 6 (2 points)
Generate 1000 x 1000 numpy random array. Then fill some random elements with nans. Then replace all nan values with the mean of the array. Finally save the array to a file, simulataneously saving the labels for each element. The labels should be calculated as follows: if the element is greater than 70% of all numbers in the array, then the label is equal to 1, otherwise the label is 0. The labels should be saved in a separate file.

In [12]:
a = np.random.uniform(0, 100, size=(1000, 1000))

# Fill random indices with nans
a[np.random.randint(0, 1000, size=1000), np.random.randint(0, 1000, size=1000)] = np.nan

# If value is nan replace it with mean of the array
a[np.isnan(a)] = np.mean(a[~np.isnan(a)])

# Sort the array and determine the threshold value
values_sorted = np.sort(a.flatten())
threshold = values_sorted[int(len(values_sorted) * 0.7)]

# Label the values above the threshold as 1 and the rest as 0
labels = np.where(a < threshold, 0, 1)

# Save the data and labels
np.save('labels.npy', labels)
np.save('data.npy', a)
print(np.load('data.npy'))
print(np.load('labels.npy'))

[[7.40214100e+01 5.88891311e+01 2.36034417e+01 ... 6.61243835e+01
  5.05846904e+01 6.24111660e+01]
 [3.12088085e+01 7.40067052e+01 9.26969330e+01 ... 8.55937632e+01
  8.32142759e+01 2.58733264e+01]
 [6.17190077e+01 7.88541330e+01 9.28315696e+01 ... 7.52583687e+01
  1.98473064e+01 5.62319867e+01]
 ...
 [5.73772481e+00 9.37632423e+01 6.07264053e+01 ... 3.41938171e+01
  5.35388439e+01 4.57939955e+01]
 [1.71136953e+00 4.41008057e+01 7.14846887e+01 ... 9.10550348e+01
  1.35255139e+00 3.89461426e+01]
 [4.53032979e-02 5.38392776e+00 8.20562565e+01 ... 1.96225347e+01
  9.13341033e+00 2.86104024e+01]]
[[1 0 0 ... 0 0 0]
 [0 1 1 ... 1 1 0]
 [0 1 1 ... 1 0 0]
 ...
 [0 1 0 ... 0 0 0]
 [0 0 1 ... 1 0 0]
 [0 0 1 ... 0 0 0]]
