## Numpy introduction

This notebook is a quick introduction to the numpy library. It is intended to be a quick reference for the most common operations.

The first thing we need to do is import `numpy`. We will use the standard alias for it: `np`.

```python

In [1]:
import numpy as np

### Task 0 (0.5 point)
Define the maxx function that takes two numpy arrays and returns the one, which has the largest sum of elements. If the sums are equal, the function should return the first array.


In [16]:
def maxfun(r1, r2):
    sum1 = r1.sum(axis=None)
    print(sum1)
    sum2 = r2.sum()
    if sum1 >= sum2:
        return r1
    else:
        return r2

r1 = np.array([[1,2,5,3],[2,3,5,1]])
r2 = np.array([1,2,5])
maxfun(r1, r2)

22


array([[1, 2, 5, 3],
       [2, 3, 5, 1]])

### Task 1 (0.5 point)
Define the meanest_of_them_all function that takes a list of numpy arrays and returns the one, which has the largest mean of elements. If the means are equal, the function should return the first array. The function should work for arbitrary shapes of the arrays.

In [30]:
def meanest_of_them_all(np_arrays):
    means = [np.nanmean(ar) for ar in np_arrays]
    max_mean = max(means)
    max_means_positions = [index for index, mean in enumerate(means) if np.isclose(mean, max_mean, 1e-10)]
    return np_arrays[max_means_positions[0]]

meanest_of_them_all([np.array([3., 2, 1]), np.array([3+1.0e-6, 2, 1]), np.array([1, 2, 3]), np.array([2, 2, 2])])

array([3.000001, 2.      , 1.      ])

### Task 2 (1 point)
Create 2 1D random integer (0, 100) numpy arrays of size 100, then create new array by choosing elements that appear in both arrays. Then replace all elements that are less than 50 with 0 and all elements that are greater than 50 with 1. Finally, calculate the mean of the resulting array.

In [33]:
ra = np.random.randint(low=0,high=100,size=100,dtype=int)
rb = np.random.randint(low=0,high=100,size=100,dtype=int)
print(ra)
print(rb)
common = np.intersect1d(ra,rb)
print(common)
common_replaced = common > 50
print(common_replaced)
print(common_replaced.mean())

[10 75 27 64 27  1 98 59 28 17 28 66 36 44 81 42 57 74 64 58 28 40 38 64
 84 82  5 40 64 66 34 94 37 27 51 84  1 67 28 47  5 60 72 83  2 66 86 11
 44 57 71  2 91  5 78 89 84  3 69 99  6 45 16 32 26  6 68 54  3 88 21 94
 60 78 60  2 98 43 72 42 10 52 66 96 52 92 82  6  4 22 10 23 78 24 32 68
 33 63 44 28]
[54 92 50 91 55  5  3 87 85 84 35 68 21 76 76 13 51  5 44  2 94 36 78 17
 24 47 51 72 90  0 95 60 26 59 54 49 65 71 64 95 86 87 11 48 83 28 25 79
 39 63 82 28 43 76 94 20 27  4 93 14 41  4  3 83 19 25 65 68  1 63 17 91
 41 14 85  7 82 66 12 84 84 49  0 80 98 77 60 54 50 84 68  0 74 97 40  3
 93 68  7 48]
[ 1  2  3  4  5 11 17 21 24 26 27 28 36 40 43 44 47 51 54 59 60 63 64 66
 68 71 72 74 78 82 83 84 86 91 92 94 98]
[False False False False False False False False False False False False
 False False False False False  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True]
0.5405405405405406


### Task 3 (1 point)
Construct the following block matrix (without explicitly writing it!):
```
[[10, 10, 10, 10, 10, 1,  0,  0,  0,  0],
 [10, 10, 10, 10, 10, 0,  1,  0,  0,  0],
 [10, 10, 10, 10, 10, 0,  0,  1,  0,  0],
 [10, 10, 10, 10, 10, 0,  0,  0,  1,  0],
 [10, 10, 10, 10, 10, 0,  0,  0,  0,  1],
 [ 1,  0,  0,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  1,  0,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  1,  0,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  0,  1,  0, 10, 10, 10, 10, 10],
 [ 0,  0,  0,  0,  1, 10, 10, 10, 10, 10]]
```
Then calculate its determinant.

In [35]:
a = np.ones((5,5))*10
b = np.eye(5)

arr1 = np.concatenate((a,b),axis=1)
arr2 = np.concatenate((b,a), axis=1)

arr = np.concatenate((arr1,arr2))

print(arr)
print(np.linalg.det(arr))

[[10. 10. 10. 10. 10.  1.  0.  0.  0.  0.]
 [10. 10. 10. 10. 10.  0.  1.  0.  0.  0.]
 [10. 10. 10. 10. 10.  0.  0.  1.  0.  0.]
 [10. 10. 10. 10. 10.  0.  0.  0.  1.  0.]
 [10. 10. 10. 10. 10.  0.  0.  0.  0.  1.]
 [ 1.  0.  0.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  1.  0.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  1.  0.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  0.  1.  0. 10. 10. 10. 10. 10.]
 [ 0.  0.  0.  0.  1. 10. 10. 10. 10. 10.]]
2498.9999999999973


### Task 4 (1 point)
Replace all nan values with the mean of the array. Then normalize the array by subtracting the mean and dividing by the standard deviation.

In [38]:
from cmath import nan


def normalize_array(array):
    mean = np.nanmean(array)
    array[np.isnan(array)] = mean
    return (array - mean)/ np.std(array)

a = np.array([1,2,3,4,nan])
print(normalize_array(a))

[-1.5 -0.5  0.5  1.5  0. ]


### Task 5 (1 point)

Treating separate rows as independent vectors, calculate the projection of vector v onto vector w for each rows of matrices Vs and Ws.

In [51]:
v1 = [1,2,3,4,5]
v2 = [[1,1,1,1,1],[1,1,1,1,2],[1,1,1,3,1]]


def projection(a, arrays):
    projections = []
    for arr in arrays:
        projections.append(np.dot(a, arr) / np.dot(arr, arr) * arr)
    
    return projections

def task_5(a):
    res = []
    for vector in a:
        res.append(projection(vector, a))
    
    return res


# projection(np.array([1, 2]), [np.array([1, 2]), np.array([2, 3]), np.array([3, 4])])

task_5([np.array([1, 2]), np.array([2, 3]), np.array([3, 4])])


[[array([1., 2.]), array([1.23076923, 1.84615385]), array([1.32, 1.76])],
 [array([1.6, 3.2]), array([2., 3.]), array([2.16, 2.88])],
 [array([2.2, 4.4]), array([2.76923077, 4.15384615]), array([3., 4.])]]

### Task 6 (2 points)
Generate 1000 x 1000 numpy random array. Then fill some random elements with nans. Then replace all nan values with the mean of the array. Finally save the array to a file, simulataneously saving the labels for each element. The labels should be calculated as follows: if the element is greater than 70% of all numbers in the array, then the label is equal to 1, otherwise the label is 0. The labels should be saved in a separate file.

In [58]:
N = 1000
rnd_arr = np.random.rand(N,N)
some = int(0.25*N*N)

rnd_arr = rnd_arr.flatten()
for i in range(some):
    #rnd_arr[np.random.randint(N), np.random.randint(N)] = np.nan
    rnd_arr[np.random.randint(N*N)] = np.nan

mean = np.nanmean(rnd_arr)
rnd_arr[np.isnan(rnd_arr)] = mean

perc70 = np.sort(rnd_arr)[int(0.7*N*N)]
perc70
#labels = (rnd_arr > perc70).astype(int)

#np.savetxt("labels.txt", labels, fmt = '%d')
#np.savetxt("rnd_arr.txt", rnd_arr)


0.6998344945813877