# Problem Solving Session 12
## Topic: Lecture 8 - Numpy
##### Date: March 6, 2021
##### By: Hermine Grigoryan

[Numpy - cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)

## Imports

In [1]:
import numpy as np
np.set_printoptions(suppress=True, precision=4)

## Exercises

### Exercise 1

Write a function which calculates the Euclidean distance between two arrays.
$d\left( p,q\right)   = \sqrt {\sum _{i=1}^{n}  \left( q_{i}-p_{i}\right)^2 }$

1. Use list comprehensions
2. Use numpy functionality
3. Compare the two approaches



In [2]:
np.random.seed(42)
one = np.random.randint(0, 10, 1_000_000) # 1,000,000 random integers [0-9]
two = np.random.randint(0, 10, 1_000_000)

In [3]:
print('len:', len(one), len(two))
print('dim:', one.ndim, two.ndim)
print('shape:', one.shape, two.shape)
print('type:', type(one), type(two))
print('type:', one.dtype, two.dtype)

len: 1000000 1000000
dim: 1 1
shape: (1000000,) (1000000,)
type: <class 'numpy.ndarray'> <class 'numpy.ndarray'>
type: int32 int32


In [4]:
def euclidean_distance(one, two, calc_type='numpy'):
    if calc_type=='list_comp':
        distance = (sum([(i-j)**2 for i,j in zip(one, two)]))**0.5
    elif calc_type=='numpy':
        distance = np.sqrt(np.sum((np.array(one)-np.array(two))**2))
    else:
        distance = 'The possible calculation types are "list_comp" and "numpy"'
    return distance    

In [5]:
%%time
euclidean_distance(one, two, 'numpy')

Wall time: 11 ms


4061.5244674875466

In [6]:
%%time
euclidean_distance(one, two, 'list_comp')

Wall time: 641 ms


4061.5244674875466

In [7]:
%%timeit
euclidean_distance(one, two, 'numpy')

6.93 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [8]:
%%timeit
euclidean_distance(one, two, 'list_comp')

578 ms ± 7.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Exercise 2
Write a function to create a 2D array of arbitrary shape. This array should have all zero values, except for the elements around the border (i.e., the first and last rows, and the first and last columns), which should have a value of one.

In [9]:
def border_matrix(rows, cols):
    ones = np.ones((rows,cols))
    ones[1:rows-1, 1:cols-1] = 0
    return ones

In [10]:
border_matrix(8, 10)

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

### Exercise 3
Now, we will create a table on 100 basketball players. Create height, weight, age and BMI (BMI = kg/m^2) for 100 players. 
* Calculate summary statistics. 
* What is the age of the fifth player?  
* What is the weight of the tallest player?
* How many players are older than 30?
* Convert the height from centimeter to inches, and weight from kg to pounds.
  - 1 cm = 0.393701 inches, 1 kg = 2.20462 pounds
* Normalize the Age, so that the values range between 0 and 1.

In [11]:
np.random.seed(42)
height = np.random.normal(190, 5, 100)
weight = np.random.normal(85, 10, 100)
age = np.floor(np.random.normal(25, 3, 100))
bmi = weight/(height/100)**2

In [12]:
player_data = np.column_stack((height, weight, age, bmi))
player_data[:10,] #show the first 10 rows

array([[192.4836,  70.8463,  26.    ,  19.1218],
       [189.3087,  80.7935,  26.    ,  22.5442],
       [193.2384,  81.5729,  28.    ,  21.8453],
       [197.6151,  76.9772,  28.    ,  19.7116],
       [188.8292,  83.3871,  20.    ,  23.3863],
       [188.8293,  89.0405,  22.    ,  24.9717],
       [197.8961, 103.8619,  26.    ,  26.5205],
       [193.8372,  86.7458,  26.    ,  23.0874],
       [187.6526,  87.5755,  26.    ,  24.8699],
       [192.7128,  84.2555,  36.    ,  22.687 ]])

In [13]:
player_data.shape

(100, 4)

In [14]:
# Summary statistics
print('Mean', np.mean(player_data, axis=0))
print('St.dev', np.std(player_data, axis=0))
print('Median', np.mean(player_data, axis=0))
print('Min', np.min(player_data, axis=0))
print('Max', np.max(player_data, axis=0))

Mean [189.4808  85.223   24.71    23.7966]
St.dev [4.5181 9.4889 3.2227 3.0762]
Median [189.4808  85.223   24.71    23.7966]
Min [176.9013  65.8123  15.      18.6835]
Max [199.2614 112.2017  36.      34.6062]


In [15]:
# Age of the fifth player
player_data[5,2]

22.0

In [16]:
# Weight of the tallest player
idx = player_data[:,0].argmax() #identifying the index of the max height
player_data[idx, 1]

85.68562974806028

In [17]:
# Number of players older than 30
np.sum(player_data[:,2]>30)

5

In [18]:
conversion = np.array([0.393701, 2.20462, 1, 1]) # creating an array for conversion
player_data = player_data*conversion
player_data[:10]

array([[ 75.781 , 156.1892,  26.    ,  19.1218],
       [ 74.531 , 178.1191,  26.    ,  22.5442],
       [ 76.0782, 179.8371,  28.    ,  21.8453],
       [ 77.8013, 169.7055,  28.    ,  19.7116],
       [ 74.3423, 183.837 ,  20.    ,  23.3863],
       [ 74.3423, 196.3005,  22.    ,  24.9717],
       [ 77.9119, 228.9759,  26.    ,  26.5205],
       [ 76.3139, 191.2415,  26.    ,  23.0874],
       [ 73.879 , 193.0707,  26.    ,  24.8699],
       [ 75.8712, 185.7515,  36.    ,  22.687 ]])

In [19]:
# Normalizing Age using min-max scaling
max_val = player_data[:,2].max()
min_val = player_data[:,2].min()

player_data[:,2] = (player_data[:,2] - min_val)/(max_val - min_val)
player_data[:10]

array([[ 75.781 , 156.1892,   0.5238,  19.1218],
       [ 74.531 , 178.1191,   0.5238,  22.5442],
       [ 76.0782, 179.8371,   0.619 ,  21.8453],
       [ 77.8013, 169.7055,   0.619 ,  19.7116],
       [ 74.3423, 183.837 ,   0.2381,  23.3863],
       [ 74.3423, 196.3005,   0.3333,  24.9717],
       [ 77.9119, 228.9759,   0.5238,  26.5205],
       [ 76.3139, 191.2415,   0.5238,  23.0874],
       [ 73.879 , 193.0707,   0.5238,  24.8699],
       [ 75.8712, 185.7515,   1.    ,  22.687 ]])

### Exercise 4

* Convert the following 1-D array with 12 elements into a 3-D array. 
* Flatten back the array. 
* Find the sum of all multiples of 3 and 5.

In [20]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

In [21]:
arr.reshape((2, 3, 2)) #height, row, column

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]]])

In [22]:
arr.resize((2, 3, 2)) #in-place calculation

In [23]:
arr

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]]])

In [24]:
arr.reshape(-1) #flattens the nD array to 1D array

#OR

arr.ravel()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [25]:
np.sum(arr[(arr % 3 == 0) | (arr % 5 == 0)])

45

### Exercise 5
Generate a 10 x 3 array of random numbers (in range [0,1]). For each row, pick the number closest to 0.5.

In [26]:
mat = np.random.rand(10, 3) # creates random numbers [0-1]
mat

array([[0.282 , 0.1774, 0.7506],
       [0.8068, 0.9905, 0.4126],
       [0.372 , 0.7764, 0.3408],
       [0.9308, 0.8584, 0.429 ],
       [0.7509, 0.7545, 0.1031],
       [0.9026, 0.5053, 0.8265],
       [0.32  , 0.8955, 0.3892],
       [0.0108, 0.9054, 0.0913],
       [0.3193, 0.9501, 0.9506],
       [0.5734, 0.6318, 0.4484]])

In [27]:
indx = abs(mat-0.5).argmin(axis=1) # identifies the index which has the closest values to 0.5
indx

array([0, 2, 0, 2, 0, 1, 2, 1, 0, 2], dtype=int64)

In [28]:
mat[np.arange(mat.shape[0]), indx] # slices the values with the indexes identified above

array([0.282 , 0.4126, 0.372 , 0.429 , 0.7509, 0.5053, 0.3892, 0.9054,
       0.3193, 0.4484])

### Exercise 6
A magic square is a matrix all of whose row sums, column sums and the sums of the two diagonals are the same. (One diagonal of a matrix goes from the top left to the bottom right, the other diagonal goes from top right to bottom left.) Write a program to check wheter the provided matrix is a magic square.

In [29]:
A=np.array(
   [[17, 24, 1, 8, 15],
    [23, 5, 7, 14, 16],
    [ 4, 6, 13, 20, 22],
    [10, 12, 19, 21, 3],
    [11, 18, 25, 2, 9]]
)

In [30]:
# instead of min-max, we could use np.unique() as well.
col = A.sum(axis=0).min() == A.sum(axis=0).max() # if min and max are equal, then there is only one unique value
row = A.sum(axis=1).min() == A.sum(axis=1).max()
diag = np.diag(A).sum() == np.diag(np.fliplr(A)).sum() # fliplr flips the matrix

In [31]:
if col and row and diag:
    magic = np.array([A.sum(axis=0).min(),
                     A.sum(axis=1).min(),
                     np.diag(A).sum()])
    
    if len(np.unique(magic)) == 1:
        print('Magic square!')
else:
    print('Not a magic square!')

Magic square!


### Exercise 7
Using Numpy, create the following 8x8 matrix.

      [[0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0]]

In [32]:
mat = np.zeros((8,8), dtype='int')
mat[1::2, ::2] = 1
mat[::2, 1::2] = 1
mat

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0]])