# Coding Temple's Data Analytics Program  
---
# Advanced Python - Intro to `numpy`
---



## Part 1: Working with `numpy`


### 1.1 Importing `numpy`

We've already used the `numpy` package by importing it and assigning it the standard alias of `np`. Do this again in the following cell - the more you practice typing these lines of code, the easier it will be to remember.

In [22]:
# Import numpy and assign it the standard alias
# YOUR CODE HERE

import numpy as np

### 1.1 Solution - Run this cell to check your answer in 1.1. Please do not edit the values in this cell!

In [15]:
# DO NOT EDIT THIS CELL
assert np.__name__ == 'numpy', 'Make sure that you have properly imported numpy and aliased it as np!'

### 1.2 Generate random numbers

Create a `(5,3)` `numpy` array of random integer values between 0 and 100.

Use the the `random()` method in numpy to generate these integers. Name your new variable `myarray`. You should also print the array to check it's dimensions and values.

In [28]:
# Generate your random numbers
np.random.seed(1) #Seed generated for reproducibility

#YOUR CODE HERE
myarray = np.random.randint(0, 101,(5, 3))

# Print out the array
print(myarray)

[[37 12 72]
 [ 9 75  5]
 [79 64 16]
 [ 1 76 71]
 [ 6 25 50]]


### 1.2 Solution - Run the following cell to check your answer.

In [27]:
#DO NOT EDIT THIS CELL

#Verify the array was created with the correct name and has the proper shape
assert myarray.shape == (5,3), 'Make sure you create an array with the proper shape!'

### 1.3 Calculate BMI 

Using the two lists provided, please calculate the BMI(body mass index) of each individual using NDArrays. Save the variable containing your results as `bmi`

The formula for BMI in pounds and inches can be defined as: $BMI= \frac{703 * weight} {(height)^2}$

In [43]:
height = [55, 120, 90, 100]
weight = [170, 180, 190, 200]

arr_height = np.array(height)
arr_weight = np.array(weight)

bmi = 703 * (arr_weight / arr_height ** 2)
bmi = np.round(bmi, 2)
bmi

array([39.51,  8.79, 16.49, 14.06])

### 1.3 Solution: Run the following cell to check your answer.

In [38]:
assert 'bmi' in dir() , 'Make sure you have saved your results to the proper variable name!'
assert type(bmi) == np.ndarray, 'Make sure that you made the calculation using an NDArray for both height and weight!'

### 1.4 Create a function 

Create a function named `my_func` that will take in two parameters and will create a random matrix based off of those parameters. Extra: Have additional parameters taken in that allow the user to choose the shape and data type of the matrix.

In [108]:
np.random.seed(42)
def my_func (value1, value2):
    mymatrix = np.random.randint(0, 10, (value1, value2))
    return mymatrix

my_func(8,8)

array([[6, 3, 7, 4, 6, 9, 2, 6],
       [7, 4, 3, 7, 7, 2, 5, 4],
       [1, 7, 5, 1, 4, 0, 9, 5],
       [8, 0, 9, 2, 6, 3, 8, 2],
       [4, 2, 6, 4, 8, 6, 1, 3],
       [8, 1, 9, 8, 9, 4, 1, 3],
       [6, 7, 2, 0, 3, 1, 7, 3],
       [1, 5, 5, 9, 3, 5, 1, 9]])

### 1.5 Array practice

Time for some more practice. Run each of these tasks in the separate code cell listed below:

1.  Return the first row
2.  Return the last column
3.  Return the third column values from the 4th and 5th rows
4.  Multiply every value in the array by 2
5.  Divide every value by 3
6.  Increase the values in the first row by 12
7. Calculate the mean of the first column
8. Calculate the median of the array _after_ removing the 2 smallest values in the array
9. Calculate the standard deviation of the first 3 rows
10. Return values greater than 25 in the second column
11. Return values less than 40 in the array

In [109]:
np.random.seed(42)
def my_func (value1, value2):
    mymatrix = np.random.randint(0, 10, (value1, value2))
    return mymatrix

M = my_func(8,8)

# 1. Return the first row:
print(M[0,:])

[6 3 7 4 6 9 2 6]


In [110]:
# 2. Return the last column
print(M[:,7])

[6 4 5 2 3 3 3 9]


In [115]:
# 3. Return the third column values from the 4th and 5th rows
print(M[3:5,2])


[9 6]


In [119]:
# 4. Multiply every value in the array by 2
print(M * 2)

# or

print(M[:,:] * 2)

[[12  6 14  8 12 18  4 12]
 [14  8  6 14 14  4 10  8]
 [ 2 14 10  2  8  0 18 10]
 [16  0 18  4 12  6 16  4]
 [ 8  4 12  8 16 12  2  6]
 [16  2 18 16 18  8  2  6]
 [12 14  4  0  6  2 14  6]
 [ 2 10 10 18  6 10  2 18]]
[[12  6 14  8 12 18  4 12]
 [14  8  6 14 14  4 10  8]
 [ 2 14 10  2  8  0 18 10]
 [16  0 18  4 12  6 16  4]
 [ 8  4 12  8 16 12  2  6]
 [16  2 18 16 18  8  2  6]
 [12 14  4  0  6  2 14  6]
 [ 2 10 10 18  6 10  2 18]]


In [137]:
# 5. Divide every value by 3

print(M / 3)

[[2.         1.         2.33333333 1.33333333 2.         3.
  0.66666667 2.        ]
 [2.33333333 1.33333333 1.         2.33333333 2.33333333 0.66666667
  1.66666667 1.33333333]
 [0.33333333 2.33333333 1.66666667 0.33333333 1.33333333 0.
  3.         1.66666667]
 [2.66666667 0.         3.         0.66666667 2.         1.
  2.66666667 0.66666667]
 [1.33333333 0.66666667 2.         1.33333333 2.66666667 2.
  0.33333333 1.        ]
 [2.66666667 0.33333333 3.         2.66666667 3.         1.33333333
  0.33333333 1.        ]
 [2.         2.33333333 0.66666667 0.         1.         0.33333333
  2.33333333 1.        ]
 [0.33333333 1.66666667 1.66666667 3.         1.         1.66666667
  0.33333333 3.        ]]


In [138]:
# 6. Increase the values in the first row by 12
print(M[0,:] + 12)

[18 15 19 16 18 21 14 18]


In [140]:
# 7. Calculate the mean of the first column
print(int(np.mean(M[:,0])))

5


In [145]:
# 8. Calculate the median of the array after removing the 2 smallest values in the array
myarray = np.array([18, 15, 19, 16, 18, 21, 14, 18])
sorted_myarray = np.sort(myarray)
print(sorted_myarray)
clipped_myarray = sorted_myarray[2:]
print(clipped_myarray)

[14 15 16 18 18 18 19 21]
[16 18 18 18 19 21]


In [172]:
# 9. Calculate the standard deviation of the first 3 rows
# Generate a new array to work on
np.random.seed(2) # New seed for new array
newarray = np.array(np.concatenate((M[0:3, :])))
print(newarray)
print(np.std(newarray))


[6 3 7 4 6 9 2 6 7 4 3 7 7 2 5 4 1 7 5 1 4 0 9 5]
2.419538523492996


In [159]:
# 10. Return values in the second column greater than 25
# NOTE: I MADE IT MORE MEANINGFUL BY USING > 3
mask = (M[:,1]) > 3
print(M[mask,1])

[4 7 7 5]


In [160]:
# 11. Return values < 40 in the array
# NOTE: I MADE IT MORE MEANINGFUL BY USING < 3
mask = M < 3
print(M[mask])

[2 2 1 1 0 0 2 2 2 1 1 1 2 0 1 1 1]


### Solution 1.5: Run the following cell to view the solution for each of the above tasks.

A new array will be generated to demonstrate the solution - the values will not be the same as your array. But the code for each task will still apply.

In [146]:
# DO NOT EDIT THIS CELL
# SOLUTION 1.3

# SOLUTION: Make sure you have completed all of the above tasks
# Generate your random numbers (NEW ARRAY)
myarray = np.random.randint(100, size=(5, 3))
print('The generated array: \n', myarray)
print('\n')

# 1. Return the first row:
print('1. The first row: ', myarray[0])
print('\n')

# 2. Return the last column
print('2. The last column: ', myarray[:,-1])
print('\n')

# 3. Return the third column values from the 4th and 5th rows
print('3. The 3rd column, 4th & 5th rows: ', myarray[3:5,2])
print('\n')

# 4. Multiply every value in the array by 2
# (operates on the original array)
print('4. Multiply by 2: \n', myarray * 2)
print('\n')

# 5. Divide every value by 3
# (operates on the original array)
print('5. Divide by 3: \n', myarray / 3)
print('\n')

# 6. Increase the values in the first row by 12
# (operates on the original array)
print('6. Add 12 to the first row: \n', myarray[0,:] + 12)
print('\n')

# 7. Calculate the mean of the first column
print('7. The mean of the 1st column: ', myarray[:,0].mean())
print('\n')

# 8. Calculate the median of the array after removing the 2 smallest values in the array
# flatten and sort (axis=None does the flattening)
myarray = np.sort(myarray, axis=None)
# remove two smallest values
myarray = myarray[2:]
# calculate the median
print('8. The median after removing the 2 smallest values: ', np.median(myarray))
print('\n')

# 9. Calculate the standard deviation of the first 3 rows
# Generate new array first:
myarray = np.random.randint(100, size=(5, 3))
# Then calculate the std:
print('9. The standard deviation is: ', np.std(myarray[0:3,:]))
print('\n')

# 10. Return values in the second column greater than 25
# create a Boolean mask where values in the 2nd column > 25 are True
condition = myarray[:,1] > 25
# Apply the mask
print('10. All values in 2nd column > 25: \n', myarray[condition])
print('\n')

# 11. Return values < 40 in the array
# create another Boolean mask for values < 40
condition = myarray < 40
# apply the mask
print('11. All values < 40: \n', myarray[condition])

The generated array: 
 [[43 33 73]
 [61 99 13]
 [94 47 14]
 [71 77 86]
 [61 39 84]]


1. The first row:  [43 33 73]


2. The last column:  [73 13 14 86 84]


3. The 3rd column, 4th & 5th rows:  [86 84]


4. Multiply by 2: 
 [[ 86  66 146]
 [122 198  26]
 [188  94  28]
 [142 154 172]
 [122  78 168]]


5. Divide by 3: 
 [[14.33333333 11.         24.33333333]
 [20.33333333 33.          4.33333333]
 [31.33333333 15.66666667  4.66666667]
 [23.66666667 25.66666667 28.66666667]
 [20.33333333 13.         28.        ]]


6. Add 12 to the first row: 
 [55 45 85]


7. The mean of the 1st column:  66.0


8. The median after removing the 2 smallest values:  71.0


9. The standard deviation is:  24.03598125855752


10. All values in 2nd column > 25: 
 [[79 81 52]
 [59 40 28]
 [14 44 64]
 [88 70  8]]


11. All values < 40: 
 [23 25 28 14  8]
