## Numpy practice

##### Numpy is a library for representing and working with large and multi-dimensional arrays. Most other libraries in the data-science ecosystem depend on numpy, making it one of the fundamental data science libraries. It provides a number of useful tools for scientific programming.

In [1]:
import numpy as np

#### The Numpy Array
- numpy provides an array type that goes beyond what Python lists can do
- We can create a numpy array by passing a list to the `np.array()` function

In [3]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

- We can create multi-dimensional arrays by passing a list of lists to the `array` function.

In [4]:
matrix = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

#### Indexing
- Referencing elements in numpy arrays at it's most basic is the same as referencing elements in Python lists.

In [5]:
a[0]

1

In [6]:
print(f'a    == {a}')
print(f'a[0] == {a[0]}')
print(f'a[1] == {a[1]}')
print(f'a[2] == {a[2]}')

a    == [1 2 3]
a[0] == 1
a[1] == 2
a[2] == 3


- However, multidimensional numpy arrays are easier to index into. To obtain the element at the second column in the second row, we would write:

In [7]:
matrix[1, 1]

5

- To get the first 2 elements of the last 2 rows:

In [10]:
matrix[1:,:2]

array([[4, 5],
       [7, 8]])

#### Vectorized Operations
- Another useful feature of numpy arrays is being able to perform vectorized operations
- e.g. if we wanted to add 1 to every element in a normal Python list, we get a TypeError or have to use a loop or list comprehension:


In [17]:
python_list = [1, 2, 3, 4, 5]
try:
    python_list + 1
except TypeError as e:
    print('An Error Occurred')
    print(f'TypeError: {e}')

An Error Occurred
TypeError: can only concatenate list (not "int") to list


In [12]:
list_plus_1 = [x + 1 for x in python_list]
list_plus_1

[2, 3, 4, 5, 6]

- Vectorizing allows an operation to be applied to every element in a vector, which in our case is a numpy array.
- So, with a numpy array, we can just add one like this:


In [13]:
# let's convert that original python list to a numpy array:
numpy_array = np.array(python_list)
numpy_array

array([1, 2, 3, 4, 5])

In [14]:
# now we can just add one or use any other basic operator to transform the entire array
numpy_array + 1

array([2, 3, 4, 5, 6])

- Comparison operators are also vectorized in an array:

In [19]:
my_array = np.array([-3, 0, 3, 16])

print(f'my_array       == {my_array}')
print(f'my_array == -3 == {my_array == -3}')
print('my_array >= 0  == {}'.format(my_array >= 0))
print('my_array < 10  == {}'.format(my_array < 10))


my_array       == [-3  0  3 16]
my_array == -3 == [ True False False False]
my_array >= 0  == [False  True  True  True]
my_array < 10  == [ True  True  True False]


- we can also use comparison operators to select a certain subset of an array.
- this is essentially a boolean "slice" instead of an index slice.


In [21]:
# select all values in array greater than zero.
my_array[my_array > 0]

array([ 3, 16])

In [22]:
# get the even numbers in the array
my_array[my_array % 2 == 0]

array([ 0, 16])

#### Array Creation
- Make an array of a specified length from the standard normal distribution

In [24]:
np.random.randn(10)

array([ 0.03025169, -0.63835195, -1.27993281, -0.1023321 ,  1.79367735,
        0.35724932,  0.14806355,  0.97087283, -0.89338799,  1.69387885])

- we transform this to a normal distribution with mean $\mu$ and standard deviation $\sigma$ by applying some basic math:

In [28]:
μ = 100
σ = 30

σ * np.random.randn(20) + μ

array([132.59280904,  88.54152996, 106.25803488,  96.66699514,
       139.55411366,  96.58333529, 109.45252063,  91.90875516,
       145.19008195, 139.12366075,  82.07056689,  75.53923575,
       150.82739714, 110.06025711, 116.9824605 , 116.70664166,
        52.976519  ,  74.60577164, 131.10376021, 126.86892314])

- the `zeros` and `ones` functions allow creation of arrays of a specified size of 0's and 1's.
- the `full` function creates arrays of specified size populated with a particular value.

In [33]:
print(f'np.zeros(3)     == {np.zeros(3)}')
print(f'np.ones(3)      == {np.ones(3)}')
print(f'np.full(3, 17)  == {np.full(3, 17)}')
      

np.zeros(3)     == [0. 0. 0.]
np.ones(3)      == [1. 1. 1.]
np.full(3, 17)  == [17 17 17]


- We can also create multi-dimensional arrays by passing a tuple of the dimensions of the desire array, instead of a single integer value.

In [34]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

- numpy's `arange()` function is similar to python's builtin `range()` function.
- it can take a single argument and generate a range from zero up to, but not including, the passed number.

In [35]:
np.arange(4)

array([0, 1, 2, 3])

- we can also specify a starting point for the range:

In [36]:
np.arange(3, 8)

array([3, 4, 5, 6, 7])

- unlike `range()`, `arange()` can handle decimal numbers

In [38]:
np.arange(3, 5, 0.5) # the third value in the tuple is the 'step' interval.

array([3. , 3.5, 4. , 4.5])

- the `linspace()` method creates a range of numbers between a minimum and a maximum, with a set number of elements.

In [43]:
# note here that max value is inclusive
print(f'min: 1, max: 4, length = 4 -- {np.linspace(1, 4, 4)}')
print(f'min: 1, max: 4, lenght = 7 -- {np.linspace(1, 4, 7)}')

min: 1, max: 4, length = 4 -- [1. 2. 3. 4.]
min: 1, max: 4, lenght = 7 -- [1.  1.5 2.  2.5 3.  3.5 4. ]


#### Array Methods
- numpy has many built-in methods that make mathematical operations easier.
- some common ones are `.min`, `.max`, `.mean`, `.sum`, `.std` (standard deviation)

In [46]:
a = np.array(np.arange(1,6))
a

array([1, 2, 3, 4, 5])

In [48]:
a.min(), a.max(), a.mean(), a.sum(), a.std()

(1, 5, 3.0, 15, 1.4142135623730951)

#### Exercises

In [81]:
a = np.array([4, 10, 12, 23, -2, -1, 0, 0, 0, -6, 3, -7])
len(a)


12

1. How many negative numbers are there?

In [54]:
len(a[a < 0])

4

2. How many positive numbers are there?

In [68]:
# if zero is neither positive or negative
print(a)
print(a > 0)
print(a[a > 0])
len(a[a > 0])

[ 4 10 12 23 -2 -1  0  0  0 -6  3 -7]
[ True  True  True  True False False False False False False  True False]
[ 4 10 12 23  3]


5

3. How many even positive numbers are there?

In [70]:
print(a[a > 0])
print(a[a % 2 == 0])
print(a[(a > 0) & (a % 2 == 0)])
print(a[(a > 0) & (a % 2 == 0)])
len(a[(a > 0) & (a % 2 == 0)])

[ 4 10 12 23  3]
[ 4 10 12 -2  0  0  0 -6]
[ 4 10 12]
[ 4 10 12]


3

4. If you were to add 3 to each data point, how many positive numbers would there be?



In [80]:
print(a)
print(a + 3)
print((a + 3)[(a + 3) > 0])
len(a[(a + 3) > 0])

[ 4 10 12 23 -2 -1  0  0  0 -6  3 -7]
[ 7 13 15 26  1  2  3  3  3 -3  6 -4]
[ 7 13 15 26  1  2  3  3  3  6]


10

5. If you squared each number, what would the new mean and standard deviation be?



In [90]:
print(f'a = {a}')
print(f'mean of a = {a.mean()}')
print(f'std of a = {a.std()}')
print(f'square of a = {a**2}')
print(f'std of a-squared = {(a**2).std()}')

a = [ 4 10 12 23 -2 -1  0  0  0 -6  3 -7]
mean of a = 3.0
std of a = 8.06225774829855
square of a = [ 16 100 144 529   4   1   0   0   0  36   9  49]
std of a-squared = 144.0243035046516


6. A common statistical operation on a dataset is centering. This means to adjust the data such that the mean of the data is 0. This is done by subtracting the mean from each data point. Center the data set.

In [92]:
print(f'a = {a}')
print(f'mean of a = {a.mean()}')
print(f'centered a = {a - a.mean()}')

a = [ 4 10 12 23 -2 -1  0  0  0 -6  3 -7]
mean of a = 3.0
centered a = [  1.   7.   9.  20.  -5.  -4.  -3.  -3.  -3.  -9.   0. -10.]


7. Calculate the z-score for each data point. Recall that the z-score is given by:
   $$Z = \frac{x - \mu}{\sigma}$$




In [104]:
print('z-scores for each element in a:\n')
print(*((a - a.mean()) / a.std()), sep='\n')

z-scores for each element in a:

0.12403473458920847
0.8682431421244593
1.116312611302876
2.4806946917841692
-0.6201736729460423
-0.49613893835683387
-0.3721042037676254
-0.3721042037676254
-0.3721042037676254
-1.116312611302876
0.0
-1.2403473458920846


8. Copy the setup and exercise directions from More Numpy Practice into your numpy_exercises.py and add your solutions.



In [None]:

# Life w/o numpy to life with numpy

In [None]:
## Setup 1
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
# Use python's built in functionality/operators to determine the following:
# Exercise 1 - Make a variable called sum_of_a to hold the sum of all the numbers in above list

In [None]:
# Exercise 2 - Make a variable named min_of_a to hold the minimum of all the numbers in the above list

In [None]:
# Exercise 3 - Make a variable named max_of_a to hold the max number of all the numbers in the above list

In [None]:
# Exercise 4 - Make a variable named mean_of_a to hold the average of all the numbers in the above list

In [None]:
# Exercise 5 - Make a variable named product_of_a to hold the product of multiplying all the numbers in the above list together

In [None]:
# Exercise 6 - Make a variable named squares_of_a. It should hold each number in a squared like [1, 4, 9, 16, 25...]

In [None]:
# Exercise 7 - Make a variable named odds_in_a. It should hold only the odd numbers

In [None]:
# Exercise 8 - Make a variable named evens_in_a. It should hold only the evens.

In [None]:
## What about life in two dimensions? A list of lists is matrix, a table, a spreadsheet, a chessboard...
## Setup 2: Consider what it would take to find the sum, min, max, average, sum, product, and list of squares for this list of two lists.
b = [
    [3, 4, 5],
    [6, 7, 8]
]

In [None]:
# Exercise 1 - refactor the following to use numpy. Use sum_of_b as the variable. **Hint, you'll first need to make sure that the "b" variable is a numpy array**
sum_of_b = 0
for row in b:
    sum_of_b += sum(row)

In [None]:
# Exercise 2 - refactor the following to use numpy. 
min_of_b = min(b[0]) if min(b[0]) <= min(b[1]) else min(b[1])  

In [None]:
# Exercise 3 - refactor the following maximum calculation to find the answer with numpy.
max_of_b = max(b[0]) if max(b[0]) >= max(b[1]) else max(b[1])

In [None]:
# Exercise 4 - refactor the following using numpy to find the mean of b
mean_of_b = (sum(b[0]) + sum(b[1])) / (len(b[0]) + len(b[1]))

In [None]:
# Exercise 5 - refactor the following to use numpy for calculating the product of all numbers multiplied together.
product_of_b = 1
for row in b:
    for number in row:
        product_of_b *= number

In [None]:
# Exercise 6 - refactor the following to use numpy to find the list of squares 
squares_of_b = []
for row in b:
    for number in row:
        squares_of_b.append(number**2)

In [None]:
# Exercise 7 - refactor using numpy to determine the odds_in_b
odds_in_b = []
for row in b:
    for number in row:
        if(number % 2 != 0):
            odds_in_b.append(number)

In [None]:
# Exercise 8 - refactor the following to use numpy to filter only the even numbers
evens_in_b = []
for row in b:
    for number in row:
        if(number % 2 == 0):
            evens_in_b.append(number)

In [None]:
# Exercise 9 - print out the shape of the array b.

In [None]:
# Exercise 10 - transpose the array b.

In [None]:
# Exercise 11 - reshape the array b to be a single list of 6 numbers. (1 x 6)

In [None]:
# Exercise 12 - reshape the array b to be a list of 6 lists, each containing only 1 number (6 x 1)

In [None]:
## Setup 3
c = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

# HINT, you'll first need to make sure that the "c" variable is a numpy array prior to using numpy array methods.

In [None]:
# Exercise 1 - Find the min, max, sum, and product of c.

In [None]:
# Exercise 2 - Determine the standard deviation of c.

In [None]:
# Exercise 3 - Determine the variance of c.

In [None]:
# Exercise 4 - Print out the shape of the array c

In [None]:
# Exercise 5 - Transpose c and print out transposed result.

In [None]:
# Exercise 6 - Get the dot product of the array c with c. 

In [None]:
# Exercise 7 - Write the code necessary to sum up the result of c times c transposed. Answer should be 261

In [None]:
# Exercise 8 - Write the code necessary to determine the product of c times c transposed. Answer should be 131681894400.

In [None]:
## Setup 4
d = [
    [90, 30, 45, 0, 120, 180],
    [45, -90, -30, 270, 90, 0],
    [60, 45, -45, 90, -45, 180]
]

In [None]:
# Exercise 1 - Find the sine of all the numbers in d

In [None]:
# Exercise 2 - Find the cosine of all the numbers in d

In [None]:
# Exercise 3 - Find the tangent of all the numbers in d

In [None]:
# Exercise 4 - Find all the negative numbers in d

In [None]:
# Exercise 5 - Find all the positive numbers in d

In [None]:
# Exercise 6 - Return an array of only the unique numbers in d.

In [None]:
# Exercise 7 - Determine how many unique numbers there are in d.

In [None]:
# Exercise 8 - Print out the shape of d.

In [None]:
# Exercise 9 - Transpose and then print out the shape of d.

In [None]:
# Exercise 10 - Reshape d into an array of 9 x 2