# Lesson 04 - NumPy

### The following topics are discussed in this notebook:
* Create NumPy arrays.
* Array operations.
* Boolean masking. 

### Additional Resources
* [Python Data Science Handbook, Ch 2](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)
* [DataCamp: Intro to Python for Data Science, Ch 4](https://www.datacamp.com/courses/intro-to-python-for-data-science)





### Packages 
A **package** is a pre-built set of functions and data types that can be loaded into a Python session to extend the language's functionality. 

The following block of code imports the `math` package, which contains many useful mathematical functions and constants. 

In [None]:
import math

The `math` packages contains functions the following functions (along with many others):

* **`sqrt()`** which is used to calculate the square root of a number. 
* **`factorial()`** which is used to calculate the factorial of an integer.

It also contains an object named `pi` which contains the value of the constant `pi`.

To access any of these items within the math package, we much precede its name with `math.`. 

In [None]:
print(math.sqrt(20))
print(math.factorial(10))
print(math.pi)

When the name of a package is long, it can become tedius to type its entire name every time you wish to use a function from it. Fortunately, we are able to rename packages when we import them. The following code imports the `math` package under the name `mt`. 

In [None]:
import math as mt

In [None]:
print(mt.sqrt(40))

### NumPy

**NumPy**, which is short for "Numerical Python" is a package that provides additional functionality for performing numerical calculations involving lists. It can greatly simplify certain types of tasks relating to lists that would otherwise require loops. In the next cell, we will import NumPy under the name `np`. 

In [None]:
import numpy as np

At the core of NumPy is a new data type called an **array**. Arrays are similar to lists, and in many ways, arrays and lists behave the same. In the following cell, we create a list and an array, each containing the same elements. 

In [None]:
my_list = [4, 1, 7, 3, 5]
my_array = np.array([4, 1, 7, 3, 5])

In the next few cells, we show that lists and arrays can behave in very similar ways. 

In [None]:
print(my_list[3])
print(my_array[3])

In [None]:
print(my_list[:3])
print(my_array[:3])

In [None]:
print(len(my_list))
print(len(my_array))

In [None]:
print(type(my_list))
print(type(my_array))

### Array Operations

The difference between arrays and lists is that certain types of operations can be performed more easily on arrays than on lists. Assume that we would like to print out a list/array that contains 5 times the elements in our previously defined list/array. 

In [None]:
print(5 * my_array)


In [None]:
print(5 * my_list)

In [None]:
temp = []
for i in range(0, len(my_list)):
    temp.append(5 * my_list[i])
print(temp)

We can perform other types of operations on NumPy arrays:

In [None]:
print(my_array ** 2)

In [None]:
print(my_array +  100)

NumPy also includes a meaningful way to multiply two arrays, as long as they are of the same length. 

In [None]:
array1 = np.array([2,1,4])
array2 = np.array([3,9,2])

print(array1 * array2)

In [None]:
array1 = np.array([2,1,4])
array2 = np.array([3,9,2,7])

print(array1 * array2)

In [None]:
def find_sse_v1(y, y_hat):
    

In [None]:
y_actual = [3.1, 4.5, 6.4, 7.2]
y_pred = [2.9, 4.4, 6.7, 7.1]

In [None]:
errors = []
sse = 0
for i in range(0, len(y_actual)):
    temp = y_actual[i] - y_pred[i]
    errors.append(temp)
    sse += temp**2
print(errors)    
print(sse)                

In [None]:
y_actual = np.array(y_actual)
y_pred = np.array(y_pred)

errors = y_actual - y_pred
sse = sum(errors**2)
print(errors)
print(sse)

### Boolean Masking

**Boolean masking** is a tool for creating subset of NumPy arrays. We will explain this concept in steps.

In the cell below, we create two NumPy arrays. The array `bool_array` contains boolean values, while the other, `my_array`, contains numerical values. 

We will pass `bool_list` to `my_array` as if it were an index, and will store the result in `sub_array`. 

In [None]:
bool_array = np.array([True, True, False, True, False])
my_array = np.array([1,2,3,4,5])

sub_array = my_array[bool_array]
print(sub_array)

Unlike lists, we can perform numerical comparisons with arrays. The comparison is carried out for each element of the array, and the result is an array of boolean values, containing the results of each comparison. 

In [None]:
some_array = np.array([4, 7, 6, 3, 9, 8])
print(some_array < 5)

In [None]:
print(some_array % 2 == 0)

We can combine the concept of array comparisons and passing boolean arrays to create subsets of arrays by picking out the elements that satisfy certain conditions. This process is called **boolean masking**. 

In [None]:
sel = some_array % 2 == 0
print(some_array[sel])

In [None]:
sel = some_array > 5
print(some_array[sel])

In [None]:
print(some_array[some_array > 5])

### Using Boolean Masks to Count

Since Python treats `True` as being equal to 1 and `False` as being equal to 0, we can use the sum function along with Boolean masking to count the number of elements in an array that satisfy a certain critera. 

In [None]:
cat = np.array(['A', 'C', 'A', 'B', 'B', 'C', 'A', 'A' ,'C', 'B', 'C', 'C', 'A', 'B', 'A', 'A'])

In [None]:
print(sum(cat == 'A'))
print(sum(cat == 'B'))
print(sum(cat == 'C'))

In [None]:
val = np.array([8, 1, 3, 6, 10, 6, 12, 4, 6, 1, 4, 8, 5, 4, 12, 4])

In [None]:
print(sum(val > 5) )
print(sum(val < 5) )
print(sum(val % 2 == 0) )
print(sum(val % 2 != 0) )

In [None]:
sum( (val > 5) & (val % 2 == 0) )

In [None]:
sum( (val > 7) & (val % 2 == 0) & (val % 3 == 0))

## Random Number Generation

We can use NumPy to draw random samples from a set. 

In [None]:
sample1 = np.random.choice(['A', 'B', 'C', 'D', 'E'], 10)
print(sample1)

In [None]:
sample2 = np.random.choice(['A', 'B', 'C', 'D', 'E'], 3, replace=False)
print(sample2)

In [None]:
sample3 = np.random.choice(['A', 'B', 'C', 'D', 'E'], 1)
print(sample3)

We can generate random numbers according to a distribution, such as the normal or uniform distribution. 

In [None]:
x1 = np.random.uniform(5, 15, 20)
print(x1)

In [None]:
x2 = np.random.normal(10, 5, 20)
print(x2)

We can set the seed using. `np.random.seed()`.

In [None]:
np.random.seed(32)
x3 = np.random.normal(10,5,20)
print(x3)