# Numpy

- https://numpy.org/
- https://numpy.org/devdocs/user/quickstart.html

## Intro

### Overview of Data Science Libraries

Read: https://ds.codeup.com/python/ds-libraries-overview/

A couple of libraries we will not include here, but you will use throughout the rest of the course are Scikit-Learn for machine learning and scipy.stats for statistical testing. 

### About Numpy

- Numpy is one of the main reasons why Python is so powerful and popular for scientific computing

- It is super fast. Numpy arrays are implemented in C, which makes numpy very fast.

- It is the most popular linear algebra library for Python. 

- It provides loop-like behavior w/o the overhead of loops or list comprehensions (vectorized operations)

- It provides list + loop + conditional behavior for filtering arrays. 

### In this Lesson you will learn about...

- Numpy Arrays
- Vectorization and Vectorized Operations
- Array Indexing or Slicing
- Boolean Masking
- Data types of array values

### By the end of this lesson, you should be able to...

- Create an n-dimensional array
- Access elements of an array using slicing notation
- Use built-in functions for common statistical and mathematical operations
- Perform vectorized arithmetic operations on arrays
- Filter Arrays using boolean masks


### Vocabulary

- Vectorized Operations: The concept of relational or arithmetic operators extended to vectors of any arbitrary length, where the comparison or calculation is performed on each item in the array. 

- Boolean Masking: Filtering of values in a numpy array by passing a condition in the indexing brackets, [], or by using a boolean array that only contains the boolean values of either True or False, and then passing that into the indexing brackets, [].

- Array: An array is a collection of items of same data type stored at contiguous memory locations, starting at index 0. The array object in NumPy is called ndarray, for n-dimensional array.

- Scalar: aka 0-D arrays, the elements in an array. Each value in an array is a scalar. 

- 1-Dimensional Array: An array that has 0-D arrays/scalars as its elements. 

- 2-D Array: An array that has 1-D arrays as its elements. A matrix.

- 3-D Array: An array that has 2-D arrays (matrices) as its elements. 

- Slicing: An operation that extracts a subset of elements from an array and packages them as another array. 


### Agenda

1. Import Numpy
2. Numpy Speed
3. Create arrays
4. Access items in arrays using indexing, slicing
5. Built-in functions: a.sum(), a.mean(), a.min(), a.max(), a.std(), np.sqrt(a)
6. Vectorized Operations
7. Filtering Arrays using boolean masks
8. Array data types 

## Lesson

### Import Numpy

It is a common practice to import numpy with the alias `np`

In [54]:
import numpy as np

### Numpy Speed 

As we said, numpy is super fast. This is because Numpy arrays are implemented in C, and C is "closer to the metal" than Python. Assembly is closer to the metal than C, and Processor instruction sets == are the metal!

In [57]:
%%timeit
# using base python
[x ** 2 for x in range(1, 1_000_000)]

216 ms ± 532 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [58]:
%%timeit 
## this times cells

# using numpy
np.arange(1, 1_000_000) ** 2

1.16 ms ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


### Create Arrays

#### Create an array from a list

Create a one-dimensional array from a single list. 

In [4]:
my_list = [1, 2, 3, 4]
a = np.array(my_list)
a

array([1, 2, 3, 4])

In [62]:
# create a list

my_list = [1,2,3,4]
#convert to array using np.array

a = np.array(my_list)  # assigning a variable to the np.array of my_list
a

array([1, 2, 3, 4])

In [5]:
type(a)

numpy.ndarray

We can create a 2-D array by passing a list of lists to the array function. 

In [6]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [66]:
# create a list of list : 2D array

list = [[1,0,1],
        [3,6,2],
        [6,19,44]]

# convert to array
my_matrix = np.array(list)
my_matrix

array([[ 1,  0,  1],
       [ 3,  6,  2],
       [ 6, 19, 44]])

#### Create an array of of random numbers drawn from the standard normal distribution. 

`np.random.randn(length_of_array)`

In [7]:
np.random.randn(10)

array([ 0.46927531, -0.96331603, -0.47724536, -0.94130309, -0.99824283,
       -0.28762746,  0.30513432,  1.81567593,  0.75341926,  0.24763535])

In [68]:
# create array from random numbers instead of from a list :

np.random.randn(13) # 13 values in a standard normal distribution : mean 0, std dev 0f 1. 1D array bc only 1 no

array([-1.43417367, -1.23761524, -0.3412221 , -1.02215636,  0.55864172,
        1.3634234 ,  0.62091367, -0.13730021,  1.6052836 ,  0.21539666,
        0.99154151,  0.02243931,  0.86461276])

In [69]:
# pass a second argument to define the shape of the array :
# 13 == rows, 4 == columns

np.random.randn(13, 4)

array([[ 0.01161059, -2.43770031,  2.52167423,  0.40068233],
       [ 0.10315252,  1.21507445, -0.27121223,  0.73603603],
       [ 0.703108  ,  0.35721749, -0.58301587,  1.08917784],
       [-1.70412012,  0.39296725,  2.42652762, -1.37412376],
       [ 0.28472297,  0.24565699,  0.29560816, -0.50492594],
       [ 0.86561439,  0.19747631, -0.93434353,  0.98005373],
       [-0.74266578,  1.09284957, -0.4938523 ,  0.41475973],
       [ 1.19325248,  0.5169049 , -0.37128241,  0.24991132],
       [ 0.49664015,  0.1048089 , -0.09750643, -0.50528278],
       [ 0.04328388,  1.73745761, -0.46528303,  1.87051737],
       [ 1.04690267, -0.30184377,  0.41939984, -2.26116677],
       [ 0.86248582, -1.11445617,  1.6271588 , -0.42696506],
       [-0.62648859,  1.28702551, -1.38034096,  0.76680797]])

We can pass a second argument to this function to define the shape of a two dimensional array. The first number can be thought of as the number of rows in the matrix, while the second number are the number of columns.

In [8]:
np.random.randn(10, 2)

array([[-0.58435402, -0.13131378],
       [ 0.03611437,  1.23933298],
       [-0.95121296,  1.19280027],
       [ 0.13546603,  0.24763969],
       [-0.06735086, -1.16384514],
       [-0.11874943,  1.26464739],
       [ 2.1941625 ,  0.45275272],
       [ 0.40802577,  0.21221754],
       [ 2.34817488,  1.79171257],
       [ 0.1425013 ,  0.23294827]])

#### Create an array of a single value

The `zeros` and `ones` functions provide the ability to create arrays of a specified size full or either 0s or 1s, and the `full` function allows us to create an array of the specified size with a default value.

In [9]:
np.zeros(3)

array([0., 0., 0.])

In [71]:
# zeros in a list
# for when wanting an empty array to which to add values. For doing calculations and having somewhere
    # to put them.
    
np.zeros(5)

array([0., 0., 0., 0., 0.])

We can also create multi-dimensional arrays by passing a tuple of the dimensions of the desired array, instead of a single integer value.

In [10]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [None]:
# tuple = an immutable ((10, 5)). Parentheses, doubled in an array of 1D or greater.

In [72]:
np.zeros((10,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [11]:
np.ones(3)

array([1., 1., 1.])

In [73]:
np.ones((5,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

How can I make a 5 x 4 matrix of 1's?

In [12]:
np.ones((5, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Create an array with 3 items, all of the value 21. 

In [74]:
np.full(3, 21)

array([21, 21, 21])

In [None]:
# np.full does NOT take double ()
# np.full = filling the array with a certain value.

In [76]:
np.full((3, 2), -10.3)

array([[-10.3, -10.3],
       [-10.3, -10.3],
       [-10.3, -10.3]])

Create a 3x2 matrix of -10. 

In [14]:
create = np.matrix([[-10, -10],
                   [-10, -10],
                   [-10, -10]])
create

matrix([[-10, -10],
        [-10, -10],
        [-10, -10]])

#### Create an array of numbers in a designated range

Numpy's `arange` function is very similar to python's builtin range function. It can take a single argument and generate a range from zero up to, but not including, the passed number.

In [15]:
np.arange(3)

array([0, 1, 2])

In [77]:
np.arange(10) # array of 0 to 9, excluding the 10 : 10 values starting at 0.

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [78]:
np.arange(2, 10) # a range : starts at 2 and stops at before 10 (ie, 9)

array([2, 3, 4, 5, 6, 7, 8, 9])

Specify a starting point:

In [16]:
np.arange(1,4)

array([1, 2, 3])

Specify a step:

In [80]:
# count by 2
np.arange(2,10, 4) # starts at 2, up to 10, counting b y 4

array([2, 6])

In [17]:
np.arange(1,10,2)

array([1, 3, 5, 7, 9])

The `linspace` method creates a range of numbers between a minimum and a maximum, with a set number of elements.

In [18]:
# min, max, length
np.linspace(2, 5, 4)

array([2., 3., 4., 5.])

In [94]:
np.linspace(1, 10, 23)
# min of 1, max of 10, with 23 elements 
# includes max and min

# the numbers are not random, but are evenly spaced

array([ 1.        ,  1.40909091,  1.81818182,  2.22727273,  2.63636364,
        3.04545455,  3.45454545,  3.86363636,  4.27272727,  4.68181818,
        5.09090909,  5.5       ,  5.90909091,  6.31818182,  6.72727273,
        7.13636364,  7.54545455,  7.95454545,  8.36363636,  8.77272727,
        9.18181818,  9.59090909, 10.        ])

#### Create an array of random integers

The `np.random.randint(start, stop)` creates an array of random integers between start (including) and stop (excluding). 

In [19]:
# So the following line is like rolling a 6-sided die
x = np.random.randint(1, 7)
x

3

In [124]:
b = np.random.randint(1,9, 10)
b
# put curser inside of first parenthese and hit 'shift','tab' to get instructions

# random integer, starting at 1 and up to, but not including, 9. The 10 = how many times to roll.

array([8, 7, 3, 4, 7, 3, 3, 7, 2, 1])

In [127]:
b = np.random.randint(1,9, (5,2)) # this creates a tuple, which creates an array
b

array([[2, 1],
       [6, 7],
       [2, 3],
       [5, 6],
       [4, 2]])

# Access items in arrays using indexing, slicing

Referencing elements in numpy arrays at its most basic is the same as referencing elements in Python lists. To obtain the 2nd item in the array, we would write `a[1]`

In [128]:
a = np.array([2,3,5,8,13])
a

array([ 2,  3,  5,  8, 13])

In [136]:
ab = np.array([0,1,1,2,3,5,8,13])

ab[3] # calls the 4th in the sequence

2

To obtain the 2nd, 3rd, and 4th items, we would write `a[1:4]`. The starting index is inclusive and the ending index is exclusive. 

In [138]:
a[1:4]


array([3, 5, 8])

To obtain the 3rd index through the end of the array, we would write `a[2:]`. 

In [22]:
a[2:]

array([ 5,  8, 13])

In [139]:
# to access final item in the list
ab[-1]

13

In [140]:
# access the final 2 items
ab[-2:]

array([ 8, 13])

For a 2-D numpy array, or matrix, called `m`, to obtain the element in second row (index = 1) and third column (index = 2), we would write `m[1,2]`.

In [23]:
m = np.array([[2,3,5],
              [8,13,21],
              [34,55,89]]
            )

In [143]:
# second row (index = 1) and third column (index = 2), we would write m[1,2].
m[1,2]

array([[ 8],
       [34]])

To access the 2nd and 3rd rows of the 1st column of the matrix, we write `m[1:, 0:1]`:

In [144]:
m[1:, 0:1]
# 1: = 8, 0:1 = 34

array([[ 8],
       [34]])

### Built-in methods and functions

Methods are called *on* the numpy object, so they begin with the numpy array variable name, which is `a` in this case. 
These are methods: a.sum(), a.mean(), a.min(), a.max(), a.std()

Functions begin with `np.` and the array is one of the function arguments, such as `np.sqrt(a)`. 

Some operations have both a method and a function, such as summing all items in an array. `np.sum(a)` and `a.sum()` do the same thing. 

In [25]:
a.sum(), a.mean(), a.min(), a.max(), a.std()

(31, 6.2, 2, 13, 3.9698866482558417)

In [150]:
ab.sum(), ab.mean(), ab.min(), a.max(), round(a.std(),4)

(33, 4.125, 0, 13, 3.9699)

In [151]:
np.sum(ab)

33

In [152]:
np.median(ab)

2.5

In [28]:
np.sqrt(a)

array([1.41421356, 1.73205081, 2.23606798, 2.82842712, 3.60555128])

`.all` -- every single element is `True`

In [29]:
# Are all the elements in a less than 10?
(a < 10).all()

False

`.any` -- at least one element is `True`

In [30]:
# Are there any negative numbers?
(a < 0).any()

False

### Vectorized Operations

If we wanted to add 1 to every element in a list, without numpy, we can't simply add 1 to the list, as that will result in a TypeError: 

`[1, 2, 3, 4, 5] + 1`

In [31]:
[1, 2, 3, 4, 5] + 1

TypeError: can only concatenate list (not "int") to list

We would have to use a loop or a list comprehension:

In [32]:
my_list = [1, 2, 3, 4, 5]
new_list = [n + 1 for n in my_list]
new_list

[2, 3, 4, 5, 6]

Vectorizing operations means that operations are automatically applied to every element in a vector, which in our case will be a numpy array. So if we are working with a numpy array, we can simply add 1:

In [33]:
a = np.array(my_list)
a + 1

array([2, 3, 4, 5, 6])

`a * 2` and reassign to `a`: 

In [35]:
# reassign a to hold the result of a * 2
a = a * 3.2
a + 1
a

array([10.24, 20.48, 30.72, 40.96, 51.2 ])

Or...

In [36]:
a *= 2
a

array([ 20.48,  40.96,  61.44,  81.92, 102.4 ])

In [37]:
a ** 2

array([  419.4304,  1677.7216,  3774.8736,  6710.8864, 10485.76  ])

Write an operation that divides each element by 2 and then adds 3. 

In [38]:
 a / 2 + 3

array([13.24, 23.48, 33.72, 43.96, 54.2 ])

What happens if we subtract `a` from itself? 

In [39]:
a - a

array([0., 0., 0., 0., 0.])

Find even numbers

In [43]:
a = np.array([2, 3, 5, 8, 13, 21])
a % 2


array([0, 1, 1, 0, 1, 1])

The items with a value of 0 are even because they have no remainder. Those with a one are odd, because they have a remainder of 1. We can use a boolean mask to filter to just the even numbers. 

### Filtering Arrays using boolean masks

In [44]:
is_even = a % 2 == 0
is_even

array([ True, False, False,  True, False, False])

In [45]:
a[is_even]

array([2, 8])

**or**

In [46]:
a[a % 2 == 0]

array([2, 8])

Expressions that return true or false can be used as our filter. 

In [47]:
a = np.array([-3, 0, 3, 6, 9])
a > 0

array([False, False,  True,  True,  True])

It might help to read this as SQL in your head: "select a where a > 0"

In [None]:
a[a > 0]

In [None]:
# select a where a == 3
a[a == 3]

In [None]:
# select a where a != 3
a[a != 3]

In the Python admissions test, there was a question called "remove_evens" where you write a function that removes evens. Using numpy, your function could look like: 

In [None]:
def remove_evens(x):
    x = np.array(x)
    return x[x % 2 == 1]

odds = remove_evens([2, 3, 5, 8, 13])
odds

Combine boolean arrays with `&` for "and", `|` for "or"

For example, create an array of all positive, even numbers from the original array below. 

In [None]:
a = np.array([-3, 0, 1, 1, 2, 3, 5, 8, 13, 21])
new_a = a[(a > 0) & (a % 2 == 0)]
new_a

Create an array of all positive OR even numbers from the original array below. 

In [None]:
new_a = a[(a > 0) | (a % 2 == 0)]
new_a

Negate a mask

In [49]:
a = np.array([-3, 0, 1, 1, 2, 3, 5, 8, 13, 21])
odds = a % 2 == 1
odds

array([ True, False,  True,  True, False,  True,  True, False,  True,
        True])

In [50]:
evens = ~ odds
evens

array([False,  True, False, False,  True, False, False,  True, False,
       False])

In [None]:
# these will all return the same thing, the even numbers 

a[~(a % 2 == 1)]

# a[~odds]

# a[evens]

# a[a % 2 == 0]


### Array data types 

The data type of an array is the LCD...lowest common datatype. 

- Most datatypes can be converted to strings or objects, so if there is a string, that will be the LCD. 

- All numbers can be converted to decimals, so that is the LCD of numbers. 

- Only integers can be converted to integers, so that will only be the datatype when all values are INTs. 

In [51]:
a = np.array([1, 2, '3', 4])
a, a.dtype

# U21 = Unicode, 21 character

(array(['1', '2', '3', '4'], dtype='<U21'), dtype('<U21'))

In [52]:
a = np.array([1, 2.01, 3, 4])
a, a.dtype

(array([1.  , 2.01, 3.  , 4.  ]), dtype('float64'))

In [53]:
a = np.array([1, 2, 3, 4])
a, a.dtype

(array([1, 2, 3, 4]), dtype('int64'))

## More Examples

We can use numpy to answer some questions:

In [None]:
# 1. How many data points are there?
a.shape

In [None]:
# 2. How many data points are greater than 70? (.shape + .sum)
a[a > 70].shape

In [None]:
# 2. How many data points are greater than 70?
(a > 70).sum()

In [None]:
# 3. What is the sum of the odd numbers?
a[a % 2 == 1].sum()

In [None]:
a[a < 10].shape

In [None]:
# 4. Take all the numbers between 30 and 80 (inclusive), square them, what is the highest resulting number?
(a[(a >= 30) & (a <= 80)] ** 2).max()

In [None]:
# 4. Take all the numbers between 30 and 80 (inclusive), square them, what is the highest resulting number?
more_than_30 = a >= 30
less_than_80 = a <= 80

in_our_desired_range = more_than_30 & less_than_80

desired_numbers = a[in_our_desired_range]
desired_numbers_squared = desired_numbers ** 2

desired_numbers_squared.max()

`np.where` will produce values conditionally, based on a boolean array

In [None]:
# 5. Square the odd numbers in the array. What is the average of the resulting data set? (np.where)
odd_numbers_squared = np.where(a % 2 == 1, a ** 2, a)

odd_numbers_squared.mean()

In [None]:
# 6. Square the even numbers in the array. Remove any odd number less than 40.
#    Double odd numbers greater than 80. What is the sum of the resulting dataset?
evens_squared = np.where(a % 2 == 0, a ** 2, a)
x = evens_squared[(evens_squared % 2 == 1) & (evens_squared < 40)]
x = np.where(x % 2 == 1, x * 2, x)
x.sum()