# Numpy

- https://numpy.org/
- https://numpy.org/devdocs/user/quickstart.html

## Intro

### Overview of Data Science Libraries

Read: https://ds.codeup.com/python/ds-libraries-overview/

A couple of libraries we will not include here, but you will use throughout the rest of the course are Scikit-Learn for machine learning and scipy.stats for statistical testing. 

### About Numpy

- Numpy is one of the main reasons why Python is so powerful and popular for scientific computing

- It is super fast. Numpy arrays are implemented in C, which makes numpy very fast.

- It is the most popular linear algebra library for Python. 

- It provides loop-like behavior w/o the overhead of loops or list comprehensions (vectorized operations)

- It provides list + loop + conditional behavior for filtering arrays. 

### In this Lesson you will learn about...

- Numpy Arrays
- Vectorization and Vectorized Operations
- Array Indexing or Slicing
- Boolean Masking
- Data types of array values

### By the end of this lesson, you should be able to...

- Create an n-dimensional array
- Access elements of an array using slicing notation
- Use built-in functions for common statistical and mathematical operations
- Perform vectorized arithmetic operations on arrays
- Filter Arrays using boolean masks


### Vocabulary

- Vectorized Operations: The concept of relational or arithmetic operators extended to vectors of any arbitrary length, where the comparison or calculation is performed on each item in the array. 

- Boolean Masking: Filtering of values in a numpy array by passing a condition in the indexing brackets, [], or by using a boolean array that only contains the boolean values of either True or False, and then passing that into the indexing brackets, [].

- Array: An array is a collection of items of same data type stored at contiguous memory locations, starting at index 0. The array object in NumPy is called ndarray, for n-dimensional array.

- Scalar: aka 0-D arrays, the elements in an array. Each value in an array is a scalar. 

- 1-Dimensional Array: An array that has 0-D arrays/scalars as its elements. 

- 2-D Array: An array that has 1-D arrays as its elements. A matrix.

- 3-D Array: An array that has 2-D arrays (matrices) as its elements. 

- Slicing: An operation that extracts a subset of elements from an array and packages them as another array. 


### Agenda

1. Import Numpy
2. Numpy Speed
3. Create arrays
4. Access items in arrays using indexing, slicing
5. Built-in functions: a.sum(), a.mean(), a.min(), a.max(), a.std(), np.sqrt(a)
6. Vectorized Operations
7. Filtering Arrays using boolean masks
8. Array data types 

## Lesson

### Import Numpy

It is a common practice to import numpy with the alias `np`

In [6]:
import numpy as np

### Numpy Speed 

As we said, numpy is super fast. This is because Numpy arrays are implemented in C, and C is "closer to the metal" than Python. Assembly is closer to the metal than C, and Processor instruction sets == are the metal!

In [2]:
%%timeit
# using base python
[x ** 2 for x in range(1, 1_000_000)]

215 ms ± 430 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [3]:
%%timeit 
## this times cells

# using numpy
np.arange(1, 1_000_000) ** 2

1.15 ms ± 82 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


### Create Arrays

#### Create an array from a list

Create a one-dimensional array from a single list. 

In [54]:
my_list = [1, 2, 3, 4]
a = np.array(my_list)
a

array([1, 2, 3, 4])

In [55]:
# create a list

my_list = [1,2,3,4]
#convert to array using np.array

a = np.array(my_list)  # assigning a variable to the np.array of my_list
a

array([1, 2, 3, 4])

In [56]:
type(a)

numpy.ndarray

We can create a 2-D array by passing a list of lists to the array function. 

In [57]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [8]:
# create a list of list : 2D array

list = [[1,0,1],
        [3,6,2],
        [6,19,44]]

# convert to array
my_matrix = np.array(list)
my_matrix


array([[ 1,  0,  1],
       [ 3,  6,  2],
       [ 6, 19, 44]])

#### Create an array of of random numbers drawn from the standard normal distribution. 

`np.random.randn(length_of_array)`

In [58]:
np.random.randn(10)

array([-1.12760673,  0.85461286, -0.69124164,  1.3220827 , -0.45563252,
       -1.76433412, -2.391447  , -0.43534963,  0.81422191,  1.68311738])

In [59]:
# create array from random numbers instead of from a list :

np.random.randn(13) # 13 values in a standard normal distribution : mean 0, std dev 0f 1. 1D array bc only 1 no

array([-1.08606826,  0.60908185, -0.22013365,  0.17554286,  0.0072689 ,
        0.922925  , -0.70091409, -1.71344639,  1.2446491 ,  1.30255786,
        0.42767825, -0.28751146,  1.07291903])

In [60]:
# pass a second argument to define the shape of the array :
# 13 == rows, 4 == columns

np.random.randn(13, 4)

array([[-0.45683301,  2.55693331, -0.81498948, -0.10284629],
       [-0.96673579,  0.90710419,  0.00752351, -2.20817668],
       [ 0.12510817, -0.20042566, -0.24654266,  0.88908391],
       [ 0.21110169, -0.49375137,  2.61661683,  1.26004496],
       [ 0.73631103, -2.51325485,  0.0486234 ,  0.88913273],
       [-1.26826863,  0.47030393, -0.02690265,  0.80144543],
       [-0.79912692, -0.3040565 , -0.34742728,  0.53672757],
       [ 0.01875482, -1.52554405, -0.2470314 ,  0.49816051],
       [-0.50052088, -0.50571749, -0.07988312,  0.01680783],
       [ 0.62266323, -2.05451949,  0.23503995, -0.6020305 ],
       [-2.09413646,  0.53241198, -0.65785379,  0.10905511],
       [-0.67272314,  1.56983359,  0.06907353, -0.26090722],
       [-0.94939297,  0.02285191,  0.49093692, -0.38159875]])

We can pass a second argument to this function to define the shape of a two dimensional array. The first number can be thought of as the number of rows in the matrix, while the second number are the number of columns.

In [61]:
np.random.randn(10, 2)

array([[-0.97753283, -1.89034278],
       [-0.82575961,  0.85493003],
       [ 2.9281118 ,  0.23757387],
       [-0.36277453, -0.90458715],
       [ 0.74044575,  0.12201803],
       [-0.84436998, -0.40400252],
       [ 1.08992895, -1.51702967],
       [-0.78869022, -0.35639456],
       [ 0.35123322,  0.59186255],
       [ 0.15744296, -0.4764423 ]])

#### Create an array of a single value

The `zeros` and `ones` functions provide the ability to create arrays of a specified size full or either 0s or 1s, and the `full` function allows us to create an array of the specified size with a default value.

In [62]:
np.zeros(3)

array([0., 0., 0.])

In [63]:
# zeros in a list
# for when wanting an empty array to which to add values. For doing calculations and having somewhere
    # to put them.
    
np.zeros(5)

array([0., 0., 0., 0., 0.])

We can also create multi-dimensional arrays by passing a tuple of the dimensions of the desired array, instead of a single integer value.

In [64]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [16]:
# tuple = an immutable ((10, 5)). Parentheses, doubled in an array of 1D or greater.

In [65]:
np.zeros((10,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [18]:
np.ones(3)

array([1., 1., 1.])

In [19]:
np.ones((5,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

How can I make a 5 x 4 matrix of 1's?

In [20]:
np.ones((5, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Create an array with 3 items, all of the value 21. 

In [21]:
np.full(3, 21)

array([21, 21, 21])

In [22]:
# np.full does NOT take double ()
# np.full = filling the array with a certain value.

In [23]:
np.full((3, 2), -10.3)

array([[-10.3, -10.3],
       [-10.3, -10.3],
       [-10.3, -10.3]])

Create a 3x2 matrix of -10. 

In [24]:
create = np.matrix([[-10, -10],
                   [-10, -10],
                   [-10, -10]])
create

matrix([[-10, -10],
        [-10, -10],
        [-10, -10]])

#### Create an array of numbers in a designated range

Numpy's `arange` function is very similar to python's builtin range function. It can take a single argument and generate a range from zero up to, but not including, the passed number.

In [25]:
np.arange(3)

array([0, 1, 2])

In [67]:
np.arange(10) # array of 0 to 9, excluding the 10 : 10 values starting at 0.

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [66]:
np.arange(2, 10) # a range : starts at 2 and stops at before 10 (ie, 9)

array([2, 3, 4, 5, 6, 7, 8, 9])

Specify a starting point:

In [28]:
np.arange(1,4)

array([1, 2, 3])

Specify a step:

In [29]:
# count by 2
np.arange(2,10, 4) # starts at 2, up to 10, counting b y 4

array([2, 6])

In [30]:
np.arange(1,10,2)

array([1, 3, 5, 7, 9])

The `linspace` method creates a range of numbers between a minimum and a maximum, with a set number of elements.

In [68]:
# min, max, length
np.linspace(2, 5, 4)

array([2., 3., 4., 5.])

In [69]:
np.linspace(1, 10, 23)
# min of 1, max of 10, with 23 elements 
# includes max and min

# the numbers are not random, but are evenly spaced

array([ 1.        ,  1.40909091,  1.81818182,  2.22727273,  2.63636364,
        3.04545455,  3.45454545,  3.86363636,  4.27272727,  4.68181818,
        5.09090909,  5.5       ,  5.90909091,  6.31818182,  6.72727273,
        7.13636364,  7.54545455,  7.95454545,  8.36363636,  8.77272727,
        9.18181818,  9.59090909, 10.        ])

#### Create an array of random integers

The `np.random.randint(start, stop)` creates an array of random integers between start (including) and stop (excluding). 

In [70]:
# So the following line is like rolling a 6-sided die
x = np.random.randint(1, 7)
x

2

In [71]:
b = np.random.randint(1,9, 10)
b
# put curser inside of first parenthese and hit 'shift','tab' to get instructions

# random integer, starting at 1 and up to, but not including, 9. The 10 = how many times to roll.

array([5, 3, 8, 2, 6, 4, 5, 5, 2, 8])

In [72]:
b = np.random.randint(1,9, (5,2)) # this creates a tuple, which creates an array
b

array([[5, 4],
       [6, 8],
       [3, 8],
       [1, 4],
       [1, 8]])

# Access items in arrays using indexing, slicing

Referencing elements in numpy arrays at its most basic is the same as referencing elements in Python lists. To obtain the 2nd item in the array, we would write `a[1]`

In [73]:
a = np.array([2,3,5,8,13])
a

array([ 2,  3,  5,  8, 13])

In [74]:
ab = np.array([0,1,1,2,3,5,8,13])

ab[3] # calls the 4th in the sequence

2

To obtain the 2nd, 3rd, and 4th items, we would write `a[1:4]`. The starting index is inclusive and the ending index is exclusive. 

In [75]:
a[1:4]


array([3, 5, 8])

To obtain the 3rd index through the end of the array, we would write `a[2:]`. 

In [76]:
a[2:]

array([ 5,  8, 13])

In [77]:
# to access final item in the list
ab[-1]

13

In [78]:
# access the final 2 items
ab[-2:]

array([ 8, 13])

For a 2-D numpy array, or matrix, called `m`, to obtain the element in second row (index = 1) and third column (index = 2), we would write `m[1,2]`.

In [79]:
m = np.array([[2,3,5],
              [8,13,21],
              [34,55,89]]
            )

In [80]:
# second row (index = 1) and third column (index = 2), we would write m[1,2].
m[1,2]

21

To access the 2nd and 3rd rows of the 1st column of the matrix, we write `m[1:, 0:1]`:

In [81]:
m[0:, 0:1]
# 1: = 8, 0:1 = 34

array([[ 2],
       [ 8],
       [34]])

### Built-in methods and functions

Methods are called *on* the numpy object, so they begin with the numpy array variable name, which is `a` in this case. 
These are methods: a.sum(), a.mean(), a.min(), a.max(), a.std()

Functions begin with `np.` and the array is one of the function arguments, such as `np.sqrt(a)`. 

Some operations have both a method and a function, such as summing all items in an array. `np.sum(a)` and `a.sum()` do the same thing. 

In [82]:
a.sum(), a.mean(), a.min(), a.max(), a.std()

(31, 6.2, 2, 13, 3.9698866482558417)

In [83]:
ab.sum(), ab.mean(), ab.min(), a.max(), round(a.std(),4)

(33, 4.125, 0, 13, 3.9699)

In [84]:
%%timeit
np.sum(ab)

1.4 µs ± 3.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [85]:
np.median(ab)

2.5

In [86]:
np.sqrt(a)

array([1.41421356, 1.73205081, 2.23606798, 2.82842712, 3.60555128])

`.all` -- every single element is `True`

In [87]:
# Are all the elements in a less than 10?

(a < 10).all()

False

`.any` -- at least one element is `True`

In [88]:
# Are there any negative numbers?

(a < 0).any()

False

### Vectorized Operations

If we wanted to add 1 to every element in a list, without numpy, we can't simply add 1 to the list, as that will result in a TypeError: 

`[1, 2, 3, 4, 5] + 1`

In [89]:
[1, 2, 3, 4, 5] + 1

## this will error : "TypeError: can only concatenate list (not "int") to list"

TypeError: can only concatenate list (not "int") to list

We would have to use a loop or a list comprehension:

In [90]:
my_list = [1, 2, 3, 4, 5]
new_list = [n + 1 for n in my_list]
new_list

# this is how to do it

[2, 3, 4, 5, 6]

In [91]:
## easier : take array and + 1

a = np.array(my_list)   # converts list to array

a + 1

array([2, 3, 4, 5, 6])

Vectorizing operations means that operations are automatically applied to every element in a vector, which in our case will be a numpy array. So if we are working with a numpy array, we can simply add 1:

In [92]:
a = np.array(my_list)
a + 1

array([2, 3, 4, 5, 6])

`a * 2` and reassign to `a`: 

In [93]:
# reassign a to hold the result of a * 3.2
a = a * 3.2
a + 1
a

array([ 3.2,  6.4,  9.6, 12.8, 16. ])

Or...

In [94]:
a *= 2
a

# this multiplies the results of 'a' by 2 and reassigns this as the new value of a

array([ 6.4, 12.8, 19.2, 25.6, 32. ])

In [95]:
a ** 2

# this squares the results of 'a'
# does not re assign the values bc no = sign


array([  40.96,  163.84,  368.64,  655.36, 1024.  ])

Write an operation that divides each element by 2 and then adds 3. 

In [96]:
 a / 2 + 3
    
    # this returns floats, not integers

array([ 6.2,  9.4, 12.6, 15.8, 19. ])

What happens if we subtract `a` from itself? 

In [97]:
a - a

    # this returns floats, not integers

array([0., 0., 0., 0., 0.])

Find even numbers

In [98]:
a = np.array([2, 3, 5, 8, 13, 21])
a % 2

    # 

array([0, 1, 1, 0, 1, 1])

In [99]:
ab = np.array([0,1,1,2,3,5,8,13])

ab % 2

    # returns 0 even, 1 odd numbers. Use boolean maths to filter zeros.

array([0, 1, 1, 0, 1, 1, 0, 1])

The items with a value of 0 are even because they have no remainder. Those with a one are odd, because they have a remainder of 1. We can use a boolean mask to filter to just the even numbers. 

### Filtering Arrays using boolean masks

In [100]:
is_even = a % 2 == 0
is_even

array([ True, False, False,  True, False, False])

In [101]:
_evn_nos = ab % 2
type(_evn_nos)

numpy.ndarray

In [102]:
a[is_even]

array([2, 8])

In [103]:
# filtering array
# ab where ab is even
ab[_evn_nos]

array([0, 1, 1, 0, 1, 1, 0, 1])

In [104]:
a[is_even]

array([2, 8])

**or**

In [105]:
a[a % 2 == 0]
# this is the same as above : a[is_even]

array([2, 8])

Expressions that return true or false can be used as our filter. 

In [106]:
a = np.array([-3, 0, 3, 6, 9])
a > 0

# reassigning a again.

array([False, False,  True,  True,  True])

In [107]:
a[a > 0]
# bolean expression in [] used on each item in array
# select a where a is > 0

array([3, 6, 9])

It might help to read this as SQL in your head: "select a where a > 0"

In [108]:
a[a > 0]

array([3, 6, 9])

In [109]:
# select a where a == 3

a[a == 3]


array([3])

In [110]:
# select a where a != 3

a[a != 3]

array([-3,  0,  6,  9])

In the Python admissions test, there was a question called "remove_evens" where you write a function that removes evens. Using numpy, your function could look like: 

In [111]:
def remove_evens(x):
    x = np.array(x)
    return x[x % 2 == 1]

odds = remove_evens([2, 3, 5, 8, 13])
odds

array([ 3,  5, 13])

In [112]:
def remove_odds(ab):
    ab = np.array(ab)
    return ab[ab % 2 != 1]

evens = remove_odds(ab)
evens

array([0, 2, 8])

Combine boolean arrays with `&` for "and", `|` for "or"

For example, create an array of all positive, even numbers from the original array below. 

In [113]:
a = np.array([-3, 0, 1, 1, 2, 3, 5, 8, 13, 21])
new_a = a[(a > 0) & (a % 2 == 0)]
new_a


array([2, 8])

In [114]:
# a wehre a > 3 and even
a[(a > 3) & (a % 2 == 0)]
a

array([-3,  0,  1,  1,  2,  3,  5,  8, 13, 21])

Create an array of all positive OR even numbers from the original array below. 

In [115]:
new_a = a[(a > 0) | (a % 2 == 0)]
new_a



array([ 0,  1,  1,  2,  3,  5,  8, 13, 21])

In [116]:
ab[(ab > 3) | (ab % 2 == 0)]

array([ 0,  2,  5,  8, 13])

Negate a mask

In [117]:
a = np.array([-3, 0, 1, 1, 2, 3, 5, 8, 13, 21])
odds = a % 2 == 1
odds


array([ True, False,  True,  True, False,  True,  True, False,  True,
        True])

In [118]:

## this means to change / reverse ~ , ie, true exchanged for false

is_even = a % 2 == 0
is_even

array([False,  True, False, False,  True, False, False,  True, False,
       False])

In [119]:
a[~is_even]

# returns the NOT even numbers

array([-3,  1,  1,  3,  5, 13, 21])

In [120]:
evens = ~ odds
evens

array([False,  True, False, False,  True, False, False,  True, False,
       False])

In [121]:
# these will all return the same thing, the even numbers 

a[~(a % 2 == 1)]

# a[~odds]

# a[evens]

# a[a % 2 == 0]


array([0, 2, 8])

In [122]:
a[a % 2 == 1]
# returns odd numbers

array([-3,  1,  1,  3,  5, 13, 21])

### Array data types 

The data type of an array is the LCD...lowest common datatype. 

- Most datatypes can be converted to strings or objects, so if there is a string, that will be the LCD. 

- All numbers can be converted to decimals, so that is the LCD of numbers. 

- Only integers can be converted to integers, so that will only be the datatype when all values are INTs. 

In [123]:
a = np.array([1, 2, '3', 4])
a, a.dtype

# U21 = Unicode, 21 character

(array(['1', '2', '3', '4'], dtype='<U21'), dtype('<U21'))

In [127]:
ab = np.array([1,2,'8',5])
ab, ab.dtype, type(ab)

#returns ab data type, the type of number-collection is ab, and the array ab

(array(['1', '2', '8', '5'], dtype='<U21'), dtype('<U21'), numpy.ndarray)

In [128]:
a = np.array([1, 2.01, 3, 4])
a, a.dtype

# mixing floats and integers.

(array([1.  , 2.01, 3.  , 4.  ]), dtype('float64'))

In [131]:
a = np.array([1, 2, 3, 4])
a, a.dtype

# all integers

(array([1, 2, 3, 4]), dtype('int64'))

## More Examples

We can use numpy to answer some questions:

In [7]:
# 1. How many data points are there?
# Use .shape to tell me. This is an attribute, doens't take ()
ac = np.array([0,1,1,2,3,5,8,13,71])
ac.shape
# 9 items

(9,)

In [8]:
# 2. How many data points are greater than 2? (.shape + .sum)
# a[a > 70].shape

ac[ac > 2].shape


(5,)

In [9]:
# 2. How many data points are greater than 2?
 # this returns a boolean array : (ac > 2)

(ac > 2).sum()

5

In [10]:
# 3. What is the sum of the odd numbers?
ac[ac % 2 == 1].sum()

# ac where ac % 2 == 1, and take the sum of the odd numbers with .sum()

94

In [146]:
ac[ac < 10].shape

(7,)

In [16]:
# 4-a

# [(ac >= 8) & (ac <= 20)] returns a boolean list

ac[(ac >= 8) & (ac <= 20)]

array([ 8, 13])

In [11]:
# 4. Take all the numbers between 8 and 20 (inclusive), square them : 
                                                        # What is the highest resulting number?

(ac[(ac >= 8) & (ac <= 20)] ** 2).max()

169

In [None]:
# 4. Take all the numbers between 30 and 80 (inclusive), square them, what is the highest resulting number?
more_than_30 = a >= 30
less_than_80 = a <= 80

in_our_desired_range = more_than_30 & less_than_80

desired_numbers = a[in_our_desired_range]
desired_numbers_squared = desired_numbers ** 2

desired_numbers_squared.max()

`np.where` will produce values conditionally, based on a boolean array

In [None]:
# 5. Square the odd numbers in the array. What is the average of the resulting data set? (np.where)
odd_numbers_squared = np.where(a % 2 == 1, a ** 2, a)

odd_numbers_squared.mean()

In [23]:
# np.where as a way to filter


np.where(ac % 2 == 1) # filtering for odd numbers.


(array([1, 2, 4, 5, 7, 8]),)

In [22]:
np.where(ac % 2 == 1, ac ** 2, ac)   ## squaring the odd numebrs. 
                                        # The final 'ac' passes the function.

array([   0,    1,    1,    2,    9,   25,    8,  169, 5041])

In [None]:
# 6. Square the even numbers in the array. Remove any odd number less than 40.
#    Double odd numbers greater than 80. What is the sum of the resulting dataset?
evens_squared = np.where(a % 2 == 0, a ** 2, a)
x = evens_squared[(evens_squared % 2 == 1) & (evens_squared < 40)]
x = np.where(x % 2 == 1, x * 2, x)
x.sum()