# Numpy

https://numpy.org/

## What is it? 
- Numpy is a Python library used for working with arrays
- Numpy is the fundamental package for scientific computing in Python


## Why do we care? 
- Numpy is one of the main reasons why Python is so powerful and popular for scientific computing
- Super fast. Numpy arrays are implemented in C, which makes numpy very fast.
- The arrays allows for vectorized operations

## Show us! 

### Create a 1D array

#### create a list
format: list()

In [1]:
# Previously:

my_list = ['apple', 'banana', 'peach', 'plum']

In [2]:
# indexing:
my_list[0]

'apple'

In [3]:
my_list[1:]

['banana', 'peach', 'plum']

In [4]:
added_string = 'my favorite fruit is: '

In [5]:
[added_string + fruit for fruit in my_list]

['my favorite fruit is: apple',
 'my favorite fruit is: banana',
 'my favorite fruit is: peach',
 'my favorite fruit is: plum']

In [6]:
my_list.append(added_string)

In [7]:
my_list

['apple', 'banana', 'peach', 'plum', 'my favorite fruit is: ']

In [8]:
#create a list
my_list = [1,2,3,4]
my_list

[1, 2, 3, 4]

In [9]:
#whats the type
type(my_list)

list

In [10]:
#what is dtype
# if I want to figure out the data type of the contents
# of this list, I would need to iterate over the individual pieces
for i in my_list:
    print(type(i))

<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>


In [11]:
# ineffective: dtype(my_list)

In [12]:
#whats the shape
len(my_list)

4

In [13]:
my_second_list = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9] 
]

In [14]:
len(my_second_list)

3

In [15]:
for little_list in my_second_list:
    print(len(little_list))

3
3
3


In [16]:
# importing numpy:
# industry standard alias: np
import numpy as np

#### create an array
format: np.array()

In [17]:
#create the array (from the contents of the list)
# np.array?
# array is an object in the numpy library.  cast your list 
my_array = np.array(my_list)

In [18]:
#whats the type
type(my_array)

numpy.ndarray

In [19]:
#whats the shape
# old method will work, but:
len(my_array)

4

In [20]:
my_array.shape

(4,)

In [21]:
#dtype
my_array.dtype

dtype('int64')

#### access elements of our new array

#### slice the array

In [22]:
my_2d_array = np.array(my_second_list)

In [23]:
# indexing my_array will be similar to the way we index a list

In [24]:
my_array[0]

1

In [25]:
my_array[1:2]

array([2])

In [26]:
my_array

array([1, 2, 3, 4])

#### create an array from 1 to 100

In [27]:
my_100 = np.array(range(1,101))

In [28]:
my_100[:10]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

### Create a 2D array

In [29]:
my_2d_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [30]:
#type
type(my_2d_array)

numpy.ndarray

In [31]:
my_2d_array.shape

(3, 3)

#### access elements

In [32]:
my_2d_array[0]

array([1, 2, 3])

In [33]:
my_second_list[0][2]

3

In [34]:
my_2d_array[0][2]

3

In [35]:
my_2d_array[0,2]

3

In [36]:
# breakage for the list:my_second_list[0,2]

In [37]:
my_2d_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [38]:
my_2d_array[1,:]

array([4, 5, 6])

### Descriptive Stats

In [39]:
#pulling back my big array from 1 - 100


#### using methods: the method is called on the numpy object
format: object.method()

In [40]:
# descriptive stats on the 1-100 array:

In [41]:
my_100.mean()

50.5

In [42]:
# breakage: my_100.avg()

In [43]:
my_100.std()

28.86607004772212

In [44]:
my_100.max()

100

In [45]:
my_100.min()

1

In [46]:
my_100.sum()

5050

#### using functions: using numpy to call functions
np.function(object)

In [47]:
# using dot notation:
np.max(my_100)

100

In [48]:
# more interesting numpy functions to follow :)

### Array of Booleans! 

### Boolean Masks

1. create an array
2. make a list of booleans (mask)
3. combine

    format: array [ list of booleans ] 

In [49]:
#pull back our small array
my_array

array([1, 2, 3, 4])

In [50]:
#make list of booleans aka our mask
my_mask = [True, True, False, True]

In [51]:
len(my_array), len(my_mask)

(4, 4)

In [52]:
#combine them 
my_array[my_mask]

array([1, 2, 4])

> only return values that are true. this is known as boolean masking

#### how else can we get our array of boolean values?

In [53]:
# say I want to make an assessment 
# this is going to be like numpy's version of a where clause

In [54]:
# previously,
# we may have written something like:
'''
SELECT *
FROM my_table
WHERE ham > 10;
'''

'\nSELECT *\nFROM my_table\nWHERE ham > 10;\n'

In [55]:
# my_100, where my_100 is greater than 10

In [56]:
# if it was a list:
# thing for thing in my_100 if thing > 10

In [57]:
my_over_ten_mask = my_100 > 10

In [58]:
# my 100, where the contents are greater than 10
my_100[my_over_ten_mask][:-10]

array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
       28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
       45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
       62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
       79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90])

In [59]:
# I can do this in-line without making a var:
# elements from my_array that are even:
my_array[my_array % 2 == 0]

array([2, 4])

#### what about with multiple conditions?

In [60]:
# because we are working with multiple elements:
# we need to specify conditional operators

In [61]:
# and: &
# or: |

In [62]:
# my array.
# where the elements of my array divided by two does not have a remainder
# and where that element is less than 3
my_array[(my_array % 2 == 0) & (my_array < 3)]

array([2])

In [63]:
# my array.
# where the elements of my array divided by two does not have a remainder
# OR where that element is less than 3
my_array[(my_array % 2 == 0) | (my_array < 3)]

array([1, 2, 4])

#### do it with a matrix!

In [64]:
my_2d_array[my_2d_array < 7]

array([1, 2, 3, 4, 5, 6])

In [65]:
my_2d_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [66]:
# cant do this comparison with a list:
# my_second_list < 7

In [67]:
# cant mask with a list either
# my_list[my_mask]

#### cool! can we just do it with a list instead of an array?

In [68]:
# nope!

In [69]:
# we would need to use a loop!

In [70]:
# breakage with our list: my_second_list + 5

In [71]:
my_2d_array + 5

array([[ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

#### what if you put something that was all false, what would it return?

In [72]:
my_2d_array[(my_2d_array + 5) < 6]

array([], dtype=int64)

#### can we convert our arrays back to lists?

In [73]:
list(my_2d_array)

[array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]

In [74]:
new_list = []
for x in list(my_2d_array):
    new_list.extend(x)

In [75]:
new_list

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [76]:
new_list = []
for x in list(my_2d_array):
    new_list.append(list(x))

In [77]:
new_list

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

#### what if we wanted the opposite of our mask?

In [78]:
# we saw & and |
# we also have the tilde ~ for "not"

In [79]:
my_mask

[True, True, False, True]

In [80]:
# note breakage if we pass the mask as a list:
# masks can be lists, but if we want to use full 
# numpy functionality, we need them to also be arrays
my_mask = (my_array % 3 == 0)

In [81]:
type(my_mask)

numpy.ndarray

In [82]:
~my_mask

array([ True,  True, False,  True])

In [83]:
# my array, where the content is not divisible by 3
my_array[~my_mask]

array([1, 2, 4])

## Vectorization Operations
- makes looping over math so easy and fast!

### Add one to every element

#### hard way: doing it in a list

In [84]:
my_list_plus_one = []
for i in my_list:
    my_list_plus_one.append(i+1)

#### easy way: doing it with an array

In [85]:
my_list_plus_one

[2, 3, 4, 5]

In [86]:
my_array_plus_one = my_array + 1

### more operations

In [87]:
my_array % 2

array([1, 0, 1, 0])

In [88]:
my_array ** 2

array([ 1,  4,  9, 16])

In [89]:
my_array / 3

array([0.33333333, 0.66666667, 1.        , 1.33333333])

In [90]:
my_array - 16

array([-15, -14, -13, -12])

### show us the speed

In [91]:
my_big_number = 1_000_000

In [92]:
type(my_big_number)

int

In [93]:
my_big_number

1000000

In [94]:
big_array = np.array(range(0,1_000_001))

In [95]:
import time
# im using time to grab time stamps
# you can play with the magic function %%timeit
# (it should work in jupyter notebook with an anaconda install)

In [96]:
#make a really big array
# timestamp before:
t_o = time.time()
# do the thing (note that its not going anywhere)
# there is no reassignment here,
# so big array is not changing,
# we are just outputting a transformation of it
# that is going nowhere
big_array ** 3
# get the timestamp after:
t_1 = time.time()
# difference in time:
time_spent = t_1-t_o

In [97]:
time_spent

0.001965045928955078

In [98]:
big_list = list(big_array)

In [99]:
len(big_list)

1000001

In [100]:
t_o = time.time()
[thing ** 3 for thing in big_list]
t_1 = time.time()
t_1 - t_o

0.03613114356994629

## Numpy ways to create arrays

#### full of zeros

In [101]:
np.zeros([5,5])

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

#### full of ones

In [102]:
np.ones([25,1])

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.]])

#### full of whatever you want 

In [103]:
np.full(10,3)

array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

#### a quicker way to make a range

In [104]:
range(10,20,2)

range(10, 20, 2)

In [105]:
np.arange(10,20,.3)

array([10. , 10.3, 10.6, 10.9, 11.2, 11.5, 11.8, 12.1, 12.4, 12.7, 13. ,
       13.3, 13.6, 13.9, 14.2, 14.5, 14.8, 15.1, 15.4, 15.7, 16. , 16.3,
       16.6, 16.9, 17.2, 17.5, 17.8, 18.1, 18.4, 18.7, 19. , 19.3, 19.6,
       19.9])

#### an array of random integers

In [106]:
# random content betwee 2 and 20 for 10 elements
np.random.randint(2,20,10)

array([ 4,  7,  8,  6,  2,  4,  6, 10, 11,  2])

#### an array of random numbers from the standard distribution

In [107]:
np.random.randn(1000000000).mean()

3.8606427621272e-06