# Numpy

This lesson will discuss the Numpy package, which does super-fast math in Python.

In [1]:
#import numpy package using the 'np' abbreviation
import numpy as np

Numpy does everything that our normal lists can do, and more. When we want to sort only a few values, we can use a standard python list. However, once we get to more values than that, a list can quickly become a performance bottleneck. Here's where we will discuss numpy arrays.

In [2]:
#Let's start with Python lists
a=[1,2,3]
b=[4,5,6]
print('a + b=', a+b)

a + b= [1, 2, 3, 4, 5, 6]


In [3]:
#Here's where we run into a problem. Let's try to multiply the two lists

try:
    print(a*b)
except TypeError:
    print('Lists cannot be multiplied in Python')

Lists cannot be multiplied in Python


To multiply lists of values, we must use numpy, Python's library for numerical computing. You can use numpy for everything from arithmetic to highly complex math. Numpy can even be your calculator. 

<b>Performance: Lists vs. numpy arrays</b>

In [4]:
#Numpy arrays are stored in memory in a much more efficient way than Python lists.
#Let's compare them to see the respective advantages

#create the objects
import numpy as np
my_array = np.arange(1000000)
my_list = list(range(1000000))

In [5]:
#%timeit
#now let's compare the two
%time for _ in range(10): my_array2 = my_array * 2

CPU times: user 22.3 ms, sys: 11.2 ms, total: 33.5 ms
Wall time: 42 ms


In [6]:
%time for _ in range(10): my_list2 = [x*2 for x in my_list]

CPU times: user 660 ms, sys: 185 ms, total: 844 ms
Wall time: 871 ms


In [7]:
#If we look at this, we can see that the numpy array is several times faster than lists. 
#This is why many scientific and data science applications use numpy when they can.

<b>Copying Arrays</b>

We copy arrays with a special np.copy method.

In [8]:
array_1 = [2, 4, 6]

In [9]:
array_2 = array_1.copy()

In [10]:
print('Array one is:',array_1)
print('Array two is:',array_2)

Array one is: [2, 4, 6]
Array two is: [2, 4, 6]


In [11]:
#Let's change something in array one to show its a separate array
array_1[0]=3
print('Array one is now:', array_1)
print('Array two is now:', array_2)

Array one is now: [3, 4, 6]
Array two is now: [2, 4, 6]


In [12]:
#We copied them successfully, now let's look at an easy mistake to make

<i>Warning: copying an array on its own will not work</i>

In [13]:
#let's illustrate
a = np.array([1,2,3])

In [14]:
b = a
print('a', a)
print('b', b)

a [1 2 3]
b [1 2 3]


Seems to work fine, but let's change something

In [15]:
a[0] = 2
print('Array a:', a)
print('Array b:', b)
print('It is the same array, we need to use the copy method')

Array a: [2 2 3]
Array b: [2 2 3]
It is the same array, we need to use the copy method


<b>Saving and loading arrays</b>

Numpy comes with built-in save and load ability. We can save the arrays to disk and then load them again.

In [16]:
#let's get our directory
import os
os.getcwd()
os.listdir()

a = [1,2,3]

#let's make a file:
np.save('array_one.npy', a)

In [17]:
#let's load the file

a1 = np.load('array_one.npy')
print(a1)

[1 2 3]


<b>Numpy array basics</b>

In [18]:
#numpy has a basic object known as the array
#here's an array with only one number
a = np.array([1])
b=np.array(2)

In [19]:
print(a)
print(b)
a*b

[1]
2


array([2])

In [20]:
#More commonly, numpy arrays have only one type of value. We can initialize them with a Python list.
array_1 = [1, 2, 3]
a_1 = np.array(array_1)

#this is the same as
array_2 = np.array([4,5,6])
print(array_1, 'is a', type(array_1))
print(a_1, 'is a', type(array_2))

[1, 2, 3] is a <class 'list'>
[1 2 3] is a <class 'numpy.ndarray'>


We can add two numpy arrays together

In [21]:
ar1 = np.array([1,2,3])
ar2 = np.array([2,4,6])
ar3 = ar1+ar2
print('ar1', ar1, '+', 'ar2', ar2, '=', ar3)

ar1 [1 2 3] + ar2 [2 4 6] = [3 6 9]


In [22]:
#Let's subtract two numpy arrays:

ar4 = ar1-ar2
print(ar4)

[-1 -2 -3]


In [23]:
#Let's multiply two arrays:
ar5 = ar1*ar2
print(ar5)

[ 2  8 18]


In [24]:
#Let's divide two arrays
ar6 = ar1/ar2
print(ar6)

[0.5 0.5 0.5]


<i>Making arrays from functions</i>

We can also use functions to make arrays. One example of this is the `range` function.

In [25]:
c = np.array(range(1, 11)) #all the numbers from 1 to 11
print(c)

[ 1  2  3  4  5  6  7  8  9 10]


In [26]:
a_squared= lambda x: x**2
a_sq = np.array([a_squared(i) for i in range(1, 10)])
print(a_sq)

[ 1  4  9 16 25 36 49 64 81]


Numpy also offers the `arange` function to create arrays without using the `range` function.

In [27]:
print(np.arange(11))

c = np.arange(21)
print(c, 'Notice how we specify 21 as the end point to have it end at 20', sep='\n')

[ 0  1  2  3  4  5  6  7  8  9 10]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]
Notice how we specify 21 as the end point to have it end at 20


In [28]:
#We can also use the np.linspace function to create an evenly spaced array of numbers
print(np.linspace(1, 10, 10)) #creates an array with numbers from 1 to 10

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]


In [29]:
#The advantage of np.linspace is that it lets us create arrays with different quantities of numbers
#let's get an array of five numbers between 1 and 10
print(np.linspace(1, 10,5)) #These numbers are all evenly spaced

[ 1.    3.25  5.5   7.75 10.  ]


In [30]:
#or 10 numbers between 1 and 20
print(np.linspace(1, 20, 10))

[ 1.          3.11111111  5.22222222  7.33333333  9.44444444 11.55555556
 13.66666667 15.77777778 17.88888889 20.        ]


In [34]:
#we can also use the linspace function to create an array between 1 and 4:
a = np.linspace(1, 4, 4)
print(a)
#or 1 and 10:
b = np.linspace(1, 10, 10)
print(b)

[1. 2. 3. 4.]
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]


In [32]:
#Let's get the maximum and minimum values
print('This:', np.max(a_sq))
print('does the same thing as this:', a_sq.max())
#
print('Minimum:', np.min(a_sq))
print('Also the minimum: ', a_sq.min())

This: 81
does the same thing as this: 81
Minimum: 1
Also the minimum:  1


In [33]:
#Let's get the type of the array
print(a_sq.dtype)
#np.dtype(a_sq)

int64


In [30]:
#So the basic dtype of an array is int64, which is a 64 bit integer. For smaller
#values we can use types taking less memory.

#default value is 64 bit integer
a = [1,2,3]
a_arr = np.array(a)
print('Array', a, 'dtype:', a_arr.dtype)

#Let's create an array with a specific datatype:
d = [1,2,3,4]
d_arr = np.array(d, dtype='int32') #32 bit integer datatype
print(d_arr)

d_fl_arr = np.array(d, dtype='float32') #32 bit float datatype
print(d_fl_arr)

d_fl64_arr = np.array(d, dtype='float64') #64 bit float datatype
print(d_fl_arr)

#Let's make one with mixed datatypes, we'll find that ints and floats result
#in an array of floats:

#Now let's make an array with integers and floats and see what happens
b=np.array([1, 2.4, 3])
print(b)
print(b.dtype)
#So the type has been made into "float" becuase it includes integers and floats

Array [1, 2, 3] dtype: int64
[1 2 3 4]
[1. 2. 3. 4.]
[1. 2. 3. 4.]
[1.  2.4 3. ]
float64


<b>Nested Arrays</b>

We can nest arrays inside one another to create matrices. To create a 3x3 matrix, we simply create an array consisting of three 1x3 subarrays.

In [31]:
data = [[1,2,3], [4,5,6], [7,8,9]]
data_array = np.array(data)
print(data_array)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [32]:
#we can do the same thing with lists of any length:
b=[[1,2],[3,4]]
b_array=np.array(b)
print(b_array)

[[1 2]
 [3 4]]


In [33]:
#or even different lengths
a=[1, [2,3]]
a_array = np.array(a)
print(a_array)

[1 list([2, 3])]


<b>Special types of arrays</b>

Numpy provides us with the ability to create several types of arrays automatically. Sometimes we don't know the contents of an array when we create it.

In [34]:
#We can make an empty array, which will usually have whatever data was in the 
#memory before we created it

a1 = np.empty((2,2))
a2 = np.empty((2,3))
print(a1)
print(a2)

#The numbers below are whatever was in the memory when the array was created.

[[2.00000000e+000 1.29074474e-231]
 [6.95034959e-310 3.47637728e-309]]
[[ 2.00000000e+000 -2.32036296e+077  1.48219694e-323]
 [ 0.00000000e+000  0.00000000e+000  4.17201348e-309]]


In [35]:
#We can make an array with all zeros:

one_by_one_zeros = np.zeros(1)
one_by_two_zeros = np.zeros(2)
print(one_by_one_zeros)
print(one_by_two_zeros)

#We can also use an existing format:
a=[[1,2,3],[4,5,6], [7,8,9]]
a=np.array(a)
print('array a', a)
a_zeros = np.zeros_like(a)
print('an array with all zeros with the same shape as a', a_zeros)

[0.]
[0. 0.]
array a [[1 2 3]
 [4 5 6]
 [7 8 9]]
an array with all zeros with the same shape as a [[0 0 0]
 [0 0 0]
 [0 0 0]]


In [36]:
#We can also make an array with all ones:

#make a single row numpy array
one_by_three_ones = np.ones(3)
print(one_by_three_ones)

[1. 1. 1.]


In [37]:
two_by_three_ones = np.ones((2, 3))
print('two by three array with all ones', two_by_three_ones)
three_by_three_ones = np.ones((3,3))

#We can also use the shape of an existing array, like we did with zeros
d=[[1,2,3],[4,5,6], [7,8,9]]
print('array d')
d=np.array(d)
print(d)
d_ones = np.ones_like(d)
print('An array of ones with the same shape as d', d_ones)

#We can also pass a tuple and it will copy the size. Be careful with this 
#as it copies the size of the tuple, not its values.
arr_size = (2, 3)
e = np.ones_like(arr_size)
print('We used the ',e)

two by three array with all ones [[1. 1. 1.]
 [1. 1. 1.]]
array d
[[1 2 3]
 [4 5 6]
 [7 8 9]]
An array of ones with the same shape as d [[1 1 1]
 [1 1 1]
 [1 1 1]]
We used the  [1 1]


We can also create arrays that use any other number using `np.full`

In [38]:
#We pass in the dimensions as a tuple, and the value as an integer
full_99s = np.full((3,3), 99)
print('This array has all 99s', full_99s, sep='\n')

full_fours = np.full_like((3,4), fill_value=4)
print('This array has all 4s', full_fours)

This array has all 99s
[[99 99 99]
 [99 99 99]
 [99 99 99]]
This array has all 4s [4 4]


There is one more type of array we need to create: the identity array. This type of array has a 1 going diagonally across it. We use `np.eye()` to create this array.

In [45]:
two_by_two_eye = np.eye(2)
print('two by two identity matrix', two_by_two_eye, sep='\n')
#This we only need to enter one number because the eye matrix is square
four_by_four = np.eye(4)
print('four by four identity matrix', four_by_four, sep='\n')

two by two identity matrix
[[1. 0.]
 [0. 1.]]
four by four identity matrix
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [53]:
#Note: when we multiply a matrix times the identiy matrix we get the same matrix
a = np.array([[1,2], [3,4]])
print('Multiply a times the two by two identity matrix', a.dot(two_by_two_eye), sep='\n')

Multiply a times the two by two identity matrix
[[1. 2.]
 [3. 4.]]


<b>Array datatypes</b>

In [113]:
#Each array consists of only one datatype: we can choose these datatypes when we create 
#the array:

a=np.array([1,2,3])
#Let's get the type
print(a)
print(a.dtype)

b = np.array([1,2,3], dtype='float64') #defaults to float64
print(b)
print(b.dtype)

#this is somewhat advanced, but if we want to limit the size of the memory 
#consumed we can use a smaller type 

b_float32 = np.array([0,2,3], dtype='float32')
print('Array b in float32:',b_float32)
print('Array b in float32 dtype', b_float32.dtype)

#Note that float32 is not always going to be able to display the biggest numbers

[1 2 3]
int64
[1. 2. 3.]
float64
Array b in float32: [0. 2. 3.]
Array b in float32 dtype float32


<i>Datatypes for creating special array types</i>

Sometimes we may need to create special arrays with a certain datatype. These might include an array of ones that is made up of floats, for example. We can choose the datatype by specifying it when we create the array.

In [59]:
#Array of zeros made of integers
zeros_ints = np.zeros(3, dtype='int64')
print(zeros_ints)

#Array of zeros made of floats
floats_ints = np.zeros(3, dtype='float64')
print(floats_ints)

[0 0 0]
[0. 0. 0.]


<b>Multidimensional arrays</b>

We can nest arrays inside one another to create matrices. To create a 3x3 matrix, we simply create an array consisting of three 1x3 subarrays.

In [57]:
#We can create a matrix with nested lists in numpy
data = [[1,2,3], [4,5,6], [7,8,9]]
data_array = np.array(data)

In [58]:
#Let's print out our array and get some information about it
print(data_array)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [59]:
#Let's access the individual elements:
#The first row:
print('The first row is: ', data_array[0])
print('The second row is: ', data_array[1])
print('The third row is: ', data_array[2])

In [92]:
#let's get the number of dimensions in data_array
data_array.ndim
#The number of dimensions in an array is also called the array's rank
print('The rank of the array is', data_array.ndim)

In [96]:
#We can also go to arrays with more than two dimensions by nesting lists to
#create them

three_dim_list = [[[2,3],[1,3]],[[3,2], [8, 1]]]
result = np.array(three_dim_list)
print(result)
print('The rank of the array is', result.ndim)

In [154]:
#We access these arrays via the index
print('The first subarray', result[0])
print('The first row of the first subarray', result[0,0])
print('The first element in the first row', result[0,0,0])

#likewise
print('The second element in the first row of subarray 3: ', result[1,1,0])

The first subarray [[2 3]
 [1 3]]
The first row of the first subarray [2 3]
The first element in the first row 2
The second element in the first row of subarray 3:  8


<b>Slicing Arrays</b>

In [60]:
#Let's access the elements within each row. It's indexed like a tuple. We start with the 
#outermost part of the array and move inwards.

#get the first row (as above):

data2 = [[1,2,3], [4,5,6], [7,8,9]]
data_array2 = np.array(data2)
print('The first row is: ', data_array2[0])
print('The first element of the first row is: ', data_array2[0, 0])
print('The second element of the first row is: ', data_array2[0, 1])
print('The third element of the first row is: ', data_array2[0, 2])
#Likewise, we access the second row of the matrix with
print('The second row is:', data_array2[1])
#get the second element of the third row
print('The second element of the third row is: ', data_array2[2,2])
#We can also index the elements from the end:
print('The last element of the first row is:', data_array2[0, -1])
print('The last row is:', data_array[-1])

The first row is:  [1 2 3]
The first element of the first row is:  1
The second element of the first row is:  2
The third element of the first row is:  3
The second row is: [4 5 6]
The second element of the third row is:  9
The last element of the first row is: 3
The last row is: [7 8 9]


In [61]:
#Accessing elements by column
#make a list with list comprehension
a = [[i for i in range(1, 4)],[i for i in range(4, 7)],[i for i in range(7, 10)]]
#make it into an array
a = np.array(a)
print(a)
#The first column only
a[:,0]
#to second column only
a[:,1]

[[1 2 3]
 [4 5 6]
 [7 8 9]]


array([2, 5, 8])

In [89]:
#Changing elements of an array:
#We can change array elements in the same manner we use to set them
a1 = [[i for i in range(1, 4)],[i for i in range(4, 7)],[i for i in range(7, 10)]]
a1 = np.array(a1)
print(a1)
#get the value of the top left element in a1:
print('Top left value in ',a1[0,0])
#change the value of the top left element to 2:
a1[0,0] = 2
print('The array has a new top left value', a1)
#replacing a column:
print('The first column of a1',a1[:,0])
a1[:,0] = [1,3,4]
print(a1)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
Top left value in  1
The array has a new top left value [[2 2 3]
 [4 5 6]
 [7 8 9]]
The first column of a1 [2 4 7]
[[1 2 3]
 [3 5 6]
 [4 8 9]]


<b>Basic math operations on arrays</b>

Let's say that we have two arrays, and we want to add them together.
Let's start with two arrays that were selected randomly.

In [164]:
#We can get random numbers using numpy. Let's get random integers with numpy's
#random.randint function
random_3_3 = np.random.randint(-2,2, size=(3,3))
print(random_3_3)
#Let's make another array of the same dimensions:
random2 = np.random.randint(-2, 4, size=(3, 3))
print(random2)

[[-1  1  1]
 [ 1 -2 -1]
 [-1  0 -1]]
[[ 0  0  3]
 [-2  2  2]
 [-1  3  3]]


We don't need to know how to do matrix multiplication for this lesson, but we do know that we don't multiply matrices element by element. Instead we multiple each row by each column.

<i>The following usually doesn't work.</i>

In [167]:
print('Multiply each element by its corresponding element')
print(random_3_3*random2)

Multiply each element by its corresponding element
[[ 0  0  3]
 [-2 -4 -2]
 [ 1  0 -3]]


In order to actually multiply two arrays properly, we use the `np.dot` method.

In [169]:
print('Proper matrix multiplication')
print(random_3_3.dot(random2))

Proper matrix multiplication
[[-3  5  2]
 [ 5 -7 -4]
 [ 1 -3 -6]]


<b>Array oriented programming</b>

With numpy arrays, we can apply normal math functions across the entire array.

<b>Manipulating array sizes</b>

We can change the shape of arrays with numpy

In [49]:
a = np.array(((1,2), (3,4)))
print(a)

[[1 2]
 [3 4]]


In [50]:
#Let's get the shape of a
print('The shape of array a is:', a.shape)

The shape of array a is: (2, 2)


In [51]:
#Now let's make array a into a one-dimensional array with all the elements in the same row
a_flat = a.flatten()
print(a_flat)
#Flatten creates a new copy of an array, so we can use it even if we change the original array
a[0] = ([3,4])
print(a)

[1 2 3 4]
[[3 4]
 [3 4]]


In [52]:
#If we don't want to change the array, and we just want to view what it would look like flattened, we can use
#the "ravel" method, like as in "unravel"
a_view = a.ravel()

In [53]:
#Let's look at ravel some more. We will create a new array and then use ravel
b = a.copy()
print('The original array b',b)
print('The flattened view of the array' , b.ravel())
print('ravel() does not change the original array:', b)

The original array b [[3 4]
 [3 4]]
The flattened view of the array [3 4 3 4]
ravel() does not change the original array: [[3 4]
 [3 4]]


Remember the difference between flatten and ravel. Flatten returns a new flat array with the same elements, while ravel returns merely a view of the existing array.

If we change the original array, then we've also changed the view.

In [90]:
#let's make a new array we'll call original
original = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(original)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [91]:
#now we'll make a duplicate with flatten
flat_original = original.flatten()
print(flat_original)

[1 2 3 4 5 6 7 8 9]


In [92]:
#Now we'll use ravel to create a view
ravelled_original = original.ravel()
print('We have called ravel:' ,ravelled_original)

We have called ravel: [1 2 3 4 5 6 7 8 9]


In [93]:
#now we'll manually swap two of the rows in the original.

original[1]=[1,2,3] #replace the second row with the first row
original[0]=[4,5,6] #replace the first row with the second row

#print the original again
print(original)
print('The ravelled version has changed', ravelled_original)
print('But the flattened version has not', flat_original)

#print('but when we use flatten()', b.flatten())
print(b)

[[4 5 6]
 [1 2 3]
 [7 8 9]]
The ravelled version has changed [4 5 6 1 2 3 7 8 9]
But the flattened version has not [1 2 3 4 5 6 7 8 9]
[[3 4]
 [3 4]]


<b>Reshaping Numpy Arrays</b>

We can change the shape of numpy arrays. When we make an array, we can use a sequence on one row, then reshape it into multiple rows.

In [11]:
new = np.linspace(1,9,9)
print('Before any changes:', new)
#let's call the reshape method on it
new.resize(3,3)
print('After reshaping to 3x3:', new, sep='\n')

Before any changes: [1. 2. 3. 4. 5. 6. 7. 8. 9.]
After reshaping to 3x3:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
