# Lesson: numpy Package

numpy is a very important package, described as "the fundamental package for scientific computing with Python". It is the underpinning for pandas and many AI and ML packages.

It provides the np.ndarray data type (array for short). Unlike lists, all the element in arrays must be the same data type (usually ints or float). We can do element-wise operations on these arrays to make our code simpler and shorter.  Operations on arrays work a lot faster than operations on lists.


In [1]:
import numpy as np

## Create numpy arrays

Create an numpy array from a list.  
We will use this np_squares array in many of the code cells below.

In [4]:
np_squares = np.array([1, 4, 9, 16, 25])
print("np_squares:", np_squares)
print("type:", type(np_squares))

np_squares: [ 1  4  9 16 25]
type: <class 'numpy.ndarray'>


A numpy array can be 1, 2 or more dimensions.

In [5]:
#  A numpy array can also be 2 or more dimensions
list_of_lists = [[1,2,3], [4,5,6]]
a2 = np.array(list_of_lists) # from a list of lists
a2, a2.shape, a2.ndim, a2[0,0], type(a2[0,0])

(array([[1, 2, 3],
        [4, 5, 6]]),
 (2, 3),
 2,
 np.int64(1),
 numpy.int64)

An array has many useful properties

In [8]:
print("shape:", a2.shape)
print("number of dimensions:", a2.ndim)


shape: (2, 3)
number of dimensions: 2


Access the array elemnst with index notation.

In [7]:
print("element in first row first column:", a2[0,0])
print("type:", type(a2[0,0]))

element in first row first column: 1
type: <class 'numpy.int64'>


Create numpy arrays using numpy functions

In [10]:
np_zeros = np.zeros(3) # initialise with all elements with value 0
np_zeros

array([0., 0., 0.])

In [11]:
np_ones = np.ones(3) # # initialise with all elements with value 1
np_ones

array([1., 1., 1.])

In [13]:
# Create an numpy array  with a set of random values
np_random_normal = np.random.randn(5) # standard normal distribution
np_random_normal

array([ 1.5786748 , -0.56674104, -0.42957394, -0.73846063, -0.79449132])

In [14]:
# simulate 10 throws of a fair die - a discrete, uniform distribution
dice_throws = np.random.randint(low = 1, high = 6, size = 10) 
dice_throws

array([5, 4, 1, 5, 1, 1, 5, 3, 4, 3], dtype=int32)

Create a 2D array with random values

In [16]:

a2 = np.random.randn(4, 2) 
print("array:\n", a2) 
print("shape:", a2.shape)

array:
 [[-0.77778751  1.05493573]
 [-0.24322534 -1.06412731]
 [-0.00541027 -1.47459934]
 [-0.16191043  1.33885742]]
shape: (4, 2)


## Operate on numpy arrays
We can take advantage of element-wise operations. We don't need to loop through the elements of the array.

Add (or multiply, subtract, divide...) a constant scalar value to each element in the array

In [17]:
np_squares, np_squares + 10

(array([ 1,  4,  9, 16, 25]), array([11, 14, 19, 26, 35]))

Add (or multiply...) two arrays.  This is an element-wise operation.  The arrays must be the same size.

In [18]:
a = np.array([3,4,5])
b = np.array([30,40,50])
print("add two arrays:", a + b)
print("multiply two arrays:", a * b)


add two arrays: [33 44 55]
multiply two arrays: [ 90 160 250]


## Filter arrays with boolean expressions

A boolean expresion on an array will return an array of booleans the same length as the array.

In [19]:
np_squares = np.array([1, 4, 9, 16, 25])
np_squares < 15

array([ True,  True,  True, False, False])

We can use this boolean array in slice notation to returns a smaller array with only those elements that meet the criteria.

In [20]:
np_squares[np_squares < 15]

array([1, 4, 9])

## Slicing and Indexing Arrays

We can index and slice array in the usual fashion.

In [21]:
# returns the element at index 2
indexed = np_squares[2] 
indexed

array([ 9, 16])

In [None]:
# returns the elements from index 2 to 4 (not including 4)
sliced = np_squares[2:4] 
sliced

## Math (aggregaton) operations

We can aggregate array of numeric elements.


In [23]:
print("np_squares:", np_squares)
print("total sum:", np_squares.sum())
print("smallest value:", np_squares.min())
print("largest value:", np_squares.max())
print("average:", np_squares.mean())


np_squares: [ 1  4  9 16 25]
total sum: 55
smallest value: 1
largest value: 25
average: 11.0


## Reshape Arrays

In [25]:
evens = np.arange(2, 26, 2) # start at 2, stop before 26, step by 2
print("evens\n", evens)
print("shape:", evens.shape) # note the trailing comma in the result to indicate a tuple of 1 element

evens
 [ 2  4  6  8 10 12 14 16 18 20 22 24]
shape: (12,)


In [26]:
reshaped = evens.reshape(3, 4) # reshape to 3 rows, 4 columns
print("reshaped\n", reshaped)
print("shape:", reshaped.shape)

reshaped
 [[ 2  4  6  8]
 [10 12 14 16]
 [18 20 22 24]]
shape: (3, 4)


# Stack Arrays

In [27]:
# Create a couple of 2D arrays
array1 = np.array([[1,2,3], [4,5,6]])
array2 = np.array([[7,8,9], [10,11,12]])
print("array1\n", array1)
print("array2\n", array2)


array1
 [[1 2 3]
 [4 5 6]]
array2
 [[ 7  8  9]
 [10 11 12]]


In [28]:
# Stack vertically
vstacked = np.vstack((array1, array2)) 
print("vstacked\n", vstacked)

vstacked
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [29]:
# Stack horizontally
hstacked = np.hstack((array1, array2))
print("hstacked\n", hstacked)

hstacked
 [[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]
