# 3. NumPy

### Sustainable Investment Group/Biokind Analytics

##### Lucien Chen, Pranay Jha

### 3.1 What is NumPy and why do we use it?

NumPy, shorthand for Numerical Python, is an essential Python library used for scientific computing. It provides an implementation of a multidimensional array object, various derived objects such as matrices, and various methods for efficient operations on arrays which include applications to linear algebra, statistics, random simulations, etc. It is very commonly used in many fields of science and engineering and in other Python libraries such as pandas and Scikit-learn.

### 3.2 Libraries and Import Statements

In [2]:
# we install packages with pip, '!' allows us to treat the cell like a terminal
! pip install numpy



A library in Python is a set of modules of code that can be reused in various applications. We import them with the following syntax:

In [3]:
import numpy as np

We type import followed by NumPy and we use `as np` to make our code more readable. When we see np in our code, we know that the object or function is from the NumPy library.

### 3.3 NumPy arrays and ndarrays: shape, indexing, slicing, iterating

Initializing an array in NumPy is very simple with the np.array method.

In [4]:
# an example of a one-dimensional array
arr = np.array([1, 2, 3, 4, 5, 6])
arr

array([1, 2, 3, 4, 5, 6])

In [5]:
# we can also initialize a two dimensional array with a nested list
two_d = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
two_d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [6]:
# numpy is also zero-indexed like base python
# accessing the first row in our two-dimensional array
two_d[0]

array([1, 2, 3, 4])

In [7]:
# accessing a specific element in a two-dimensional array
# gets the element in the last column from the second row
two_d[1][-1]

8

In [8]:
# we can also acces elements in a list using container-like syntax, this is the preferred method as it is more efficient
two_d[:, 0] # : gets all rows, and 0 gets only the first elements

array([1, 5, 9])

In [9]:
# we can also index arbitrary elements in an array with the following syntax
two_d[np.array([0, 2]), 0] # gets the first and last row in the array, and their first element

array([1, 9])

We use array methods like .ndim to find the number of dimensions in our array, .size to find the number of elements, and .shape to find the shape of the array.

In [10]:
two_d.ndim # 2

2

In [11]:
two_d.size # 12

12

In [12]:
# similar to n x m of matrices in the mathematical sense
two_d.shape # 3, 4

(3, 4)

In [13]:
# we can also iterate through numpy arrays
for x in two_d:
    print(x) # prints every row in the array

[1 2 3 4]
[5 6 7 8]
[ 9 10 11 12]


In [14]:
for x in two_d:
    for y in x:
        print(y) # prints every element, the element in each column for each row

1
2
3
4
5
6
7
8
9
10
11
12


In [15]:
# nditer allows us to condense this, helpful for navigating through arrays with higher dimensionality
for x in np.nditer(two_d):
    print(x)

1
2
3
4
5
6
7
8
9
10
11
12


### 3.4 Common operations: element-wise, sum(), min(), max(), median(), mean(), std(), append(), concatenate(), reshape(), flatten()

Many opeartors and functions on NumPy arrays are performed element-wise, meaning they are applied to each elemtn individually.

In [16]:
np.random.seed(0) # sets a seed so everyone will have the same "random" results
arr = np.random.rand(3, 3) # creates a 3 x 3 matrix of random elements
arr

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411],
       [0.43758721, 0.891773  , 0.96366276]])

In [17]:
# multiplying a scalar by the matrix
2 * arr

array([[1.09762701, 1.43037873, 1.20552675],
       [1.08976637, 0.8473096 , 1.29178823],
       [0.87517442, 1.783546  , 1.92732552]])

In [18]:
# adding 1 to each element in the matrix
arr + np.ones((3, 3)) # we will cover np.ones later on

array([[1.5488135 , 1.71518937, 1.60276338],
       [1.54488318, 1.4236548 , 1.64589411],
       [1.43758721, 1.891773  , 1.96366276]])

In [19]:
arr.sum() # adds up every element

5.7742213143196475

In [20]:
arr.sum(axis=0) # adds up the elements row wise, axis 0 is the x axis and axis 1 is the y axis

array([1.5312839 , 2.03061717, 2.21232025])

In [21]:
arr.sum(axis=1) # adds up the elements column wise

array([1.86676625, 1.6144321 , 2.29302297])

In [22]:
arr.max() # finds the max

0.9636627605010293

In [23]:
arr.min() # finds the min

0.4236547993389047

In [24]:
np.median(arr) # finds the median

0.6027633760716439

In [25]:
arr.mean() # finds the mean

0.6415801460355164

In [26]:
arr.std() # finds the standard deviation

0.17648980804276407

Unlike in base Python, NumPy arrays have a fixed size upon creation so appending an element returns a copy. This is actually one advantage Python has over NumPy since in order to keep the changes we make to an array, we must reassign each time.

In [27]:
np.append(arr, np.array([[0, 0, 0]]), axis=0) # when specifying the axis, the shape must matche hence why we have a 1 x 3 array passed as the argument rather than a 1 dimensional array

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411],
       [0.43758721, 0.891773  , 0.96366276],
       [0.        , 0.        , 0.        ]])

In [28]:
np.append(arr, np.array([0, 0, 0])) # notice the difference in properties

array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ,
       0.64589411, 0.43758721, 0.891773  , 0.96366276, 0.        ,
       0.        , 0.        ])

In [29]:
arr # notice how the new elements aren't there

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411],
       [0.43758721, 0.891773  , 0.96366276]])

We can use np.concatenate to concatenate or merge two arrays along a specific axis.

In [30]:
a = np.array([[1, 2]])
b = np.array([[3, 4], [5, 6]])
np.concatenate((a,b), axis=0) # contain all the arrays you want to concatenate using (), and specify the axis; note that the shapes must match

array([[1, 2],
       [3, 4],
       [5, 6]])

In [31]:
np.concatenate((a, b), axis=None) # specifying axis=None returns a 1 dimensional array, known as flatenning the array

array([1, 2, 3, 4, 5, 6])

In [32]:
b.flatten() # making a multidimensional array "flatter" by converting it to a one dimensional array

array([3, 4, 5, 6])

In [33]:
a # originally a 1 x 2 matrix

array([[1, 2]])

In [34]:
# we can also reshape arrays
a.reshape(2, 1) # reshapes a to 2 x 1 matrix

array([[1],
       [2]])

In [35]:
b

array([[3, 4],
       [5, 6]])

In [36]:
b.reshape(1, -1) # -1 means that we "don't know" the dimension so numpy will calculate a number for us

array([[3, 4, 5, 6]])

In [37]:
arr.flatten() # flattens our array from early to 1-d

array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ,
       0.64589411, 0.43758721, 0.891773  , 0.96366276])

### 3.5. Preset array creation: empty(), zeros(), full(), arange(), random.rand() 

NumPy has many built in methods to easily create arrays for operations

In [38]:
np.empty([2, 2]) # creates an "empty" array of specified dimension

array([[-0.00000000e+000, -1.49457921e-154],
       [-9.88131292e-324,  2.82476581e-309]])

In [39]:
np.zeros([2, 2]) # creates an array of specified dimension but initializes values to be 0; marginally slower than empty

array([[0., 0.],
       [0., 0.]])

In [40]:
np.ones([2, 2]) # creates an array of specified dimension but initializes values to be 1

array([[1., 1.],
       [1., 1.]])

In [41]:
np.full([2, 2], fill_value=100) # creates an array of specified dimension with a specified value

array([[100, 100],
       [100, 100]])

In [42]:
np.arange(100) # numpy equivalent of Python's range except instead of returning a generator object, it creates a 1-d array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [43]:
np.random.rand(2, 4) # creates a 2 x 4 matrix of random numbers between 0 and 1

array([[0.38344152, 0.79172504, 0.52889492, 0.56804456],
       [0.92559664, 0.07103606, 0.0871293 , 0.0202184 ]])

### 3.6. Vector/matrix operations: transpose, dot(), matmul()

As we discussed earlier, NumPy has built-in support for linear algebra applications and operations.

In [44]:
vec = np.array([[1, 2, 3, 4, 5]]) # think of this as a 1 x 5 vector
vec

array([[1, 2, 3, 4, 5]])

In [45]:
# we can transpose matrices and vectors with .T
vec.T # converts to 5 x 1

array([[1],
       [2],
       [3],
       [4],
       [5]])

In [46]:
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])
np.dot(v, w) # calculates a dot product of two arrays or "vectors"

32

In [47]:
np.outer(v, w) # creates the outer product of two vectors

array([[ 4,  5,  6],
       [ 8, 10, 12],
       [12, 15, 18]])

In [48]:
a = np.random.rand(3, 3)
b = np.random.rand(3, 3)
np.matmul(a, b) # performs matrix multiplication for two matrices

array([[0.83888711, 1.48696026, 1.0533238 ],
       [0.68217276, 1.39821275, 1.13809693],
       [0.45283551, 1.13238465, 0.51091387]])

In [49]:
np.linalg.eigvals(a) # finds the eigenvalues of an array

array([ 2.10969709, -0.28492506,  0.44692739])