# Module 3 - Common Python Libraries 1 

## <u> NumPy </u>

Working with numbers is central to almost all scientific and engineering computations. 
The topic is so important that there are many dedicated libraries to help implement efficient numerical
computations. NumPy (http://www.numpy.org/), or numerical Python, is a library that provides an extensive range of data structures and functions for numerical
computation. It provides an interface to efficiently store and operate on arrays. Arrays are similar to Pythons lists data type but they are able to be operated on much more efficiently when they get large. This speed, and the range of tools that NumPy provides, has meant that NumPy provides almost all of the data science tools that are available in Python. 

The first thing to do is import NumPy into our script so that we can use its functions. One of the advantages of using Anaconda is that numpy comes pre-installed. The same applies for Azure, as it is comes with Conda pre-installed. If it isnt installed, refer to module 1 for information on how to install it.

It 

In [2]:
# We begin by importing the numpy package. We can specify how we want to call numpy throughout the rest of the
# script by putting our special alias after the 'as' command. In this case we have gone for 'np'.

import numpy as np

### <u> Built in documentation </u> 

As you read through this notebook there might be packages that you haven't come across, or ones where you aren't sure of the proper way to use them. If this becomes the case, you can quickly explore the contents of a package by using the tab-complete feature of IPytho (these notebooks). Place the curser within the parenthesis of the function  and press shift+tab. If you press it multiple times, then more information will appear on your screen.

You can also bring up the full documentation of a function by using the '?' character, or by specifying 'help()' with the function in question written within the parenthesis.



### <u> Creating arrays from Python Lists </u> 

We have already seen Python 'lists', which hold 'arrays' of data. We can access the elements of a list using an index because the entries are stored in order. Python lists are very flexible and can hold mixed data types, e.g. combinations of floats and strings, or even lists of lists etc


The flexibility of Python lists comes at the expense of performance. Many science, engineering and mathematics problems involve very large problems with operations on numbers, and computational speed is important for large problems. To serve this need, we normally use specialised functions and data structures for numerical computation, and in particular for arrays of numbers. Some of the flexibility of lists is traded for performance.

In [3]:
# Here we convert a python list into a numpy array

our_list = [1,2,3,4,5]
our_np_array = np.array(our_list)

In [4]:
# Or we can turn a list straight into an array

a_quick_np_array = np.array([10,20,30,40,50])

In [6]:
# Notice here the subtle difference between the outputs of the Python lists and the numpy arrays. Also look at the
# data types returned for each

print(our_list, type(our_list))
print(our_np_array, type(our_np_array))
print(a_quick_np_array, type(a_quick_np_array))

[1, 2, 3, 4, 5] <class 'list'>
[1 2 3 4 5] <class 'numpy.ndarray'>
[10 20 30 40 50] <class 'numpy.ndarray'>


NumPy is constrained to arrays that all contain the same data type, unlike Python lists. NumPy will, however, help out and upcast ints to floats etc.

In [8]:
# Notice that the integers are upcast to floats

diff_dtypes_array = np.array([1., 2, 3, 4])
print(diff_dtypes_array)

[1. 2. 3. 4.]


We can explicitly set the data type though using a keyword argument. 

In [None]:
# Notice the output is all floats

set_dtype = np.array([1, 2, 3, 4], dtype = 'float32')
print(set_dtype, type(set_dtyp)

NumPy arrays can also be multidimensional. One way of doing this is to create a list of lists

In [14]:
# A two dimensional array is created by having the inside lists treated as rows of the array

matrix = np.array(([1,2,3], [4,5,6], [7,8,9]))
print(matrix)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


#### <u> Quick questions </u> 

Can you use a for loop to write a function to initialise an n dimensional array?

### <u> A more efficient method of creating arrays </u> 

There are several routines in NumPy that allow you to create large arrays very quickly. We will look at some in this section. For more information on different NumPy array building routines, look at the documentation found here: https://numpy.org/doc/stable/reference/routines.html

In [15]:
# Create a 4 x 4 identity matrix

np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [16]:
# Create a 4 x 4 array of random integers in the interval [5, 12)

np.random.randint(5, 12, (4, 4))

array([[ 5,  5, 10, 10],
       [10,  6,  6, 10],
       [ 7, 10,  8,  6],
       [ 7, 10, 11,  6]])

In [26]:
# Create an array filled with a linear sequence. The arange functions works like this: arange(start, stop, step)

linear_array = np.arange(15)
spaced_linear_array = np.arange(0, 20, 2)

#### <u> Quick questions </u> 

> 1) Can you create an array of zeroes? Is there a NumPy function to do this? <br>
> 2) Using Numpy functions, create a 3 dimensional array of random normally distributed values <br>
> 3) Create an array of 8 values equally spaced between 0 and 2 <br>
> 4) What uses can you think of for initialising an empty array? <br>
> 5) Access the extended NumPy array docstring from the Jupyter Notebook. What information is included here? <br>
> 6) Find out the standard Numpy data types. Which ones do you think are used most commonly?


### <u> The Basics of NumPy Arrays </u> 

Most data manipulation is Python is done via NumPy. Even tools such as Pandas are built around using NumPy arrays. In this section we will cover a few catagories of basic array manipulation including: Indexing, Slicing, Attributes, Reshaping and Joining. 

#### <u> Array Indexing </u> 

Indexing involves getting and/or setting the value of an element in an array. In NumPy, indexing is very similar to that of the indexing carried out on normal Python lists. To access the ith element of an array (remembering Python counts from 0) the desired index is specified in square brackets.  

In [21]:
# Indexing is exactly the same as it is when indexing Python lists

array_to_be_indexed = np.arange(0, 20, 2)
print(array_to_be_indexed[4])

8


In [23]:
# Remember to index from the opposite end use a minus sign

index_from_end = np.array([2, 4, 7, 3, 7])
print(index_from_end[-2])

3


#### <u> Performance example: computing the norm of a long vector </u> 

The norm of a vector $x$ is given by: 

$$
\| x \| = \sqrt{\sum_{i=0}^{n-1} x_{i} x_{i}}
$$

where $x_{i}$ is the $i$th entry of $x$. It is the dot product of a vector with itself, 
followed by taking the square root.
To compute the norm, we could loop/iterate over the entries of the vector and sum the square of each entry, and then take the square root of the result.

We will evaluate the norm using two methods for computing the norm of an array of length 10 million to compare their performance. We first create a vector with 10 million random entries, using NumPy:

In [3]:
# Create a NumPy array with 10 million random values
x = np.random.rand(10000000)
print(type(x))

<class 'numpy.ndarray'>


We now time how long it takes to compute the norm of the vector using the NumPy function '`numpy.dot`'. We use the Jupyter 'magic command' [`%time`](Notebook%20tips.ipynb#Simple-timing) to time the operation: 

In [4]:
%time norm = np.sqrt(np.dot(x, x))
print(norm)

CPU times: user 19.7 ms, sys: 6.99 ms, total: 26.7 ms
Wall time: 36.7 ms
1825.984107874267


The time output of interest is '`Wall time`'.

> The details of how `%time` works are not important for this course. We use it as a compact and covenient tool to 
> measure how much time a command takes to execute.

We now perform the same operation with our own function for computing the norm:

In [6]:
def compute_norm(x):
    norm = 0.0
    for xi in x:
        norm += xi*xi
    return np.sqrt(norm)

%time norm =compute_norm(x)
print(norm)

CPU times: user 4.09 s, sys: 22.8 ms, total: 4.12 s
Wall time: 4.15 s
1825.9841078742058


You should see that the two approaches give the same result, but the 
NumPy function is more than 100 times faster, and possibly more than 100,000 times faster!

The message is that specialised functions and data structures for numerical computations can be many orders of magnitude faster than your own general implementations. On top of that, the specialised functions are much less 
likely to have bugs!

#### <u> Quick questions </u> 

> 1) How would you access an item in a multidimensional array? Create some multidimensional arrays and access different items within them. <br>
> 2) Can you modify some values in the above array? <br>
> 3) What happens if you add a float into an integer array? Why could this be a problem?

#### <u> Array Slicing </u> 

Slicing involves getting smaller arrays from larger arrays. Again, this works in exactly the same way as slicing does on Python lists. By using the semi-colon in the square brackets we can return any sub array we choose. The standard notation is x[start:stop:step]. 

In [28]:
# We can find the first 6 elements of an array

array_to_be_sliced = np.arange(20)

array_to_be_sliced[0:6]

array([0, 1, 2, 3, 4, 5])

In [29]:
# Or we can find a middle slice

array_to_be_sliced[5:9]

array([5, 6, 7, 8])

In [32]:
# We can even find the rows or columns of a multi dimensional array

ndarray = np.array(([1,2,3], 
                    [7,8,9], 
                    [4,5,6]))

first_col = ndarray[:, 0] # Returns the first column
first_row = ndarray[0,:] # Returns the first row

print(first_col)
print(first_row)

[1 2 3]


#### <u> Quick questions </u>

> 1) What happens if you dont specify a value when slicing? What happens if you dont specify multiple values? What are the defaults? <br>
> 2) Can you reverse the direction of an array? (Hint: Remember that the index -1 is the last value of your array) <br>
> 3) How would you slice a multidimensional array?

3

#### <u> Array Attributes </u> 

The attributes of an array refer to its size and shape, the amount of space it takes in memory and the types of data it contains.

To illustrate the different attributes of arrays we can look at a one, two and three dimentional array. 

#### <u> n dimensional arrays </u> 

A one-dimensional array is a collection of numbers which we can access by index (it preserves order).

Two-dimensional arrays are very useful for arranging data in many engineering applications and for performing mathematical operations. Commonly, 2D arrays are used to represents matrices. 

In [51]:
# We can use NumPys random number generator, seeded with a set value, so that the same random arrays are generated
# each time.

np.random.seed(5) 

array1 = np.random.randint(15, size = 5) # 1D array
array2 = np.random.randint(15, size = (5, 6)) # 2D array
array3 = np.random.randint(15, size = (5, 6, 7)) # 3D array

In [47]:
# We can find the attribute 'ndim' (the number of dimensions) quite easily

print('Array 1 has', array1.ndim, 'dimension')
print('Array 2 has', array2.ndim, 'dimensions')
print('Array 3 has', array3.ndim, 'dimensions')

Array 1 has 1 dimension
Array 2 has 2 dimensions
Array 3 has 3 dimensions


In [52]:
# Finding the data type is also trivial 

print('Array 1s data type is: ', array1.dtype)
print('Array 2s data type is: ', array2.dtype)
print('Array 3s data type is: ', array3.dtype)

Array 1s data type is:  int64
Array 2s data type is:  int64
Array 3s data type is:  int64


In [132]:
# Finding the shape is also easy

print('Array 1s shape is: ', array1.shape)
print('Array 2s shape is: ', array2.shape)
print('Array 3s shape is: ', array3.shape)

Array 1s shape is:  (5,)
Array 2s shape is:  (5, 6)
Array 3s shape is:  (5, 6, 7)


#### <u> Quick questions </u>

> 1) Research what other attributes there are. In what situations would they be useful?


### <u> 2D array (matrix) operations </u>

For those who have seen matrices previously, the operations in this section will be familiar. For those who have not encountered matrices, you might want to revisit this section once matrices have been covered in the mathematics lectures.

#### <u> Matrix-vector and matrix-matrix multiplication </u>

Consider the matrix $A$:

$$
A  = 
\begin{bmatrix}
3.4 & 2.6 \\
2.1 & 4.5
\end{bmatrix}
$$

and the vector $x$:

$$
x  = 
\begin{bmatrix}
0.2 \\ -1.1
\end{bmatrix}
$$

In [7]:
A = np.array([[3.4, 2.6], [2.1, 4.5]])
print("Matrix A:\n {}".format(A))

x = np.array([0.2, -1.1])
print("Vector x:\n {}".format(x))

Matrix A:
 [[3.4 2.6]
 [2.1 4.5]]
Vector x:
 [ 0.2 -1.1]


We can compute the matrix-vector product $y = Ax$ by:

In [None]:
y = A.dot(x)
print(y)

Matrix-matrix multiplication is performed similarly. Computing $C = AB$, where $A$, $B$, and $C$ are all matrices:

In [8]:
B = np.array([[1.3, 0], [0, 2.0]])

C = A.dot(B)
print(C)

[[4.42 5.2 ]
 [2.73 9.  ]]


The inverse of a matrix ($A^{-1}$) and the determinant ($\det(A)$) can be computed using functions in the NumPy submodule `linalg`:

In [9]:
Ainv = np.linalg.inv(A)
print("Inverse of A:\n {}".format(Ainv))

Adet = np.linalg.det(A)
print("Determinant of A: {}".format(Adet))

Inverse of A:
 [[ 0.45731707 -0.26422764]
 [-0.21341463  0.34552846]]
Determinant of A: 9.839999999999998


NumPy is large library, so it uses sub-modules to arrange functionality.

A very common matrix is the *identity matrix* $I$. We can create a $4 \times 4$ identity matrix using:

I = np.eye(4)
print(I)

#### <u> Array Reshaping, joining and splitting </u>

It is sometimes very important to change the shape on an array. The 'reshape()' method makes it very easy to do. It can also be very useful to join or split up an array depending on what it is you are doing. We present some common methods here to achieve all of the above, however this list is by no means exhaustive.

In [70]:
# Here we reshape our 1D array into a 3x3 matrix. Notice how because reshape is a method, it is called after the 
# object we want to reshape

our_1D_array = np.arange(9)
our_2D_reshaped_matrix = our_1D_array.reshape(3,3)

print(our_1D_array, our_1D_array.ndim)
print(our_2D_reshaped_matrix, our_2D_reshaped_matrix.ndim)

[0 1 2 3 4 5 6 7 8] 1
[[0 1 2]
 [3 4 5]
 [6 7 8]] 2


In [71]:
# To join two arrays we can use the concatenate routine

x = np.array([1, 2, 3])
y = np.array([6, 32, 6])

z = np.concatenate([x,y])
print(z)

[ 1  2  3  6 32  6]


In [72]:
# To split two arrays we can use the split routine

z_split1, z_split2, z_split3 = np.split(z, [3, 3])
print(z_split1, z_split2, z_split3)

[1 2 3] [] [ 6 32  6]


#### <u> Quick questions </u>

> 1) Look up the 'newaxis' keyword. Use it to reshape some of the above arrays <br>
> 2) Look up vstack and hstack. How do these routines differ from the concatenate routine we have given in the example? <br>
> 3) Look up hsplit and vsplit. How do these routines differ from the split routine presented in the example


#### <u> Computations on NumPy arrays  </u>

In this section we will whizz through some common mathematical procedures that you can carry out on NumPy arrays. They all come under the banner of Universal Functions, or UFuncs. UFuncs rely on 'vectorised' operations, instead of loops, are one of the reasons that NumPy is so fast. This level of detail is slightly outside the scope of this course, so feel free to look it up further here: https://numpy.org/devdocs/reference/ufuncs.html

In [87]:
example1 = np.array([5, 10, 15])
example2 = np.array([3, 6, 9])

Here we look at the arithmetic operators that are available through NumPy. They will feel familiar to you as they
are broadly similar to that which we have seen previously in Module 2. 

In [106]:
# Each of these pairs of arithmetic functions return the same result. The operator is always given first, with 
# the equivalent UFunc coming second.

addition = example1 + example2
addition1 = np.add(example1, example2)

subtraction = example1 - example2
subtraction1 = np.subtract(example1, example2)

negation = np.negative(example1)

multiplication = example1 * example2
multiplication1 = np.multiply(example1, example2)

division = example1 / example2
division1 = np.divide(example1, example2)

floor_divide = example1 // example2
floor_divide1 = np.floor_divide(example1, example2)

power = example1 ** example2
power1 = np.power(example1, example2)

mod = example1 % example2
mod1 = np.mod(example1, example2)

In [138]:
print() # print some of the above examples to see the results




Here we will look at some of the more useful aggregation functions available in NumPy. Most aggregation functions in NumPy also have a NaN safe counterpart. This will compute the value and ignore any missing values.

In [112]:
example3 = np.array([10, 20, 30, 40])

In [130]:
# Try entering sum of these examples into the print function below to see the outputs

_sum = np.sum(example3)
product = np.prod(example3)
mean = np.mean(example3)
std_dev = np.std(example3)
variance = np.var(example3)
_min = np.min(example3)
_max = np.max(example3)
minimum_index = np.argmin(example3)
maximum_index = np.argmax(example3)
median = np.median(example3)




In [129]:
print()

3


And finally here we look at some of the boolean operators that NumPy can carry out.

In [147]:
# Each of these pairs of booleans return the same result. The operator is always given first, with 
# the equivalent UFunc coming second

equal = example1 == example2
equal1 = np.equal(example1, example2)

not_equal = example1 != example2
not_equal1 = np.not_equal(example1, example2)

less = example1 < example2
less1 = np.less(example1, example2)

less_equal = example1 <= example2
less_equal1 = np.less_equal(example1, example2)

greater = example1 > example2
greater1 = np.greater(example1, example2)

greater_equal = example1 >= example2
greater_equal1 = np.greater_equal(example1, example2)

In [149]:
print()




#### <u> Quick questions </u>

> 1) Given how straighforward the arithmetic functiond were, can you guess what the trigonometric functions are?<br>
> 2) Try out these trigonemetric functions <br>
> 3) Have a practise with some exponent and logarithmic functions <br>
> 4) Try out some of the NaN safe aggregation functions. Why is it important that these functions exist? <br>
> 5) Why is that some of the variable names for the aggregation functions start with a _. For example the variable name for np.min is _min, not min.

#### <u> Broadcasting  </u>

We need to be careful when performing arithmetic operations on n-dimensional arrays of different shapes and sizes. So far we have only seen operations performed on an element by element basis, but broadcasting allows these same operations to be performed on arrays of different sizes. Think, for example, of adding a scalar to an array.

In [133]:
# In this example 5 has been added to each index of the array. 

a = np.array([0, 1, 2])
print(a + 5)

[5 6 7]


We can also see examples of this in higher dimensions

In [137]:
# Here we add a 1D array to a 2D array. 

b = np.array([2, 3, 4])
matrix_of_ones = np.ones((3,3)) # We set it as a 2D 3 by 3 matrix

print(matrix_of_ones)
print('') # Added to space out the two matrices when printed to screen
print(matrix_of_ones + b)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

[[3. 4. 5.]
 [3. 4. 5.]
 [3. 4. 5.]]


#### <u> Rules of Broadcasting  </u>

Broadcasting in NumPy follows strict rules:

> 1) If the two arrays have a different number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side <br>
> 2) If the shape of the two arrays doesn't match in any dimension, then the array with shape 1 in that dimension is stretched to match the other shape <br>
> 3) If, in any dimension, the sizes disagree and neither is equal to 1, an error is raised

#### <u> Quick questions </u> 

> 1) Produce some arrays that cannot be broadcast together and study the error message that is returned. 


### <u> Congratulations </u>

You are now all set for using NumPy!