## Working with Numbers

### Introduction to Numerical Python (NumPy)

NumPy is one of the most powerful python libraries for numerical modelling and simulation. Basically, NumPy provides n-dimensional arrays (of numbers) for numerical calculations. Thus, it is very useful for mathematical operations involving matrices, and it is very broad. However, due to the diverse backgrounds of the participants, our use of NumPy will be limited to simple arrays. Kindly visit the official page of NumPy https://numpy.org/devdocs/user/quickstart.html

Like any other library, NumPy must be imported prior to first use. A simple syntax to import numpy is given below:

<code>import numpy as np</code>

In [None]:
import numpy as np

### Creating numpy arrays

Next, we would define a list arr_elements given as: <code> arr_elements = [2, 0, 1, -1, 3, 5, 2, -3, 4] </code>.

Now, we can define our first numpy array as <code> arr = np.array([arr_elements]) </code>

Remember we said numpy gives n-dimensional arrays of numbers. So, the question is how can we examine the dimensions **(nrows, ncols)** of a given array. That can be achieved using the **shape** method on a given array. The simple syntax to do that is given as 
<code>array_name.shape</code>.

In [None]:
# the shape of our previous example array is given as 

This means **arr** contains *1 row and 9 columns* or simply put, it is *1 x 9 array*

You may be wondering if it's possible to change the dimensions of an array; for example, changing the above array to a 3 x 3 array. The quick answer is yes. This can be done using any of the following two syntaxes:

1. <code> np.reshape(array_name, new_shape, order) </code>
2. <code> array_name.reshape(new_shape, order) </code>

In both cases, new_shape is a tuple given by (nrows, ncols) and order is optional.

In [None]:
# let's form a new  3 by 3 array A by reshaping our previous example

In [None]:
# let's examine the shape of A

It is obvious from this result that multiplying nrows by ncols gives the number of elements in a given array. But, can we actually check the number of elements without necessarily obtaining the shape of the array. Again, the answer is yes. The **size** method does this for us. It can be called using the syntax <code>array_name.size</code>

In [None]:
# let's try this with our array A
A.size

In the first example, I created an array from a single list and as such, I ended up with a 1-dimensional array (or a row vector in mathematical terms). We then reshaped that array into a 3 by 3 array using the resphae function which simply partitions the original list into three sub-lists. That simply means that we could directly create arrays of any shape by specifying the individual sub-lists making up each row of the array. First the sub-lists are collected in a parent list to achieve this. The general syntax can therefore be written as:

<code>array = np.array([sublist1, sublist2, sublist3, ..., sublistn])</code> 

Note that tuples can be used instead of sublists but the tuples also have to collected in a parent list as follows:

<code>array = np.array([tuple1, tuple2, tuple3, ...,tuplen])</code> 

In [None]:
# let's implement each of these
A = np.array([[2,  0,  1], [-1,  3,  5], [ 2, -3,  4]])
B = np.array([(2,  0,  1), (-1,  3,  5), (2, -3,  4)])

In [None]:
# let's see what array A is
A

In [None]:
# what about array B?
B

In [None]:
# are they the same?
A == B

Element-wise, A & B are equal but both are stored in different locations on our computer and as such, are not recognised as exactly the same by python.

In [None]:
# are they equal?
A is B

### Slicing numpy arrays

Numpy arrays can be indexed and sliced with a simple syntax as follows:
<code> array_name[start_row : stop_row , start_col : stop_col] </code>

To illustrate, let's form a numpy array using the **range()** to generate the elements and then reshape the array as follows:
<code> np.array(range(36)).reshape(4,9) </code>


In [None]:
# let's define our array here

In [None]:
arr

Now, let's try out the following:
<code>
1. arr[:]
2. arr[0:3]
3. arr[0:3, 2:5]
4. arr[:3, 2:5]
5. arr[0:, 2:5]
6. arr[, 2:5]
7. arr[2, 2:5]
</code>

In [None]:
# this returns all
arr[:]

In [None]:
# this returns first 3 rows (row 0 to row 2)
arr[0:3]

In [None]:
# this returns elements on the first 3 rows and columns #2 to #4
arr[0:3, 2:5]

<font size = 3, color = "blue"> Try the rest! </font>

### Creating numpy matrices

By default, numpy operates with arrays because they are faster however, most numerical operations involve matrices. Thus, it is important to understand how numpy handles matrices. 

The following syntax can be used to define a matrix or convert a numpy array to a matrix:

<code>np.matrix(array)</code>

In [None]:
# for example, let's convert the previous array to a matrix

Let's obtain a slice of **my_matrix** as we did for **arr**. 

In [None]:
mat_A = my_matrix[0:3, 2:5]
mat_B = my_matrix[1:4, 0:3]

In [None]:
mat_A

In [None]:
mat_B

These can now be used for operations that require matrices specifically. For example, finding the **determinant of mat_A**

### Basic operations with numpy arrays and matrices

Numpy operates element-by-element on arrays. Hence, most arithmetic operations involving two or more numpy arrays would require that those arrays have the same shape. All the rules of operating with matrices must be obeyed when dealing with numpy matrix operations. 

Let's define two arrays A & B as follows:
<code>
A = arr[0:3, 2:5]
B = arr[1:4, 0:3]
</code>

We would use these to illustrate some mathematical operations

In [None]:
# let's define the arrays here
A = arr[0:3, 2:5]
B = arr[1:4, 0:3]

In [None]:
# let's see each one separately
A

In [None]:
B

In [None]:
# Now, let's add A & B

Did you notice that each element in A is added to the corresponding element in B? That's why A & B must have the same shape otherwise, python will throw an error. 

Can you try **A - B**?

In [None]:
# let's do A multiplied by B

In [None]:
# how about B multiplied by A?

Both are the same because array multiplications are also done element-by-element. Such a multiplication is known as array broadcasting.

In [None]:
A/B

Direct division of one array by another is possible also because of the element-wise operations. 

In [None]:
# Let's try similar operations with matrices instead
mat_A + mat_B

This is obviously an element-by-element operation; similar to arrays

In [None]:
mat_A*mat_B

This is **NOT** an element-by-element operation; similar to arrays but a matrix multiplication.

In [None]:
mat_B*mat_A

Unlike arrays, mat_A multiplied by mat_B is **NOT** the same as mat_B multiplied by mat_A 

### Common used numpy functions

Among others, the following is a list of some of the commonly used numpy functions:

<code>
1. linspace(start, stop, n) - to generate an evenly array of "n" elements from a "start" to "stop" on a linear scale
    
2. arange(start, stop, step_size) - to generate a uniformly spaced array of numbers from start to stop at a given interval (step_size)

3. logspace(start, stop, n) - same as linspace() but using a log scale

4. sum()

5. max()

6. min()

7. mean()

8. median()
</code>

Let's try some of them out

In [None]:
# let's apply linspace

In [None]:
# another example with linspace

In [None]:
# let's try arange

Just like range, the array stops one step before the endpoint specified.

In [None]:
# another example with arange

In [None]:
# let's generate some random numbers with numpy
x = np.random.random_sample(50)

In [None]:
# random numbers with normal distribution
np.random.seed(1)   # to ensure reproducibility of results

In [None]:
# let's use this to generate a plot

###  working with data in a text file

Numpy provides a **loadtxt** functionality for loading datasets in text files. This function has several input arguments to help us control the file loading as much as possible. A simplied syntax for this function is provided below:

<code>np.loadtxt(filename, delimiter = None, skiprows = 0, usecols = None)</code> 

Kindly click on this link for a complete list of these arguments and their meanings:

https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

To illustrate, we would load the file named **boston_structured.txt** with numpy and extract some data from it. It is a good idea to manually open the file and examine the contents before loading with python

In [None]:
# let's load the data here and name it boston_data

You can see the data is loaded as an array by default. One more thing to note is that each row is converted to a **sublist** while forming the array. If you examine the first element of the array for example, you will notice that it corresponds to the first row of the table

In [None]:
# extract the first element of the array

Now, from the file, we know what each column represents. The first column is CRIM (per capital crime rate by town). How can we extract that information from the array. 

Well, we know from the array that the elements of the CRIM column form the first element of each sub-array (or sublist) in our data. So, if we iterate over the array (using **for loop**), we can extract this information.