## Introduction to the Numpy library 

In this lecture, we will learn a few features of the **Numpy** library (from Numerical Python). 

Together with Scipy of which we will see more next year (but you should now be able to read its documentation and learn about it on the [project page]()!), these are possibly the two most important libraries in Python when it comes to scientific computing. Note also that many other various libraries, for example for Machine Learning, are built relying on some of the functions and objects defined in Numpy!

In general, NumPy contains, among other things:
- A powerful object, the **numpy array**. It is so ubiquitous that we will just call it array, without further specifications. Arrays are possibly the most important part of the Numpy package, especially considering that almost any other package having to do any sort of heavy computation is actually built on them. The reason for their widespread usage is that numpy arrays allow very fast [vectorized computation](https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html). 
- Sophisticated mathematical functions
- Packages for linear algebra, random number generation (which we will see in the next lecture!) and Fourier transform.

Another aspect to keep in mind is that, although we would not cover this aspect in this course, Numpy contains tools for interfacing other coding languages such as C/C++ ("Cyton") and Fortran ("F2Py") with Python, which is useful to combine the (much better) computational performance of these other programming languages with the ease and flexibility of Python.

### An important note

The Numpy library is gigantic and **we will not cover all of it in the short time we have**. In fact, we will concentrate on *numpy arrays* and on the sub-library that allows us to generate random numbers (because it is quite important for different kind of computational modelling techniques, such as Monte Carlo simulations). However, you can find the most comprehensive documentation describing the whole Numpy library in the [Numpy project page](https://numpy.org/devdocs/reference/index.html).  

I also suggest you to read this recent [review appeared on the journal Nature](https://www.nature.com/articles/s41586-020-2649-2) specifically on numpy Arrays, which besides providing an historical perspective on their development it describes the direction in which the Python community is pushing this project.

### The Numpy arrays class: attributes and selected methods

A numpy array or simply array (the actual exact Python name being *ndarray*) is a **n-dimensional array of homogeneous data types**, for example, an ordered sequence of numbers. 

Superficially, it might simply look like a less flexible Python list, because its elements *must* be of the same type (this is what we mean by homogeneous) instead of being a (potentially) heterogeneous collection like lists. There are indeed some useful properties of lists that have also been copied by numpy arrays, in particular with respect to the way we manipulate them. For example, each element of an array `a` can be retrieved by providing the set of indexes corresponding to its position. Similarly, pieces of a numpy array can be taken by slicing it as we do in lists and numpy arrays are also copied by reference. However, this is possibly where the similarities stop and important differences begin. Let us see a few of them:

- Operations on numpy array are performed in compiled code (instead of being executed line by line like in a Python list), which drastically accelerates performance. 

- Numpy arrays have a fixed size at creation, unlike lists which can be increased on the fly ( reminder: using the `.append()` method ). If instead you try to change the size of an array, what happens is that a new array is created and the original is deleted.

- Numpy arrays are required to contain elements of the same data type

- Numpy arrays speed up certain basic mathematical operations on a large number of data through "vectorization" (more later below), whereas lists operate sequentially. This is possible because any operation on numpy arrays is defined element-wise, which helps writing much more more compact and easy to read codes, in many cases substituting loops.

If what is written in the list above is not clear, do not worry. We will have a look at some examples to illustrate all these concepts!

#### Creating a Numpy array

The standard way to instantiate ( = create ) a one-dimensional numpy array is the following:
   
```Python
myArray = np.array( [a list of objects of the same type] )
```
for example:

```Python 
myArray = np.array( [1.0, 2.0, 3.0, 4.0 ])
myArray = np.array( [1, 2, 3, 4 ], dtype = float )
```
where in the second case we have added the optional argument `dtype = typeOfData`, where `typeOfData` can be any valid data type, for example, `float` or `int`, although Python typically recognises which data type you want to use by the form in which it is provided in the list.

We will now see a few examples of how numpy arrays work, including similarities and differences with lists. **Read and run the cells below to see what happens**. As usual, before running them *think about what you expect to see*.


In [1]:
import numpy as np

Let us first see some similarities with lists:

In [2]:
aa = np.array( range( 10 ) )

print(aa)

# You can see we can access an array as if it was a list
aa[ 1 ] = -4
aa[ 2 ] = 0
print(aa)

# Whereas this examplify the use of splicing techniques!
print( aa[ 1:4 ] )
print( aa[ 1::2 ] )

[0 1 2 3 4 5 6 7 8 9]
[ 0 -4  0  3  4  5  6  7  8  9]
[-4  0  3]
[-4  3  5  7  9]


...and now some differences:

In [3]:
aa = np.array( range( 5 ) )
bb = np.array( range( 5 ) )

aa2 = list( range( 5 ) )
bb2 = list( range( 5 ) )

print( "This is the results with arrays {}".format( aa + bb ) )
print( "This is the results with lists {}".format( aa2 + bb2 ) )

This is the results with arrays [0 2 4 6 8]
This is the results with lists [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]


The previous cell illustrates an example of what we mean by the fact that operations on arrays are performed element-wise (which is also called vectorisation). For example, the sum of two arrays `a` and `b` is another array `c` whose elements c[i] = a[i] + b[i], similarly to what you would have with your everyday array found in your Math lectures (although they are *not* exactly the same thing!).

Run the next cells to see what happens and compare the results between lists and (numpy) arrays:

In [4]:
# Multiplication of an array for a scalar. Before running this cell ask yourself: What do you expect?

aa = np.array( range( 10 ) )
aa2 = list( range( 10 ) )

aa *= 2
aa2 *= 2

print( "This is the results with arrays {}".format( aa ) )
print( "This is the results with lists {}".format( aa2 ) )

This is the results with arrays [ 0  2  4  6  8 10 12 14 16 18]
This is the results with lists [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [5]:
# Multiplication of two arrays...What do you expect?
aa = np.array( range( 10 ) )
bb = np.array( range( 10 ) )

aa2 = list( range( 10 ) )
bb2 = list( range( 10 ) )

print( "This is the results with arrays {}".format( aa * bb ) )
print( "This is the results with lists {}".format( aa2 * bb2 ) )

This is the results with arrays [ 0  1  4  9 16 25 36 49 64 81]


TypeError: can't multiply sequence by non-int of type 'list'

Because operation on arrays are performed element-wise, summing or multiplying two arrays is only possible when they have the same length. In fact, you will get an error when `a` and `b` have different length. Check below!

In [6]:
aa = np.array( range( 10 ) )
bb = np.array( range( 20 ) )

print( "The product of two arrays of different length is:" )
print( aa * bb )

The product of two arrays of different length is:


ValueError: operands could not be broadcast together with shapes (10,) (20,) 

In [7]:
# Again, clearly this would not work with lists, for which multiplication 
# is not even defined! Check by running this cell
aa = list( range( 10 ) )
bb = list( range( 10 ) )

print( "The product of the list" )
print( aa )
print( "with the list" )
print( bb )
print( "is:" )
print( aa * bb )


The product of the list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
with the list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
is:


TypeError: can't multiply sequence by non-int of type 'list'

As we have previouls said, numpy arrays are copied *by reference*. If you want to make a "proper" hard copy (so that you can modify the elements of the copy independently), as in list this can be done via the .copy() method, which exists also for numpy arrays!

Let us now look at a few useful command to create certain specific numpy arrays. Read the comments in each of the subsequent cells and then run them to see. 

In [8]:
#  If we want to create arrays with all elements equal to zero or one, 
#  this can be done via the .zeros(N)  and .ones(N) method, `N` being the 
#  number of elements you want to create. Have a look!

xx = np.ones( 4 )
yy = np.zeros( 4 )
print( xx )
print( yy )

[1. 1. 1. 1.]
[0. 0. 0. 0.]


In [9]:
# The function `np.arange()` is instead the equivalent to the declaration 
# np.array( range( x ) ) and thus creates an arrays of numbers from 0 to x-1. 
# Run the next cell and see!

xx = np.arange( 4 )
print( xx )

[0 1 2 3]


In [10]:
# The function np.linspace( x1, x2, N ) creates an array of N elements equally 
# spaced, from x1 to x2, both included

xx = np.linspace( 1, 5, 9 )
print( xx )

[1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]


In [11]:
# The function np.logspace( x1, x2, N ) creates an array of N elements from 10^x1 
# to 10^x2. This is similar to linspace, but elements are equally spaced in 
# logarithmic space, so that the RATIO between consecutive elements is constant

xx = np.logspace( 1, 5, 8 )
print( xx )
print( "The first element is {0}".format( xx[ 0 ] ) )
print( "The last element is {0}".format( xx[ -1 ] ) )
print( "The ratios are" )
print( "{0}/ {1} = {2}".format( xx[1], xx[0], xx[1]/xx[0] ) )
print( "{0}/ {1} = {2}".format( xx[2], xx[1], xx[2]/xx[1] ) )
print( "{0}/ {1} = {2}".format( xx[3], xx[2], xx[3]/xx[2] ) )
print( "{0}/ {1} = {2}".format( xx[4], xx[3], xx[4]/xx[3] ) )

[1.00000000e+01 3.72759372e+01 1.38949549e+02 5.17947468e+02
 1.93069773e+03 7.19685673e+03 2.68269580e+04 1.00000000e+05]
The first element is 10.0
The last element is 100000.0
The ratios are
37.2759372031494/ 10.0 = 3.72759372031494
138.94954943731375/ 37.2759372031494 = 3.7275937203149403
517.9474679231213/ 138.94954943731375 = 3.7275937203149416
1930.6977288832495/ 517.9474679231213 = 3.7275937203149376


Let us now have a look at a few typical method attributes (= object-specific functions) of numpy arrays. 

Read the documentation for this method, for example, by using the help command in Jupyter, or the documentation trick shown to your in the very first lecture notes about data types!). After reading the documentation, run the following cells to see what happens. 

You might want to take note of these methods, as you should become familiar with them for the exercises provided in the other Jupyter notebook!

In [12]:
# the .sum() command

a = np.array( [1, 2, 3, 4 ] )

print( a.sum() )

10


In [13]:
# the .mean() command

a = np.array( [1, 2, 3, 4 ] )

print( a.mean() )

2.5


In [14]:
# the .std() command

a = np.array( [1, 0, 1, 0 ] )
b = np.array( [1, 1, 1, 1 ] )

print( a.std() )
print( b.std() )



0.5
0.0


In [15]:
# the .size() command

a = np.array( [1, 2, 3, 4 ] )
b = np.array( [1, 2, 3, 4, 10 ] )
c = np.array( [1, 2, 3, 4, 22, -19 ] )

print( a.size )
print( b.size )
print( c.size )

4
5
6


Sometimes it is useful to check if a condition is satisfied for all elements of an array. This can be done using the `.all()` and `.any()` method. 

Run this cell and check what happens in the examples. You can also check using the numpy documentation (either online or printing the relative `__doc__` for the method...).

In [16]:
# the .all( ) command...
a = np.array( [ True, True, True, False ] )
b = np.array( [ True, True, True, True ] )

print( a.all()  )
print( b.all() )

a = np.array( [ 1, 1, 1, 1 ] )
b = np.array( [ 1, 0, 1, 1 ] )

print( a.all() )
print( b.all() )



False
True
True
False


In practice, what you should realise from the previous examples is that the `.all()` method returns `True` if *all the elements of the array* are `True` and `False` otherwise. It is equivalent to connecting all input values via a logical `AND`

In [17]:
# the .any( ) command

# the .all( ) command...
a = np.array( [ False, True, False ] )
b = np.array( [ False, False, False ] )

print( a.any()  )
print( b.any()  )

a = np.array( [ 1, 1, 1, 1 ] )
b = np.array( [ 1, 0, 1, 1 ] )

print( a.any()  )
print( b.any()  )

True
False
True
True


In practice, the `.any()` command returns `True` if *at least one element* of the array is `True` and `False` otherwise. It is like connecting all the array values via a logical `OR`.  

When you think about arrays, you might have in mind operations like the dot or cross product between two of them. These operations can be implemented via numpy arrays but they are slightly different then the usual mathematical definition. Let us see with an example.

In [18]:
# the .dot( anotherArray ) command (to understand, you can experiment with other input values by changing the lists below in the array declaration...)

a = np.array( [1, 2, 3, 4 ] )
b = np.array( [0, 1, 0, 1 ] )

#As you shall see, this does not give the dot product as defined in math...but it can be defined from it!
print( a.dot( b ) )

c = a * b 
print( c.sum() )


6
6


In [19]:
# But if you want to do the cross product, you need the np.cross(array1, array2) of the numpy
# library

a = np.array( [ 0, 0, 1 ] )
b = np.array( [ 1, 0, 0 ] )

print( np.cross( a, b ) )
print( np.cross( b, a ) )

aa = np.array( [ 3, 0, 0 ] )
bb = np.array( [ 1, 0, 0 ] )

print( np.cross( aa, bb ) )

[0 1 0]
[ 0 -1  0]
[0 0 0]


#### Arrays and functions 

Let us now see something extremely powerful that you can do with numpy arrays. 

If you have defined *any* function that takes an element and transforms it, you can literally apply the same function using a numpy array and the function is applied element-wise! In other words, when you apply a function to an array the return value is another array, where the function has been applied to each of its elements.  

This way of applying functions to arrays is true for (almost) any function, including arithmetic operators such as +, -, but also sin, cos, tan and any mathematical function.

Look at the following example by running the cell below:

In [20]:
def myFunction( x ):
    """This function takes the square of a number and subtract 10"""
    x *= x
    x -= 10
    return x
    
a = 4
b = -2
aaa = np.array( [ 1, 2, 10 ] )
print( myFunction( a ) )
print( myFunction( b ) )
print( myFunction( aaa ) )

6
-6
[-9 -6 90]


There is an exception to the rule above, or, better, **a certain class of functions cannot be applied to arrays**. These are functions that, within their body, **contain a conditional (if) statement**. 

Have a look and see what happens by running this cell:

In [21]:

def myFunction2( xx ):
    """This function takes the square of a number and subtract 10"""
    if xx < 0:
        xx *= xx
        xx -= 10
    else:
        pass
    
    return xx
    
a = 4
b = -2
aaa = np.array( [ 1, 2, 10 ] )
print( myFunction2( a ) )
print( myFunction2( b ) )
print( myFunction2( aaa ) )

4
-6


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

#### Some special numpy functions

Numpy comes with almost all mathematical functions that you possibly know: for example, you can call sine, cosine, tangent and exponential simply via

- np.sin( x )
- np.cos( x )
- np.tan( x )
- np.exp( x )

You can look it up in the documentation at the link given in the first cell if you are interested in knowing all of them...or just try to use these functions little by little when building new programs! 

Here I just want to present a few which you should definitely know because of 
their very general usefullness in analysing data. In this regard, let us look at a  couple of special, non-aritmetic functions to manipulate data. For example, functions for sorting data and looking through long sets of data...run the cells below and see what happens!

In [22]:
# The np.sort( x ) function. Run to see what it does.

x = np.array( [-1,5983,3,434897,43095,312,23,45 ] )
print( np.sort( x ) )



[    -1      3     23     45    312   5983  43095 434897]


In [23]:
# The np.argmax( x ) function...what does it do? Run and see, or check
# the documentation of the argmax method
x = np.array( [-1,5983,3,434897,43095,312,23,45 ] )
indexOfMax = np.argmax( x )

print( indexOfMax )
print( x[ indexOfMax ] )


3
434897


In [24]:
# Or the opposite, the np.argmin( x ) function...(this returns the index of the minimum
# value in the array )
x = np.array( [ 1, 5983, 3, 434897, 43095, 312, -23, 45 ] )
indexOfMin = np.argmin( x )

print( indexOfMin )
print( x[ indexOfMin ] )

6
-23


In [25]:
# Finally, the extract function
#
# np.extract( condition, array )
#
# which returns the elements satisfying a certain condition
#
x = np.array( [ 0, 1, 1, 2, 0 ,1 ] )



Remember that operations are applied elementwise to arrays and the return function is an array containing their result. Hence, the following creates an array of the same size as `x` above, where if the condition is satisfied there will be a `True`, or `False` otherwise. 

Look at the following code:

In [26]:
condition = x > 0 
print( condition )
z = np.extract( condition, x )
print(z)

[False  True  True  True False  True]
[1 1 2 1]


#### Masked Arrays
 
Masked arrays are a slightly different type of object from normal numpy arrays (in fact, they have been derived from it, in the sense that they **inherited** most of standard arrays functionalities).    

In general, masked arrays have been developed to work with arrays that may have missing or invalid entries, a common situation when dealing with large dataset. For example, a sensor may have failed to record a data, or recorded a negative value due to malfunctioning for a quantity like the absolute temperature, which we know can only be positive.  

In order to use masked arrays, we need to import the `numpy.ma` module of the standard numpy library:

```Python
import numpy.ma as ma  # (we load the subpackage numpy.ma, this has to be done once only)
```

Then we can instantiate masked arrays. 

In practice, a masked array is the combination of a standard numpy array and a **mask**. When instantiating a masked array, this mask can be set to `nomask`, indicating that no value of the associated array is invalid, or it can be another array, of the same length, containing booleans values (`True/1` or `False/0`) that determine, for each element of the associated array, whether the value is valid or not.  

It might be a bit counterintuitive but actually **when an element of the mask is `False`, the corresponding element of the associated array is valid** and is said to be **unmasked**. When an element of the mask is `True`, the corresponding element of the associated array is said to be **masked** (invalid).  

The general declaration to instantiate (create a variable of type) masked arrays is:

```Python
var = ma.masked_array( array, mask )
```

where `array` is a numpy array and `mask` an array of boolean (`True` or `False`) values, which for obvious reasons must be of the same length as `array`.  

Let's make an example (read until the end then run the cell to see what happens). Here, we wish to mark all negative values as invalid and then take the average of the remaining ones. With a masked array, this can be done easily via:

In [29]:
import numpy.ma as ma
#This is our example array
x = np.array( [1, -2, 3, -1, 5, -30, 45, 1, 18, -12, 0, -21, 24 ] ) 

condition = x < 0  # Remember we want to mask/hide all the values that are INVALID
masked = ma.masked_array( x, mask = condition )

#We can now compute the mean of the dataset, without considering the invalid data:
print( masked.mean() )

12.125


In [30]:
# In  this cell, write a code to do the same job but using normal numpy arrays...
# can you see how masked arrays make a more compact way to do it!



In general, all operations that are normally done on a numpy array will be done also on the masked version, but in this case the masked (True) elements will be automatically excluded and not be taken into account.

The mask of a masked array can be seen with the `.mask` data attribute of the object masked array. Read and run this cell to see what happens

In [31]:
x = np.array([1, 2, 3])
myMask = [ 0, 0, 1 ]
myMask2 = [ False, True, False ]

# Remember that in Python 0 == False and 1 == True!

masked = ma.masked_array( x, myMask )
print( masked.mask )

masked = ma.masked_array( x, myMask2 )
print( masked.mask )

[False False  True]
[False  True False]


Another powerful method to generate masks without actually using the `np.extract()` method is to use implicit `list` declarations to mask specific elements. You should remember the construct:

```Python
i = [ valueOfAFunction(i) for i in someRangeOfValues]
```

```Python
i = [ i**2 for i in range( 4 ) ] ==> [ 0, 1, 4, 9 ]
```

We can thus use these lists to generate a mask of booleans to check if a condition in our numpy array is satisfied for each element. To see a practical example, read and run the next cell but as usual think: what did you expect to see?

In [32]:
x = np.arange( 10 )
mask = [ True if x[ i ]**2 > 9 else False for i in range( len(x ) ) ]

print( "My array is")
print(x)
print( "The square would be")
y = x**2
print(y)

print( "The mask is True if the square of the element is less than 9, False otherwise")
print(mask)

print( "Now we take the mean considering only UNmasked elements, that is, for which the mask is False!")
print( "Do it by end to check! The result computed here is:")
xx = ma.masked_array( x, mask )
print( xx.mean() )


My array is
[0 1 2 3 4 5 6 7 8 9]
The square would be
[ 0  1  4  9 16 25 36 49 64 81]
The mask is True if the square of the element is less than 9, False otherwise
[False, False, False, False, True, True, True, True, True, True]
Now we take the mean considering only UNmasked elements, that is, for which the mask is False!
Do it by end to check! The result computed here is:
1.5


And let us make a second example for even more clarity:

In [33]:
x = np.array( [ 200, 39248, 2494, 2482, 9913, 23824, 589035, 999, 939, 38942,
              200, 39248, 2494, 2482, 93, 224, 9035, 999, 939, 38942,
              200, 39248, 2494, 282, 913, 23824, 589035, 999, 939, 38942,
              200, 357385, 3994294, 2482, 9913, 23824, 589035, 999, 939, 38942 ] )

# We want to verify that the square root of all elements in the array "x" above is bigger than 10. 
# We can simply do

mask = [ True if np.sqrt( x[ i ] ) > 10 else False for i in range( len( x ) ) ]

# Than we transform the mask in an array and combine it with .all

mask2 = np.array( mask )

print( "Is the square root of all element of x bigger than 10? {0} ".format( mask2.all() ) )


Is the square root of all element of x bigger than 10? False 


Ok, this was a lot! But hopefully you are now ready to use most of the functionalities of this very powerful object!