## Part 2: Packages and modules

Over 99% of the time, you are not the only person with a particular problem that needs solving. There is a really good chance someone has already implemented a function that you need. If they were nice and organised, they put it in a nice bundle so it's easy to use. We call these collections of functions *modules*. A collection of modules is called a *package*. There are tons of really good packages available for anyone to use (remember, Python is open-source). Most of them will have already been installed in the Anaconda distribution of Python. 

You can start using a package by calling *import* followed by the package name. There are a couple of ways to do this. For example for the function numpy:

In [2]:
import numpy
numpy.mean([1,2])

1.5

In [9]:
import numpy as np
import fishualize
np.mean([1,2])

1.5

In [6]:
from numpy import mean
mean([1,2])

1.5

In [8]:
# DO NOT USE THIS
from numpy import *
mean([1,2])
nanmean([1,2])

1.5

In [10]:
np.mean

* In the first case you import all the functions from numpy. If you want to use any, you'll have to type 'numpy.*function_name*()'.
* In the second case, we have abbreviated numpy to np, so you'll just have to type 'np.*function_name*()'.
* In the third case, we have just imported one specific function that we wanted, and can now use that by just typing the function name direction (in this case nanmean())
* The fourth one is something you generally want to avoid, because it imports all the functions, but you don't have to type either numpy or np before the function name. This can cause confusion, because a given package can contain many functions and you might have another function with the same name, either from a different package, or because you've implemented it yourself. Try to avoid this option

#### module help

For most packages there is great documentation available. In the case of some of the most-used scientific packages, such as numpy, scikit-learn and matplotlib, this even contains examples and specific use-cases. You can usually just google 'function name package name' (e.g. nanmean numpy) to get all the information.

Anohter way to get information on a function is by either typing help(function_name) or function_name?:

In [11]:
help(np.mean)

Help on function mean in module numpy.core.fromnumeric:

mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
    Compute the arithmetic mean along the specified axis.
    
    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.
    
    Parameters
    ----------
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : None or int or tuple of ints, optional
        Axis or axes along which the means are computed. The default is to
        compute the mean of the flattened array.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, a mean is performed over multiple axes,
        instead of a single axis or all the axes as before.
    dtype : data-type, optional
        Type to use in computing the mean.  Fo

In [12]:
np.mean?

This will give you an overview of the input parameters, what the default values are, what the function does, the output, and sometimes even some examples.

### Numpy

The most important part of numpy is the n-dimensional arrays or ndarrays. They are objects with a number of builtin methods and attributes that optimize array calculations and indexing. ndarrays are faster and more powerful to work with than lists, and in most cases it is recommended to work with these rather than lists. 

You can make an array in a number of ways:
* Make a list and turn it into an array
* Initialize an array with one of the array creation functions
* Use another method that results in a ndarray

We're going to start with taking a list and turning it into an array:

In [22]:
my_list = [1,2,3,4]
np.array(my_list)
np.asarray(my_list)

array([1, 2, 3, 4])

In [20]:
multi_dim = [[1,2],
             [3,4],
             [5,6]]
multi_dim[:,0] # This doesn't work

multi_array = np.array(multi_dim)
multi_array[:,0]
print(multi_array)

[[1 2]
 [3 4]
 [5 6]]


The numpy array function has a number of parameters that you can use. 

The dtype is the type of the elements in the array (and for numpy arrays all elements must be the same type). Our list only had integers, so the function will automatically assign the dtype as ints. However, maybe we want them to be floats, we can specify this with the dtype function:

In [25]:
float_array = np.array(multi_dim, dtype='float')
print(float_array)

[[1. 2.]
 [3. 4.]
 [5. 6.]]


In [26]:
multi_array/2

array([[0.5, 1. ],
       [1.5, 2. ],
       [2.5, 3. ]])

Another important parameter is the 'copy' parameter. If you copy is True, then the array function will always make a copy of whatever the input was, which means that the values are copied to a new place in the memory with a new pointer. However, if copy is False and the input was already an ndarray, the values will stay in the same place and the new variable will point to the same place in memory. This means that if I change a value in the new variable, the old one will also change!!!

In [31]:
second_array = np.array(float_array, copy=True)
second_array[0,0] = 5
print(second_array)
print(float_array)

[[5. 2.]
 [3. 4.]
 [5. 6.]]
[[1. 2.]
 [3. 4.]
 [5. 6.]]


In [None]:
third_array = np.array(float_array, copy = False)
third_array[0,0] = 1
print(third_array)
print(float_array)


You can also specify the number dimensions you want your array to have. This will add singleton dimensions (of size 1) to your array. This may seem a bit silly now, but it could come in handy, for example if you want to combine multiple arrays along that extra dimension

In [37]:
float_array.ndim
print(float_array.shape)
print(float_array)

(3, 2)
[[5. 2.]
 [3. 4.]
 [5. 6.]]


In [42]:
ndim_array = np.array(multi_dim, ndmin=3)
ndim_array.shape

(1, 3, 2)

Now let's have a look at the array creation functions. You can find a full list here: https://numpy.org/devdocs/reference/routines.array-creation.html

In numpy you can initialize an array of given dimensions with zeros, ones or empty. You have to give the dimensions as either a list of a tuple. When printing the empty array it will act a bit weird, because it is essentially python reserving space in the memory. Which values are in the array depend on whatever was in that memory space before, as it does not set any values. When using empy, you should therefore be careful with using this method and make sure you set all values at some point

In [46]:
np.zeros([3,3])
np.ones([5,2,2])
np.empty([2,2])

array([[1., 1.],
       [1., 0.]])

In [49]:
np.nan

nan

You can also fill your array with a specified fill value by using the full method. Like all of the method above, this method has additional parameters, most importantly dtype, which you can use to specify the type of the elements in your array

In [50]:
np.full([4,2], np.nan)

array([[nan, nan],
       [nan, nan],
       [nan, nan],
       [nan, nan]])

In [51]:
np.full([4,2], 5)

array([[5, 5],
       [5, 5],
       [5, 5],
       [5, 5]])

All of the functions above have a \_like function (empty\_like, ones\_like, etc), which instead of taking dimensions as input, takes an array as input and makes a new array with the same dimensions:

In [53]:
zeros_array = np.zeros_like(ndim_array)
zeros_array.shape
zeros_array

array([[[0, 0],
        [0, 0],
        [0, 0]]])

The last way to make numpy arrays is through certain functions that result in an array. We will discuss them in more depth as we discuss the function, for now just a couple of examples:

In [54]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

#### Indexing and iterating

Indexing (i.e. selecting values out of an array) in ndarrays works very similar to lists. 


In [57]:
print(float_array)
float_array[0,:]
float_array[:,0]

[[5. 2.]
 [3. 4.]
 [5. 6.]]


array([5., 3., 5.])

In [58]:
float_array[1,1]

4.0

You can also combine indices for 2 or more dimensions:

You can also index using values of another array. The example below selects the values on the diagonal.

In [72]:
ori_array = np.arange(9).reshape((3,3))
ori_array[ori_array>3]
ori_array[np.eye(3, dtype=int)]

array([[[3, 4, 5],
        [0, 1, 2],
        [0, 1, 2]],

       [[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [3, 4, 5]]])

Like lists, you can *iterate* over a ndarray. This means you can loop through the array to look at every element. when you loop through, you just loop through 1 dimension. That means that if you have a 3D array, every element will be a 2D array. You can use 'nested loops' (i.e. a loop within a loop) to get the individual elements.

In [67]:
print(ori_array)
for row in ori_array:
    for el in row:
        print(el)

[[0 1 2]
 [3 4 5]
 [6 7 8]]
0
1
2
3
4
5
6
7
8


An easier way to get the individual elements is to use the attribute 'flat', which collapses all the dimensions in an ndarray into one long list:

In [70]:
for el in ori_array.flat:
    print(el)

0
1
2
3
4
5
6
7
8


#### Array attributes
An array is an object with builtin atributes that give you information about the array. The most importants ones are:
* ndim - number of dimensions
* shape - dimensions of the array
* dtype - type of the array 
* T - the transpose of the array

Try to see if you can predict what the result is of the code below

In [None]:
ori_array.ndim
ori_array.shape
ori_array.dtype
ori_array.T

ndarray.T holds the transpose of an array (so the columns become rows and the rows become columns)

#### Element-wise multiplications
You can easily perform element-wise multiplications on any ndarray. There are special methods for matrix-wise multiplications which we'll talk about later

In [28]:
ori_array * 2
ori_array ** 2

AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'

#### Array methods
There are countless methods you can use on arrays. You will probably encounter most of them while you are working with arrays. Here I'm just going to highlight a few that might come in handy. 
* reshape
* sum, min, max
* exp, sqrt

Reshape is used to change the shape of an array. There are some restrictions to this, the total size needs to stay the same. 

In [None]:
ori_array.reshape( (9,1) )

If you know one dimension, but not the other, you can put in -1 as the dimension. This will automatically fill in the right dimension if possible

The sum, min and max functions do exactly what it says on the tin, they find the sum of the array, the minimum value in the array and the maximum value in the array respectively

In [None]:
ori_array.sum()
ori_array.min()
ori_array.max()

As you can see, each of these functions find the value over the entire array. However, you might want to find the sum or max of each row. In which case, you can use the 'axis' parameter, which specifies the dimension along which the function is performed. Remember that the dimensions start counting at 0

In [None]:
ori_array.sum(axis=0)
ori_array.min(axis=1)
ori_array.max()

#### Numpy constants

Numpy has some constants (variables that always have the same value) built in. They are:
* Inf - float representation of a postive infinity
* ninf - float representation of a negative infinity 
* nan - 'Not a Number', used as a placeholder for an unknown value, or may be used to represent an undefinied number (e.g. 0/0)
* e - Euler's constant
* pi - pi

The constants also have associated functions, for example isnan to check if a variable is nan. 


In [None]:
np.pi
np.e
np.nan

In [None]:
np.inf
np.ninf

NaN can behave in unexpected ways at times. For example, any comparison will always result in a 'False':

In [30]:
print(np.nan<0)
print(np.nan>=0)

False
False


If you do any computation with nans, it will always result in a nan:

In [31]:
np.nan*2

nan

However, NumPy has some functions to deal with this. If you have a numpy array of numbers, but some of them are nans, you can use the 'nanmean' function to calculate the mean. Similarly there are 'nanmax', 'nansum' and 'nanstd' functions. These functions ignore the nan value and calculate the answer based on the other values in the array:.

In [32]:
nan_array = np.array([1,2,np.nan, 4,5])
print(np.mean(nan_array))
print(np.nanmean(nan_array))

nan
3.0


### Dot products

One of the most important calculation you will do with matrices are dot products. If you are used to matlab, the * will automatically be a dot product, whereas you need to type .* to get the elementwise operation. In Numpy you need to use either the 'dot' function or simply @ to do matrix mulitplications: