## Part 2: Packages and modules

Over 99% of the time, you are not the only person with a particular problem that needs solving. There is a really good chance someone has already implemented a function that you need. If they were nice and organised, they put it in a nice bundle so it's easy to use. We call these collections of functions *modules*. A collection of modules is called a *package*. There are tons of really good packages available for anyone to use (remember, Python is open-source). Most of them will have already been installed in the Anaconda distribution of Python. 

You can start using a package by calling *import* followed by the package name. There are a couple of ways to do this. For example for the function numpy:

In [1]:
import numpy
import numpy as np
from numpy import nanmean, nanstd
from numpy import * # Don't do this!!

* In the first case you import all the functions from numpy. If you want to use any, you'll have to type 'numpy.*function_name*()'.
* In the second case, we have abbreviated numpy to np, so you'll just have to type 'np.*function_name*()'.
* In the third case, we have just imported one specific function that we wanted, and can now use that by just typing the function name direction (in this case nanmean())
* The fourth one is something you generally want to avoid, because it imports all the functions, but you don't have to type either numpy or np before the function name. This can cause confusion, because a given package can contain many functions and you might have another function with the same name, either from a different package, or because you've implemented it yourself. Try to avoid this option

#### module help

For most packages there is great documentation available. In the case of some of the most-used scientific packages, such as numpy, scikit-learn and matplotlib, this even contains examples and specific use-cases. You can usually just google 'function name package name' (e.g. nanmean numpy) to get all the information.

Anohter way to get information on a function is by either typing help(function_name) or function_name?:

In [2]:
np.nanmean?
help(np.nanmean)

Help on function nanmean in module numpy.lib.nanfunctions:

nanmean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
    Compute the arithmetic mean along the specified axis, ignoring NaNs.
    
    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.
    
    
    .. versionadded:: 1.8.0
    
    Parameters
    ----------
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : {int, tuple of int, None}, optional
        Axis or axes along which the means are computed. The default is to compute
        the mean of the flattened array.
    dtype : data-type, optional
        Type to use in computing the mean.  For integer inputs, the default
        is `float64`; for inexact inputs, it is the same as the input
        dtype.
    out

This will give you an overview of the input parameters, what the default values are, what the function does, the output, and sometimes even some examples.

### Numpy

The most important part of numpy is the n-dimensional arrays or ndarrays. They are objects with a number of builtin methods and attributes that optimize array calculations and indexing. ndarrays are faster and more powerful to work with than lists, and in most cases it is recommended to work with these rather than lists. 

To start working with numpy, you will have to make an numpy array, either by turning a list into an array, or by using one of the numpy functions to make a numpy array. It's very easy to turn a list into an array:


In [5]:
a = [[1,2],[3,4]]
print(type(a))

<class 'list'>
<class 'float'>


In [12]:
import numpy as np
b = np.array(a)
print(type(b))
b

<class 'numpy.ndarray'>


array([[1, 2],
       [3, 4]])

Now let's have a look at the array creation functions. You can find a full list here: https://numpy.org/devdocs/reference/routines.array-creation.html

In numpy you can initialize an array of given dimensions with zeros, ones or empty. You have to give the dimensions as either a list of a tuple. When printing the empty array it will act a bit weird, because it is essentially python reserving space in the memory. Which values are in the array depend on whatever was in that memory space before, as it does not set any values. When using empy, you should therefore be careful with using this method and make sure you set all values at some point

In [19]:
a = np.zeros( [3,2,] )
print(a)
b = np.ones( (2,2) )
print(b)
c = np.empty([3,3])
print(c)


[[0. 0.]
 [0. 0.]
 [0. 0.]]
[[1. 1.]
 [1. 1.]]
[[0.0000000e+000 0.0000000e+000 0.0000000e+000]
 [0.0000000e+000 0.0000000e+000 5.7509241e-321]
 [8.4560344e-307 1.2461147e-306 3.4969413e-317]]


#### Indexing and iterating

Indexing (i.e. selecting values out of an array) in ndarrays works very similar to lists. 


In [23]:
a = np.arange(16).reshape((4,4))
print(a)


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [24]:
# Select the first row
print(a[0])


[0 1 2 3]


In [25]:
# Select the first column
print(a[:,0])

[ 0  4  8 12]


You can also combine indices for 2 or more dimensions:

In [26]:
print(a[1,1])
print(a[2:,2:])

5
[[10 11]
 [14 15]]


You can also index using values of another array. The example below selects the values on the diagonal.

#### Array attributes
An array is an object with builtin atributes that give you information about the array. The most importants ones are:
* ndim - number of dimensions
* shape - dimensions of the array
* dtype - type of the array 
* T - the transpose of the array

Try to see if you can predict what the result is of the code below

In [41]:
a = np.ones((3,4,3), dtype=int)
print(a.ndim)
print(a.shape)
print(a.dtype)

3
(3, 4, 3)
int32


ndarray.T holds the transpose of an array (so the columns become rows and the rows become columns)

In [39]:
a = np.arange(12).reshape(4,3)
print(a)
print(a.T)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[ 0  3  6  9]
 [ 1  4  7 10]
 [ 2  5  8 11]]
