<a href="https://colab.research.google.com/github/acnavasolive/2021_seminars/blob/main/04_Modules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 4. Modules

Complete tutorial: <a href="https://docs.python.org/3/tutorial/modules.html" target="_blank">Modules </a>

If you quit from the Python interpreter and enter it again, the definitions you have made (
functions and variables) are lost. Therefore, if you want to write a somewhat longer program, 
you are better off using a text editor to prepare the input for the interpreter and running it 
with that file as input instead. This is known as creating a script. As your program gets longer, 
you may want to split it into several files for easier maintenance. You may also want to use a handy 
function that you’ve written in several programs without copying its definition into each program.

To support this, Python has a way to put definitions in a file and use them in a script or in
an interactive instance of the interpreter. Such a file is called a module; definitions from 
a module can be imported into other modules or into the main module (the collection of variables 
that you have access to in a script executed at the top level and in calculator mode).

A module is a file containing Python definitions and statements.
The file name is the module name with the suffix .py appended.

4.1. Importing modules
There is a huge community developing open modules that can be used for multiples tasks.
You can use it by installing them.
In this tutorial we won't need to install anything, but here are the instructions for 
if you need it in the hackathon: https://docs.python.org/3/installing/index.html

Once it is installed, you need to import it by calling the command "import"

In [None]:
# Import a module called numpy
import numpy

We can access its functions by written the name of the module, followed by a dot and the name of the function

In [None]:
# Function sum of the numpy module
numpy.sum([1, 2])

The word "numpy" seems too long to be using it all the time.
We can import it and rename it

In [None]:
import numpy as np
np.sum([1, 2])

In fact, these abbreviations are usually determined and everybody calls them the same,
so you won't be seen

In [None]:
# NO!  import numpy as num
# NO!  import numpy as npy

## 4.2. Scientific Modules (SciPy)
It is the main module used on science, maths and engineering. 
One of the most known is <a href="https://www.scipy.org/" target="_blank">SciPy </a>

### 4.2.1 NumPy
NumPy tutorials:
* https://numpy.org/numpy-tutorials/
* https://numpy.org/doc/stable/user/whatisnumpy.html

NumPy is the fundamental package for scientific computing in Python. It is a Python library 
that provides a multidimensional array object, various derived objects (such as masked arrays 
and matrices), and an assortment of routines for fast operations on arrays, including mathematical, 
logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra,
basic statistical operations, random simulation and much more.

Since we have already imported numpy, we don't need to import it again :)

### Numpy Arrays
NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array 
is not the same as the Standard Python Library class array.array, which only handles one-dimensional
arrays and offers less functionality. The more important attributes of an ndarray object are:

* ndarray.__ndim__

  the number of axes (dimensions) of the array.


* ndarray.__shape__

  the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.


* ndarray.__size__

  the total number of elements of the array. This is equal to the product of the elements of shape.


* ndarray.__dtype__

  an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.


* ndarray.__data__

  the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

In [None]:
a = np.arange(15).reshape(3, 5)
print('a =', a)
print('a.shape =', a.shape)
print('a.ndim =', a.ndim)
print('a.dtype.name =', a.dtype.name)
print('a.size =', a.size)

We can also transform lists to numpy arrays

In [None]:
b_list = [1, 2, 3]
b_array = np.array(b_list)
print('b_array =', b_array)

Even lists of lists, also called matrices

In [None]:
c_list = [[1, 2, 3], [4, 5, 6]]
c_array = np.array(c_list)
print(c_array)

Then indexing becomes much more simple

In [None]:
print('c_list[0][1] =', c_list[0][1])
print('c_array[0][1] =', c_array[0][1])
print('c_array[0, 1]) =', c_array[0, 1]) # It's equivalent to the previous line!

Ok, same as list

In [None]:
print('c_list[0][:] =', c_list[0][:])
print('c_array[0, :] =', c_array[0, :])

Taking one column does not work with lists!

In [None]:
print('c_list[:][0] =', c_list[:][0])
print('c_array[:, 0] =', c_array[:, 0])

We can take slices very easily

In [None]:
print('c_array[1:3, :2] =', c_array[1:3, :2])
print('c_array[1, 1:] =', c_array[1, 1:])

We can see what shape the resuting slice has

In [None]:
print('shape of c_array[1:3, :2] =', c_array[1:3, :2].shape)
print('shape of c_array[1, 1:] =', c_array[1, 1:].shape)

However, we cannot make matrices with different numbers of elements in each row

In [None]:
d_list = [ [1,2,3], [4,5,6,7] ]
d_array = np.array(d_list)

### Basic Operations
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [None]:
a = np.array([20, 30, 40, 50])
b = np.arange(4)
c = a - b

print(a)
print(b)
print(c)

Operations affect all elements in arrays

In [None]:
print(b**2)

In [None]:
print(10 * np.sin(a))

We can check statements also for each element in the array

In [None]:
print(a < 35)

In [None]:
# Elements less than 35 must be 0
a[a < 35] = 0
print(a)

We can make matrices of ones, zeros, and random numbers

In [None]:
x_ones = np.ones((2, 3), dtype=int)
print(x_ones)

In [None]:
x_zeros = np.zeros((2, 3), dtype=float)
print(x_zeros)

In [None]:
x_random01 = np.random.rand(9)
x_randnorm = np.random.randn(3, 3)
print('Random between 0 and 1\n', x_random01)
print('Random with gaussian distribution\n', x_randnorm)

Many unary operations, such as computing the sum of all the 
elements in the array, are implemented as methods of the ndarray class.

In [None]:
a = np.random.rand(2, 3)
print('sum:', a.sum())
print('min:', a.min())
print('max:', a.max())
print('mean:', a.mean())
print('std:', a.std())

By default, these operations apply to the array as though it were a list of numbers, 
regardless of its shape. However, by specifying the axis parameter you can apply an
operation along the specified axis of an array:

In [None]:
b = np.arange(12).reshape(3, 4)
print(b)
print('\nsum of each column\n', b.sum(axis=0))
print('\nsum of each row\n', b.sum(axis=1))
print('\nsum of each row, keeping dimensions\n', b.sum(axis=1, keepdims=True))
print('\nmin of each row\n', b.min(axis=1))
print('\ncumulative sum along each row\n', b.cumsum(axis=1))

But wait a minute, what is that __reshape__ thingy?

In [None]:
b = np.arange(12)
print('b =\n', b , end='\n\n')
print('b.reshape(3, 4) =\n', b.reshape(3, 4), end='\n\n')
print('b =\n', b, end='\n\n')
print('b.reshape(4, 3) =\n', b.reshape(4, 3), end='\n\n')
print('b.reshape(1, 12) =\n', b.reshape(1, 12), end='\n\n')
print('b.reshape(2, 3, 2) =\n', b.reshape(2, 3, 2), end='\n\n')
print('b.reshape(3, -1, 2) =\n', b.reshape(3, -1, 2), end='\n\n')
print('b.reshape(3, -1, 2).shape =\n', b.reshape(3, -1, 2).shape, end='\n\n')

### Universal Functions
NumPy provides familiar mathematical functions such as sin, cos, and exp. 
In NumPy, these are called “universal functions” (ufunc). Within NumPy, 
these functions operate elementwise on an array, producing an array as output.

In [None]:
b = np.arange(1,13).reshape(3, 4)
print('exp(b) =', np.exp(b))
print('sqrt(b) =', np.sqrt(b))
print('log(b) =', np.log(b))

Other usefull operations

In [None]:
c = np.random.rand(5)

# Round
print('round(c) =', np.round(c))
print('floor(c) =', np.floor(c))
print('ceil(c) =', np.ceil(c))
print()

# Index of...
print('index of min =', np.argmin(c))
print('index of max =', np.argmax(c))
print('indexes of c>0 =', np.argwhere(c>0))
print('indexes for sorting =', np.argsort(c))

# Histogram
d = np.array([0, 0, 1, 2, 2, 2, 2, 3, 3])
print('In', d)
heights, vals = np.histogram(d, np.arange(4))
for h, v in zip(heights, vals):
    print('... there are %d %ds'%(h, v))

## Exercise
Let's go back to our last exercise. We had our results in a dictionary `performances`
and we had a mean_perfs dictionary with the mean performance per method and parameter

In [None]:
# Dictionary with performances 

#                Methods         <------  all sessions ------>
performances = {'method_1' : [ [.2, .4, .6, .8, .4, .5, .3, .1],   # parameter 0
                               [.4, .2, .8, .4, .0, .1, .5, .2],   # parameter 1
                               [.2, .8, .7, .4, .8, .7, .9, .8]],  # parameter 2

                'method_2' : [ [.6, .4, .6, .5, .5, .4, .6, .6],   # parameter 0
                               [.2, .8, .7, .5, .0, .0, .1, .8]],  # parameter 1

                'method_3' : [ [.1, .1, .2, .2, .1, .2, .1, .5],   # parameter 0
                               [.1, .6, .2, .4, .5, .1, .2, .8],   # parameter 1
                               [.8, .6, .8, .9, .4, .1, .2, .8],   # parameter 2
                               [.2, .1, .1, .2, .4, .6, .1, .2]],  # parameter 3
                }


use_numpy = False

# Without numpy it would be:
if use_numpy == False:

    # Initialize dictionary of mean performances
    mean_perfs = {method:[] for method in performances}

    # For every method
    for method in performances:
        for p_param_list in performances[method]:
            mean_perfs[method].append( sum(p_param_list) / len(p_param_list) )

    # Once we have the mean performances, we take the best parameter 
    # of each method as well as the best performance
    best_params = [ (i, mean_perf) for method in mean_perfs 
                        for i, mean_perf in enumerate(mean_perfs[method]) 
                            if mean_perf==max(mean_perfs[method])]

    # And finally we take the best method+param of all
    best_methodparam = [(j, i)  for j, (i, mean_perf) in enumerate(best_params) 
                                    if mean_perf == max([best_p[1] for best_p in best_params])]

    print('Best method is %s with parameter %d'%( list(performances.keys())[best_methodparam[0][0]], 
                                                  best_methodparam[0][1]) )

# With numpy
else:

    '''
    Try your code here
    '''

### 4.2.2 Pandas
Pandas tutorials:
* https://pandas.pydata.org/docs/user_guide/index.html
* https://pandas.pydata.org/docs/user_guide/10min.html#min
pandas is an open source, BSD-licensed library providing high-performance, 
easy-to-use data structures and data analysis tools for the Python programming language.

In [None]:
import pandas as pd

Example: I want to store passenger data of the Titanic. For a number of passengers, 
I know the name (characters), age (integers) and sex (male/female) data.

In [None]:
df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

print(df)
print(df["Age"])

### 4.2.3 SciPy
SciPy tutorials:
* https://docs.scipy.org/doc/scipy/reference/tutorial/index.html#user-guide

SciPy is a collection of mathematical algorithms and convenience functions built
 on the NumPy extension of Python. It adds significant power to the interactive 
 Python session by providing the user with high-level commands and classes for 
 manipulating and visualizing data. With SciPy, an interactive Python session becomes 
 a data-processing and system-prototyping environment rivaling systems, such as MATLAB, 
 IDL, Octave, R-Lab, and SciLab.

* __cluster__           Clustering algorithms
* __constants__         Physical and mathematical constants
* __fftpack__           Fast Fourier Transform routines
* __integrate__         Integration and ordinary differential equation solvers
* __interpolate__       Interpolation and smoothing splines
* __io__                Input and Output
* __linalg__            Linear algebra
* __ndimage__           N-dimensional image processing
* __odr__               Orthogonal distance regression
* __optimize__          Optimization and root-finding routines
* __signal__            Signal processing
* __sparse__            Sparse matrices and associated routines
* __spatial__           Spatial data structures and algorithms
* __special__           Special functions
* __stats__             Statistical distributions and functions

### 4.2.4 Matplotlib
Matplotlib tutorials:
* https://matplotlib.org/stable/users/index.html

Matplotlib is a comprehensive library for creating static, animated, and interactive 
visualizations in Python.

In [None]:
import matplotlib.pyplot as plt

Let's create something to plot

In [None]:
fs = 1000 # Hz
time = np.arange(0, 4, 1/fs) # seconds
LFP = np.sin(time * 2*np.pi) + 0.3*np.sin(time * 7*np.pi) + 0.2*np.sin(time * 20*np.pi) + 0.1*np.random.rand(len(time))
true_events = np.array([[1.0, 1.5],
                        [2.1, 2.5],
                        [3.0, 3.4]])
pred_events = np.array([[0.8, 1.2],
                        [3.1, 3.5]])

Plot our artificial LFP and true and predicted events

In [None]:
plt.figure(figsize=(8, 4))
plt.plot(time, LFP, color='k')
for event in true_events:
    plt.fill_between(event, [-1.5, -1.5], [1.5, 1.5], color='b', alpha=0.2, label='true')
for event in pred_events:
    plt.fill_between(event, [-1.3, -1.3], [1.8, 1.8], color='g', alpha=0.2, label='pred')
plt.legend()
plt.xlabel('Time (sec)')
plt.ylabel('LFP')
plt.title('My plot')
# plt.saveas('my_plot.png')

Plot bar plot of coincidences between true and predicted events

In [None]:
# Useful function called Intersection over Union
def iou(x, y):
    """Implement the intersection over union (IoU) between x and x_array
    
    Arguments:
    x -- first segment, numpy array with coordinates (x_ini, x_end)
    y -- second segment, numpy array with coordinates (y_ini, y_end)
    """

    # Calculate the (xi1, xi2) coordinates of the intersection of x and x_array. Calculate its duration.
    xi1 = np.max([x[0], y[0]])
    xi2 = np.min([x[1], y[1]])
    inter_duration = np.max(xi2 - xi1, 0)

    # Calculate the Union duration by using Formula: Union(A,B) = A + B - Inter(A,B)
    x_duration = x[1]-x[0]
    x_duration = y[1]-y[0]
    union_duration = x_duration + x_duration - inter_duration
    
    # compute the IoU
    iou = inter_duration / union_duration
    
    return iou

# Number of true events detected
n_hits = 0

# Go through all true events
for te in true_events:
    # Create a comprehension list with the IoUs of every prediction with this true event
    hits = [ (iou(te, pe) > 0.0) for pe in pred_events]
    # If there is any hit, then add it to the counter
    if any(hits):
        n_hits += 1

# Plot
plt.figure()
plt.bar([0, 1], [n_hits, len(true_events)-n_hits], width=0.4)
plt.xticks([0, 1], labels=['Hits', 'Bads'])
plt.xlim([-1, 2])
plt.ylabel('Number of true events')

### 4.2.4 Keras
Keras tutorials:
* https://keras.io/guides/

There are many deep learning frameworks available today, keras is one of them.
Others could be:
* PyTorch
* Scikit learning
* Caffe
* ...

Let's see one example of keras+tensorflow (<a href="https://keras.io/examples/vision/mnist_convnet/" target="_blank">See original example </a>)

In [None]:
# Import modules
import tensorflow.keras as kr

Prepare the data

In [None]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Build the model

In [None]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Train the model

In [None]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Evaluate the trained model

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

## 4.3 Standard modules

* sys: https://docs.python.org/3/library/sys.html#module-sys
* os: Miscellaneous operating system interfaces (https://docs.python.org/3/library/os.html)
* io: Core tools for working with streams (https://docs.python.org/3/library/io.html)
* time:  Time access and conversions (https://docs.python.org/3/library/time.html)
* argparse: Parser for command-line options, arguments and sub-commands (https://docs.python.org/3/library/argparse.html)
* csv: CSV File Reading and Writing (https://docs.python.org/3/library/csv.html)
* pickle: Python object serialization (https://docs.python.org/3/library/pickle.html)
* math: Mathematical functions (https://docs.python.org/3/library/math.html)

## 4.4 Your own modules (do not execute this, it won't work!)

A module is a file containing Python definitions and statements. 
The file name is the module name with the suffix .py appended. 
For instance, if you create a file called fibo.py in the a
directory with the following contents:

In [None]:
# Fibonacci numbers module (saved as fibo.py)

def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

def fib2(n):   # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a)
        a, b = b, a+b
    return result

You could enter a Python interpreter (we cannot do it because we are using
a notebook!) and import this module with the following command:

In [None]:
import fibo

This does not enter the names of the functions defined in fibo directly in 
the current symbol table; it only enters the module name fibo there. Using the 
module name you can access the functions:

In [None]:
fibo.fib(1000)
fibo.fib2(100)

It's also possible to import particular functions, instead of the whole module

In [None]:
from fibo import fib

Then we could just simply call it without the `fibo` module

In [None]:
fib(1000)

We can also import several functions

In [None]:
from fibo import fib, fib2
fib(1000)
fib2(100)

This works if we are in the same folder as our `fibo.py`. If `fibo.py` is in another folder, we
need to add the path to python

In [None]:
# String with full path to the folder that contains fibo.py
path_to_fibo = '/path/to/fibo'

# Add path to system
sys.path.insert(1, path_to_fibo)

## 4.5 Scripts and executing from terminal
This part will be shown outside the notebook

## 4.6 IPython
IPython documentation:
* https://ipython.readthedocs.io/en/stable/

One of Python’s most useful features is its interactive interpreter. It allows for very fast 
testing of ideas without the overhead of creating test files as is typical in most programming 
languages. However, the interpreter supplied with the standard Python distribution is somewhat 
limited for extended interactive use.