# Introduction to Scipy
Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. [Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool to operate on array efficiently. From the *FAQ*, Scipy is "*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*". This is the core of any data analysis package in Python.

The main structure provides by Scipy is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.
This notebook can be run on Colab: [here](https://colab.research.google.com/github/DataScience4Geoscience/Toulouse2020/blob/master/Notebooks/Introduction_to_Python/N2_Introduction_to_Scipy.ipynb).


In [None]:
import scipy as sp
A = sp.array(range(10)) # Create array from a list
print("A = {}".format(A))
B = sp.arange(10) # Create array from scratch
print("B = {}".format(B))
sp.array?

There are plenty of functions to create and to initialize specific array (sp.zeros, sp.ones, sp.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html and https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html.

## Basics of Arrays 
### Getting attributes

In [None]:
# Attributes
print("Number of elements in A: {}".format(A.size))
print("Number of dimension of A: {}".format(A.ndim))
print("Dimension of A: {}".format(A.shape))
print("Type of element in A: {}".format(A.dtype))

It is possible to modify explicitely some attributes, in particlar the *shape*:

In [None]:
B.shape = (2,5) # Change the shape to two lines, 5 columns -> the number of total elements should be the same
print("B = {}".format(B))
C = B.reshape(10) # The function return a new array with the corresponding shape
print(B.shape)
print(C.shape)

### Accessing elements

In [None]:
print("A = {}".format(A))
print(A[0]) # First element
print(A[1]) # Second element
print(A[-1]) # Last element
print(A[-2]) # Antepenultimate element

In [None]:
# Some slicing
print(A[0:3]) # Return an array of elements of A from the first to the third
print(A[::2]) # All elements with a step of 2
print(A[-3:-1]) # Can use reverse order

## Computation on Array
### Universal functions
A general comment for interpreted laguage: **do not use loop if you can** ! It is slow and inefficient.

The comment apply here with Python. Scipy provide a large types of operation that are optimized to work on array directly (as in Matlab, R ...). In particular, *universal functions* (ufuncs) are a set of functions for fast element-wise operations (+, -, power ...). Let see a short example:

In [None]:
def my_add(M,N): # Suppose that A and B have the same shape
    P = sp.empty_like(M)
    nl, nr = M.shape
    for i in range(nl):
        for j in range(nr):
            P[i,j] = M[i,j] + N[i,j]
    return P

M, N = sp.arange(100000).reshape(1000,100), sp.arange(100000).reshape(1000,100)
%timeit my_add(M,N) # using loop
%timeit M + N # using ufuncs equivalent to sp.add(A,B)

Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html

### Reduction
Scipy provides a set of functions to extrac values from the array itself and for some specific dimension of the array

In [None]:
A = sp.random.rand(5,4)
print("A = \n{}".format(A))
print(A.sum()) # Sum over all element
print(A.sum(axis=0)) # Sum over the lines: return an array of values
print(A.sum(axis=1)) # over the columns

Using the same convention, it is possible to get the cummulative sum (cumsum), product of element (prod, cumprod), the maximum/miniumu value (max, min) and their position (argmax, argmin) and the first and second statistical moment (mean, var/std). It is also possible to check if a condition is fullfilled for all or any elements of the array.

In [None]:
sp.any(A>0)

In [None]:
sp.all(A>0.5)

### Some exercices
- Find the maximum and minimum value of A
- Find the maximum of each line
- Find the mean value of each row
- Find the position of the minimum value of each row

### Broadcasting
Broadcasting allow to define efficient operations between arrays of different sizes, given some of them are compatible. An extreme example is adding a scalar to a matrix

In [None]:
A+3

Easy ? Now if I need to center the data, it is also super easy

In [None]:
A - A.mean(axis=0) # Suppose that each line is a sample, and each column a measurement (i.e., a variable)

If we need to standardize the data (substract the mean and divide by the standard deviation), it can be achieved easily:

In [None]:
As = (A-A.mean(axis=0))/A.std(axis=0)
print(As)

More details about broadcasting can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

## Ploting in Python
The package [Matplotlib](https://matplotlib.org/) offers several functions to plot data. Below an example using 2D data, more complicated plots can be constructed when needed.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
x = sp.arange(0,10,0.01)
y = x**2
plt.plot(x,y)
plt.grid()
plt.show()