# SciPy
The SciPy package contains various toolboxes dedicated to common issues in scientific computing.

Its different __submodules__ correspond to different applications, such as interpolation, integration, optimization, image processing, statistics, special functions, etc.

It is the core package for scientific routines in Python; it is meant to operate efficiently on numpy arrays, so that numpy and scipy work hand in hand. Scipy’s routines are optimized and tested, and should therefore be used when possible.

In [None]:
import numpy as np

# io: the submodule for input/output (saving and loading files):
from scipy import io as spio

### TO DO:
### CREATE A MULTIDIMENSIONAL ARRAY OF YOUR CHOICE USING NUMPY
array = ...

# saving as a matlab file
spio.savemat('example.mat', {'ar': array})

# loading from a matlab file
data = spio.loadmat('example.mat', struct_as_record=True) #set to True to load as a numpy record array (usually the best option)
data['ar']

In [None]:
#linalg: the submodule for linear algebra
from scipy import linalg

### TO DO:
### CREATE A 2x2 NUMPY ARRAY WITH VALUES 1, 2, 3 and 4
matrix = np.array([[1,2],[3,4]])

matrix_determinant = linalg.det(matrix)
matrix_inverse = linalg.inv(matrix)

print(matrix_determinant)
print(matrix_inverse)

In [None]:
new_matrix = np.array([[3,2],[6,4]]) # this matrix has a null determinant: can you calculate it by hand?

matrix_determinant = linalg.det(new_matrix)
print(new_matrix)
print(int(matrix_determinant))

# since the determinant is 0, what will happen if we run this next line? uncomment it to see for yourself
# matrix_inverse = linalg.inv(new_matrix)

In [None]:
#stats: the submodule for statistics
from scipy import stats

a = np.random.normal(loc=0, scale=1, size=100)      #loc=mean, scale=standard deviation
b = np.random.normal(1, 1, 10)

stats.ttest_ind(a, b)  #how can we interpret these results?

### Intro to Signal Processing, Sampling and Curve Fitting
In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. Sampling can be done for functions varying in space, time, or any other dimension, and similar results are obtained in two or more dimensions.

For functions that vary with time, let s(t) be a continuous function (or "signal") to be sampled, and let sampling be performed by measuring the value of the continuous function every T seconds, which is called the sampling interval or the sampling period. Then the sampled function is given by the sequence:
s(nT),   for integer values of n.
The sampling frequency or sampling rate, fs, is the average number of samples obtained in one second, thus fs = 1/T. Its units are samples per second or hertz e.g. 48 kHz is 48,000 samples per second.

Reconstructing a continuous function from samples is done by interpolation algorithms.

Figure: Signal sampling representation is done in the figure below. The continuous signal S(t) is represented with a green colored line while the discrete samples are indicated by the blue vertical lines.  <img align = center src="Desktop/Signal_Sampling.png"  />



In [None]:
# signal: the submodule for signal processing
from scipy import signal as sig

### TO DO:
### CREATE AN ARRAY WITH 100 VALUES FROM 0 TO 5
time = ...

data = np.sin(time)

data_resampled_25 = sig.resample(data, 25)
data_resampled_10 = sig.resample(data, 10)


# Let's visualize this
import seaborn as sns

plot = sns.lineplot(x=time, y=data, label="Original signal")
plot = sns.scatterplot(x=time[::4], y=data_resampled_25, label="Resampled signal (25)", color="darkorange")
plot = sns.scatterplot(x=time[::10], y=data_resampled_10, label="Resampled signal (10)", color="green", marker="*", s=150)

### Linear Interpolation

Linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points.
If the two known points are given by the coordinates ($x_0$, $y_0$) and ($x_1, y_1$) the linear interpolant is the straight line between these points. For a value x in the interval ($x_0, x_1$), the value y along the straight line is given from the equation of slopes.
\begin{align}
\frac{y- y_0}{x-x_0} = \frac{y_1-y_0}{x_1-x_0}
\end{align}
Solving the equation for y, gives the formula for linear interpolation in the interval ($x_0, x_1$).
\begin{align}
y = y_0\frac{x_1-x}{x_1-x_0}+ y_1 \frac{x-x0}{x_1-x_0}
\end{align}

Linear interpolation is often used to approximate a value of some function f using two known values of that function at other points.

**_In the context of data analysis, it is useful for fitting a function from experimental data and thus evaluating points where no measure exists._**

In [None]:
#interpolate: the submodule for interpolation (easy)
from scipy.interpolate import interp1d
import seaborn as sns


### TO DO:
### CREATE A NUMPY ARRAY WITH 10 VALUES RANGING FROM 0 TO 1
time_points = ...

noise = np.random.uniform(-0.1, 0.1, 10)
print(noise)

datapoints = np.sin(2 * np.pi * time_points) + noise
print(datapoints.shape)


plot = sns.scatterplot(x=time_points, y=datapoints, label="Data points")
# points are close to the sin function, but slightly off because of the noise

In [None]:
interpolation_time = np.linspace(0, 1, 50)

linear_interp = interp1d(time_points, datapoints)
linear_results = linear_interp(interpolation_time)

cubic_interp = interp1d(time_points, datapoints, kind='cubic')
cubic_results = cubic_interp(interpolation_time)



plot = sns.scatterplot(x=time_points, y=datapoints, label="Data points", color="green")
plot = sns.lineplot(x=interpolation_time, y=cubic_results, label = "Cubic interpolation")
plot = sns.lineplot(x=interpolation_time, y=linear_results, label = "Linear interpolation", color="darkorange")

### Curve fitting
Curve fitting is a type of optimization that finds an optimal set of parameters for a defined function that best fits a given set of observations.

Unlike supervised learning, curve fitting requires that you define the function that maps examples of inputs to outputs.

The mapping function, also called the basis function can have any form you like, including a straight line (linear regression), a curved line (polynomial regression), and much more. This provides the flexibility and control to define the form of the curve, where an optimization process is used to find the specific optimal parameters of the function.

The scipy.optimize module provides algorithms for function minimization (scalar or multi-dimensional), curve fitting and root finding.

In [None]:
from scipy import optimize

x_data = np.linspace(0,10,15)
y_data = np.linspace(0,10,15) + np.random.normal(size = 15)

plot = sns.scatterplot(x=x_data, y=y_data)

In [None]:
def f(x,a,b):
    return a*x + b

# given our function "f" whose first argument is an independent variable (x),
# determine the other arguments (a and b) be so that f(x_data) is as close to y_data as possible:
params, params_covariance = optimize.curve_fit(f, x_data, y_data)
optimal_a, optimal_b = params

line = f(x_data, optimal_a, optimal_b)

plot = sns.scatterplot(x=x_data, y=y_data)
plot = sns.lineplot(x=x_data, y=line, color="r")

In [None]:
from scipy import optimize

x_data = np.linspace(-5, 5, 50)
y_data = 2.9 * np.sin(1.5 * x_data) + np.random.normal(size = 50)

plot = sns.scatterplot(x=x_data, y=y_data)

In [None]:
#our points look like they could be approximated by a sin function:
def sin_func(x,a,b):
    return a*np.sin(b*x)

### TO DO:
### FILL IN THE BLANKS!
params, params_covariance = ...
...
plot = sns.lineplot(...)



plot = sns.scatterplot(x=x_data, y=y_data)


### PRINT THE OPTIMIZED PARAMETERS:
# how well did we approximate the values used to create y_data in the first place?
print(optimal_a, optimal_b)