# Notebook 6: SciPy (WIP)

In this section we will introduce you to the `scipy` package; a package built on top of `numpy` which contains many miscellaneous but useful functions. Unlike `numpy` you may find that you do not use `scipy` frequently. However, you may often find that using `scipy` functions may help improve your code and save you time in the long run! 

This notebook will give a brief description of some of the things you can do with `scipy` and highlights why `scipy` may be useful to you in the future!

In [None]:
import numpy as np
import scipy

## More linear algebra!

`scipy` includes many of the linear algebra functions given in `numpy` and more! For example, `scipy` has a very convenient function for constructing block diagonal matrices, `block_diag`.

In [None]:
# Let's construct some matrices
a = np.array([[1,2],[3,4]])
b = np.array([[5,6,7],[8,9,10],[11,12,13]])
c = np.array([[14,15],[16,17]])

# Let's have a quick look at our matrices
print('a:')
print(a)
print('b:')
print(b)
print('c:')
print(c)

# Now lets combine them as one large block diagonal matrix!
d = scipy.linalg.block_diag(a,b,c)

# Let's have a look at d
print('---------------------------')
print('d:')
print(d)

# Try changing the order of a,b and c in the `block_diag` function - how does d change?

Another useful feature of Scipy is it allows you to easily create matrices with special structures such as [Toeplitz](https://en.wikipedia.org/wiki/Toeplitz_matrix) and [Circulant](https://en.wikipedia.org/wiki/Circulant_matrix) matrices. These are matrices which you may find yourself using quite a lot if you end up working with Fourier analyses or autoregressive models. See if you can work out what Toeplitz and Circulant matrices are using the below code!

In [None]:
# To create a Toeplitz matrix we need to specify the first row and column
row = np.array([1,2,3,6])
col = np.array([1,8,9,7,10])

# Lets make the Toeplitz matrix!
toep = scipy.linalg.toeplitz(col,row)

# Lets print the Toeplitz matrix!
print(toep)

# To create a Circulant matrix we only need a row!
circ = scipy.linalg.circulant(row)

# Lets print the circulant matrix!
print(circ)

A very useful function in `scipy` is the permuted [LU decomposition](https://en.wikipedia.org/wiki/LU_decomposition) (which is currently not supported in `numpy`). The LU decomposition has a wide range of applications in statistics and can be performed as shown below!

In [None]:
import scipy.linalg

# Make a random matrix with numpy
x = np.random.randn(3,3)
print(x)
print('----------------------')

# Get the lu decomposition
P, L, U = scipy.linalg.lu(x, permute_l=False)

# P should be a permutation
print(P)
# L should be lower triangular
print(L)
# U should be lower triangular
print(U)
print('----------------------')

# The product of PLU should be our original matrix
print(P @ L @ U)

## Interpolation

Another useful feature in `scipy` is the interpolation module which you can use to interpolate data. To interpolate 1-dimensional data, we can use the `interp1d` function. To see how this works, let's first make some data for our example!

In [None]:
# Make some data
x = np.linspace(0, 5, num=11, endpoint=True)
y = np.cos(-x**3/9.0)


# Make a plot showing the data
import matplotlib.pyplot as plt
plt.plot(x, y, 'o')
plt.legend(['data'], loc='best')
plt.show()

So this is our data! Below, we use `scipy` to perform [Nearest Neighbour](https://en.wikipedia.org/wiki/Nearest_neighbour_algorithm) interpolation. If you aren't familiar with Nearest Neighbour interpolation, don't worry; see if you can guess what is happening based on the below figure!

> **Note:** We haven't discussed creating plots yet. Don't worry if the above code seems very unfamiliar; The `matplotlib` package which is used for making plots will be covered in more detail in Notebook 7!

In [None]:
# interp1d will give us a special function we can use to
# interpolate our data!
interpolated = scipy.interpolate.interp1d(x, y, kind='nearest')

# Lets make a new range of x values which we wish to 
# interpolate our data on!
x_interpolate = np.linspace(0, 5, num=1001, endpoint=True)

# Lets get our new interpolated values
y_interpolate = interpolated(x_interpolate)

# Plot the original data
plt.plot(x, y, 'o')

# Plot the interpolated data
plt.plot(x_interpolate, y_interpolate, '--')

# Add a key
plt.legend(['data', 'interpolated'], loc='best')

# Show the plot
plt.show()

Now, instead let's try linear interpolation! What has changed in the below code? Which interpolation do you think seems most appropriate for this dataset? 

In [None]:
# interp1d will give us a special function we can use to
# interpolate our data!
interpolated = scipy.interpolate.interp1d(x, y, kind='linear')

# Lets make a new range of x values which we wish to 
# interpolate our data on!
x_interpolate = np.linspace(0, 5, num=1001, endpoint=True)

# Lets get our new interpolated values
y_interpolate = interpolated(x_interpolate)

# Plot the original data
plt.plot(x, y, 'o')

# Plot the interpolated data
plt.plot(x_interpolate, y_interpolate, '--')

# Add a key
plt.legend(['data', 'interpolated'], loc='best')

# Show the plot
plt.show()

Finally let's try cubic interpolation! Which of the 3 interpolation methods do you think was most useful and why (be careful with your answer!)?

In [None]:
# interp1d will give us a special function we can use to
# interpolate our data!
interpolated = scipy.interpolate.interp1d(x, y, kind='cubic')

# Lets make a new range of x values which we wish to 
# interpolate our data on!
x_interpolate = np.linspace(0, 5, num=1001, endpoint=True)

# Lets get our new interpolated values
y_interpolate = interpolated(x_interpolate)

# Plot the original data
plt.plot(x, y, 'o')

# Plot the interpolated data
plt.plot(x_interpolate, y_interpolate, '--')

# Add a key
plt.legend(['data', 'interpolated'], loc='best')

# Show the plot
plt.show()

Functions similar to `interp1d` also exist for [2-dimensional](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html#scipy.interpolate.interp2d) and [n-dimensional](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interpn.html#scipy.interpolate.interpn) datasets and even more options for other interpolation methods, such as splines and B-splines, exist as well. This notebook is only intended to give a brief overview of what is possible with scipy but note that many more options are available for performing interpolation (see the [documentation here](https://docs.scipy.org/doc/scipy/reference/interpolate.html)).

## Sparse matrices

`scipy` includes support for the `sparse` matrix format (a faster way of working with matrices with few non-zero elements, which works by only storing and using the non-zero elements in computation). 

On very sparse examples, the `scipy` sparse functions can be much faster than `numpy`:

In [None]:
import scipy.sparse
import time

# Make a random matrix and keep only a few elements
X = np.random.randn(1000,1000)
X[X<2.5] = 0

print('Example sparse matrix (only small block printed):')
print(X[1:10,1:10])
print('------------------------------')

# Make a sparse version of X using scipy
X_sparse = scipy.sparse.csr_matrix(X)

# Lets work out X transpose multiplied by X 
# using the original dense (numpy) matrix
t1 = time.time()
XtX = X.T @ X
t2 = time.time()
print("Using dense (numpy) matrices the calculation X @ X.T took:")
print(t2-t1, "seconds")

# Lets work out X transpose multiplied by X 
# using the sparse (scipy) matrix
t1 = time.time()
XtX = X_sparse.T @ X_sparse
t2 = time.time()
print("Using sparse (scipy) matrices the calculation X @ X.T took:")
print(t2-t1, "seconds")

Not only option - cvxopt also good

## Optimization

Minimizers: Nelder mead, powell etc

What does output tell you?

## Integration

## Random Numbers

## And much, much more...

`scipy` has a wide range of useful and interesting functions for working on numeric data. This section has barely even scratched the surface in terms of listing `scipy` features. A, by no means complete, list of `scipy` capabilities include:

 - Integration (`scipy.integrate`)
 - Optimization (`scipy.optimize`)
 - Fourier Transforms (`scipy.fftpack`)
 - Signal Processing (`scipy.signal`)
 - Spatial data structures and algorithms (`scipy.spatial`)
 - Statistics and random numbers (`scipy.stats`)
 - Multidimensional image processing (`scipy.ndimage`)
 - File handling (`scipy.io`)
 - Other special functions (`scipy.special`)
 
And much, much more...