<center> 
# R406: Using Python for data analysis and modelling

<br> <br> 

## Lecture 9: Overview of SciPy. Numerical linear programming

<br>

<center> **Andrey Vassilev**

<br> 

<center> **2016/2017**
 

# Outline

1. The scientific computing ecosystem in Python and the SciPy library
2. Selected examples of SciPy applications
3. Linear programming: a short review and SciPy applications

# The SciPy stack

- Python has many libraries that take care of various scientific computing needs.
- A core set of such libraries forms the so-called *SciPy stack*.
- Some of the key [SciPy stack](https://scipy.org/stackspec.html) libraries include:
  - NumPy
  - SciPy (library)
  - SymPy
  - Pandas
  - Matplotlib

# The SciPy stack and the SciPy library

- The SciPy stack is described as  
> a collection of open source software for scientific computing in Python, and particularly a specified set of core packages
- The SciPy library is described as
> a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics and much more

# Highlights of the SciPy library

SciPy includes routines for:
 - linear algebra (an upgrade over NumPy routines)
 - integration and ODE solution
 - statistical analysis
 - optimization and root finding
 - ...
    
Browse the [SciPy reference](https://docs.scipy.org/doc/scipy/reference/) for a more detailed description.

# SciPy vs NumPy

Better than I could possibly say it, from the [SciPy FAQ](https://www.scipy.org/scipylib/faq.html#what-is-the-difference-between-numpy-and-scipy):
> In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, et cetera. All numerical code would reside in SciPy. However, one of NumPy’s important goals is compatibility, so NumPy tries to retain all features supported by either of its predecessors. Thus NumPy contains some linear algebra functions, even though these more properly belong in SciPy. 

> In any case, SciPy contains more fully-featured versions of the linear algebra modules, as well as many other numerical algorithms. If you are doing scientific computing with Python, you should probably install both NumPy and SciPy. Most new features belong in SciPy rather than NumPy.

In case you are interested in further details, you can also read [*Why both numpy.linalg and scipy.linalg? What’s the difference?*](https://www.scipy.org/scipylib/faq.html#why-both-numpy-linalg-and-scipy-linalg-what-s-the-difference)

In [None]:
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

# Statistical functions in SciPy

The statistical functions of SciPy are contained in module `stats`.

In [None]:
from scipy import stats

The module contains a rich variety of continuous and discrete distributions that can be accessed though a more or less harmonized interface.

In [None]:
x = np.linspace(-5.0,5.0,500) # An array of 500 evenly-spaced 
                              # observations from -5.0 to 5.0
ydensity = stats.norm.pdf(x)  # The standard normal density
ycdf = stats.norm.cdf(x)      # The standard normal CDF

In [None]:
# The following should be self-explanatory.
# Feel free to play around with them.
plt.plot(x,ydensity)
plt.ylim(0,max(ydensity)+0.1)
plt.xlabel(r"$x$",fontsize = 16)
plt.ylabel(r"$f(x)$",fontsize = 16)
plt.show()
# This can be executed to resize plots
# Don't do it unless you really need to!
# plt.rcParams['figure.figsize'] = (8,8)

In [None]:
plt.plot(x,ycdf,'r')
plt.ylim(-0.1,max(ycdf)+0.05)
plt.xlabel(r"$x$",fontsize = 16)
plt.ylabel(r"$F(x)$",fontsize = 16)
plt.show()

You can also create different objects of the same type if you need to work with them simultaneously:

In [None]:
rv1 = stats.norm()
rv2 = stats.norm(2,1.5) # Mean and standard deviation are passed
rv3 = stats.norm(-1,0.5)

In [None]:
plt.plot(x,rv1.pdf(x),'k')
plt.plot(x,rv2.pdf(x),'k--') # Notice the line formatting commands
plt.plot(x,rv3.pdf(x),'k-.')
plt.xlabel(r"$x$",fontsize = 16)
plt.ylabel(r"$f(x)$",fontsize = 16)
plt.show()

You can get some of the moments like this:

In [None]:
Mean,Variance = rv2.stats()
print(Mean)
print(Variance)

For the exponential distribution — whose density is $f(x)=\lambda e^{-\lambda x},~x\geq 0,~\lambda>0$ — we have:

In [None]:
lmbda = 3
expRV = stats.expon(scale = 1/lmbda) # This is how you can pass
                                     # the rate parameter lmbda

In [None]:
expRV.mean()

In [None]:
expRV.median()

In [None]:
expRV.var()

In [None]:
x = np.arange(0,5,0.1)
plt.plot(x,expRV.pdf(x))
plt.show()

In [None]:
# This is how you get percentiles
p95 = expRV.ppf(0.95)
p60 = expRV.ppf(0.60)
p30 = expRV.ppf(0.30)
p50 = expRV.median() # :-)

In [None]:
x = np.arange(0,1.5,0.1)
plt.plot(x,expRV.cdf(x),'k',linewidth=2)
plt.axvline(x=p95,color='k',linestyle='--')
# plt.axhline(y=0.95,color='k',linestyle='--') # If you insist...
plt.axvline(x=p60,color='b',linestyle='--')
plt.axvline(x=p50,color='g',linestyle='--')
plt.axvline(x=p30,color='r',linestyle='--')
plt.show()

The SciPy documentation shows the following fancy way to get lists of the available distributions:

In [None]:
dist_continu = [d for d in dir(stats) if isinstance(getattr(stats,d), stats.rv_continuous)]
dist_discrete = [d for d in dir(stats) if isinstance(getattr(stats,d), stats.rv_discrete)]

But you might decide to simply read the [docs](https://docs.scipy.org/doc/scipy/reference/stats.html) instead.

# Regression

# Solving ordinary differential equations

# Optimization and root finding

# Linear programming