# SciPy Optimize

SciPy contains many different packages that can be used to solve a variety of problems.

This notebook will be dealing with the package `scipy.optimize` which is used to solve a variety of problems within the fields of mathematics and statistics.

Within this notebook we'll be addressing the three problems of:

1. **Finding the Minimum of a Single Variable Function**
2. **Finding the Global Minimum of a Multivariable Function**
2. **Linear Regression Models**

For a more in-depth look into `scipy.optimize` and other packages that it works well with please refer to the [SciPy 2015 Lecture](https://www.youtube.com/watch?v=avRx2cdNZmk)

The respective functions we'll use to solve these problems will be:

* `brent`
* `curve_fit`
* `differential_evolution`

#### Outline
1. Getting Started
2. Finding the Minimum of Single Variable Functions: [`brent`](https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.optimize.brent.html)
3. Finding the Minimum of a Multivariable Function: [`differential_evolution`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html#scipy.optimize.differential_evolution)
3. Linear Regression Models and Making Predictions: [`curve_fit`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html)
4. Exercises

## Getting Started

Optimization is all about finding the *best* outcome for a situation. We may want to find the largest value or maybe the smallest. Maybe we want the best prediction a linear regression model can give us. 

Prior to jumping into the notebook we'll need to import the required packages to handle the questions and be able to work with `scipy.optimize`.

The packages we'll need are:
* NumPy
* Matplotlib
* scipy.optimize

In [None]:
import scipy.optimize as opt
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Finding the Minimum of Single Variable Functions

We'll be starting off simple we'll be trying to find the **global** minimum of single variable functions. This is simply to get you comfortable with using some basic functions found within `scipy.optimize`.

As many of you already know with functions there can be a single or multiple minimum and maximum points for a function. Global minimums or maximums are the lowest and highest values that can be found along a function.

For example:
$$
f(x) = x^2
$$

As we know it can go off to $\infty$ as x goes to $- \infty$ or $\infty$ but the function overall has a minimum value at $x = 0$ or $y = 0$ as seen below.

In [None]:
x = np.linspace(-20,20)
f = x**2

plt.plot(x, f), plt.plot(0,0,'r.')

The minimum of a function will correspond to a point where the tangent line of a function has a slope of zero. The normal way to do this would be to find where the function equals zero after taking the first derivative. Then analyze the pattern at points near the zero to see how the function behaves (increasing, or decreasing around the point). You can further confirm the behaviour using the second derivative.

Let's look at the function
$$
f(x) = \cos \left( \frac{1}{2} x + 1 \right)
$$
Within the range $[-9,9]$ to keep it simple.

In [None]:
x = np.linspace(-9,9)
f = np.cos(0.5 * x + 1)
plt.plot(x,f), plt.plot((-5,1),(1,1),'r-'), plt.plot((1,7),(-1,-1),'r-')

As we can see with the constricted interval, the function has two roots, $x = 1$ and $x = -1$.

#### Example: $f(x) = x^2$

$$
f(x) = x^2 \ \ , \ f'(x) = 2x \ \ , \ f'(x) = 0 = 2x
$$

It's clear that the function is zero when $x = 0$. So we can look at points around $x=0$, such as when $x = -1$ and $x = 2$

$$
f'(-1) = 2(-1) = -2 \\ 
f'(2) = 2(2) = 4
$$

As you can see when $x < 0$ the derivative (the slope) is negative, but for points $x > 0$ the derivative is positive. So the function is decreasing until it reaches $x = 0$, then increases showing that the point: $x = 0$ is a minimum.

Now that we've gone over the idea of a minimum, how can we solve this problem using `scipy.optimize`? The problem itself isn't difficult to do, but it can be cumbersome especially when the functions become more difficult. This is where the function `scipy.optimize.brent` can help us.

In [None]:
def f(x):
    return x**2

minimum = opt.brent(f)
minimum

x = np.linspace(-10,10,100)
y = x**2
plt.plot(x,y), plt.plot(f(minimum)**2,minimum,'r.')

#### Example: Rosenbrock Banana Function

In [None]:
from mpl_toolkits.mplot3d import Axes3D

How about we try a more difficult example now. 
$$
f(x) = (1-x)^2 + 100(y - x^2)^2
$$

This is the Rosenbrock Banana Function, which is known to have a minimum at $(1,1)$

In [None]:
plt.figure()
ax = plt.axes(projection = '3d')
x = np.linspace(-1,1,1000)
y = np.linspace(-1,1,1000)
X, Y = np.meshgrid(x,y)
Z = (1 - X)**2 + 100 * (Y - X**2)**2

ax.plot_surface(X,Y,Z,cmap = plt.cm.hsv), ax.view_init(40,-50)

For our example we'll constrict the Banana Function to a 2D plot.
$$
f(x) = (1- x)^2 + 100(1- x^2)^2
$$

In [None]:
def f(x):
    return (1 - x)**2 + 100 * (1 - x**2)**2

In [None]:
x = np.linspace(-5,5,100)
simple_plot = f(x)
plt.plot(x,simple_plot)

In [None]:
minimum = opt.brent(f)
minimum

In [None]:
plt.plot(x,simple_plot), plt.plot(0,minimum,'r.')

# 2. Finding the Global Minimum of a Multivariable Function

We've looked at dealing with single variable functions and it's fairly straight forward. However, how does one find the minimum of a multivariable function, especially the **global** minimum on a given boundry?

The first few steps are similar to the steps followed in finding the minimum in the previous section.
1. You find the critical points within the boundry.

But then you need to follow a few more steps.
2. Find extreme upon the boundry
3. Comparer all the points and the largest and smallest are the global maximum and minimum of the function.

The difference here compared to the previous section is that there are more calculations required to examine the points. You need to examine points within the boundry then the boundry itself and while some may find it easy, others may find it tedious. After all, the faster we can do this the faster we get to the more interesting stuff. So let's find a way to automate this process and move through these problems faster.

We'll use the function, `differential_evolution`. It follows a similar method to how a real mathematician would evaluate to global minimum and maximim, but is able to perfrom this at a much faster rate.

The function `differential_evolution` has two required parameters, a function to find the minimum of, and the bounds to evaluate the function upon. While the bounds can be ignored, since we're trying to find the minimum within given bounds, we should probably use this parameter.

#### Example: The Rosenbrock Banana Function
Let's use the function from the end of the previous section, *The Rosenbrock Banana Function*, but without constricting it to an x,y plane.

$$
f(x,y) = (a-x)^2 + b(y-x^2)^2
$$
Find its minimum on the boundary $-5 < x < 5$ and $-5 < y < 5$

The function is known to have a minimum at: $(a,a^2)$

In [None]:
plt.figure()
ax = plt.axes(projection = '3d')
x = np.linspace(-5,5,1000)
y = np.linspace(-5,5,1000)
X, Y = np.meshgrid(x,y)
Z = (1 - X)**2 + 100 * (Y - X**2)**2

ax.plot_surface(X,Y,Z,cmap = plt.cm.hsv), ax.view_init(40,-50);

*Please note the the bounds parameter takes in an array of (x,y) points for the upper and lower bounds for optimization.*

In [None]:
#Initialize a=1 and b=100 to be used in the function

def rosen(x,a = 1,b = 100):
    return (a-x[0])**2 + b*(x[1] - x[0]**2)**2

In [None]:
# Set the bounds to evaluate the optimization
bounds = [(-2,2),(-2,2)]

# Calculate the optimized global minimum
min_point = opt.differential_evolution(rosen,bounds)
#Return the minimum point
min_point.x

The function `differential_evolution` returns an array representing the x and y values of the minimum.

## 2. Solving Linear Regression Models

In statistics linear regression models aid us in assessing the viability of a linear model. The linear model itself is a linear function that passes through a cloud of points such that each point is the same vertical distance from the model line.


The key components to compute a linear regression model is the correlation coefficient, and the standard deviation in x and y values.

The [correlation coefficient](https://en.wikipedia.org/wiki/Correlation_coefficient) is the measure of strength between two variables. The coefficient itself ranges from $[-1, 1]$.

   * The closer it is to 1, the greater the positive relationship.
        This means the model is sloping upwards and (x,y) have the relationship of x causes y.
        
   * The closer it is to -1, the stronger the negative relationship.
        This means the model slopes downward, and x and y have a negative relationship
        
   * If the coefficient is 0, then there is no **LINEAR** relationship between x and y.
   
It's the coefficient itself which can be very tedious to calculate.

The regression line equation is:
$$
\hat{y} = b_0 + b_1 x   
$$

The slope of the regression line is calculated as:
$$
b_1 = \frac{r s_y}{s_x}
$$

With:
r = correlation coefficient,
$s_x$ = Std Deviation in $X$ values,
$s_y$ = Std Deviation in $Y$ Values

The Intercept is calculated as:
$$
b_0 = \bar{y} - b_1 \bar{x}
$$

With:
$\bar{y}$ = mean of the $Y$ values,
$\bar{x}$ = mean of the $X$ values,
$b_1$ = slope of the line

So, we need all of this data which can be very time consuming to calculate with the correlation coefficient itself being quite a burden. This is where the function `scipy.optimize.curve_fit` can aid us.

In [None]:
epsilon = np.random.randn(100)
x = 0.2 * np.random.rand(100)
y = 55*x + 20 + epsilon
plt.scatter(x,y)

Here, we'll be using the function function `curve_fit` to help us construct models for a linear set of data.

The function itself can expand to work on non-linear data and functions however.

In [None]:
epsilon = np.random.rand(100)
x = 0.01 * np.linspace(0,50,100)
y = np.cos(x**0.5) + epsilon
plt.scatter(x,y)

While it is possible to construct a linear model using `scipy.linalg` it requires playing around with matrices, which Python itself is known to have difficulty handling. You cannot really "construct" matrices, but instead create arrays and utilize them as matrices. Also, solving for the coefficients in the model: $y = a + bx + \epsilon$ requires reshaping matrices which if difficult to visualize can be quite a hassle.

A difficult part of finding the linear model is finding the most optimal values of $a$ and $b$ where all the scatter points are an equal distance from the model's predicted values. Packages such as `scipy.linalg` while helpful in displaying the model, can be quite clunky to use when trying to solve for the $a, b$ values.

#### For Example:
Using the model: $y = a + bx + \epsilon$ we'll use `scipy.optimize.curve_fit` to find $a$ and $b$ and then create the model and plot it using `matplotlib`.

In [None]:
def f(x,a,b,c):
    return a + b*x + c

In [None]:
# Generate random values to plot
x = np.linspace(0,10,100)
y = f(x,2,0.15,17)
noise = 0.2 * np.random.randn(x.size)
y = y + noise
plt.scatter(x,y)

In [None]:
# Now find parameters a and b for the model
a_model, b_model = opt.curve_fit(f,x,y)

#Plot the data and lay the model over top
plt.scatter(x,y)
plt.plot(x,f(x,*a_model),'r');

As you can see, much less work is required by the user when utilizing `curve_fit`. The user is not required to calculate coefficients by hand as shown at the beginning of this section. Nor must they use matrices that need to be reshaped and are handled poorly in Python.

`curve_fit` simply required the function used to plot the points and by itself, it will find the optimized values for the linear model and then you can simply plot the model on top of the scatter plot.

## 4. Exercises

##### 1. Find the Minimum of the Function $f(x) = \cos(x + 1)$

##### 2. Find the Minimum of the Function: $f(x) = -\frac{1 + \cos(12 \sqrt{x^2 + 1})}{0.5x + 2}$

##### 3. Find the Global Minimum of [Beale's Function](https://www.sfu.ca/~ssurjano/beale.html) on the boundry $-5 < x < 5$ and $-5 < y < 5$

##### 4. Plot a regression model for data generated from $y = \cos(x^2 + 1)$