# Physics 494/594
## Linear Regression

In [None]:
# %load ./include/header.py
import numpy as np
import matplotlib.pyplot as plt
import sys
from tqdm import trange,tqdm
sys.path.append('./include')
import ml4s
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.style.use('./include/notebook.mplstyle')
np.set_printoptions(linewidth=120)
ml4s.set_css_style('./include/bootstrap.css')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

## Last Time

### [Notebook Link: 05_Batch_Processing.ipynb](./05_Batch_Processing.ipynb)

- Explored linear algebra in `numpy` for batch processing of samples
- Observed massive speedups! Use array operations whenever possible

## Today
- Cost functions and formulating a machine learning task as an optimization problem
- Understand linear regression 


### Example: Steady-State One-Dimensional Heat Conduction

Fourier's law of heat conduction for a bar of constant cross-sectional area connected between two reservoirs in the steady-state limit gives a simple differential equation for the spatial dependence of the temperature $T$:

\begin{align}
\frac{d^2 T(x)}{d x^2} &= 0 \\
\frac{d T(x)}{dx} &= w \\
T(x) &= w x + b 
\end{align}

Load experimental data from `../data/rod_temperature.dat` using the very convenient `np.loadtxt()` function

In [None]:
!head ../data/rod_temperature.dat

In [None]:
x,T,ΔT = np.loadtxt('../data/rod_temperature.dat', unpack=True)

In [None]:
plt.errorbar(x,T,ΔT, marker='o', linestyle='')
plt.xlabel('x  (m)')
plt.ylabel('T  (°C)');

We expect a linear relationship from Physics!  Let's start with a random guess, and try to fit some other lines by eye.

In [None]:
w = []
b = []
x_fit = np.linspace(np.min(x),np.max(x),100)

fig,ax = plt.subplots(1,2, figsize=(10,3.5))

for i in range(len(w)):
    ax[0].plot(x_fit,w[i]*x_fit + b[i], color=colors[i+1])
    ax[1].plot(w[i],b[i], 'o', color=colors[i+1])
    
ax[0].plot(x,T, 'o', ms=6)
ax[0].set_xlabel('x (m)')
ax[0].set_ylabel('T (°C)')
ax[0].set_title('Data Space')

ax[1].set_xlabel('w (°C/m)')
ax[1].set_ylabel('b (°C)')
ax[1].set_title('Weight Space')

## Goal

Want to predict a scalar $T$ as a function of scalar $x$ given a dataset of pairs $\{(x^{(n)},T^{(n)})\}_{n=1}^N$.  Here the $x^{(n)}$ are inputs and the $T^{(n)}$ are targets or observations. From physics, we have a model:

\begin{equation}
F(x) = w x + b
\end{equation}

i.e. $F^{(n)} = w x^{(n)} + b$.

We can think of this as the simplest possible **shallow** neural network (no hidden layer) and non non-linearity, i.e. $a(x) = 1$.

In [None]:
labels = [[r'$x$'],[r'$F(x) = wx + b$']]
ml4s.draw_network([1,1],weights=[np.array(['w'])],biases=[np.array(['b'])], node_labels=labels, annotate=True)

We want to *learn* the **parameters** (weight $w$ and bias $b$) based on the **prediction** $F$ (here a linear function).  We will do this by minimizing (optimizing) a **loss** function. For a single data point (observation) this is defined to be:

\begin{equation}
\mathcal{L}^{(n)} = \frac{1}{2} \lvert \lvert F^{(n)} - T^{(n)} \rvert \rvert^2
\end{equation}

which quantifies the goodness of fit over our **hypothesis** space (all values of the parameters).  

$F-T$ is the residual, we want to make this as small as possible, which we can do by computing the **Cost** function, the loss function averaged over all training examples (input data):

\begin{equation}
\boxed{
\mathcal{C} = \frac{1}{2N} \sum_{n=1}^N  \lvert \lvert F^{(n)} - T^{(n)} \rvert \rvert^2
}
\end{equation}

Let's use what we learned last time about batch processing to look at this loss function. Here, our input samples are the individual values of $x$.

In [None]:
# for a specific hypothesis (i.e. individual values of w and b)
C_hyp = []
for i in range(len(w)):
    F = np.dot(x,w[i]) + b[i]
    C_hyp.append(0.5*np.average((F-T)**2))
print(C_hyp)

### Now we can do this over the entire space of weights and biases

In [None]:
grid_size = 100 
weights,biases = np.meshgrid(np.linspace(400,1200,grid_size),np.linspace(-1,18,grid_size))
C = np.zeros_like(weights)

for i in range(grid_size):
    for j in range(grid_size):
        F = np.dot(x,weights[i,j]) + biases[i,j]
        C[i,j] = 0.5*np.average((F-T)**2)

In [None]:
plt.contour(weights,biases,C, cmap='Spectral_r', levels=100)

for i in range(len(w)):
    plt.plot(w[i],b[i], 'o', ms=10, color=colors[i+1])

plt.xlabel('w / (°C/m)')
plt.ylabel('b / °C')
plt.colorbar(label='Cost Function')

### Viewing in 3D

In [None]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(8,6))

ax = plt.axes(projection='3d')
surf = ax.plot_surface(weights, biases,C , rstride=1, cstride=1, cmap='Spectral_r', 
                       linewidth=0, antialiased=True, rasterized=True)

# plot the points
for i in range(len(w)):
    ax.plot3D(w[i],b[i],C_hyp[i], 'o', color='k', ms=10)

ax.set_xlabel('w (°C/m)',labelpad=8)
ax.set_ylabel('b (°C)',labelpad=8)
ax.set_zlabel('C(w,b)',labelpad=8);

## How do we identify the minimum of this cost function to extract the *best* parameters for our model?