# Example #1: Neural Network for $y = \sin(x)$

Same example as yesterday, a sine-curve with 10 points as training values:

In [0]:
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0,6.6, 0.6)
y = np.sin(x)

xplot = np.arange(0, 6.6, 0.01)
yplot = np.sin(xplot)

plt.scatter(x,y, color="b", label="Training")
plt.plot(xplot, yplot, color="g", label="sin(x)")

plt.legend()
plt.show()

## Defining the architecture of our neural network:

Fully connected with 1 input node, 1 hidden layer, 1 output node.




<img src="https://i.imgur.com/v27q53W.png" width="400">


Layer connections:
\begin{equation}
y = b+\sum_i x_i w_i
\end{equation}


**Question:** "How many weights are there in the above example?"

### Defining the Activation function (sigmoid):
\begin{equation}
\sigma\left(x\right) = \frac{1}{1 + \exp\left(-x\right)}
\end{equation}
Popular because the derivative of the sigmoid function is simple:
\begin{equation}
\frac{\mathrm{d}}{\mathrm{d}x}\sigma\left(x\right) = \sigma\left(x\right)\left(1 - \sigma\left(x\right)\right)
\end{equation}

In [0]:
def activation(val):

  sigmoid = 1.0 / (1.0 + np.exp(-val))
  return sigmoid

### Defining the architecture (i.e. the layers):

*   `input_value` - Input value
*   `w_ih` - Weights that connect input layer with hidden layer
*   `w_io` - Weights that connect hidden layer with output layer



In [0]:
def model(input_value, w_ih, w_ho):

  hidden_layer = activation(input_value * w_ih)
  output_value = np.sum(hidden_layer*w_ho)

  return output_value

Let's start by testing the neural network with random weights:

In [0]:
np.random.seed(1000)
random_weights_ih = np.random.random(10)
random_weights_ho = np.random.random(10)

print(random_weights_ih)
print(random_weights_ho)
print()

val = 2.0
sinx_predicted = model(val, random_weights_ih, random_weights_ho)

print("Predicted:", sinx_predicted)
print("True:     ", np.sin(2.0))

Setting our Model parameters:

In [0]:
# The number of nodes in the hidden layer
HIDDEN_LAYER_SIZE = 40

# L2-norm regularization
L2REG = 0.01

## Optimizing the weights:

We want to find the best set of weights $\mathbf{w}$ that minimizes some loss function. For example we can minimize the squared error (like we did in least squares fitting):

\begin{equation}
L\left(\mathbf{w}\right) = \sum_i \left(y_i^\mathrm{true} -  y_i^\mathrm{predicted}(\mathbf{w}) \right)^{2}
\end{equation}
Or with L2-regularization:
\begin{equation}
L\left(\mathbf{w}\right) = \sum_i \left(y_i^\mathrm{true} -  y_i^\mathrm{predicted}(\mathbf{w}) \right)^{2} + \lambda\sum_j w_j^{2}
\end{equation}
Just like in the numerics lectures and exercises, we can use a function from SciPy to do this minimization: `scipy.optimize.minimize()`.

In [0]:
def loss_function(parameters):

  w_ih = parameters[:HIDDEN_LAYER_SIZE]
  w_ho = parameters[HIDDEN_LAYER_SIZE:]

  squared_error = 0.0

  for i in range(len(x)):

    # Predict y for x[i]
    y_predicted = model(x[i], w_ih, w_ho)
    
    # Without # Regularization
    squared_error = squared_error + (y[i] - y_predicted)**2 
    
    # With regularization
    # rmse += (z - y[i])**2 + np.linalg.norm(parameters) * L2REG
       
  return squared_error

## Running the minimization with `scipy.optimize.minimize()`:

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

Since we haven't implemented the gradient of the neural network, we can't use optimizers that require the gradient. One algorithm we can use is the Nelder-Mead optimizer.

In [0]:
from scipy.optimize import minimize

# Define random initial weights
np.random.seed(666)
p = np.random.random(size=2*HIDDEN_LAYER_SIZE)

# Minimize the loss function with parameters p
result = minimize(loss_function, p, method="Nelder-Mead",
                  options={"maxiter": 100000, "disp": True})

wfinal_in = result.x[:HIDDEN_LAYER_SIZE]
wfinal_hl = result.x[HIDDEN_LAYER_SIZE:]

print(wfinal_in)
print(wfinal_hl)

In [0]:

# Print sin(2.5) and model(2.5)
val = 2.5
sinx_predicted = model(val, wfinal_in, wfinal_hl)

print("Predicted:", sinx_predicted)
print("True:     ", np.sin(val))

Lets make a plot with pyplot!

In [0]:
xplot = np.arange(0,6.6, 0.01)
yplot = np.sin(xplot)

ypred = np.array([model(val, wfinal_in, wfinal_hl) for val in xplot])

import matplotlib.pyplot as plt

plt.plot(xplot,yplot, color="g", label="sin(x)")
plt.scatter(x, y, color="b", label="Training")
plt.plot(xplot, ypred, color="r", label="Predicted")

plt.ylim([-2,2])
plt.show()



## What to do about "crazy" behaviour?
*   Regularization
*   Adjust hyperparameters (hidden layer size)