# Neural Networks Lab

In this lab, we'll be exploring a visual proof of the universal approximation theorem and building (from scratch) a neural network that will approximate a pretty ridiculous function.

Head over to [this site](http://neuralnetworksanddeeplearning.com/chap4.html) and read from the beginning of the page until the "Many Input Variables" section. (You do not need to read the "Many Input Variables" section and beyond but are certainly welcome to do so!) You'll read the introduction, the "Two Caveats" section, and the "Universality with One Input and One Output" section.

Your answers to problems 1-5 should come from directly this reading.

**Problem 1**: Summarize the Universal Approximation Theorem. (Don't copy it; use your own words!)

Neural networks can compute any function so long as it is continuous.

**Problem 2**: Summarize the two caveats the author uses to describe the statement "a neural network can compute any function."

Function must be continuous, and the result will always be approximate (obviously)

**Problem 3:** For a sigmoidal activation function to closely resemble a step function, how would you describe the value of $w$? What constraints exist on the value of $b$? How do we calculate $s$? What does the value of $s$ indicate?

Try playing around with the applets on the page to test how different parts of the perceptron affect the output. This should be helpful in answering the questions above.

The value of w determines how 'compressed' the function appears, or to put it in different words, how quickly the slope changes. It is more compressed/the slope is steeper with a higher value of w

There is a soft cap on its value after which it has no effect, and its value relative to w is what determines the step-ness of the curve

The step is found through the inverse relationship between bias and weight. It can be formulaically expressed as s = -b/w

s indicates where the step in a function can be found

**Problem 4**: When the author wants us to approximate $f(x)=0.2+0.4x^2+0.3x\sin(15x)+0.05\cos(50x)$ with a neural network, the function on the applet where we manipulate the values of $h_i$ is not $f(x)$. It's a different function. What is this function, and why are we working with this one instead of $f(x)$?

The output of the NN (sigma) is not weighted such that we can easily determine the minimum of our loss function. By taking the inverse of sigma and applying it to f(x), we are then in a position to analyze and optimize our values for h to minimize loss.

**Problem 5**: The author asks you to find values of $h_i$ that make your neural network closely approximate $\sigma^{-1}\circ f(x)$. Record your values of $h_i$ here and your best "average deviation" score.

h[0.0, 0.2) = -1.2

h[0.2, 0.4) = -1.5

h[0.4, 0.6) = -0.4

h[0.6, 0.8) = -1.0

h[0.8, 1.0] = 1.0

Avg. dev = 0.38

**Problem 6**: Build the neural network from your work in Problem 5 here.

A few things to keep in mind:
* How many inputs are there? 
* How many outputs are there?
* How many neurons are in the hidden layer?
* In order to create step functions between 0 and 0.2, 0.2 and 0.4, etc., what does this suggest about the activation function in these neurons? Note that these activation functions will be different, but related.
* What do the values of $h_i$ represent?

Check out the Neural Networks I notes for an implementation in NumPy; you should be able to use this as a starting point for your model.

In [None]:
def sig_act(z):
    if z > 6:
        return(1)
    if z < -6:
        return(0)
    return(1 / (1 + np.exp(-z)))

def lin(x):
    return(x)

# s = -b/w (negative bias over weight)
# w = 500
def solve_for_bias(s, w=500):
    return(-w / s)

In [None]:
steps = [0.2, 0.4, 0.6, 0.8, 1.0]
bias = np.array([solve_for_bias(n) for n in steps])
weights = np.array([500] * 10)

In [None]:
weights_output = np.array([-1.1, 1.1, -1.4, 1.4, -0.4, 0.4, -1, 1, 1.1, -1.1])
output_bias = 0

In [None]:
def network(x, activation_function = lin):
    node_vals = x * weights * bias
    
    activation_vals = np.array([sig_act(z) for z in node_vals])
    
    output = np.sum(activation_vals * weights_output) + output_bias
    
    activation_output = activation_function(output)
    
    return(activation_output)

In [None]:
x_vals = np.linspace(0, 1, 1000)
y_vals = [network(x) for x in x_vals]

def f(x):
    return(.2 + .4 * (x ** 2) + 3 * x * np.sin(15 * x) + .05 * np.cos(50 * x))

y_true = [f(x) for x in x_vals]

In [None]:
f(x)=0.2+0.4x2+0.3xsin(15x)+0.05cos(50x),

**Problem 7**: Once you've built the neural network, use `np.linspace` to generate 1000 values of $x$ between 0 and 1 and use the `pynverse` [library](https://pypi.python.org/pypi/pynverse) to manually estimate the performance of your neural network using mean squared error.

Recall that mean squared error is given by:

$$
\frac{1}{n}\sum_{i=1}^n (\hat{y}-y)^2
$$


* Your $\hat{y}$ in this case are your predicted values from your neural network for each of the $x$ that you generated using `np.linspace`. Make sure to take into account the final activation function!
* Your $y$ values are the actual observed values of $f(x)=0.2+0.4x^2+0.3x\sin(15x)+0.05\cos(50x)$ for each of the $x$ that you generated using `np.linspace`.

In [None]:
from pynverse import inversefunc
inv_sigmoid = inversefunc(sig_act)

y_hat = [network(x) for x in x_vals]
y_invsig = [inv_sigmoid(i) for i in y_true]
plt.plot(x_vals, y_invsig)
plt.plot(x_vals, y_hat)
plt.xlim((0, 1))

In [None]:
def my_MSE(y_hat, y):
    if len(y_hat) != len(y):
        print('you dun goofed')
        return()
    n = len(y_hat)
    SE = [(y_hat[i] - y[i]) ** 2 for i in range(n)]
    MSE = sum(SE) / n
    return(MSE)

In [None]:
my_MSE(y_hat, y_true)

**Problem 8**: Suppose you wanted to increase the performance of this neural network. How might you go about doing so?

Perhaps reducing the number of nodes would improve performance at the cost of speed of convergence