# Summary

The video series by 3Blue1Brown introduced you to the underlying mathematical concepts behind how neural networks work. Let's now try to implement these calculations with Tensorflow (TF), a machine learning framework designed by Google. Tensorflow allows us to easily implement parallel computing for faster computation and compute across CPUs/GPUs/TPUs, which are processing units that execute commands from your programs. Tensorflow operations closely follow NumPy operations, and many of the more verbose operations, such as one-hot encoding, are included in the library. Begin by importing NumPy and Tensorflow:

In [6]:
import numpy as np
import tensorflow as tf

# Neural Network Model

In its simplest form, a neural network consists of an input layer, hidden layer, and output layer. Each layer consists of a selected number of neurons (displayed as circles), which are connected to every neuron in its neighboring layers. The number of neurons we choose in the hidden layer is up to our discretion, so this is called a <b> hyperparameter</b>. Similarily, the number of hidden layers we add is another hyperparameter. Terminology and architecture will be discussed more in detail as we code the neural network with Tensorflow step by step.

<img src="Assignment1images/neural_network.jpg">
<center> Image source: https://www.nicolamanzini.com/single-hidden-layer-neural-network/ </center>

### Inputs & Outputs

Observe the imaginary dataset given below. Haha, a civil engineering example! Earthquakes rated on the Richter scale are classified as minor and severe based on casualties and cost of damage. Our inputs are numerical and their associated outputs are catagorical. *fyi: I might edit this, but all the steps should be the same...

In [4]:
inputs = np.array([3, 3, 3, 4, 4, 6, 6, 6, 7, 7, 8, 8, 9, 9, 9])
labels = np.array(['minor', 'minor', 'minor', 'minor', 'minor', 'minor', 'severe', 'minor', 'minor', 'severe', 'severe', 'severe', 'severe', 'severe'])

Tensorflow allows us to define operations before actually computing them, which is useful if you want to run calculations in a specific order or even multiple times. Tensorflow creates a <b> dataflow graph</b> that maps out operation order and dependecies.

Placeholders are TF objects that are given information at the time of computation (when we "run the <b>session</b>"). Think about the shape of our inputs and outputs and construct TF placeholders for them (input placeholder as x and label placeholder as y):

In [None]:
x =
y = 

Remember the discussion about one-hot encoding your catagorical variables? Tensorflow makes this task extremely simple. Make y one-hot encoded:

In [None]:
y = 

### Connecting Input to Hidden Layer

Every neuron in the input layer is connected to every neuron in the hidden layer by a <b>linear classifier</b>. It is easiest to visualize a linear classifier as a line that spacially seperates the data points. You should know from algebra that a line has the form $ y=mx+b$.

<img src="Assignment1images/linearclassifier.png">

<center> Image source: http://mlpy.sourceforge.net/docs/3.2/lin_class.html </center>

In very simple terms, our hope is to solve for m (<b>weights</b>) and b (<b>biases</b>) so that when we plug in x (input), we get an output that provides us with an accurate prediction. The neural network is a bit more complex: notice that a single neuron in the hidden layer is connected to every neuron in the input layer. These connections are weighted and summed for each neuron in the hidden layer. The collection of equations is typically written in array & matrix form, but don't stress too much about this.

Because weights and biases are evaluated during a session, they should be defined as TF variables. Weights should be initialized with a truncated normal distribution with a reasonably selected standard deviation, and biases should be initialized as zeros. Shapes must be specified.
<ul>
<li> Weights: [# neurons in previous layer, # neurons in next layer], use standard deviation = 0.1 </li>
<li> Biases: [# neurons in next layer] </li>
</ul>

Let's say that the hidden layer has 10 neurons. Define the weights and biases with TF:

In [None]:
with tf.variable_scope('', reuse=tf.AUTO_REUSE): 
    w1 = 
    b1 = 

Perform matrix multiplication on the inputs and weights, and add bias with TF:

In [None]:
hidden_layer =

### Hidden Layer

Notice that the hidden layer is connected to the output layer with more linear classifiers. Adding linear classifiers in series is essentially the same as applying one linear classifier. Thus, we need to add a nonlinearity after the first linear classifier. There are many <b>nonlinear activations</b>, such as sigmoid, tanh and ReLU. These activations squash the outputs to values between [-1,1] or [0,1].

<img src="Assignment1images/nonlinearactivations.png">
<center> Image source: https://www.researchgate.net/figure/The-most-common-nonlinear-activation-functions-in-neural-networks_fig1_325694563 </center>

Apply a sigmoid activation function to your previous answer:

In [None]:
hidden_layer = 

### Connecting Hidden Layer to Output

Similarly, every neuron in the hidden layer is connected to every neuron in the output layer with a linear classifier. Connect the hidden layer to your output layer with the same procedure:

In [None]:
with tf.variable_scope('', reuse=tf.AUTO_REUSE): 
    w2 = 
    b2 = 
output = 

The activation at the output depends on the type of problem we are solving. Sigmoid is normally applied to a binary classification problem while softmax is applied to a multi-class classification problem. Apply a sigmoid activation to your output:

In [None]:
pred = 

### Loss Function & Optimization

A loss function allows our program to evaluate how close the prediction is to the actual label. Gradient descent will attempt to minimize this loss. Because our problem is a classification problem, a good loss function to use is softmax cross entropy with logits. Define the loss function in TF:

In [None]:
loss = 

Defining an accuracy measure is useful for understanding the performance of our model. Think about what accuracy means in a classification problem and define it in TF ***: 

In [None]:
accuracy = 

Inputs are fed into the model and predictions are obtained which are compared to the actual labels with the loss function. <b>Gradient descent</b> is the optimization technique used in backpropagation to update the randomly initialized weights and biases. Use Adam optimizer (type of gradient descent with adaptive learning rates) with learning rate 0.0001:

In [None]:
train_op = 

### Training & Testing data

Our neural network model is now completely built! We will need to split the dataset into a training set and testing set, which is essential to evaluate how the model generalizes to data it has never seen. We do not want the model to <b>overfit</b>, which means that it only learns the data that it is trained on. The dataset is traditionally split 4:1 (train:test).

Prepare the dataset by randomly shuffling and splitting the dataset into training (call inputs as x_train and outputs as y_train) and testing sets (call inputs as x_test and outputs as y_test):

### Train model

To start a calculation, we will need to launch a graph in tensorflow session and initialize all global TF variables:

An <b>epoch</b> is the number of times that the whole training set is used to update the weights and biases in backpropagation. Retraining the model with the shuffed dataset helps to decrease loss and increase accuracy. To feed in the train set, use feed_dict from TF. We will need to fetch train_op as well as loss and accuracy for each epoch.

With epoch as 100, train the model and print out loss and accuracy at each epoch ***:

### Evaluate model

Now that our model is trained, feed in the test set and print out the accuracy:

# Begin Project 0

Pretty cool right? You just coded a simple neural network for classifying earthquakes. You now have the skills to start constructing neural networks to classify datasets of your choice. Find or construct a dataset, and try building the neural network on your own on a text editor/IDE (Atom, Brackets, Visual Studios, etc.). Depending on the dataset, you may run into difficulties, but don't worry, we'll discuss common problems next and add complexities to our models!