# Introduction to Using Neural Netowrks  


Hello! 

If you're here, you are probably a member of the Hamilton lab and you probably want/need to figure out how to use neural networks for your projects and other stuff. Great! This notebook should provide you with everything you need to get started. While this will be far from a comprehensive manual on machine learning, it'll definitely provide you with a full working example of how to code up an input pipeline, define and run a model, and what's required to do so.   

This notebook will specifically work on basic feed forward netowrks, sometimes called multi-layer perceptrons (MLPs) or artificial neural networks (ANNs), applied to supervised learning (classification) tasks. There are some other types of networks like convolutional and recurent networks, which perform different operations. I'll cover those in a different notebook, as they are a tad different in some pretty substantial ways.  Supervised learning is certainly not the only kind of task you can perform with ANNs, it just forms a good basis for decoding, encoding, EEG signal and stimuli analysis, and other applications for lab projects.

___
## Getting Started 
### Requirements 

Everything we'll be doing requires python 3.6 and requires some extra modules you may not have.  
To work with these examples, you're going to need the following modules on your machine:

* __Numpy__ - This is pythons linear algebra package.
* __Tensorflow__ (`tf`) - Google's machine learning library (Extensive and powerful, but takes some time to master).
* __Keras__ - High level neural net library that uses Tensorflow backend (Not as flexible as `tf`, but far easier to get up and running).



### Instalation 
If you don't have these, they can be installed several ways.

I'd recommend getting Anaconda and creating a conda environment with these packages. Anaconda is a package manager useful for scientific computing.  
_More on getting [Anaconda](https://www.anaconda.com/distribution/)._

It is not necessary to install both Keras and Tensorflow for this notebook, as the tensorflow module `tf.keras`  is identical to standalone  `keras`. I'm including both so you know that they're two ways of calling the same stuff. Once you get the hang of keras, you can move on to or even combine it with regular tensorflow if you want to do something really fancy! 

##### After you install anaconda, you'll run something like this in your terminal:  
`conda create -n tensorflow_env tensorflow`  
   
Here, `tensorflow_env` is the name of the environment, and tensorflow is the primary package being installed.  
_You can use any name you want for your environment._

_For more on installing tensorflow with anaconda, follow the instructions [here](https://www.anaconda.com/tensorflow-in-anaconda/)._ 

__Note:__ You may have to up- or down-grade the version. As of this writing, the stable versions are 1.12 or 1.13, but 2.0 has been rolled out. This isn't too big of an issue as we'll be using Keras anyway, and we only need a compatable version of tensorflow to work with Keras.  

##### To activate your environment, run:  
`conda activate tensorflow_env`
  
You'll want this environment activated before you install Keras.
When active, your terminal will look something like:  
    
`(my_env_name) userid:/cur/open/path$` 
    
##### Then set the correct python version:  

`conda install python==3.6`  

_*You can use pip instead of conda here_
    
##### Now you'll want to install Keras with pip:  
    
`pip install keras`
    
_*Make sure pip is up to date!!!_

For everyting you'll ever need on Keras, check out [this](https://keras.io/) resource.



*** 

### Check to see if the modules work

In [3]:
import tensorflow as tf
import numpy as np
import keras as K 

***
## Using Keras 

### First things First 

Ok, so we've gotten our stuff set up. Now what?  

Well, we need to know what a neural network is and how to use it. This isn't going to be a thourough description of all the math and moving parts behind a network, just a high level take on whats happening with some background and useable code examples that can be modified and applied to your projects. So lets start with what a network is, and how to use it!




### Defining a Networks and Layers 
Feed forward artificial neural networks (ANNs) stem from the way real neurons pass on information:  
- receive input from the previous neuron(s).  
- determine if input exceeds firing threshold via some activation function.
- pass on supra-threshold information to the next neuron(s). 
- weights between neurons are determined via Hebbian learning.
    - neurons with correlated firing strengthen their connections
    - possible way of preserving/encoding memory in neural circuits. 

In real neurons, information is electrical current. The activation function is the difference between the summation of postsynaptic potentials and the neuron's threshold potential _(see [Kandel's book](https://neurology.mhmedical.com/content.aspx?bookid=1049&sectionid=59138629) for more)_. The physiological implementation of Hebbian learning can be seen as long and short term potentiation or depression (_[more](https://neurology.mhmedical.com/content.aspx?bookid=1049&sectionid=59138710) from Kandel_) in a neural circuit, altering the dynamics of how neurons interact with eachother (ie, firing rates, synaptic strength).

In artificial neural networks, information is encoded as numerical values and we use mathmatical functions to determine activation. ANNs can be described as a series of layers that feed forward from an input layer, through an arbitrary number of hidden layers, to a final output layer wherein each successive layer recieves activated information from the previous layer as input.


### What this actually means for us.  
Each layer is actually a series of a few components: Input, weights, biases, and the activation.  

At a high level, the layer weights are represented as a matrix which are multiplied with the input, representing how important each element at position $i,j$ in the weight matrix $W$ actually is for determining the correct output. Similarly, biases is represented as a vector, usually denoted $b$, that is added to after the input by weight multiplication to further shape how each part of the input affects the output.  Finally, after this multiplication and addition of the input, weights, and biases, the layer activation function is applied to give us an output.  

In terms of linear algebra, we get something looking like this:
$$ H_i = \sigma(H_{i-1}W + b)$$

   where $\sigma$ represents our activation function, $H_i$ is our current layer, and $H_{i-1}$ is our previous layer (or input if this is the first layer). 

#### What this looks like in practice:
When we define a layer, we're not creating an individual neuron, but a layer of $k$ many neurons that each perform the math above.  
We do this like so: 
```
from keras.layers import Dense 

layer1 = Dense(units=K_neurons, activation=chosen_activation)
```
`from keras.layers` becomes `from tensorflow.keras.layers` if using tensorflow instead of Keras directly.  
_* This would be the only change. Nice, right?_


Seems legit, but what do we do for a network? What really is a Netowrk? 

### Defining a Network for Classification

So we've said a network is an interconnected series of these layers that allow information to pass from the input layer all the way through to the output. In a few words, an ANN is a way of classifying an input by finding what parts of that input are meaningful in determining class membership, where each layer only passes on meaningful information to the next.  

But more specifically, the ANN/MLP framework learns how to categorize which arbitrary features within an input sample are meaningful representations of a higher level class membership. It does so by passing on the activated output of one layer to the next, where the final layer computes a probability distribution based on the pattern of layer activations it receives. This distribution is then used to assign each input sample to the class it has the highest probability of belonging to. 


If this sounds complicated, don't worry. In practice, a network is just a stack of these Keras layers we described earlier. I'll get a bit more into the details of how to train a model to perform classification, but first, lets look at what coding a model looks like.  It really can be easy! 

_(There will be more detailed notebooks in the future, but you can check out_ [wiki](https://en.wikipedia.org/wiki/Artificial_neural_network), _[Kera's docs](https://keras.io/getting-started/sequential-model-guide/#examples), [Work by Geoffrey Hinton](https://scholar.google.com/citations?user=JicYPdAAAAAJ&hl=en&oi=sra) for deeper explanations of ANNs)_


### Brief Model Code

The following cell is going to create a four layer feed forward network. I'll explain whats going on below the code, but just take a look at if for now.  

In [2]:
from keras.models import Sequential
from keras.layers import Dense 

# This sets up a keras class for a feed forward (sequential) model.
model = Sequential()

model.add(Dense(units=32, activation='relu', input_shape=(32,)))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=4, activation='softmax'))


Just like that, your model exists and has all the necessary components to run! Well almost, we'll need to do just a few more things to get it going, but let me explain some of the stuff above.

Assigning `model = Sequential()` instantiates the Keras Sequential model class. This is a placeholder allowing you to arbitrarily define models. It has many usefull class methods and attributes that'll make building, running, and debugging models pretty easy.

After this, we can use the `.add` method to add layers of our choosing to the model. Calls to `.add` stack layers in the sequential order in which `.add` is called (hence the class name `Sequential`).  

`Dense` is the specific type of layer used. It is a `2d` layer, expecting to see your rows hold individual samples and your columns to hold meaningful info.  You may use other layers than `Dense` later, but they follow the same gist.

Here's what each of these arguemnts passed to `Dense` do.

`units`: This is the number of neurons used in this layer.
   - Its pretty typical (but not necessary) to start with a lower power of two, and jump to the next power with the next layer.  
   - Alternatively, start with the number of (or a multiple of) feature channels in your input. 
   - Determining the number of units is largely experimental, so start somewhat small and test what works!

`activation`: this is the specific activation function used. More about these later on.

`input_shape`: defines the number of ___feature columns___ in your input data, where the input shape is `(batch_size, features)` where `batch_size` is the number of samples passed in at a given step and `features` is the number of descriptive elements (time points, attributes, etc.) present in a sample. 
- `input_shape` is only needed in the first layer. Once it has this, Keras takes care of handling all the other resizing for you! 

Now that we have the code for the model, we just have to specify what math its going to use when performing a classification task.

***

## Supervised Learning, and other fancy names associated with Classification

### What is Supervised Learning?

___Supervised learning___ is the machine learning world's name for almost any kind of *classification task*.  This means that we are trying to get an ANN to distinguish different features or categories present within a dataset, while we already know which labels (also called ground truth) belong to each sample in the set. This is an __iterative process__ in the network guesses the labels for a batch of samples in discrete steps. At every step, guesses are made and scored. The 'supervised' component comes into play because while we train the model (the learning stage), the labels are used as an answer key at each iteration to teach the model how to perform better. 

#### Uh.. OK. How do we do this? 
### Training-Lite
The way a model "learns" is by passing in a bunch of data points at a time, correcting the model's guess as to what those data points were, and then repeating this process a bunch of times. Before I jumpt into a training script, we need to talk about about how this actually works.

### Those Fancy Names

So remember how our final model layer computes some probability density, and uses that to pick the most likely class for a sample? Well we need to have a way of correcting this process and teaching the model how to do a better job.  That correction takes the form of whats called an ***objective function***, which is used while training the model. Keeping with our high level take, the objective funtion evaluates how well the model is performing by comparing its guesses to the corresponding labels. 

For most classification tasks,[categorical cross entropy](https://en.wikipedia.org/wiki/Cross_entropy) (also called Softmax or Softmax loss) is used as the ___objective function.___ Basically, it compares the guessed distribution to the label distribution and calculates how much information from the guesses is needed to correctly predict labels, at each step. This comparison returns a ___loss value___ which quantifies the amount of info needed for this comparison (low is good, high is bad). ___Loss___ is the primary metric for model evaluation, and is the metric used by the following concepts to implement network performance ___optimization___.  
___*Note___ _the objective function is also called the loss or cost function._  
###### We'll be keeping a sky high level for these following concepts as you won't need the nuts and bolts to use the code. However,  I've included links to more thorough explanations that I'd recommend going through  at some point.
#### Optimization
Loosely, optimization is how a network learns during the training proceedure. It is the process of correcting a network to maximize performance.  It does this by using an ___Optimization Algorithm___ that informs how changes to the weights and biases of model layers should be made, and how those changes affect performance. This is instantiated by some form of ___Stochastic Gradient Descent___ and ___Backpropagation___.  These are pretty thorny concepts, so I'll briefly talk about what is essential and include links to more detailed explanations.  
##### Optimization with [Gradient Descent](https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html)

There are [several](https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.htmlA) specific algorithms that perform Gradient Descent with their own flavor, but they all esentially do the same thing: find the lowest loss in the objective function.  There are tons of ways to think about this, the most common is to think of the learning process of being a three dimensional gradient like a rough hill. The algorithm is trying to iteratively move in the opposite direction of the hill's highest point (the gradient) to find the lowest point, wherein the hill represents the loss in classification performance ([not my metaphor](https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html)).  

##### [Backpropagation](https://skymind.ai/wiki/backpropagation) (Backprop for short)

___Backpropagation___ is the process of _how_ the information from the gradient descent process affects the network. Because networks are trained iteratively, information from a sample is passed forward to make a guess at each step, but the resulting performance of that guess can be passed backwards as well in the same step. This backwards pass is used to alter the weights and biases of all layers in the network based on a given training step's error or ___loss value___. If the change made at the previous training step increases loss, the __optimization algorithm__ changes direction (literally) with respect to the gradient, and backprops that change through the netowrk.  If the change decreases loss, we keep moving down hill.



***
## All Together Now

We've jumped through a ton of ideas and concepts (if you did in fact read that absolute wall of text).
Luckily with Keras, coding this is a breeze. Lets look at the code needed to actually do this stuff! 

### Pieces of the .py


In [None]:
"""The commented out lines are not needed if you've run the cell above.
   They're just here for reference """
# from keras.models import Sequential
# from keras.layers import Dense 

# # This sets up a keras class for a feed forward (sequential) model.
# model = Sequential()

# model.add(Dense(units=32, activation='relu', input_shape=(32,)))
# model.add(Dense(units=64, activation='relu'))
# model.add(Dense(units=64, activation='relu'))
# model.add(Dense(units=4, activation='softmax'))

# Pick your objective/loss function, optimizer, and what metrics to report
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['loss','accuracy'])

# to train: 
# training data, training labels, how many times to go through data,
# and how many datapoints are shown per training steps

model.fit(train_data, train_labels, epochs=5, batchsize=32)

# to evaluate:
# test data, test labels, how many samples at a time (use as many as you can without crashing)
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)


Thats quite actually all there is to coding a model with Keras! 

An actual use case may have different layers and such, but lets go through an example of what running this will look like.



***

## Put It To Use

Now you know what we need for a nerual netowrk. But what about applications?

Lets look at an example where we're trying to tell apart the good from the bad. We'll make up a dataset in which 