<a href="https://colab.research.google.com/github/bundickm/CheatSheets/blob/master/NN_Cheat_Sheet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Documentation
[Keras](https://keras.io/)

[SpaCy](https://spacy.io/api/doc)

[NLTK](https://www.nltk.org/)

# Definitions
**Input Layer (Visible Layer)** - The layer is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow.

**Hidden Layer** - A layer in between the input layer and output layer, where artificial neurons take in a set of weighted inputs and produce an output through an activation function.

**Output Layer** - The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem being addressed. Typically the output value is modified by an "activation function" to transform it into a format that makes sense in the context of that problem.

**Neuron** - The elementary unit in an artificial neural network. The neuron receives one or more inputs and sums them to produce an output (or activation). Usually each input is separately weighted, and the sum is passed through a non-linear function known as an activation function or transfer function. The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions.

**Weight** -  The strength or amplitude of a connection between two nodes. This is similar to slope in linear regression, where a weight is multiplied to the input to add up to form the output. Weights are numerical parameters which determine how strongly each of the neurons affects the other.

**Activation Function** - The function that decides the output of a node given an input or set of inputs, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it.

**Node Map:** - A visual diagram of the architecture(topology) of a neural network. Like a flow chart it shows the path from inputs to outputs. They are usually color coded and help show, at a very high level, some of the differences in architecture between kinds of neural networks.

**Perceptron** - A single node or neuron of a neural network with nothing else. It can take any number of inputs and spit out an output. Perceptrons can only fit linear boundaries between classes.

**Feedforward Neural Network** - A Neural network made up of multiple perceptrons that has at least 1 hidden layer (does not include input and output layers).

**Epoch** - one cycle of passing data forward through the network, measuring error given our specified cost function, and then via gradient descent, updating weights within the network to improve the quality of the predictions on the next iteration.

**Backpropagation** - A neural network propagates the signal of the input data forward through its parameters towards the moment of decision, and then backpropagates information about the error, in reverse through the network, so that it can alter the parameters. This happens step by step:
- The network makes a guess about data, using its parameters
- The network’s is measured with a loss function
- The error is backpropagated to adjust the wrong-headed parameters


In [0]:
# simple perceptron

# define the sigmoid function to use as our activation function
def sigmoid(x):
  return 1/(1+np.exp(-x))

# derivative to find slope at a given point
def sigmoid_derivate(x):
  sx = sigmoid(x)
  return sx * (1-sx)

# random starting weights to add
weights = np.random.random((3,1))

for iteration in range(100000):
  # Weighted sum of inputs/weights
  weighted_sum = np.dot(inputs, weights)
  # Activate - where we are on the sigmoid graph
  activated_output = sigmoid(weighted_sum)
  # Calculate error
  error = ground_truth - activated_output
  # Adjust up or down by error amount using the slope of our current position 
  adjustments = error * sigmoid_derivate(activated_output)
  # Calculate new weights based on our adjustments
  weights += np.dot(inputs.T, adjustments)

activated_output

In [0]:
# Simple NN with Backpropagation

class NeuralNetwork:
    def __init__(self):
        #set up architecture of NN
        self.input = 2
        self.hiddenNodes = 3
        self.outputNodes = 1
        
        #initial weights
        self.weights1 = np.random.randn(self.input, self.hiddenNodes)
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        '''calculate the NN inference using feed forward'''
        
        #weighted sum of inputs and hidden
        self.hidden_sum = np.dot(X, self.weights1)
            
        #activation of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
          
        #weighted sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        #final activation
        self.activated_output = self.sigmoid(self.output_sum)
            
        return self.activated_output
    
    def backward(self, X, y, o):
        '''backward propagate through the network'''
        self.o_error = y-o #error in output
        self.o_delta = self.o_error * self.sigmoidPrime(o) #apply derivative of sigmoid to error
        
        self.z2_error = self.o_delta.dot(self.weights2.T) #z2 error: how much our hidden layer weights were off
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 += X.T.dot(self.z2_delta) #adjust first set(input => hidden) weights
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) #adjust second set (hidden => output) weights
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)


nn = NeuralNetwork()
for i in range(1000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 50 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X,y)

In [0]:
# Keras with Hyperparameter Tuning

import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
numpy.random.seed(42)

# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 