# <center> Introduction to Neural Networks </center>

<center><img src="Images/nn_image.jpg"> </center>
    
[image by fdecomite](https://www.flickr.com/photos/fdecomite/3238821080) 


Neural networks are an exciting and flexible class of machine learning models that can and are used for a wide variety of applications.  They can be used for regression, classification, and for generating unique outputs, such as you may have seen with generative AI models.

The basic model structure underlies many of the most exciting AI and ML applications such as image recognition, text interpretation, self-driving cars, and much more.  

In this lesson we will explore the basic foundations of this incredible and widely varied class of models and you will be able to extend this learning to future applications.

## What is a Neural Network?

Neural networks were first described by neurophysiologis Warren McCulloch and mathematician Walter Pitts, as a model for biological brains.  In 1959 Bernard Widrow and Marcian Hoff of Stanford adapted the idea to create MADALINE, the first neural network put into production to eliminate echoes in phone lines.  It's still in use today! -[Stanford History of Neural Networks](https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html)

The model is built of many 'neurons'.  Each neuron takes in input features and multiplies them by learned coefficients specific to each feature.  The results are summed together, along with a bias term (similar to an intercept), and finally an activation function is applied. The resulting number is the output of that neuron.

You may already know the general model behind these neurons.  They are linear models!  You may already know even know a linear model with an output function, the logistic regression model.  A neuron with a sigmoid activation function is the same as a logistic regression mode.

The network consists of layers of these neurons. The neurons in each layer process data in parallel and the outputs of each neuron are passed to the next layer.  The output of the output layer is the model prediction.

## Neural Network Structure

Neural networks can have many structures, but the simplest is the dense neural network, also called a fully connected neural network or multilayered perceptron.

1. **Input Layer**: Every network has an input layer with no weights or bias.  This simply accepts the input features and passes them to the next layers.
2. **Output Layer**: Every network must have an output layer.  This layer produces the model prediction.  It should have the same number of nodes as the number of required numbers to be predicted.  This is often one, but can be more.
3. **Hidden Layers**: Any network that needs to create a non-linear function must have one or more layers between these.  The layers between input and output are called 'hidden layers'.  This is where the real magic happens and where neural networks gain their flexibility.  This allows them to solve linear problems like a linear regression, or non-linear problems like a decision tree can.  With more layers or more neurons, a model can solve extremely complex problems with very complex decision boundaries or functions between input and output.

<font color='green'> Important: The neurons of each layer MUST have the same number of weights as the number of neurons from the previous layer.  </font>

## A Simple Neural Network in Action

We can create a very simple neural network in NumPy.  Let's start with the 'iris' dataset for some data.  If you have not encountered this dataset, each row represents the length and width of the petals and the length and width of the sepal, resulting in 4 features.  The goal is to predict the species of each flower.

This is usually treated as a multi-class classification problem with 3 possible species.  However, we are just going to use the first 2 species to make this a binary classification problem.  This simplifies the math below some.

## Prepare Data

In [1]:
import numpy as np
from sklearn.datasets import load_iris
np.random.seed(42)

## Load the data
iris = load_iris()
features = iris.data
labels = iris.target
features[:5]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

### Convert to Binary Classification
We will convert the multiclass iris problem to just be a binary classification problem by eliminating one of the species from the dataset.

In [2]:
## filter out all of the class 2 species
filter = labels != 2
print(f'The shape of the features before filtering is {features.shape}')
features = features[filter]
print(f'the shape of the features after filtering is {features.shape}') 
labels = labels[filter]
print(f'The unique labels are {np.unique(labels)}')

The shape of the features before filtering is (150, 4)
the shape of the features after filtering is (100, 4)
The unique labels are [0 1]


### Scale the Data

We will scale the data between 0 and 1 with min-max scaling.  We will subtract the minimum value of each feature from the values of that feature and then divide them by the maximum value.

While not required, neural networks have been shown experimentally to perform better with input data with absolute values less than 1.

In [3]:
# Define a min/max scaling function
def min_max_scale(features):
    min = np.min(features)
    max = np.max(features)
    features = (features - min) / max
    return features

# Scale the features
features = min_max_scale(features)
features[:5]

array([[0.71428571, 0.48571429, 0.18571429, 0.01428571],
       [0.68571429, 0.41428571, 0.18571429, 0.01428571],
       [0.65714286, 0.44285714, 0.17142857, 0.01428571],
       [0.64285714, 0.42857143, 0.2       , 0.01428571],
       [0.7       , 0.5       , 0.18571429, 0.01428571]])

We will create a simple network with two neurons that each take as input the 4 features of this dataset, multiplies them by a coefficient, sums them together with a bias, applies a sigmoid activation, sums the results together, and outputs a model prediction.  Since the model will not yet have been trained, the prediction will be essentially random.

### Input Layer

In [4]:
# Isolate the first sample from the dataset.  This will be output of our input layer.
input_vector = features[0]
input_vector

array([0.71428571, 0.48571429, 0.18571429, 0.01428571])

## Hidden Layer
Our simple network will have one hidden layer with two neurons.  

### First Neuron

#### Weights

In [5]:
# Create the weights for the first neuron in the hidden layer.
neuron_1_weights = np.random.randn(4)
neuron_1_weights

array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986])

#### Bias

In [6]:
# Create a bias term for the first neuron
neuron_1_bias = np.random.randn(1)
neuron_1_bias

array([-0.23415337])

#### Activation Function

In [8]:
# Define a sigmoid activation function
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

### Putting the pieces of the first neuron together

Now that we have the pieces we need we will:

1. Multiply the input_vector, or the measurements of the first flower, by the weights of the first neuron
2. Sum the results and add the bias term.
3. Apply the sigmoid activation function.

In [9]:
# Calcuate the output of the first neuron

# Multiply the input vector and the first neuron weights
x =  input_vector * neuron_1_weights
# Sum the resulting vector
x = np.sum(x)
# add the bias
x += neuron_1_bias

# and apply the sigmoid activation to the result.
neuron_1_output = sigmoid(x)
neuron_1_output

array([0.54872688])

The output of the first neuron is shown above.  Let's functionalize this process so we can more easily apply it to more neurons.

We will allow the user the user to pass the activation function as well, since many different activation functions can be used for a neuron, not just sigmoid.

In [10]:
# Define a function for a neuron.
def neuron(input, weights, bias, activation_function):
    x = input * weights
    x = np.sum(x)
    x += bias
    return activation_function(x)

### Second Neuron

In [7]:
# Create the weights and bias for the second neuron in the hidden layer
neuron_2_weights = np.random.randn(4)
neuron_2_bias = np.random.rand(1)

print(neuron_2_weights)
print(neuron_2_bias)

[-0.23413696  1.57921282  0.76743473 -0.46947439]
[0.18182497]


In [11]:
# Calculate the output of the second neuron using the neuron function
neuron_2_output = neuron(input_vector, 
                         neuron_2_weights, 
                         neuron_2_bias, 
                         sigmoid)

neuron_2_output

array([0.71452169])

Finally, the layer output is simply the sum of the outputs of the neurons.  

In [12]:
hidden_layer_output = np.array([neuron_1_output, neuron_2_output])
hidden_layer_output

array([[0.54872688],
       [0.71452169]])

## Output Layer
Since there are 2 possible species, we could construct our output layer 2 different ways.  We could have 2 neurons in the output layer, and each would output the probability the the flower belonged to the corresponding class: `[prob_class_1, prob_class_2]`.  Or, we could just use one output neuron and that could be the probability that the sample belongs to class 1.  The probability of class 0 would just be the inverse of the probability of class 1.  

We will choose the 2nd option for simplicity.  This also allows us to keep using the sigmoid activation.  If we had 2 output neurons we would be better off using a 'softmax' activation function.  Remember, we have many choices in activation functions!

### Output Neuron Shape

Our hidden layer neurons each had 4 weights because the input layer, or the original input data, passed 4 features.  However, since there are only 2 neurons in the hidden layer, only 2 numbers will be passed to the output layer, one from each neuron.  The output neuron will only have 2 weights.

Since there will be only one neuron in the output layer, only one number will be outputted as the model prediction.

In [13]:
## Output Neuron
output_neuron_weights = np.random.randn(2)
output_neuron_bias = np.random.randn(1)
output_neuron_weights

array([ 0.54256004, -0.57138017])

## Prediction

The output of the final layer of the model is the model output.

In [14]:
# Calculate the output of the output layer
model_output = neuron(hidden_layer_output,
                          output_neuron_weights,
                          output_neuron_bias,
                          sigmoid)
model_output

array([0.27678014])

Our model predicted the probability that the flower is class 1.  To output an actual class prediction, we would simply round this number.  This would round to 0, so the model predicts the flower to be class 0.

Remember that our weights and biases were randomly set.  This is how neural networks start, before fitting on training data.  This prediction is therefor random and not the product of any kind of learning or fitting at all.

In [15]:
prediction = np.rint(model_output)
prediction

array([0.])

## Conclusion

The simple model we created above took a single sample and applied random weights, biases, and activation functions to create a completely random prediction.

### Foward Propagation
A true neural network would make many predictions on every sample in the dataset and compare those predictions to the true labels to determine how well it did.  This is called <font color='green'> Forward Propagation </font>.  

### Backward Propagation
Then It would then apply a learning algorithm called <font color='orange'> Gradient Descent </font> to adjust the values of the weights and biases in order to make a better prediction on the next attempt.  This step is called <font color='red'> Backward Propagation </font>.

### Epochs
The combination of one <font color='green'> Forward Propagation </font> step and one <font color='red'> Backward Propagation </font> step is called an <font color='pink'> Epoch </font>.  A neural network performs multiple <font color='pink'> Epochs </font> to adjust the weights and biases again and again to converge toward more accurate predictions using <font color='orange'> Gradient Descent </font>.  

This is the basic fundamental structure and algorithm of a very simple densely connected neural network.

# Challenge

Challenge yourself.  Can you create a single function that will calculate the prediction of a model with an arbitrary number of layers and neurons per layer?  Give it a try!