# Deep Learning 101
Deep learning is one of the most impactful technologies of the past few years and will continue to be of huge importance for our technological advancement. It's therefore important to have a good understanding of the basics before tackling more complex problems with existing libraries like Tensorflow or Pytorch. However, a lot of people still make wrong assumptions about what deep learning is or just see it as some kind of black magic. To help remedy this I wrote a series of blogs together with this repository. This notebook follows along with the blogs, explaining step by step what is going on.

The blogs on medium:
- Part 1 discusses the basics and the forward pass:  TBD
- Part 2 discusses backpropagation:  TBD
- Part 3 goes over several techniques to improve the learning ability of a network:  TBD

This notebook handles the contents of the first blog.

In [1]:
# To make our lives a bit easier, I'll be using numpy 
import numpy as np

# Later on we will also be creating some plots to visualize our results.
import matplotlib.pyplot as plt

## The neuron
A neuron is the most basic building block of a neural network. It receives an input and produces a single number as output from that input.

In [2]:
# Let's say our input contains 5 numbers
inputs = [4, 9, 2, 1.3, 5.6]

# The neuron therefore needs to have 5 weights and of course a bias
weights = [0.78, -0.34, 0.7, -0.2, -0.11]
bias = 1.3

# The output of the neuron can then be computed as follows:
output = weights[0] * inputs[0] + weights[1] * inputs[1] + weights[2] * inputs[2] + weights[3] * inputs[3] + weights[4] * inputs[4] + bias

# Resulting in:
print(output)

1.884


In [3]:
# This is actually the same as doing a dot product of the input and the weights, and then adding the bias to it
output = np.dot(weights, inputs) + bias

print(output)

1.884


## Activations
As you can see, the neuron is completely linear. So, how can neural networks learn nonlinear mappings between input and output spaces? The answer is using activation functions. Basically this comes down to passing the output of the neuron through a nonlinear function. There are many of these activation functions, one of the most widely used is the Rectified Linear Unit or ReLU for short. This activation function is quite simple:

In [4]:
# The ReLU returns 0 if x is smaller than 0, and x otherwise
def ReLU(x):
    if x < 0:
        return 0
    else:
        return x

# Or simply:    
def ReLU(x):
    return np.maximum(0, x)

In [5]:
# Let's activate the output of the neuron
output = ReLU(output)
print(output)

1.884


## Layers
A neuron by itself can only do so much, therefore we combine them together to create neural networks. We structure this in layers, where each neuron of one layer receives the outputs from all neurons of the previous layer. By having multiple neurons in a single layer, they can focus themselves on different aspects of the mapping to be learned. While adding multiple layers together creates depth, meaning that later layers can base themselves on combinations of features found by earlier layers. We know that the forward pass of a single neuron is a dot product of the weights and the input matrices, to which we add the bias. This can easily be extended to account for a full layer. The weights matrix gets multiple rows with the weights of each neuron. The bias also has to be turned into a matrix, to be able to contain the bias of each neuron.

In [6]:
# We keep the same inputs as before
inputs = [4, 9, 2, 1.3, 5.6]

# Our layer has three neurons, so we need three sets of weights and three biases
weights = [
    [0.78, -0.34, 0.7, -0.2, -0.11],  # Same neuron as before
    [-0.24, 1.24, 0.43, 0.04, -0.3],
    [0.5, 0.65, -0.3, 0.64, 0.1]
]

biases = [1.3, -0.5, 2.4]

# The forward pass remains the same
output = np.dot(weights, inputs) + biases

# The activation also remains the same, the numpy maximum functon can deal with vectors as well
output = ReLU(output)

# As a result we get three numbers, one for each neuron
print(output)

[ 1.884  8.932 11.042]


## Structuring our code
Before continuing with the backpropagation algorithm, let's structure our code a bit better.

In [329]:
class Dense:
    def __init__(self, input_size, output_size):
        """Constructor of a single fully connected layer

        Args:
            input_size (int): number of inputs to this layer
            output_size (int): number of neurons in this layer, which is equivalent to the number of outputs
        """
        # Create our matrix with weights and biases for the layer
        self.weights = np.random.randn(output_size, input_size)
        self.biases = np.zeros(output_size)
        
    def forward(self, inputs):
        """Forward pass of the layer"""
        self.inputs = inputs
        self.output = np.dot(self.inputs, self.weights.T) + self.biases
        return self.output

    
class ReLU:
    def forward(self, inputs):
        """Forward pass of the activation function"""
        self.inputs = inputs
        self.output = np.maximum(0, self.inputs)
        return self.output