In [1]:
%matplotlib inline
import matplotlib
import numpy as np

# Intro
Neural networks comprise of layers of neurons and connections between the neurons of each layer. Tuning the weights and biases of these connections allow the network to "learn" and predict.

# Neuron code
Suppose we are looking at a single neuron taking in 3 inputs from the previous layer. The following is a simplified look at what a neuron does. It takes a weighted sum of its inputs and adds the bias associated with the neuron. The result of this calculation is output of the neuron.

In [2]:
inputs = [1, 2, 3] # Output of previous layer's neurons (could be from an actual input layer or a hidden layer)
weights = [0.2, 0.8, -0.5] # Strength of connection between the previous layer's neurons
bias = 2 # Bias associated with this particular neuron

output = inputs[0]*weights[0] + inputs[1]*weights[1] + inputs[2]*weights[2] + bias
print(output)

2.3


# Layer Code
Now suppose we are are looking at a single layer consisting of 3 neurons taking inputs from a previous layer of 4 neurons. This time there would be 3 sets of weights as well as 3 biases.

In [3]:
inputs = [1, 2, 3, 2.5]
weights1 = [0.2, 0.8, -0.5, 1.0]
weights2 = [0.5, -0.91, 0.26, -0.5]
weights3 = [-0.26, -0.27, 0.17, 0.87]

bias1 = 2
bias2 = 3
bias3 = 0.5

output = [inputs[0]*weights1[0] + inputs[1]*weights1[1] + inputs[2]*weights1[2] + inputs[3]*weights1[3] + bias1,
		  inputs[0]*weights2[0] + inputs[1]*weights2[1] + inputs[2]*weights2[2] + inputs[3]*weights2[3] + bias2,
		  inputs[0]*weights3[0] + inputs[1]*weights3[1] + inputs[2]*weights3[2] + inputs[3]*weights3[3] + bias3]
print(output)

[4.8, 1.21, 2.385]


# Numpy and Dot Product
To make things faster and more concise, we can use the numpy's dot product function.

In [4]:
inputs = [1, 2, 3, 2.5]
weights = [[0.2, 0.8, -0.5, 1.0],
		   [0.5, -0.91, 0.26, -0.5],
		   [-0.26, -0.27, 0.17, 0.87]]
biases = [2, 3, 0.5]

output = np.dot(weights, inputs) + biases
print(output)

[4.8   1.21  2.385]


# Batching
Now consider instead of passing a single input at time, we wish to pass a batch of inputs. Doing this allows us to reduce computation time. The following code details how the output is calculated for a batch of 3 inputs

In [5]:
inputs = [[1, 2, 3, 2.5],
		  [2.0, 5.0, -1.0, 2.0],
		  [-1.5, 2.7, 3.3, -0.8]]
weights = [[0.2, 0.8, -0.5, 1.0],
		   [0.5, -0.91, 0.26, -0.5],
		   [-0.26, -0.27, 0.17, 0.87]]
biases = [2, 3, 0.5]
output = np.dot(inputs, np.array(weights).T) + biases
print(output)

[[ 4.8    1.21   2.385]
 [ 8.9   -1.81   0.2  ]
 [ 1.41   1.051  0.026]]


# OOP
Now consider that we wish to add more layers of neurons. The simplest way would be to type out another set of weights and biases but this can be quite restricting when we want to modify the neural network. Thus, we'll be abstracting neurons and layers into classes

## Layer

In [6]:
class Layer_Dense:
	def __init__(self, n_inputs, n_neurons):
		# Randomly initialise weights to be a small number 
		self.weight = 0.10*np.random.randn(n_inputs, n_neurons)
		self.biases = np.zeros((1, n_neurons))
	def forward(self, inputs):
		self.output = np.dot(inputs, self.weight) + self.biases

**Note**: here we set initial biases to be zeros but this can sometimes cause zeros to propagate through the network, resulting in a "dead" network. Hence we should consider initial biases when creating/tuning a network.

With our dense layer abstracted, we can now make and use multiple layers as follows:

In [7]:
# Inputs
X = [[1, 2, 3, 2.5],
	[2.0, 5.0, -1.0, 2.0],
	[-1.5, 2.7, 3.3, -0.8]]

layer1 = Layer_Dense(4, 5)
layer2 = Layer_Dense(5,2)

layer1.forward(X)
layer2.forward(layer1.output)

print(layer1.output)
print(layer2.output)

[[-0.04496121  0.0411305  -0.04768364 -0.87879176 -0.3194631 ]
 [-0.03422296  0.52952711  0.65998613 -0.12569279 -0.54148672]
 [-0.3462541  -0.63243509  0.39965169 -0.82635177 -0.12040413]]
[[ 0.05249861 -0.20362281]
 [-0.03093784  0.16089939]
 [-0.01137082 -0.20589618]]


# Activation Functions
Activation functions allow the network to better fit the data. The layer class that we have used so far is considered to be using the idenity activation function, `f(x) = x`. However, to solve more difficult problems we require nonlinear activation functions. Some common/popular nonlinear activation functions include the sigmoid function and the rectified linear unit.

## Rectified Linear Unit (ReLU)
The ReLU activation function returns `f(x) = x` for `x > 0` and `f(x) = 0` for `x<=0` 

In [8]:
class Activation_ReLU:
	def forward(self, inputs):
		self.output = np.maximum(0, inputs)

The activation function can then be used as follows:

In [9]:
from nnfs.datasets import spiral_data
# 100 feature sets of 3 classes (each feature set containing 2 features (x,y))
X,y = spiral_data(100, 3)

In [10]:
layer1 = Layer_Dense(2, 5)
activation1 = Activation_ReLU()

layer1.forward(X)
activation1.forward(layer1.output)

print(activation1.output)

[[0.         0.         0.         0.         0.        ]
 [0.         0.00113527 0.00052279 0.         0.        ]
 [0.         0.00194636 0.         0.00132865 0.        ]
 ...
 [0.01747815 0.         0.13669794 0.         0.14357159]
 [0.01212957 0.02505011 0.16120295 0.         0.13918204]
 [0.01317252 0.01895553 0.16127414 0.         0.14292897]]


**Note**: here we see that there are no negative values and many values have been set to zero as expected of the ReLU function. However, if we find that the network is "dying", it could mean that our initial biases may need to be tweaked.