# Neural Networks

An *artificial neural network* (or neural network for short) is a predictive model motivated
by the way the brain operates. Think of the brain as a collection of neurons
wired together. Each neuron looks at the outputs of the other neurons that feed into
it, does a calculation, and then either fires (if the calculation exceeds some threshold)
or doesn’t (if it doesn’t).

Accordingly, artificial neural networks consist of artificial neurons, which perform
similar calculations over their inputs. Neural networks can solve a wide variety of
problems like handwriting recognition and face detection, and they are used heavily
in deep learning, one of the trendiest subfields of data science.

However, most neural
networks are “black boxes”—inspecting their details doesn’t give you much understanding
of how they’re solving a problem. And large neural networks can be difficult
to train. For most problems you’ll encounter as a budding data scientist, they’re probably
not the right choice. Nevertheless, it is important to introduce this model, because if you
continue on the path of data science, you will have to dig into it sooner or later.

## Perceptrons

Pretty much the simplest neural network is the *perceptron*, which approximates a single
neuron with n binary inputs. It computes a weighted sum of its inputs and “fires”
if that weighted sum is 0 or greater:

![Perceptron](perceptron.png)

Image Credit: https://towardsdatascience.com/the-perceptron-3af34c84838c

In [1]:
import numpy as np
from typing import List
Vector = List[float]

In [2]:
def step_function(x: float) -> float:
    return 1.0 if x >= 0 else 0.0

In [3]:
def perceptron_output(weights: Vector, bias: float, x: Vector) -> float:
    """Returns 1 if the perceptron 'fires', 0 if not"""
    calculation = np.dot(weights, x) + bias
    return step_function(calculation)

For those of you who are familiar with advanced geometry, the perceptron is simply distinguishing between the half-spaces separated by the
hyperplane:

```python
dot(weights, x) + bias == 0
```

With properly chosen weights, perceptrons can solve a number of simple problems. For example, we can create an AND gate (which returns 1 if both its
inputs are 1 but returns 0 if one of its inputs is 0) with:

In [4]:
and_weights = [2, 2]
and_bias = -3.

print(f"1 AND 1 = {perceptron_output(and_weights, and_bias, [1, 1])}")
print(f"0 AND 1 = {perceptron_output(and_weights, and_bias, [0, 1])}")
print(f"1 AND 0 = {perceptron_output(and_weights, and_bias, [1, 0])}")
print(f"0 AND 0 = {perceptron_output(and_weights, and_bias, [0, 0])}")

1 AND 1 = 1.0
0 AND 1 = 0.0
1 AND 0 = 0.0
0 AND 0 = 0.0


If both inputs are 1, the `calculation` equals 2 + 2 – 3 = 1, and the output is 1. If only
one of the inputs is 1, the `calculation` equals 2 + 0 – 3 = –1, and the output is 0. And
if both of the inputs are 0, the `calculation` equals –3, and the output is 0.

Using similar reasoning, we could build an OR gate with:

In [5]:
or_weights = [2, 2]
or_bias = -1.

print(f"1 OR 1 = {perceptron_output(or_weights, or_bias, [1, 1])}")
print(f"0 OR 1 = {perceptron_output(or_weights, or_bias, [0, 1])}")
print(f"1 OR 0 = {perceptron_output(or_weights, or_bias, [1, 0])}")
print(f"0 OR 0 = {perceptron_output(or_weights, or_bias, [0, 0])}")

1 OR 1 = 1.0
0 OR 1 = 1.0
1 OR 0 = 1.0
0 OR 0 = 0.0


![Decision space for a two-input perceptron](decision_space.jpg)

We could also build a NOT gate (which has one input and converts 1 to 0 and 0 to 1)
with:

In [6]:
not_weights = [-2]
not_bias = 1

print(f"NOT 0 = {perceptron_output(not_weights, not_bias, [0])}")
print(f"NOT 1 = {perceptron_output(not_weights, not_bias, [1])}")

NOT 0 = 1.0
NOT 1 = 0.0


However, there are some problems that simply can’t be solved by a single perceptron.
For example, no matter how hard you try, you cannot use a perceptron to build an
XOR gate that outputs 1 if exactly one of its inputs is 1 and 0 otherwise. This is where
we start needing more complicated neural networks.

Of course, you don’t need to approximate a neuron in order to build a logic gate:

In [7]:
and_gate = min
or_gate = max
xor_gate = lambda x, y: 0 if x == y else 1

Like real neurons, artificial neurons start getting (and complicated) more interesting when you start
connecting them together.

## Feed-Forward Neural Networks

The topology of the brain is enormously complicated, so it’s common to approximate
it with an idealized *feed-forward* neural network that consists of discrete layers of
neurons, each connected to the next.

This typically entails an input layer (which
receives inputs and feeds them forward unchanged), one or more “hidden layers”
(each of which consists of neurons that take the outputs of the previous layer, performs
some calculation, and passes the result to the next layer), and an output layer
(which produces the final outputs).

![Feedforward Neural Networks](nn.jpg)

Image Credit: https://www.learnopencv.com/understanding-feedforward-neural-networks/

Just like in the perceptron, each (noninput) neuron has a weight corresponding to
each of its inputs and a bias. To make our representation simpler, we’ll add the bias to
the end of our weights vector and give each neuron a bias input that always equals 1 (this is similar to the way we represent the intercept in multiple linear regression with a column of 1's).

As with the perceptron, for each neuron we’ll sum up the products of its inputs and
its weights. But here, rather than outputting the `step_function` applied to that product,
we’ll output a smooth approximation of it. Here we’ll use the `sigmoid` function:

In [8]:
import math

def sigmoid(t: float) -> float:
    return 1 / (1 + math.exp(-t))

![sigmoid](sigmoid.jpg)

Why use `sigmoid` instead of the simpler `step_function`? In order to train a neural
network, we need to use calculus, and in order to use calculus, we need smooth functions (that are differentiable).
`step_function` isn’t even continuous, and `sigmoid` is a good smooth approximation
of it.

We then calculate the output as:

In [9]:
def neuron_output(weights: Vector, inputs: Vector) -> float:
    # weights includes the bias term, inputs includes a 1
    return sigmoid(np.dot(weights, inputs))

In [13]:
neuron_output((1.3, -0.6, 0.2),(0.2, 0.8, 0.5))

0.4700359482354282

Given this function, we can represent a neuron simply as a vector of weights whose
length is one more than the number of inputs to that neuron (because of the bias
weight). Then we can represent a neural network as a list of (noninput) layers, where
each layer is just a list of the neurons in that layer.

That is, we’ll represent a neural network as a list (layers) of lists (neurons) of vectors
(weights).

Given such a representation, using the neural network is quite simple:

In [10]:
def feed_forward(neural_network: List[List[Vector]],
                 input_vector: Vector) -> List[Vector]:
    """
    Feeds the input vector through the neural network.
    Returns the outputs of all layers (not just the last one).
    """
    outputs: List[Vector] = []

    for layer in neural_network:
        input_with_bias = input_vector + [1]              # Add a constant.
        output = [neuron_output(neuron, input_with_bias)  # Compute the output
                  for neuron in layer]                    # for each neuron.
        outputs.append(output)                            # Add to results.

        # Then the input to the next layer is the output of this one
        input_vector = output

    return outputs

Now we could build the XOR gate that we couldn’t build with a single perceptron. Here is one design:

In [11]:
xor_network = [# hidden layer
               [[20, 20, -30],      # 'and' neuron
                [20, 20, -10]],     # 'or'  neuron
               # output layer
               [[-60, 60, -30]]]    # 2nd input AND (NOT 1st input) neuron

In [12]:
# feed_forward returns the outputs of all layers, so the [-1] gets the
# final output, and the [0] gets the value out of the resulting vector
print(f"1 XOR 1 = {feed_forward(xor_network, [1, 1])[-1][0]}")
print(f"0 XOR 1 = {feed_forward(xor_network, [0, 1])[-1][0]}")
print(f"1 XOR 0 = {feed_forward(xor_network, [1, 0])[-1][0]}")
print(f"0 XOR 0 = {feed_forward(xor_network, [0, 0])[-1][0]}")

1 XOR 1 = 9.383146683006828e-14
0 XOR 1 = 0.9999999999999059
1 XOR 0 = 0.9999999999999059
0 XOR 0 = 9.38314668300676e-14


![XOR](xor.jpg)

One suggestive way of thinking about this is that the hidden layer is computing features
of the input data (in this case “and” and “or”) and the output layer is combining
those features in a way that generates the desired output.