#### An *artificial neural network* is a predictive model motivated by the way the brain operates. Think of the brain as a collections of neurons wired together. Each neuron looks at the outputs of the other neurons that feed into it, does a calculation, and then either fires (if the calculation exceeds some threshold) or doesn't (if it doesn't)
#### Accordingly, artificial neural networks consist of artifical neurons, which perform similar calculations over their inputs. Neural networks can solve a wide variety of problems like handwriting recognition and face detection, and they are used heavily in deep learning, one of the trendiest subfields of data science. However, most neural networks are "black boxes" - inspecting their details doesn't give you much understanding of *how* they're solving a problem. And large neural networks can be difficult to train. For most problems you'll encounter as a budding data scientist, they're probably not the right choice. 

# Perceptrons
#### Pretty much the simplest neural network is the *perceptron*, which approximates a singular neuron with *n* binary inputs. It computes a weighted sum of its inputs and "fires" if that weighted sum is 0 or greater:

In [1]:
from ml.linear_algebra import Vector, dot

def step_function(x: float) -> float:
    return 1.0 if x >= 0 else 0.0

def perceptron_output(weights: Vector, bias: float, x: Vector) -> float:
    """ Returns 1 if the perceptron 'fires', 0 if not"""
    calculation = dot(weights, x) + bias
    return step_function(calculation)

### With properly chosen weights, perceptrons can solve a number of simple problems. For example, we can create an *AND gate* (which returns 1 if both its inputs are 1 but returns 0 if one of its inputs is 0) with:

In [2]:
and_weights = [2., 2]
and_bias = -3

assert perceptron_output(and_weights, and_bias, [1, 1]) == 1
assert perceptron_output(and_weights, and_bias, [0, 1]) == 0
assert perceptron_output(and_weights, and_bias, [1, 0]) == 0
assert perceptron_output(and_weights, and_bias, [0, 0]) == 0


### Using similar reasoning, we could build an *OR gate* with:

In [6]:
or_weights = [2., 2]
or_bias = -1.

assert perceptron_output(and_weights, or_bias, [1, 1]) == 1
assert perceptron_output(and_weights, or_bias, [0, 1]) == 1
assert perceptron_output(and_weights, or_bias, [1, 0]) == 1
assert perceptron_output(and_weights, or_bias, [0, 0]) == 0

### We can also create a *NOT gate*. However, there are some problems that simply can't be solved by a single perceptron. For example, no matter how hard you try, you cannot use a perceptron to build an *XOR gate* that outputs 1 if exactly one of its inputs is 1 and 0 otherwise. This is where we start needing more complicated neural networks. Like real neurons, artifical neurons start getting more interesting when you start connecting them together.

#### As with the perceptron, for each neutron we'll sum up the products of its inputs and its weights. But here, rather than outputting the `step_function` applied to that product, we'll output a smooth approximation of it. Here we'll use the `sigmoid` function:

In [7]:
import math

def sigmoid(t: float) -> float:
    return 1 / (1 + math.exp(-t))

### Why use `sigmoid` instead of the simpler `step_function`? In order to train a neural network, we need to us calculus, and in order to use calculus, we need *smooth* functions. `step_function` isn't even continuous, and `sigmoid` is a good smooth approximation of it.

In [8]:
def neuron_output(weights: Vector, inputs: Vector) -> float:
    # weights includes the bias term, imputs includes a 1
    return sigmoid(dot(weights, inputs))

In [9]:
from typing import List

def feed_forward(neural_network: List[List[Vector]], input_vector: Vector) -> List[Vector]:
    """
    Feeds the input vector through the neural network.
    Returns the outputs of all layers (not just the last one).
    """
    outputs: List[Vector] = []

    for layer in neural_network:
        input_with_bias = input_vector + [1]
        output = [neuron_output(neuron, input_with_bias) for neuron in layer]
        outputs.append(output)

        # Then the input to the next layer is the output of this one
        input_vector = output
    return outputs

### Now it's easy to build the XOR gate that we couldn't build with a single perceptron. We just need to scale the weights up so that the `neuron_outputs` are either really close to 0 or really close to 1:

In [11]:
xor_network = [ # hidden layer
                [[20., 20, -30], # 'and' neuron
                 [20., 20, -10]], # 'or' neuron
                 # output layer
                [[-60., 60, -30]]] # '2nd input but not 1st input' neuron

# feed_forward returns the outputs of all the layers, so the [-1] gets the final output, and the [0] gets the value out of the resulting vector
print(feed_forward(xor_network, [0, 0])[-1][0])
print(feed_forward(xor_network, [1, 0])[-1][0])
print(feed_forward(xor_network, [0, 1])[-1][0])
print(feed_forward(xor_network, [1, 1])[-1][0])

9.38314668300676e-14
0.9999999999999059
0.9999999999999059
9.383146683006828e-14


#### For a given input, the hidden layer produces a 2 dimensional vector consisting of the "and" of the 2 input values and the "or" of the two input values.

#### And the output layer takes a two-dimensional vector and computes "second element but not first element" The result is a network that performs "or, but not and", which is precisely XOR