# Chapter 3: Introduction to Neural Prediction
## Forward Propagation

### TOC
- Simple network making prediction
- What Neural Nets are and what they do
- Making predictions with multiple inputs
- Making predictions with multiple outputs
- Making predictions with multiple inputs and outputs
- Predicting on predictions

## Step 1: Predict

In the previous chapter, we learned the paradigm: 'Predict, Compare, Learn'. This chapter is all about part 1. 

We also learned that the predict step looks a lot like: 

> Data --> Machine --> Prediction

We'll start with only one input / predicting one datapoint at a time, like so: 

> \# of toes --> Machine --> Prediction (98%)

Later, we'll explore how our predictions are affected by the number of datapoints at a time we pass in. For example, predicting if a picture is a cat with one pixel at a time won't be accurate at all. You'd need all the pictures in order to make a real prediction! Good general rule of thumb is 'Always present enough information to the network'. Enough information is defined loosely as how much info a human would need to make the same prediction. 

We can only create our network once we understand the 'shape' of our input / output datasets. Shape means number of columns / number of datapoints we are processing at once. For now, our input is the one datapoint _# of toes_ and our output is the single prediction _likelihood the team will win_. Since we have only one of each in / out, we will have one knob (these are also called weights). Our network looks like 

> [input] -- # of toes --> weight 1 --> win? -- [output]

## A Simple NN making a prediction

Starting with the simplest NN possible (our NN from above):

In [5]:
weight = 0.1
def neural_network(input, weight):
    prediction = input * weight
    return prediction

# Now let's feed in an input point: 
number_of_toes = [8.5, 9.5, 10, 9]
input = number_of_toes[0]
pred = neural_network(input, weight)
print pred

0.85


Wooo first NN. 

Few important questions answered by above: 
- What is input data? 
    - number that we recorded in real world somewhere, like temperature, batting average, or stock price. 
- What is a prediction? 
    - What NN tells us given our input data, like given Temp, X% chance people wear sweaters today. 
- Is this prediction right? 
    - Not always, sometimes makes mistakes, just a matter of adjusting weights to make it more accurate. 
- How does this network learn? 
    - Trial and error. Makes prediction, adjusts weights based on outcome!
- When NNs get the next set of input data, they forgot the last set of inputs! 
    - Not always the case, but for now we don't have any memory, as well as any back propagation. 
- What is a weight? 
    - Think of a weight as a measure of sensitivity to inputs from this channel. 
    - Or Volume? Bigger weights mean that the input is louder / has more influence on final prediction. 

## Making a Prediction with multiple inputs
> NNs can combine intelligence from multiple datapoints. 

One datapoint at a time sometimes doesn't make for a very accurate prediction. In that case we can make a new network, or graph, but this time with 3 inputs, each edge with it's own weight. 

```
          # toes - - - (0.1) - 
        /                      \
      /                         \
  Input - Win / Loss - - (.2) - - Win? --> (Some %)
      \                         /
        \                      /
          # fans - - - (0) - -
```

Now let's transcribe this in code: 

In [3]:
weights = [0.1, 0.2, 0]
def neural_network(input, weights):
    pred = w_sum(input, weights)
    return pred

def w_sum(a,b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output
# Now if we pass in a dataset with stats from the last 4 games
# toes = current # of toes
# wlrec = games on (percent)
# nfans = fan count (in million)

toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# lets only deal with first game's data
input = [toes[0], wlrec[0], nfans[0]]
pred = neural_network(input, weights)
print pred

0.98


## Multiple Inputs - What does this Neural Net do? 

With the new neural net defined above, we can accept multiple inputs at a time per prediction. Now the network can make even more informed decisions. However, the fundamental logic of the network has not changed. We still multiply each input by it's weight, and sum the result.

Our input has now transformed into a **vector**. Vector is just a list of numbers. 

Vectors are incredibly useful when performing operations with groups of numbers. In our case, we want to perform a weighted sum. We do this by multiplying each number based on its position. Whenever we perform an operation like this, we call it an _element-wise_ operation. 

#### Challenge: Vector Math: 
Write 4 functions: elementwise_mult / add, vector_sum / average

In [6]:
def elementwise_mult(a, b):
    assert(len(a) == len(b))
    output = [a[i] * b[i] for i in range(len(a))]
    return output

def elementwise_add(a, b):
    assert(len(a) == len(b))
    output = [a[i] + b[i] for i in range(len(a))]
    return output

def vector_sum(a):
    return reduce((lambda x, y: x + y), a)

def vector_average(a):
    return float(vector_sum(a)) / len(a)

a = [1,2,3]
b = [3,4,5]

print "Element-mult %s" % elementwise_mult(a,b)
print "Element-add %s" % elementwise_add(a,b)
print "Vector add %s" % vector_sum(a)
print "Vector average %s" % vector_average(a)

# perform a dot product by using element mult + vector sum
print "Dot prod: %s" % (vector_sum(elementwise_mult(a,b)))

Element-mult [3, 8, 15]
Element-add [4, 6, 8]
Vector add 6
Vector average 2.0
Dot prod: 26


Dot product can also be thought of a notion of _similarity_ between two vectors. The highest weighted sum (dot product) between two vectors would be between itself. Some have also equated dot product to an 'AND' statement of sorts. For a binary set of arrays a = [0,1] and b = [1,0] then doing a · b would be equivalent of saying is there a value at a[0] AND b[0], etc. 

Neural nets can also use partial weights to represent partial AND-ing. Very useful when modeling probabilities in neural networks. 

Finally, there are also negative weights, which imply a logical NOT operator. 

With this kind of logical mapping to our weights, we can almost develop a crude way to read these arrays. For example: 
```
weights = [1, 0, 1] => if input[0] OR input[2]
weights = [0, 0, 1] => if input[2]
weights = [1, 0, -1] => if input[0] OR NOT input[2]
weights = [-1, 0, -1] => if NOT input[0] OR NOT input[2]
weights = [0.5, 0, 1] => if BIG input[0] OR input[2]
```

For the last example, the smaller weight means that the input at position 0 needs to be larger to have an effect on the score at the end. 

So, given these intuitions, we can posit that when our NNet is making a prediction, it really is telling us how similar our inputs are to our weights. But weights do not make the final decision regarding score, it is the combination of the value of a weight AND the value of an input that determine the impact on the final score. 

## Multiple Inputs - Complete Runnable Code

Instead of using the clunky code we wrote in the beginning, there is a much simpler way to write the same code in numpy!

In [7]:
import numpy as np

weights = np.array([0.1, 0.2, 0])
def neural_network(input, weights):
    return input.dot(weights)

toes = np.array([8.5, 9.5, 9.9, 9.0]) 
wlrec = np.array([0.65, 0.8, 0.8, 0.9]) 
nfans = np.array([1.2, 1.3, 0.5, 1.0])

input = np.array([toes[0], wlrec[0], nfans[0]])
print neural_network(input, weights)

0.98


## Making a Prediction with Multiple Outputs
> NNets can also make multiple predictions using only a single input

A single input -> multiple output NNet is simple as well. 
```
           Hurt? 
         /
        /
input - -  Win? 
        \
         \
           Sad? 
```

This is not a 'dense' neural net. Notice that each prediction is completely separate from the other two. This makes this network easy to implement. 

In [9]:
def neural_network(input, weights):
    return list(map(lambda x: x * input, weights))

weights = [0.3, 0.2, 0.9]
wlrec = [0.9, 0.8, 0.8, 0.9]
print neural_network(wlrec[0], weights)

[0.27, 0.18000000000000002, 0.81]


## Predicting with Multiple Inputs AND Outputs (Dense Graph)
> NNets can predict multiple outputs given multiple inputs

Finally, we can combine what we've learned to create a graph / net where each input node is connected to each output prediction, with it's own weight attached. Furthermore, the steps also remain mostly the same: 
1. Insert one datapoint
2. For each output, perform a weighted sum of Inputs
3. Deposit Predictions

How does it work? It performs 3 independent weighted sums of the input to make 3 predictions. 
Easier to think of this as 3 weights coming into each output node, and all these graphs are overlaid onto each other. Check out the code below: 

In [10]:
def neural_network(input, weights):
    return vect_mat_mul(input, weights)

def vect_mat_mul(vect, matrix):
    output = [0] * len(vect)
    for i in range(len(vect)):
        output[i] = w_sum(vect, matrix[i])
    return output

There are more variables flying around here. We are choosing to think of this network as a series of weighted sums. In this code we created a helper function, vect_mat_mul, that iterates through each vector of 'weights', and makes a prediction using the w_sum (dot product) function. 

There are a few new terms introduced here as well, a list of vectors is called a **matrix**. Secondly, we will learn lots of functions that leverage matrices, the one used here is called **vector_matrix_multiplication**. Essentially our series of weighted sums. We take a vector, perform a dot product with every vector within the matrix. There are, as usual, special numpy functions that can help us out here. 

## Predicting on Predictions

**Neural networks can be stacked!** 
This is what 'deep' learning is. You stack multiple layers of learning into a network. Taking the output of one network and feeding it into the next network. Specifics coming later, but for now we can just know that it's possible. 

In [1]:
import numpy as np

ih_wgt = np.array([
    [0.1, 0.2, -0.1],
    [-0.1, 0.1, 0.9],
    [0.1, 0.4, 0.1],
]).T

hp_wgt = np.array([
    [0.3, 1.1, -0.3],
    [0.1, 0.2, 0.0],
    [0.0, 1.3, -0.1],
]).T

weights = [ih_wgt, hp_wgt]

def neural_network(input, weights):
    hid = input.dot(weights[0])
    pred = hid.dot(weights[1])
    return pred

toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])

input = np.array([toes[0], wlrec[0], nfans[0]])
pred = neural_network(input, weights)
print pred

[ 0.2135  0.145   0.2605]
