# CSS

In [1]:
from IPython.display import HTML
style = """
<style>
.expo {
  line-height: 150%;
}

.visual {
  width: 400px;
}

</style>
"""
HTML(style)

# Let's make a prediction using a neural net

Our goal will be to:

* Start with our three original features.
* Transform them into four "intermediate features" using logistic-regression-like transformation.
* Take these four "intermediate features" and use _them_ to predict our final output.

## Step 1: feeding features to the "intermediate" or "hidden" layer

How do we transform our original three features into an intermediate, or hidden layer? Let's call our intermediate features $a_1$, $a_2$, $a_3$, and $a_4$. 

We want $a_1$, for example, to be a linear combination of $x_1$, $x_2$, and $x_3$ - that is, we want some weights $v_{11}$, $v_{12}$, and $v_{13}$ so that:

$$ a_1 = x_1 * v_{11} + x_2 * v_{21} + x_3 * v_{31} $$

and similarly for $a_2$, $a_3$, and $a_4$.

### Step 1: feeding features to the "intermediate" or "hidden" layer

A way to concisely express this is to define your features as a vector

$$ X = \begin{bmatrix}x_1 & x_2 & x_3\end{bmatrix} $$

This should be intuitive, since $X$ is already a row in your data! 

### Step 1: feeding features to the "intermediate" or "hidden" layer

Then, multiplying this vector by a matrix $V$:

$$ V = \begin{bmatrix}v_{11} & v_{12} & v_{13} & v_{14} \\
                      v_{21} & v_{22} & v_{23} & v_{24} \\
                      v_{31} & v_{32} & v_{33} & v_{34}
                      \end{bmatrix} $$

gives us what we want, since "$A = X * V$" is equivalent to:

### Step 1: feeding features to the "intermediate" or "hidden" layer

$$ a_1 = x_1 * v_{11} + x_2 * v_{21} + x_3 * v_{31} $$
$$ a_2 = x_1 * v_{12} + x_2 * v_{22} + x_3 * v_{32} $$
$$ a_3 = x_1 * v_{13} + x_2 * v_{23} + x_3 * v_{33} $$
$$ a_4 = x_1 * v_{14} + x_2 * v_{24} + x_3 * v_{34} $$

which is what we want in order to get four intermediate features.

### Step 1: feeding features to the "intermediate" or "hidden" layer

Let's code this up.

In [6]:
x = np.array(X[0], ndmin=2)
array_print(x)

The array:
 [[1 0 0]]
The dimensions are 1 row and 3 columns


### Step 1: feeding features to the "intermediate" or "hidden" layer

In [7]:
V = np.random.randn(3, 4)
array_print(V)

The array:
 [[-0.69  1.1  -0.99  0.52]
 [-0.75  3.41  0.02 -0.89]
 [-0.48 -0.56  0.89  1.2 ]]
The dimensions are 3 rows and 4 columns


In [8]:
A = np.dot(x, V)
array_print(A)

The array:
 [[-0.69  1.1  -0.99  0.52]]
The dimensions are 1 row and 4 columns


## Where are we?

<div class="visual">
<img src='img/neural_net_4_first_layer.png'>
</div>

## Step 2: feeding these intermediate features through an "activation function"

We're going to use a classic, easy-to-understand activation function, though one that is not often used in cutting-edge applications: the sigmoid function:

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

### Step 2: feeding these intermediate features through an "activation function"

$$ B = \sigma(A) $$ or

$$ b_1 = \sigma(a_1) $$
$$ b_2 = \sigma(a_2) $$
$$ b_3 = \sigma(a_3) $$
$$ b_4 = \sigma(a_4) $$

### Step 2: feeding these intermediate features through an "activation function"

In [9]:
def sigmoid(x):
    return 1.0/(1.0+np.exp(-x))

In [10]:
B = sigmoid(A)
array_print(B)

The array:
 [[ 0.33  0.75  0.27  0.63]]
The dimensions are 1 row and 4 columns


## Where are we?

<div class="visual">
<img src='img/neural_net_4_first_sigmoid.png'>
</div>

### Step 3: use these intermediate features as a linear combination to the output

We'll multiply these "sigmoided" results by another matrix $W$ to get a single output. Since we want to transform 4 features down into 1, we can use a 4 x 1 matrix:

$$ W = \begin{bmatrix}w_{11} \\
                      w_{21} \\
                      w_{31} \\
                      w_{41}
                      \end{bmatrix} $$

### Step 3: use these intermediate features as a linear combination to the output

And since we want the result to be:

$$ c_1 = w_{11} * b_1 + w_{21} * b_2 + w_{31} * b_3 + w_{41} * b_4 $$

### Step 3: use these intermediate features as a linear combination to the output

This is equivalent to writing:

$$ C = B * W $$

or:

$$ \begin{bmatrix}
c_1 \end{bmatrix} = 
\begin{bmatrix}b_1 &
                  b_2 &
                  b_3 &
                  b_4
                  \end{bmatrix} * 
\begin{bmatrix}w_{11} \\
               w_{21} \\
               w_{31} \\
               w_{41}
               \end{bmatrix} $$

### Step 3: use these intermediate features as a linear combination to the output

So we can simply code this up as:

In [11]:
W = np.random.randn(4, 1)
array_print(W)

The array:
 [[-0.5 ]
 [ 0.65]
 [-0.41]
 [-1.03]]
The dimensions are 4 rows and 1 column


In [12]:
C = np.dot(B, W)
array_print(C)

The array:
 [[-0.44]]
The dimensions are 1 row and 1 column


## Where are we

<div class="visual">
<img src='img/neural_net_4_second_layer.png'>
</div>

### Step 4: sigmoid this to make a final prediction

Mathematically, we want:

$$ p_1 = \sigma(c_1) $$

So we can simply code this up as:

In [13]:
P = sigmoid(C)
array_print(P)

The array:
 [[ 0.39]]
The dimensions are 1 row and 1 column


## Where are we

<div class="visual">
<img src='img/neural_net_4_final_prediction.png'>
</div>

### Step 5: compute the loss

Mathematically, we'll compute mean squared error loss:

$$ L = \frac{1}{2}(y - P)^2 $$

And coding this up is simply:

In [14]:
y = np.array(Y[0], ndmin=2)
L = 0.5 * (y - P) ** 2
array_print(L)

The array:
 [[ 0.18]]
The dimensions are 1 row and 1 column


## Where are we

<div class="visual">
<img src='img/neural_net_4_loss.png'>
</div>