In [1]:
import os
setup_script = os.path.join(os.environ['ENV_JUPYTER_SETUPS_DIR'], 'setup_sci_env_basic.py')
%run $setup_script

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Neural networks

## Neuron in the brain

- core: nucleus
- input: dendrite
- output: axon

## Single neuron

- inputs: $x_{i}$
- weights/parameters: $\theta$
- output:  $h_{\theta}(x)$
- activation function: $g$

**General output:**

\begin{equation}
    h_{\theta}(x)  = g(\theta^{T}x)
\end{equation}

**Example:**

Logistic sigmoid activation function:
\begin{equation}
    h_{\theta}(x)  = \frac{1}{1 + e^{-\theta^{T}x}}
\end{equation}

### Definitions

- *hidden layer*: any layer which is not input layer nor an output layer (inside layer)

### Notation

- $x_{i}$: input unit, $x_{i} = a^{(0)}_{i}$
- $a^{(j)}_{i}$: activation of unit $i$ in layer $j$, the output of the neutron. The 0th unit is defined to be $a_{0} = 1$, which is multiplied by the bias. 
- $\theta^{(j)}$: matrix of weights controlling function mapping from layer $j$ to $j+1$
- $z^{(j)}_{i} = \sum_{k} \Theta^{j}_{ik} a^{(j)}_{k}$

### Example of a (3,3,1) network:

#### Layer 1 (input layer)

Input units:
$x_{i} = a^{(1)}_{i}$, where $i=1,2,3$.

#### Layer 2 (hidden layer)

The output vector $\underline{a}^{(2)}$ is determined by the weights $\Theta^{1}_{ij}$ and the input units $a^{(1)}_{i}$:

\begin{equation}
    a^{(2)}_{1} =
    g\left(
    \Theta^{(1)}_{10}x_{0} +
    \Theta^{(1)}_{11}x_{1} +
    \Theta^{(1)}_{12} x_{2} +
    \Theta^{(1)}_{13} x_{3}
    \right)
    =
    g\left(
    z^{(1)}_{1}
    \right)\\
    a^{(2)}_{2} =
    g\left(
    \Theta^{(1)}_{20}x_{0} +
    \Theta^{(1)}_{21}x_{1} +
    \Theta^{(1)}_{22} x_{2} +
    \Theta^{(1)}_{23} x_{3}
    \right)
     =
    g\left(
    z^{(1)}_{2}
    \right)\\
    a^{(2)}_{3} =
    g\left(
    \Theta^{(1)}_{30}x_{0} +
    \Theta^{(1)}_{31}x_{1} +
    \Theta^{(1)}_{32} x_{2} +
    \Theta^{(1)}_{33} x_{3}
    \right)
     =
    g\left(
    z^{(1)}_{3}
    \right)
\end{equation}

or written in matrix form:

\begin{equation}
    \underline{a}^{(2)}
    =
    \underline{g}
    \left(
    \underline{\underline{\Theta}} \underline{x} 
    \right)
\end{equation}

where $\underline{\underline{\Theta}}$ is a $3x4$ matrix.

#### Layer 3 (output layer)

\begin{equation}
    h_{\Theta}(x)
    =
    a^{(3)}_{1}
    =
    g
    \left(
    \Theta^{(2)}_{10} a^{(2)}_{0} + 
    \Theta^{(2)}_{11} a^{(2)}_{1} + 
    \Theta^{(2)}_{12} a^{(2)}_{2} + 
    \Theta^{(2)}_{13} a^{(2)}_{3}
    \right)
\end{equation}

Here $a^{(3)}_{1}$ is actually just a logistic regression unit. The input of this logistic regression unit is $a^{(2)}_{i}$, which is created from the input in layer 2, these features are "learnt" as a function of the input

**Size of $\Theta^{(j)}$:**

If a network has $s_{j}$ unit in layer $j$, and $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_{j}+1)$.

## Non-linear classification examples

### AND

One unit with 2 inputs, $x_{1}, x_{2} \in {0,1}$ (+ bias)

- $\Theta_{10} = -30$
- $\Theta_{11} = 20$
- $\Theta_{12} = 20$

Scenarios:

| $x_{1}$ | $x_{2}$ | $h_{\Theta} (x)$  |
| ------- | ------- | ------------------ |
| 0 | 0 | $g(-30)\approx0$ |
| 0 | 1 | $g(-10)\approx0$ |
| 1 | 0 | $g(-10)\approx0$ |
| 1 | 1 | $g(10)\approx1$  |

### OR

One unit with 2 inputs, $x_{1}, x_{2} \in {0,1}$ (+ bias)

- $\Theta_{10} = -10$
- $\Theta_{11} = 20$
- $\Theta_{12} = 20$

Scenarios:

| $x_{1}$ | $x_{2}$ | $h_{\Theta} (x)$  |
| ------- | ------- | ------------------ |
| 0 | 0 | $g(-10)\approx0$ |
| 0 | 1 | $g(10)\approx1$ |
| 1 | 0 | $g(10)\approx1$ |
| 1 | 1 | $g(30)\approx1$  |


### NOT

One layer with 1 inputs, $x_{1} \in {0,1}$ (+ bias)

- $\Theta_{10} = 10$
- $\Theta_{11} = -20$

Scenarios:

| $x_{1}$ | $h_{\Theta} (x)$  |
| ------- | ------------------ |
| 0 | $g(10)\approx1$ |
| 0 | $g(-10)\approx0$ |

### XNOR

Scenarios:

| $x_{1}$ | $x_{2}$ | $a^{(2)}_{1}$ | $a^{(2)}_{2}$ | $h_{\Theta} (x)$  |
| --- | --- | --- | --- | --------- |
| 0 | 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1  |