# week 5: neural networks


## artificial neural networks

- model mathematical function from inputs to outputs based on the structure and parameters of the network
- allows for learning the network's parameters based on data

### activation functions

- step function
  - $g(x)=1$ if $x\ge0$ else $0$
  - forms a step at $\vec w\cdot\vec x$
- logistic sigmoid
  - $g(x)=e^x/e^x+1$
  - forms a probability sigmoid around $\vec w\cdot\vec x$
- ReLU
  - $g(x)=\text{max}(0, g)$
  - forms a hockey stick starting at $\vec w\cdot\vec x$

These activation functions effectively result in
$
h(x_1,x_2)=g(w_0+w_1x_1+w_2x_2)
$

### network structure

#### OR function

- $w_0=-1$
- $w_1=1$
- $w_2=1$
- $g(-1+1x_1+1x_2)$
- step activation function

#### AND function

- $w_0=-2$
- $w_1=1$
- $w_2=1$
- $g(-2+1x_1+1x_2)$
- step activation function

### multiple nodes

$
g(\sum^n_{i=1}x_iw_1+w_0)
$

## training: gradient descent

- algorithm for minimising loss when training a neural network

1. start with a random choice of weights
2. repeat
   1. calculate the gradient based on all data points: direction that will lead to decreasing loss
   2. update weights according to the gradient

- calculating from all data points is very  (in terms of time and effort)
- so somtimes we use one data point only
- or sometimes use a small batch

### binary classification

- we can only predict things that are linearly seperable
- **a single perceptron is only capable of learning a linearly seperable decision boundary**
- how do we make more complex decision bounaries, such as circular?


## multilayer neural network

- an ANN with an input layer, output layer, and at least one hidden layer
- Input -> hidden -> Output
- this can model more complex functions
- if you know the loss of the output node, you can estimate the loss contribution from the hidden layer from the weights - backpropogation
  - start with a random choice of weights
  - repeat:
    - calculate error for output layer
    - for every layer, starting with output layer and moving back towards the earliest hidden layer:
      - propogate error back one layer
      - update weights
- this is a **deep neural network**, an ANN with multiple hidden layers

### overfiting

to combat overfitting with DNNs, we can use **dropout**

- temporarily removing units, selected at random, from an ANN to prevent over-reliance on certain units


In [1]:
import csv
import tensorflow as tf

from sklearn.model_selection import train_test_split

# read data from file
with open("banknotes.csv") as f:
    reader = csv.reader(f)
    next(reader)

    data = []
    for row in reader:
        data.append({
            "evidence": [float(cell) for cell in row[:4]],
            "label": 1 if row[4] == "0" else 0
        })

# separate data into train and test sets
evidence = [row["evidence"] for row in data]
labels = [row["label"] for row in data]
X_training, X_testing, y_training, y_testing = train_test_split(evidence, labels, test_size=0.4)

# create a neural network
model = tf.keras.models.Sequential()

# add a hidden layer with 8 units, with ReLU activation function
model.add(tf.keras.layers.Dense(8, input_shape=(4,), activation="relu"))

# add output layer with 1 unit, with sigmoid activation function
model.add(tf.keras.layers.Dense(1, activation="sigmoid"))

# train neural network with adam optimiser, binarised crossentropy loss and accuracy metric
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# fit the model with 20 epochs
model.fit(X_training, y_training, epochs=20)

# evaluate model performance
model.evaluate(X_testing, y_testing, verbose=2)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
18/18 - 0s - loss: 0.1187 - accuracy: 0.9945


[0.118703693151474, 0.994535505771637]

## computer vision

- computational methods for analysing and understanding digital images
- flattening out an image into pixel strings removes a lot of useful information
  
### image convolution

#### kernels

- applying a filter that adds each pixel value of an image according to its neighbors, weighted according to a kernel matrix
- example kernel:
  - [0, -1, 0]
  - [-1, 5, -1]
  - [0, -1, 0]
  - or
  - [-1, -1, -1]
  - [-1, 8, -1]
  - [-1, -1, -1]

#### pooling

- reducing the size of an input by sampling from regions in the input
- max-pooling: pooling by choosing the maximum value in a region

### convolutional neural network

- neural networks that use convolution, usually for analysing images
- convolution -> pooling -> deep neural network


## types of ANNs

### feed-forward neural network

- neural network that has connections only in one direction

### recurrent neural network

- network can maintain a state and feed back to itself
- this is a one-to-many structure
- it can output at each step (such as building up a sentence from an image)
- or it can just pass the old network as new inputs (such as analysing an entire video image-by-image)
- or it can be many-to-many, such as google translate where it can't just translate word-by-word, with a Long Short-Term Memory Network (LSTM)
