# 3. ... Perceptron to perform ... using plain Swift

**[Christopher Boone](https://github.com/cboone)**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/cboone/swift-neural-intuition/blob/master/2-perceptron-exclusive-disjunction.ipynb)

_Using the network from [part one](1-perceptron-inclusive-disjunction.ipynb)._

The Perceptron we've created is a simple network that:

1. Takes an input vector $\mathbf{x}_j \in \mathbb{R}^{n}$ where:
    - $\mathbf{x}_{j,i} \in \{0, 1\}$
    - $\mathbf{x}_{j,0} = 1$
    - $n = 3, n = m + 1$, where $m = 2$ is the actual number of inputs
    

2. Takes a weights vector $\mathbf{w} \in \mathbb{R}^{n}$ where:
    - $w_i \in [0, 1]$
    

3. Calculates the dot (inner, scalar) product of the inputs and the weights where:
    - $f \colon \mathbb{R}^{n} \to \mathbb{R}$
    - $f(\mathbf{x}_j) = \mathbf{x}_j \cdot \mathbf{w}$
    - $f(\mathbf{x}_j) = \sum_{i=1}^{n} x_{j,i} w_{i}$


5. Applies a binary step activation function $g(h_j)$ where:
    - $g(h_j)={\begin{cases} 0 &{\text{for }} h_j < 0 \\1 &{\text{for }} h_j >  0 \end{cases}}$
    - therefore $g(x)$ is monotonic, but not continuous
    - $h_j$ is the weighted, summed inputs
    

6. Produces an output $\hat{y}_j$ where:
    - $\hat{y}_j \in \{0, 1\}$
    - $\hat{y}_j = g \! \left( \sum_{i=1}^{n} x_{j,i} w_{i} \right)$

_NB:_ Using $n = m + 1$ and $\mathbf{x}_{j,0} = 1$ obviates the need for a separate bias term by ensuring that all input vectors have $1$ as their first value, the impact of which is moderated by $\mathbf{w}_0$.

And its training:

1. Measures the prediction $\hat{y}_j$ against the expectation $y_j$ using an objective (cost, error, loss, transfer) function $L(\hat{y}_j, y_j)$

1. Updates the weights $\mathbf{w}$ using the delta rule $\Delta w_{j,i} = \alpha (\hat{y}_j - y_j) x_{j,i}$, simplified to leave out the gradient of $g'(h_j)$, where:
    - $\alpha \in [0, 1]$
    - $h_j$ is the scalar product of the inputs and the weights

This network can be modified, without changing its structure, by:

1. Changing $n$ depending on the shape of the input data $\mathbf{x}_j$
1. Adding features to the input data $\mathbf{x}_j$
1. Providing different starting $\mathbf{w}$ values
1. Using a different activation function $g(x)$
1. Measuring the predictions with a different objective function $L(\hat{y}_j, y_j)$
1. Including the gradient $g'(h_j)$ when updating the weights $\mathbf{w}$

## The Iris dataset

In [20]:
import PythonKit
let sklearnDatasets = Python.import("sklearn.datasets")

In [29]:
let irisData = sklearnDatasets.load_iris()
let irisTrainingSamples: [([Double], Double)] = zip(irisData.data, irisData.target).map { (([1.0] + $0.0.map { Double($0)! }), Double($0.1 == 0 ? 0 : 1)!) }
let irisTrainingSampleInputs: [[Double]] = irisTrainingSamples.map { $0.0 }
let irisTrainingSampleOutputs: [Double] = irisTrainingSamples.map { $0.1 }

In [38]:
trainWeights(
    startingFrom: Array(repeating: 0, count: irisTrainingSampleInputs[0].count),
    samples: irisTrainingSamples,
    learningRate: 0.1,
    errorThreshold: 0.25,
    maximumIterationCount: Int(1e3),
    activation: unitStep
)

Iterations: 0
Epochs: 0
Current weights: [0.0, 0.0, 0.0, 0.0, 0.0]
Mean error: 0.6666666666666667
Accuracy: 0.3333333333333333
Precision: 0.0
Recall: 0.0
F1: 0.0

Iterations: 8
Epochs: 0
Current weights: [-0.1, -0.19999999999999996, -0.45000000000000007, 0.74, 0.4099999999999999]
Mean error: 0.0
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1: 1.0



▿ 5 elements
  - 0 : -0.1
  - 1 : -0.19999999999999996
  - 2 : -0.45000000000000007
  - 3 : 0.74
  - 4 : 0.4099999999999999
