Artificial neural networks are models inspired by networks of biological neurons that are made up of artificial neurons that individually perform various computations.

Topics:

- Overview
- Implementation
- Tuning

# Overview

One simple ANN architecture is the **Perceptron**, which is made up of a single layer of **threshold logic units (TLUs)**. A TLU computes a weighted sum of inputs and applies a step function to determine an output.

*Equation 1: Common step functions*

\begin{equation*}
\text{heaviside}(z) = \begin{cases}
0 \;\text{ if }\; z \lt t\\
1 \;\text{ if }\; z \geq t\\
\end{cases}
\end{equation*}

\begin{equation*}
\text{sgn}(z) = \begin{cases}\begin{aligned}
-1 \;\text{ if }\; z \lt t\\
0 \;\text{ if }\; z = t\\
1 \;\text{ if }\; z \geq t\\
\end{aligned}\end{cases}
\end{equation*}

where $t$ is some numeric threshold.

Each TLU in a Perceptron is connected to all of the inputs in the input layer, classifying the Perceptron as a **fully connected layer**. Constant inputs are referred to as **bias neurons**.

*Equation 2: Computing fully connected layer outputs*

\begin{equation*}
h_\mathbf{W},\mathbf{b}(\mathbf{X}) = \phi(\mathbf{XW}+\mathbf{b})
\end{equation*}

- $\mathbf{X}$ is the matrix of input features
- $\mathbf{W}$ is the vector of non-bias connectionn weights
- $\mathbf{b}$ is the vector of bias connection weights
- $\phi$ is the activation function (such as a step function)

Training a Perceptron involves adjusting the weights after making predictions on each training instance to reduce the error.

*Equation 3: Perceptron learning rule*

\begin{equation*}
w_{i,j}^{\text{next step}} = w_{i,j} + \eta  \bigl(y_j - \hat{y}_j\bigr)x_i
\end{equation*}

- $w_{i,j}$ is the connection weight between the $i^{\text{th}}$ input neuron and the $j^{\text{th}}$ output neuron
- $x_i$ is the $i^{\text{th}}$ input value of the current training instance
- $\hat{y}_j$ is the output of the $j^{\text{th}}$ output neuron for the current training instance
- $y_j$ is the target output of the $j^{\text{th}}$ output neuron for the current training instance
- $\eta$ is the learning rate

In [1]:
# Perceptron demonstration with sklearn
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()

# Get petal length and width
X = iris.data[:, (2, 3)]
y = (iris.target == 0).astype(np.int)

per_clf = Perceptron()
per_clf.fit(X, y)

Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
           fit_intercept=True, max_iter=1000, n_iter_no_change=5, n_jobs=None,
           penalty=None, random_state=0, shuffle=True, tol=0.001,
           validation_fraction=0.1, verbose=0, warm_start=False)

In [3]:
y_pred = per_clf.predict([[2, 0.5]])
y_pred

array([0])