#+NAME: Perceptron
#+YEAR: 1958
#+AUTHORS: Frank Rosenblatt
#+CATEGORIES: Fully Connected
#+DESCRIPTION: Neural network research begins with the first implementation of an artificial neuron called the perceptron. The theory for the perceptron was introduced in 1943 by McCulloch and Pitts as a binary threshold classifier. The first implementation was actually intended to be a machine rather than a program. Photocells were interconnected with potentiometers that were updated during learning with electric motors.
#+PAPER: /static/perceptron.pdf
#+IMAGE:

# Perceptron

The origins of artificial neural network research can be traced back to the work of Warren McCulloch and Walter Pitts in 1943, who proposed a mathematical model of a biological neuron based on the all-or-nothing character of nervous activity. This simplified computational model casts the neuron as a binary threshold unit that computes a weighted sum of its inputs to produce an output.

$$
f(x) = \begin{cases}
1,  &\text{if} \ \displaystyle\sum_{i=0}^n w_i x_i + b > 1, \\
0, &\text{otherwise}
\end{cases}
$$

The transition to multilayer perceptrons greatly expanded the learning capabilities of neural networks. Networks would now be organized into several successive layers, each consisting of multiple neurons with weighted connections to neurons in adjacent layers.

In [4]:
# This will prompt if neccessary to install everything, including CUDA:
using Flux, CUDA, Statistics, ProgressMeter

# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)]   # 1000-element Vector{Bool}

# Define our model, a multi-layer perceptron with one hidden layer of size 3:
model = Chain(
    Dense(2 => 3, tanh),   # activation function inside layer
    BatchNorm(3),
    Dense(3 => 2),
    softmax) |> gpu        # move model to GPU, if available

# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy |> gpu) |> cpu                                 # 2×1000 Matrix{Float32}

# To train the model, we use batches of 64 samples, and one-hot encoding:
target = Flux.onehotbatch(truth, [true, false])                   # 2×1000 OneHotMatrix
loader = Flux.DataLoader((noisy, target) |> gpu, batchsize=64, shuffle=true);
# 16-element DataLoader with first element: (2×64 Matrix{Float32}, 2×64 OneHotMatrix)

optim = Flux.setup(Flux.Adam(0.01), model)  # will store optimiser momentum, etc.

# Training loop, using the whole data set 1000 times:
losses = []
@showprogress for epoch in 1:1_000
    for (x, y) in loader
        loss, grads = Flux.withgradient(model) do m
            # Evaluate model and loss inside gradient context:
            y_hat = m(x)
            Flux.crossentropy(y_hat, y)
        end
        Flux.update!(optim, model, grads[1])
        push!(losses, loss)  # logging, outside gradient context
    end
end

optim # parameters, momenta and output have all changed
out2 = model(noisy |> gpu) |> cpu  # first row is prob. of true, second row p(false)

mean((out2[1,:] .> 0.5) .== truth)  # accuracy 94% so far!

LoadError: ArgumentError: Package CUDA not found in current path.
- Run `import Pkg; Pkg.add("CUDA")` to install the CUDA package.