# Deep Learning with Julia

## Housing Data

First and foremost, import the libraries we need.

In [None]:
using Flux, Flux.Tracker
using Base.Iterators: repeated

And download some example data on house prices.

In [None]:
isfile("housing.data") ||
  download("https://raw.githubusercontent.com/MikeInnes/notebooks/master/housing.data",
           "housing.data")

Next we load the data, which is as easy as using the `readdlm` function. Each column represents an individual property. The last row represents the price of the property, in tens of thousands of dollars (this is an old data set).

In [None]:
rawdata = readdlm("housing.data")'

The price is what we actually want to predict, so we split that out as our target output `y`, while the rest of the data is used as an input `x`.

In [None]:
x = rawdata[1:13,:]
y = rawdata[14:14,:]

We want to *normalise* the data, such that all features have mean $0$ and standard deviation $1$. This ensures that features don't have more influence on the model just because they are numerically larger.

In [None]:
x = (x .- mean(x,2)) ./ std(x,2)

Our model is simply an affine transform: we multiply by some weight matrix `W` and add a bias `b`. The `param` function tells Flux that these should be *trainable parameters* that we can tweak in order to get better predictions.

In [None]:
W = param(randn(1, 13))
b = param(randn(1))
model(x) = W*x .+ b

We can immediately apply our model to our input data to see what predictions it generates, compared to the ground truth `y`.

In [None]:
model(x)

In [None]:
y

Not great predictions! We can quantify exactly how bad they are with a *loss function* that computes the distance between `m(x)` and `y`.

In [None]:
loss(x, y) = Flux.mse(model(x), y)
loss(x, y)

If we call `Flux.train!`, Flux will try to reduce this loss value by adjusting the parameters inside the `Dense` layer.

In [None]:
opt = SGD([W, b], 0.1)
Flux.train!(loss, [(x, y)], opt)
@show loss(x, y)

We can see that the loss has reduced a little already. We can train over the `x` and `y` multiple times, as well as supplying a callback to show the loss at each iteration.

In [None]:
Flux.train!(loss, repeated((x, y), 10), opt,
            cb = () -> @show loss(x, y))

The loss goes down pretty quickly. Now let's see our predictions again.

In [None]:
model(x)

In [None]:
y

In [None]:
using MNIST

You can see that the predictions are a lot closer, and there's some correlation between the predicted prices and the real ones. If you keep running `train!` you'll see that the predictions keep getting better, until the data converges.

# MNIST

Here's a slightly harder example: Classifying hand-written MNIST digits. We can grab the dataset from the `MNIST` package.

In [None]:
using Flux, MNIST, Images
using Base.Iterators: repeated
x, y = traindata()

Each MNIST digit is represented as a column in a large matrix. We can grab a column, make it sqaure, and then convert the numbers to colour values to see what it looks like.

In [None]:
Gray.(reshape(x[:,5], 28, 28)./256)

If we want to, we can also try to visualize the whole dataset at once.

In [None]:
Gray.(x[:, 1:1000]/256)

The target output `y` is a list of digits; you'll recognise the fifth element, `9`, as being the 9 that's drawn above.

In [None]:
y[1:5]

However, this is not a good format for our machine learning model, which needs to predict the probability that an image is any of the ten digits. To do this we put the outputs in "one hot" form, where each column is an image and each row is a given class. For example, the first column represents $5$, so the fifth row is `true`, and so on.

In [None]:
y = Flux.onehotbatch(y, 0:9)

We can also visualise `y` in one-hot form.

In [None]:
Gray{N0f8}.(y[:,1:100])

Here's our model, a multi-layer perceptron.

In [None]:
m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10),
  softmax)

Calling the model with our input images `x` gives us a predicted output `y`.

In [None]:
m(x)

In [None]:
Gray.(m(x).data)[:,1:100]

Compared to the `y` above, our model is assigning low probabilities to every class for every image; it can't make any confident predictions without being trained. Once again we can quantify how far we are off target with a loss function.

In [None]:
loss(x, y) = Flux.mse(m(x), y)
loss(x, y)

In [None]:
dataset = repeated((x, y), 100)
evalcb = () -> display(Gray.(m(x).data)[:,1:100])
opt = SGD(params(m), 0.1)
Flux.train!(loss, dataset, opt, cb = Flux.throttle(evalcb, 5))

You can easily see the predictions gradually getting more confident, and they're looking very similar to the ground truth.

In [None]:
Gray.(y)[:,1:100]

It's obvious that the predicted classes, while noisier, are very close to the true values; our modelled has learned how to tell a 9 from a 3 and so on. You can dig down to the parts where the model is less confident. For example, we're not sure whether image 34 is a 7 or a 9. Pulling that image out, it's easy to see why.

In [None]:
Gray.(m(x).data)[:,32:36]

In [None]:
Gray.(reshape(x[:,34], 28, 28)./256)