# Getting Started
-------

In this notebook we will implement a simple convolutional neural net using Flux.jl to predict digits on the famous MNIST data set.  This is a canonical neural net example, so it's not particularly novel, but it's a welcome presence in any Data Scientists github.

Additionally, we will pair the Julia code examples with a Python Example using Pytorch.  Most notable in this comparison are the similarities.  In almost all languages, the design and execution of neural nets is very similar both because they borrow from eachother things like naming conventions because it helps adoption, and also because Neural Nets have a common structure that can be exploited through generalizable code architecture. We will explore the latter similarities and differences here.

---




Starting off, we will import the necessary packages.  If you don't already have the packages listed in your Julia or Python environments, uncomment the commented code before running.

For this example, we are using the Julia Flux.jl package, which is the most popular package for working with Neural Networks in Julia.  This package is great because of its Zygote dependency which serves as the automatic differentiation engine of the package.  It also has many common neural network layer types, activation functions, and loss types.

The Parameters.jl is imported because it contains helpful macros that allow us to create structs in a manner that improves readability, but also that is not possible in base Julia. I will point out the use of Parameters when the time comes and discuss more about its use.

We import the Plots.jl package to handle any plotting we may need.  Plotting in Julia is strange at first because of it's simple sophistication.  Because julia's multiple dispatch, calling plot() on different types in julia can automatically generate plots that are relevant to that type.  I won't cover it much here, but it's very nice. One downside is that Plots.jl does not have all of the customizability of a package like Python's Matplotlib, though for almost all plots you'll like find Plots.jl an improvement.

Next we import CUDA.jl. CUDA is a package that merged together multiple separate Cuda related packages into one (CuArrays,CUDAapi, etc).  CUDA contains all the necessary functionality to transform Julia arrays and structs into GPU compatible arrays. The beauty of multiple dispatch is that once we declare our arrays to be of a Cuda compatible type, then our code executes naturally on the GPU without additional functions or code.

Finally, we import MLDatasets. MLDatasets contains some common benchmark datasets as well as a Dataloader which will help us split our data into smaller batches and lazily provide it to our NN as required.

In [7]:
# ################## JULIA CODE #########################


# #using Pkg
# #Pkg.add("Flux")
# #Pkg.add("MLDatasets")
# #Pkg.add("Parameters")
# #Pkg.add("CUDA")


# using Flux
# using Parameters
# using Plots
# using MLDatasets
# using CUDA: has_cuda


# #This little bit of code is used in all of the Model Zoo examples as a way to test whether you have
# # cuda gpu support. If so, CUDA is imported and we will process our model on the GPU.
# if has_cuda()
#     @info "CUDA is on"
#     import CUDA
#     CUDA.allowscalar(false)
#     println("CUDA ACTIVATED")
# end


# #######################################################

CUDA ACTIVATED


┌ Info: CUDA is on
└ @ Main In[7]:16


'vars' is not recognized as an internal or external command,
operable program or batch file.


Uncomment the code below to download the torch and torchvision packages if you don't already have them.  Notice that there is only one package manager in Julia and that missing packages from Julia can be installed quite easily

In [28]:
################## PYTHON CODE ########################


# import sys
# !conda install --yes --prefix {sys.prefix} torch   # uncomment if Conda environment
# !conda install --yes --prefix {sys.prefix} torchvision # uncomment if Conda environment

# !{sys.executable} -m pip install torch # uncomment if not conda environment
# !{sys.executable} -m pip install torchvision # uncomment if not conda environment

from torch import nn
from torch.utils.data import dataloader
from torchvision import datasets

#######################################################

One thing you will notice right away is the time it take for the Julia packages to load. This is because Julia compiles the packages on import. This compile time is cause for a common "time to first plot" complaint in Julia. Plots.jl can take 30seconds+ to compile on some computers.  The benefits to that compilation, though, is very fast and interactive plots that put anything made by python and matplotlib to shame.

In [11]:
download_dir = "~\\GitHub\JuliaLearning\Data" # replace with your directory of choice.
train = datasets.MNIST(download_dir, download=True)
test = datasets.MNIST(download_dir,train=False, download=True)

In [31]:
train.data.shape

torch.Size([60000, 28, 28])

In [16]:
learning_rate = 3e-3
epochs = 5
batch_size = 100
imgsize= (28,28,1)
nclasses = 10

The frustrating thing about learning deep learning models is that the architecture seems arbitrary.  Why Relu? Why a layer of that particular width? It turns out, that people just try things out and report their performance. Things that perform well get copied. Sometimes, people optimize over certain parameters using crossval and LOTS of compute. When things work, they get repeated. It's a mix of empirical results for particular datasets and heuristic approaches.

Take for example the code from Flux.jl model zoon on convolutional NN for MNIST.. Why did they choose their layers the way the did? Why max pools at each layer? no explanation. Just works.

In [None]:
dataloader(train,)

In [163]:
################# JULIA CODE #########################

# # Lets create a struct to store our model arguments
# # This will help organize our code and allow arguments to be passed into our functions in a clean and readable manner

# # The @with_kw creates a constructor for our args type automatically
# @with_kw mutable struct Args
#     η::Float64 = 3e-3
#     epochs::Int = 1
#     batch_size = 50
#     savepath::String = "./"
#     device::Function = has_cuda() ? gpu : cpu #if cuda is available, use it!
#     imgsize::Tuple = (28,28,1)
#     nclasses::Int = 10
# end

#######################################################

Args

NameError: name 'dataloader' is not defined

I finally figured out why I was getting errors! You have to be careful with the dimensions of the dataset. In one of the modelzoo examples, they pull data using a DataLoader with MLDatasets.MNIST, with the other example they use Flux.Data.MNIST.  The dimensions of each are different. Also, they were flattening the data.

After undoing the transformations and examinging the dimensions of the data, all that was required was converting a 28x28xN dataset into a 28x28x1xN.  The additional dimension made the dimensions match with the model params dimensions and everything began to work again.

Dimensions matter, I wont forget it. I've been debugging this model for a full day!

In [164]:
#Define a function to get data
args = Args
function getdata(args)
    # Loading Dataset
    xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
    xtest, ytest = MLDatasets.MNIST.testdata(Float32)

    # Reshape Data from 28x28xN to 28x28x1xN, not that this only works for the convolutional model
    xtrain = reshape(xtrain,size(xtrain)[1:2]...,1,size(xtrain)[3])
    xtest = reshape(xtest,size(xtest)[1:2]...,1,size(xtest)[3])

    # One-hot-encode the labels
    ytrain, ytest = Flux.onehotbatch(ytrain, 0:9), Flux.onehotbatch(ytest, 0:9)

    # Batching
    train_data = Flux.Data.DataLoader(xtrain, ytrain, batchsize=args.batch_size, shuffle=true)
    test_data = Flux.Data.DataLoader(xtest, ytest, batchsize=args.batch_size)
    return train_data, test_data
end

getdata (generic function with 1 method)

In [68]:
function build_conv_model(args)
    cnn_output_size = Int.(floor.([args.imgsize[1]/8,args.imgsize[2]/8,32]))
    return Chain(
    # First convolution, operating upon a 28x28 image
    Conv((3, 3), 1=>16, pad=(1,1), relu),
    MaxPool((2,2)),

    # Second convolution, operating upon a 14x14 image
    Conv((3, 3), 16=>32, pad=(1,1), relu),
    MaxPool((2,2)),

    # Third convolution, operating upon a 7x7 image
    Conv((3, 3), 32=>32, pad=(1,1), relu),
    MaxPool((2,2)),

    # Reshape 3d tensor into a 2d one using `Flux.flatten`, at this point it should be (3, 3, 32, N)
    flatten,
    Dense(prod(cnn_output_size), 10))
end



build_conv_model (generic function with 1 method)

In [161]:
m = build_conv_model(args)
for row in params(m)
    @show size(row)
end

size(row) = (3, 3, 1, 16)
size(row) = (16,)
size(row) = (3, 3, 16, 32)
size(row) = (32,)
size(row) = (3, 3, 32, 32)
size(row) = (32,)
size(row) = (10, 288)
size(row) = (10,)


In [158]:
function loss_all(dataloader, model)
    l = 0f0
    for (x,y) in dataloader

        l += Flux.Losses.logitcrossentropy(model(x), y)
    end
    l/length(dataloader)
end

function accuracy(data_loader, model)
    acc = 0
    for (x,y) in data_loader
        acc += sum(Flux.onecold(cpu(model(x))) .== Flux.onecold(cpu(y)))*1 / size(x,4)
    end
    acc/length(data_loader)
end

accuracy (generic function with 1 method)

In [160]:
args = Args(;epochs =5,batch_size=1000)
function train(args)

    # Construct model
    m = build_conv_model(args)
    train_data,test_data = getdata(args)
    train_data = args.device.(train_data)
    test_data = args.device.(test_data)
    m = args.device(m)
    loss(x,y) = Flux.Losses.logitcrossentropy(m(x), y)

    ## Training
    evalcb = () -> @show(loss_all(train_data, m))
    opt = ADAM(args.η)

    Flux.@epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb)

    @show accuracy(train_data, m)

    @show accuracy(test_data, m)
end


┌ Info: Epoch 1
└ @ Main C:\Users\Bleep\.julia\packages\Flux\05b38\src\optimise\train.jl:114


loss_all(train_data, m) = 2.2776525f0
loss_all(train_data, m) = 2.248435f0
loss_all(train_data, m) = 2.2152865f0
loss_all(train_data, m) = 2.1728847f0
loss_all(train_data, m) = 2.1165485f0
loss_all(train_data, m) = 2.0444007f0
loss_all(train_data, m) = 1.9534538f0
loss_all(train_data, m) = 1.839233f0
loss_all(train_data, m) = 1.7029536f0
loss_all(train_data, m) = 1.5498775f0
loss_all(train_data, m) = 1.3873558f0
loss_all(train_data, m) = 1.2115502f0
loss_all(train_data, m) = 1.0439117f0
loss_all(train_data, m) = 0.8958241f0
loss_all(train_data, m) = 0.7873061f0
loss_all(train_data, m) = 0.69131976f0
loss_all(train_data, m) = 0.62354285f0
loss_all(train_data, m) = 0.5629688f0
loss_all(train_data, m) = 0.5218593f0
loss_all(train_data, m) = 0.49033296f0
loss_all(train_data, m) = 0.4573032f0
loss_all(train_data, m) = 0.44973817f0
loss_all(train_data, m) = 0.41917577f0
loss_all(train_data, m) = 0.40969753f0
loss_all(train_data, m) = 0.366155f0
loss_all(train_data, m) = 0.3984281f0
loss_all(

┌ Info: Epoch 2
└ @ Main C:\Users\Bleep\.julia\packages\Flux\05b38\src\optimise\train.jl:114


loss_all(train_data, m) = 0.17277148f0
loss_all(train_data, m) = 0.17605901f0
loss_all(train_data, m) = 0.16929227f0
loss_all(train_data, m) = 0.16183974f0
loss_all(train_data, m) = 0.16953257f0
loss_all(train_data, m) = 0.15832387f0
loss_all(train_data, m) = 0.14549473f0
loss_all(train_data, m) = 0.15611799f0
loss_all(train_data, m) = 0.16809325f0
loss_all(train_data, m) = 0.15138428f0
loss_all(train_data, m) = 0.1408373f0
loss_all(train_data, m) = 0.14237304f0
loss_all(train_data, m) = 0.14286514f0
loss_all(train_data, m) = 0.14565895f0
loss_all(train_data, m) = 0.14595693f0
loss_all(train_data, m) = 0.13978532f0
loss_all(train_data, m) = 0.13584372f0
loss_all(train_data, m) = 0.1364515f0
loss_all(train_data, m) = 0.1331088f0
loss_all(train_data, m) = 0.12994474f0
loss_all(train_data, m) = 0.13306803f0
loss_all(train_data, m) = 0.13463661f0
loss_all(train_data, m) = 0.12684658f0
loss_all(train_data, m) = 0.12401478f0
loss_all(train_data, m) = 0.12391225f0
loss_all(train_data, m) = 0.

┌ Info: Epoch 3
└ @ Main C:\Users\Bleep\.julia\packages\Flux\05b38\src\optimise\train.jl:114


loss_all(train_data, m) = 0.099982806f0
loss_all(train_data, m) = 0.108037524f0
loss_all(train_data, m) = 0.113099895f0
loss_all(train_data, m) = 0.10078212f0
loss_all(train_data, m) = 0.101065464f0
loss_all(train_data, m) = 0.10426667f0
loss_all(train_data, m) = 0.09523626f0
loss_all(train_data, m) = 0.09512427f0
loss_all(train_data, m) = 0.10730527f0
loss_all(train_data, m) = 0.09919824f0
loss_all(train_data, m) = 0.08822267f0
loss_all(train_data, m) = 0.09600993f0
loss_all(train_data, m) = 0.09792208f0
loss_all(train_data, m) = 0.09450543f0
loss_all(train_data, m) = 0.094810806f0
loss_all(train_data, m) = 0.09030684f0
loss_all(train_data, m) = 0.08977219f0
loss_all(train_data, m) = 0.09186118f0
loss_all(train_data, m) = 0.095904894f0
loss_all(train_data, m) = 0.090484954f0
loss_all(train_data, m) = 0.086523674f0
loss_all(train_data, m) = 0.09238333f0
loss_all(train_data, m) = 0.091473594f0
loss_all(train_data, m) = 0.088602796f0
loss_all(train_data, m) = 0.087422095f0
loss_all(train

┌ Info: Epoch 4
└ @ Main C:\Users\Bleep\.julia\packages\Flux\05b38\src\optimise\train.jl:114


loss_all(train_data, m) = 0.07572326f0
loss_all(train_data, m) = 0.080911025f0
loss_all(train_data, m) = 0.08440767f0
loss_all(train_data, m) = 0.0791189f0
loss_all(train_data, m) = 0.0748353f0
loss_all(train_data, m) = 0.077109165f0
loss_all(train_data, m) = 0.07513511f0
loss_all(train_data, m) = 0.07174246f0
loss_all(train_data, m) = 0.07603264f0
loss_all(train_data, m) = 0.07951641f0
loss_all(train_data, m) = 0.07171177f0
loss_all(train_data, m) = 0.0679686f0
loss_all(train_data, m) = 0.070156656f0
loss_all(train_data, m) = 0.074707836f0
loss_all(train_data, m) = 0.076179735f0
loss_all(train_data, m) = 0.071818344f0
loss_all(train_data, m) = 0.07066049f0
loss_all(train_data, m) = 0.07075668f0
loss_all(train_data, m) = 0.07220874f0
loss_all(train_data, m) = 0.074729f0
loss_all(train_data, m) = 0.07058056f0
loss_all(train_data, m) = 0.068514705f0
loss_all(train_data, m) = 0.06803015f0
loss_all(train_data, m) = 0.07101832f0
loss_all(train_data, m) = 0.07264062f0
loss_all(train_data, m)

┌ Info: Epoch 5
└ @ Main C:\Users\Bleep\.julia\packages\Flux\05b38\src\optimise\train.jl:114


loss_all(train_data, m) = 0.065258905f0
loss_all(train_data, m) = 0.06945584f0
loss_all(train_data, m) = 0.06427868f0
loss_all(train_data, m) = 0.05975722f0
loss_all(train_data, m) = 0.05984291f0
loss_all(train_data, m) = 0.059782997f0
loss_all(train_data, m) = 0.05778915f0
loss_all(train_data, m) = 0.058370486f0
loss_all(train_data, m) = 0.061772455f0
loss_all(train_data, m) = 0.061132055f0
loss_all(train_data, m) = 0.05827368f0
loss_all(train_data, m) = 0.055853672f0
loss_all(train_data, m) = 0.057137605f0
loss_all(train_data, m) = 0.06007219f0
loss_all(train_data, m) = 0.060703333f0
loss_all(train_data, m) = 0.06133113f0
loss_all(train_data, m) = 0.060369495f0
loss_all(train_data, m) = 0.056733675f0
loss_all(train_data, m) = 0.057668015f0
loss_all(train_data, m) = 0.05788345f0
loss_all(train_data, m) = 0.057998262f0
loss_all(train_data, m) = 0.056284603f0
loss_all(train_data, m) = 0.056328926f0
loss_all(train_data, m) = 0.058143124f0
loss_all(train_data, m) = 0.05863417f0
loss_all(t

0.9832999999999998