# Julia Things

### Environment

First things first. Let us set up the environment with the requried packages for this notebook:

In [1]:
for p in ("Knet", "Plots", "Plotly.jl")
    Pkg.installed(p) == nothing && Pkg.add(p)
end

using Knet, Plots
gr()

Knet.gpu(0); # set the desired GPU to use
atype = KnetArray{Float32}; # atype = KnetArray{Float32} for gpu usage, Array{Float32} for cpu. 

println("OS: ", Sys.KERNEL)
println("Julia: ", VERSION)
println("Knet: ", Pkg.installed("Knet"))
println("GPU: ", readstring(`nvidia-smi --query-gpu=name --format=csv,noheader`))

OS: Linux
Julia: 0.6.0
Knet: 0.8.5+
GPU: NVS 310
TITAN X (Pascal)



### New Stuff

In this notebook we introduce the following Julia/Knet packages and functions:

* ...

# Generative Adversarial Networks (GANs)

Many of the applications are in the context of images. Since this takes too much time to solve in a Jupyter notebook on a laptop, we're going to provide a simpler example by fitting a much simpler distribution. We will illustrate what happens if we use GANs to build the world's most inefficient estimator of parameters for a Gaussian. Let's get started. Since this is going to be the world's lamest example, we simply generate data drawn from a Gaussian. And let's also set a context where we'll do most of the computation.

In [2]:
xtrn = randn(2, 1000);
ytrn = ones(UInt8, 1, 1000);
w    = [[1 2; -0.1 0.5]', [1, 2]];
xtrn = w[1] * xtrn .+ w[2];

In [3]:
batch_size = 4;
dtrn = minibatch(xtrn, ytrn, batch_size, xtype=atype, shuffle=true);

Let's see what we got. This should be a Gaussian shifted in some rather arbitrary way with mean $b$ and covariance matrix $A^\top A$.

In [4]:
print("The covariance matrix is:\n")
A = w[1]'
A * A'

The covariance matrix is:


2×2 Array{Float64,2}:
 5.0  0.9 
 0.9  0.26

In [5]:
scatter(xtrn[1, :], xtrn[2, :], legend=false)
xticks!([-2, 1, 4]); yticks!([-4, 2, 8])

## Define the networks

Next we need to define how to fake data. Our generator network will be the simplest network possible - a single layer linear model. This is since we'll be driving that linear network with a Gaussian data generator. Hence, it literally only needs to learn the parameters to fake things perfectly. For the discriminator we will be a bit more discriminating: we will use an MLP with 3 layers to make things a bit more interesting. 

The cool thing here is that we have *two* different networks, each of them with their own gradients, optimizers, losses, etc. that we can optimize as we please. 

The function `initweights` allows to initialize a model of arbitrary depth, assuming that the parameters of each layer consist of an $H\times X$ matrix and a vector of size $H$. We can define the shape of the matrices with the keyword argument `hidden`, a vector with the matrix sizes. We can also define the dimension of the input layer with the variable $d$ such that $x_i\in\mathbb{R}^d$ for $i=1,\dots,N$, where $N$ is the number of samples. For example, if we want a single densely connect layer with `xtrn` as input, we can simply call `initweights(2)` since by default `initweights` initializes a single layer with output size `1`. Using `initweights` we can define but the generator and discriminator units:

In [16]:
function initweights(d, hidden)
    model = Vector{Any}(2 * length(hidden))
    X = d
    for k = 1:length(hidden)
        H = hidden[k]
        model[2k - 1] = randn(H, X)
        model[2k]     = zeros(H, 1)
        X = H
    end
    return model
end

initweights (generic function with 1 method)

In [17]:
generator_init(d, atype)     = map(atype, initweights(d, [2]));
discriminator_init(d, atype) = map(atype, initweights(d, [5, 3, 2]));

We can also create a single `predict` function for both models:

In [18]:
function predict(w, x)
    x = mat(x)
    for i=1:2:length(w) - 2
        x = tanh.(w[i] * x .+ w[i+1])
    end
    return w[end - 1]*x .+ w[end]
end

predict (generic function with 1 method)

In [9]:
loss(w, x, y) = nll(predict(w, x), y)
lossgradient  = grad(loss)

(::gradfun) (generic function with 1 method)

In [10]:
function train(w, x, y, optim)
    
    g = lossgradient(w, x, y)
    update!(w, g, optim)
    
    return w
    
end

train (generic function with 1 method)

In [15]:
wg = generator_init(2, atype);
wd = discriminator_init(2, atype);

optimg = optimizers(wg, Adam;  lr=0.01)
optimd = optimizers(wd, Adam;  lr=0.05)

Fake_x = []
for epoch = 1:10
    
    push!(Fake_x, predict(wg, atype(randn(2, 100))));
    for (x, y) in dtrn
        
        # generate fake inputs from noise
        noise  = atype(randn(size(x)))
        fake_x = copy(predict(wg, noise));
        fake_y = Array{UInt8}(2ones(size(y)))
        
        x_ = hcat(x, fake_x)
        y_ = hcat(y, fake_y)
        
        wd     = train(wd, x_, y_, optimd)  
        output = predict(wd, fake_x)
        wg     = train(wg, output, y, optimg)
    end  
    
    if epoch % 2 == 0
        scatter(xtrn[1, :], xtrn[2, :], label=:true_data)
        xpred = Array(Fake_x[epoch])
        display(scatter!(xpred[1, :], xpred[2, :],  label=:synthetic_data, size=(400,300)))
    end
end