# Julia Things

### Environment

First things first. Let us set up the environment with the requried packages for this notebook:

In [1]:
for p in ("Knet", "Plots", "Plotly.jl")
    Pkg.installed(p) == nothing && Pkg.add(p)
end

using Knet, Plots, Images
gr()

Knet.gpu(0); # set the desired GPU to use
atype = KnetArray{Float32}; # atype = KnetArray{Float32} for gpu usage, Array{Float32} for cpu. 
srand(1)

println("OS: ", Sys.KERNEL)
println("Julia: ", VERSION)
println("Knet: ", Pkg.installed("Knet"))
println("GPU: ", readstring(`nvidia-smi --query-gpu=name --format=csv,noheader`))

OS: Linux
Julia: 0.6.0
Knet: 0.8.5+
GPU: NVS 310
TITAN X (Pascal)



# Deep Convolutional Generative Adversarial Networks

In [our introduction to generative adversarial networks (GANs)](section2-generative-adversarial-networks.ipynb), 
we introduced the basic ideas behind how GANs work.
We showed that they can draw samples from some simple, easy-to-sample distribution,
like a uniform or normal distribution, 
and transform them into samples that appear to match the distribution of some data set. 
And while our example of matching a 2D Gaussian distribution got the point across, it's not especially exciting.

In this notebook, we'll demonstrate how you can use GANs 
to generate photorealistic images. 
We'll be basing our models on the deep convolutional GANs introduced in [this paper](https://arxiv.org/abs/1511.06434). 
We'll borrow the convolutional architecture that have proven so successful for discriminative computer vision problems
and show how via GANs, they can be leveraged to generate photorealistic images. 

In this tutorial, concentrate on the [LWF Face Dataset](http://vis-www.cs.umass.edu/lfw/), 
which contains roughly 13000 images of faces. 
By the end of the tutorial, you'll know how to generate photo-realistic images of your own, given any dataset of images. First, we'll the the preliminaries out of the way.



### Training parameters

In [2]:
epochs        = 2;
batch_size    = 64;
latent_z_size = 100;

lr    = 2e-4;
beta1 = 0.5;

## Dataset

In [3]:
lfw_url   = "http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz";
data_path = joinpath(pwd(), "../datasets/lfw-deepfunneled");
#download(lfw_url, data_path);

Let us load all `.jpg` image files in the `data_path` dictionary:

In [4]:
rgb_array = [load(joinpath(root, f)) for (root, dirs, files) in walkdir(data_path) for f in files if endswith(f, "jpg")];

We can transform all images to match the transformation done in [mxnet](https://github.com/zackchase/mxnet-the-straight-dope/blob/master/chapter14_generative-adversarial-networks/dcgan.ipynb). We basically normalize to `[-1 1]` and ensure all arrays have 3 channels. Let us transform `nsamples`:

In [5]:
function transform(rgb_array; target_wd=64, target_ht=64, nsamples=nothing)
    if nsamples==nothing; nsamples=length(rgb_array); end
    output = zeros(Float32, target_wd, target_ht, 3, nsamples)
    for k = 1:nsamples
        x = imresize(rgb_array[k], (target_wd, target_ht));
        x = convert(Array, reinterpret(Float32, float32.(x)));
        x = (x * 255) / 127.5 - 1 # normalize to [-1 1]
        if size(x, 1) == 1; x = cat(1, x, x, x); end;
        output[:, :, :, k] = permutedims(reshape(x, (size(x)..., 1)), [2, 3, 1, 4])
    end
    return output
end

transform (generic function with 1 method)

In [111]:
img_arr = transform(rgb_array; nsamples=10);

In [116]:
visualize(array_rgb) = colorview(RGB, permutedims( (127.5(array_rgb+1))/255, [3, 1, 2]));

In [121]:
[visualize(img_arr[:, :, :, k]) for k = 1:4]

## Defining the networks

The core to the DCGAN architecture uses a standard CNN architecture on the discriminative model. For the generator,
convolutions are replaced with upconvolutions, so the representation at each layer of the generator is actually successively larger, as it mapes from a low-dimensional latent vector onto a high-dimensional image.

* Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).

* Use batch normalization in both the generator and the discriminator.

* Remove fully connected hidden layers for deeper architectures.

* Use ReLU activation in generator for all layers except for the output, which uses Tanh.

* Use LeakyReLU activation in the discriminator for all layers.

![](../img/dcgan.png "DCGAN Architecture")

Note that in this case we're gonna change the number of output channels to 2!

In [138]:
function initkernelweights(d, hidden; use_bias=false)
    model = Vector{Any}( (use_bias ? 2:1) * length(hidden) )
    X = d
    for i = 1:length(hidden)
        k = hidden[i][1]; # kernel size
        H = hidden[i][2]; # channels    
        j = use_bias ? 2i - 1:i
        model[j] = xavier(Float32, k, k, X, H)
        if use_bias; model[j + 1] = zeros(Float32, 1, 1, H, 1); end
        X = H
    end
    return model
end

initkernelweights (generic function with 1 method)

In [139]:
wginit(ngf, nc, nz) = initkernelweights(nz, [(1,ngf*8), (1,ngf*4), (1,ngf*2), (1,ngf), (1,nc)]);
wdinit(ndf, nc)     = initkernelweights(nc, [(4,ndf), (4,ndf*2), (4,ndf*4), (4,ndf*8), (4,2)]);

In [140]:
function predict(w, x; generate=false, training=true)
    if generate
        x = conv4(w[1], unpool(x, window=4) )
        x = relu.( batchnorm( x, training=training ) )
        for i=2:length(w) - 1
            x = conv4(w[i], unpool(x, window=2) )
            x = relu.( batchnorm( x, training=training ) )
        end
        x = tanh.(conv4(w[end], unpool(x, window=2) ))
    else
        for i=1:length(w) - 1
            x = batchnorm( conv4(w[i], x, stride=2, padding=1), training=training ) 
            x = max.(0.2x, x)
        end
        x = mat(conv4(w[end], x))
    end
    return x 
end

predict (generic function with 1 method)

In [141]:
function loss(w, x, y; wd=0, o...)
    if wd == 0; return nll(predict(w, x, o...), y); else; return nll(predict(wd, predict(w, x; generate=true, o...); o...), y) end
end
lossgradient = grad(loss)

(::gradfun) (generic function with 1 method)

In [142]:
function train(w, x, y, optim; o...)
    g = lossgradient(w, x, y; o...); update!(w, g, optim); return w
end

train (generic function with 1 method)

In [143]:
noise(zsize, batch_size) = randn(Float32, 1, 1, zsize, batch_size);

In [145]:
num_of_samples = 10000
latent_z_size  = 100
batch_size     = 10;

ngf = 64;
ndf = 64;
nc  = 3;

wg = map(atype, wginit(ngf, nc, latent_z_size));
wd = map(atype, wdinit(ndf, nc));

optimg  = optimizers(wg, Adam;  lr=0.0002)
optimd  = optimizers(wd, Adam;  lr=0.0002)

img_arr = transform(rgb_array; nsamples=num_of_samples);

dtrn    = minibatch(img_arr, ones(UInt8, 1, size(img_arr, 4)), batch_size, xtype=atype, shuffle=true);

## Training

In [None]:
for epoch = 1:20
    for (x, y) in dtrn
        z     = atype(noise(latent_z_size, batch_size));
        xfake = predict(wg, z; generate=true );
        yfake = Array{UInt8}(2ones(size(y)));
        X     = reshape(hcat(mat(x), mat(xfake)), (64, 64, 3, 2batch_size));
        Y     = reshape(hcat(mat(y), mat(yfake)), (1, 2batch_size));
        wd = train(wd, X, Y, optimd)  
        wg = train(wg, z, y, optimg; wd=wd)          
    end 
    z = atype(noise(latent_z_size, 5))
    xfake = Array(predict(wg, z; generate=true ));
    display([visualize(xfake[:, :, :, k]) for k = 1:5])
end