<a href="https://colab.research.google.com/github/abel-bernabeu/facecompressor/blob/master/train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compression models training

This notebook is provided for anyone to use it as a scripted recipe for training the models from the autoencoder.models package. The notebook can also be read as a journal of the trial an error process that lead to the best known pre-trained model.

The notebook is divided in sections, each of them corresponding to an experiment. In each of those experiments we craft an incrementally functional prototype or we test a different idea. A series of seven experiments leads to the final model.

For each experiment we do:

- Specify the hyperparmeters (hparams).
- Instantiate a model from the autoencoder.models package.
- Create dataloaders with the input patch size and batch size especified in hparams.
- Embed a TensorBoard for visualiazing the training.
- Kick a training session lasting until the number of epochs especified in hparams is reached.
- Summarize the results.

# Setup

Firstly, we download from DropBox the dataset and the models source code if needed.

Optionally, when the download_trained_models Bool is set to True in the next code cell, the notebook also downloads all the pre-trained weights and the TensorBoard logs. Setting this option to True is useful if you only intend to browse the TensorBoard instances from Google Collab.

In [None]:
download_trained_models = True

# When on Google Colab force PyTorch version to 1.4.0 (for compatibility of the .pt files)
try:
    from google.colab import drive
    !pip install torch==1.4.0 torchvision==0.5.0
except:
    pass

# Get the dataset if needed
import os.path
if not os.path.isdir('./data'):
    !rm -rf image_dataset.zip
    !wget https://www.dropbox.com/s/n03i55xxwqnned4/image_dataset.zip && \
    unzip -q image_dataset.zip && rm image_dataset.zip
    !wget https://www.dropbox.com/s/ds815v4i8a8vyep/ginger.tgz && \
    tar xzf ginger.tgz && rm ginger.tgz

# Get the latest source code if needed
if not os.path.isdir('./autoencoder'):
    !wget https://www.dropbox.com/s/at2d0nmmiw118ap/facecompressor-master.zip && \
    unzip -q facecompressor-master.zip && rm facecompressor-master.zip && \
    mv facecompressor-master/autoencoder/ . && \
    rm facecompressor-master -rf

if not os.path.isdir('./share') and download_trained_models:
    # The fallback for when not in Collab is to download share from Dropbox
    !wget https://www.dropbox.com/s/76w9gsga8mz5ve4/share.tgz && tar xzf share.tgz && rm share.tgz
    !wget https://www.dropbox.com/s/xd9noc1tbfd173o/experiment5.tgz && tar xzf experiment5.tgz && rm experiment5.tgz
    !wget https://www.dropbox.com/s/zt2hrac0dslzyy9/experiment6.tgz && tar xzf experiment6.tgz && rm experiment6.tgz
    !wget https://www.dropbox.com/s/anfb3jrk72fta6y/experiment7.tgz && tar xzf experiment7.tgz && rm experiment7.tgz

In [None]:
import autoencoder.models
import autoencoder.utils

# Experiment 1: sparsity at 1/2

Our baseline model effort focuses on training the neural network proposed in "Lossy image compression with compression autoencoders", by Lucas Theis, Wenzhe Shi, Andrew Cunningham & Ferenc Husz, published in 2017 (see the [original paper](https://arxiv.org/pdf/1703.00395v1.pdf) for details). We make the addition of batch normalization layers for improved robustness, but other than that we try to stick to the proposed model as much as possible.

The purpose of this experiment is to confirm that we have understood the architecture, and confirm that we can extract features and use them for reconstructing the original image.

The paper authors claim that their "subpixel" operator is a better upsampler than transposed convolution, because it does not suffer from the checker board artifact produced by the kernel overlaps (see [this blog post](https://distill.pub/2016/deconv-checkerboard/) for illustrated examples). This a bold clain that needs, at least, a visual confirmation.

For this first experiment we will not implement any kind of quantization and we will only perform a 50% dimensionality reduction (sparsity from now). This dimensionality reduction is achieved by using 96 channels in the features tensor (as opposed to 192 channels that would be needed if we wanted to keep the dimensionality from the input).

## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 32,
    'lr' : 1e-6,
    'device' : 'cuda',
    'block_width' : 128,
    'block_height' : 128,
    'hidden_state_num_channels' : 96,
    'quantize' : False,
    'num_bits' : 0,
    'train_dataset_size' : 5000,
    'test_dataset_size' : 500,
    'num_epochs' : 12577,
    'num_workers' : 4,
    'name' : "experiment1",
    'port' : 6100,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p share/{hparams['name']}

## Model instantiation

In [None]:
model = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=model, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

We see there is no bluriness, which is very pleasant to see. The subpixel operator certainly delivers a sharp reconstruction.

The quality of this model sets 43 db as upper bound on the accuracy for this model. The quality meassurmentt will not get any better as we try smaller sparsity ratios in the next experiments

Training the model took 4 days on a Tesla P100, setting also a lower bound on how long will take us to train state of de art models for image compression.

# Experiment 2: sparsity at 1/4

In this second experiment we further squeeze the features tensor, going from 96 channels to only 48 for achieving a 25% dimensionality reduction to confirm the images can be further squeezed without serious damage. We define a serious damage a PSNR for the test set below 32 db.

Again no quantization is provided.

## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 40,
    'lr' : 1e-6,
    'device' : 'cuda',
    'block_width' : 224,
    'block_height' : 224,
    'hidden_state_num_channels' : 48,
    'quantize' : False,
    'num_bits' : 0,
    'train_dataset_size' : 1000,
    'test_dataset_size' : 500,
    'num_epochs' : 16000,
    'num_workers' : 4,
    'name' : "experiment2",
    'port' : 6200,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p share/{hparams['name']}

## Model instantiation

In [None]:
model = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=model, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

We see training reaching a 32 dB PSNR for the test set in just 31 hours of training, with the slope suggesting the quality is far from estagnated.

#  Experiment 3: 3 bits quantization

On this third experiment we introduce 3 bits quantization of the features. This experiment is intended to empirically prove the suitability of a novel concept for training a quantizing model, which we call **training in two stages**:

1. A quantizing model is trained with the quantization and dequantization modules being bypassed.

2. The quantizing model in trained with the encoder weights frozen and the quantization and dequantization modules enabled, with the purpose of training the decoder for undoing the quantization.

## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 40,
    'lr' : 1e-6,
    'device' : 'cuda',
    'block_width' : 224,
    'block_height' : 224,
    'hidden_state_num_channels' : 48,
    'quantize' : True,
    'num_bits' : 3,
    'train_dataset_size' : 1000,
    'test_dataset_size' : 500,
    'num_epochs' : 2500,
    'num_workers' : 4,
    'name' : "experiment3",
    'port' : 6300,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p  share/{hparams['name']}

## Model instantiation

In [None]:
qmodel = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

# Transfer learning from the non-quantized model
qmodel.encoder = model.encoder
qmodel.decoder = model.decoder

# Freeze the encoder
for param in qmodel.encoder.parameters():
    param.requires_grad = False

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=qmodel, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

The test PSNR improves about 0.2 dB, showing that the decoder can learn to undo some of the noise introduced by the quantization.

# Experiment 4: sparsity at 1/8

At this point it becomes evident that if we want to achieve a compression ratio in the range of 1/10 to 1/20 for comparing with JPEG and JPEG 2000, it is unlikely that using a 1/4 sparsity is bringing us even nearly close, no matter what the PSNR the image codec is.

Although we can certainly try to rely on quantization and entropic coding for bridging the compression ratio gap from 1/4 to 1/10, it seems a bit of a stretch to say the least. Achiving a sparsity of 1/8 on the autoencoder would be a better starting point for the quantization and entropy coding effort to bridge the gap with JPEG. 

Hence, in this experiment we train a model that reduces dimensionality to 1/8, although but we do not yet perform  quantization. We do not introduce quantization yet because we learned in experiment 3 that it is possible to first train without quantization and then introduce the quantization on a second stage.

## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 40,
    'lr' : 1e-6,
    'device' : 'cuda',
    'block_width' : 224,
    'block_height' : 224,
    'hidden_state_num_channels' : 24,
    'quantize' : False,
    'num_bits' : 0,
    'train_dataset_size' : 1000,
    'test_dataset_size' : 500,
    'num_epochs' : 110000,
    'num_workers' : 4,
    'name' : "experiment4",
    'port' : 6400,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p share/{hparams['name']}

## Model instantiation

In [None]:
model = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=model, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

The test PSNR is 40.1 dB, but could only achieve this result at the expense of training for 14 days on an Tesla P100 (with a approximated cost of 350 euros in Google Cloud Platform).

#  Experiment 5: 6 bits quantization

We do the second stage of training the quantizing model, expecting to confirme once again that the training in two stages helps to reduce the amount of noise introduced by the quantization.

Similarly to what it was done for experiment 3, the second stage of the training is achieved is by transfering the encoder and decoder weights learned with experiment 4, freezing the encoder weights and further training the decoder for undoing the quantization.

## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 40,
    'lr' : 1e-8,
    'device' : 'cuda',
    'block_width' : 224,
    'block_height' : 224,
    'hidden_state_num_channels' : 24,
    'quantize' : True,
    'num_bits' : 6,
    'train_dataset_size' : 1000,
    'test_dataset_size' : 500,
    'num_epochs' : 12650,
    'num_workers' : 4,
    'name' : "experiment5",
    'port' : 6500,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p share/{hparams['name']}

## Model instantiation

In [None]:
qmodel = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

# Transfer learning from the non-quantized model
qmodel.encoder = model.encoder
qmodel.decoder = model.decoder

# Freeze the encoder
for param in qmodel.encoder.parameters():
    param.requires_grad = False

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=qmodel, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

Adding the quantization worsened the test PSNR: went from 40.1 dB to 39.7 dB. However, by further training the decoder we improved the test PSNR from 39.7 dB to 39.9 dB.

# Experiment 6: blending in more training data

In this experiment we try a radically different approach for training the same model from experiment 4. Rather than running for as many epochs as possible (110K in experiment 4) we do fewer epochs with an increased dataset size (60K samples as opposed to 1K samples in experiment 4) and an increased learning rate.

## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 40,
    'lr' : 2e-5,
    'device' : 'cuda',
    'block_width' : 224,
    'block_height' : 224,
    'hidden_state_num_channels' : 24,
    'quantize' : False,
    'num_bits' : 0,
    'train_dataset_size' : 60000,
    'test_dataset_size' : 6000,
    'num_epochs' : 960,
    'num_workers' : 4,
    'name' : "experiment6",
    'port' : 6600,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p share/{hparams['name']}

## Model instantiation

In [None]:
model = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=model, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

The approach really pays off, achieving higher accuracy with just 5 days of training (a opposed to 14 days in experiment 4).

#  Experiment 7: 6 bits quantization of final model

In this  experiment we introduce a 6 bits quantization in the model from experiment 6. For the training we used only 12 additional hours of a Tesla P100.


## Hyperparameters

In [None]:
hparams = {
    'batch_size' : 40,
    'lr' : 1e-8,
    'device' : 'cuda',
    'block_width' : 224,
    'block_height' : 224,
    'hidden_state_num_channels' : 24,
    'quantize' : True,
    'num_bits' : 6,
    'train_dataset_size' : 60000,
    'test_dataset_size' : 6000,
    'num_epochs' : 350,
    'num_workers' : 4,
    'name' : "experiment7",
    'port' : 6700,
    'checkpointing_freq' : 10,
    'inference_freq' : 200,
}

!mkdir -p share/{hparams['name']}

## Model instantiation

In [None]:
qmodel = autoencoder.models.TwitterCompressor(
    hidden_state_num_channels=hparams['hidden_state_num_channels'],
    quantize=hparams['quantize'],
    num_bits=hparams['num_bits'])

# Transfer learning from the non-quantized model
qmodel.encoder = model.encoder
qmodel.decoder = model.decoder

# Freeze the encoder
for param in qmodel.encoder.parameters():
    param.requires_grad = False

## Data loaders

In [None]:
train_loader, test_loader, few_train_x, few_train_y, few_test_x, few_test_y = autoencoder.utils.create_dataloaders(hparams)

## TensorBoard

In [None]:
try:
    # When on Google Colab try to launch an embedded TensorBoard
    from google.colab import drive
    %load_ext tensorboard
    from tensorboard import notebook
    notebook.start('--logdir share/' + hparams['name'] + '/runs/ --port ' + str(hparams['port']))
except:
    pass

## Training

In [None]:
try:
  autoencoder.utils.train(hparams=hparams, \
        model=qmodel, \
        train_loader=train_loader, \
        test_loader=test_loader, \
        few_train_x=few_train_x, few_train_y=few_train_y, \
        few_test_x=few_test_x, few_test_y=few_test_y)
except KeyboardInterrupt:
    print('Exiting from training early')

## Results

A 6 bits quantization needed for increasing the compression ratio to 10.66 was introduced, which impacted the PSNR (going from 44.02 to 43.08 dB). Then the decoder was trained for removing that noise and went from 43.08 dB to 43.4 dB.

# Final results

The model from experiment 7 achieves a 10.6 compression ratio with a 43.4 dB PSNR, being the best choice so far.