### Mount drive
Mount drive to access the dataset and save model checkpoints.  

Set `drive_path` to the directory on your drive which contains various resources for training the NIC model, like *data.zip* - an archive with image features. TensorBoard logs and model checkpoints will go into this directory too.   

In [None]:
from google.colab import drive

drive.mount("/content/drive")
drive_path = "drive/MyDrive/ML/neural_image_caption"

### Install nic
Install the **nic** package into the session.  

In [None]:
!pip install nic

### Extract preprocessed features into the session 
Preprocessed image features shouldn't take up a lot of disk space so they can be extracted from *data.zip* into the current session. If the entire model needs to be trained or evaluated, images will need to be accessed from Drive, if they fit in there.  

In [None]:
import os

os.environ["MSCOCO_DATA_ARCHIVE"] = os.path.join(drive_path, "data.zip")

!unzip "$MSCOCO_DATA_ARCHIVE"

### Define the model
Here we define the decoder module of the model.  

The features size and vocabulary size can be computed from the preprocessed data so only RNN related parameters should be set (as `rnn_options` below).  

In [None]:
import os
from pathlib import Path

from matplotlib import pyplot as plt
import nic
import tensorflow as tf

In [None]:
data_dir = "data"
rnn_options = nic.RNNOptions(size=512)
decoder_name = "nic-decoder"

decoder = nic.define_decoder_model(
    nic.dp.features_size(data_dir),
    nic.dp.vocabulary_size(data_dir),
    rnn_options,
    decoder_name,
)
tf.keras.utils.plot_model(decoder,
                          "decoder.png",
                          show_shapes=True,
                          show_dtype=True)

In [None]:
decoder.summary()

### Train the model
We first compile the model or restore one of its checkpoints (with `start_from_scratch = False`). Then we (extra)train the model. When extra training, make sure to set `initial_epoch` to the number of the last completed epoch and **increase** `max_epochs`.  

To restore the best model so far, set `restore_best` to `True`. Othewise, the latest checkpoint is restored.  

The training algorithm and its related parameters like `decay_patience` and `perplexity_delta` are described below.  

In [None]:
start_from_scratch = True

learning_rate = 0.00001
batch_size = 40
buffer_size = 1024

learning_rate_decay = 0.9
decay_patience = 2
perplexity_delta = 0.01
min_learning_rate = 0.0
early_stop_patience = 2

initial_epoch = 0
max_epochs = 10
shuffle_for_each_epoch = True

tensor_board_dir = os.path.join(drive_path, "tensor_board")
tensor_board_update_freq = 5000

restore_best = False
checkpoint_freq = "epoch"
checkpoint_dir = os.path.join(drive_path, "checkpoints")

Create subdirectories for a training process' TensorBoard logs and checkpoints (if they do not already exist).  

In [None]:
subdir_name = f"lr={learning_rate:.6f}_hs={rnn_options.size}"
tensor_board_subdir = os.path.join(tensor_board_dir, subdir_name)
checkpoints_subdir = os.path.join(checkpoint_dir, subdir_name)

In [None]:
Path(tensor_board_subdir).mkdir(parents=True, exist_ok=True)
Path(checkpoints_subdir).mkdir(parents=True, exist_ok=True)

Now we compile the model or restore a saved model.  

In [None]:
if (start_from_scratch):
    compiled_decoder = nic.compile_model(
        decoder,
        learning_rate
    )
else:
    compiled_decoder = nic.restore_model(
        checkpoints_subdir,
        restore_best
    )

Here we train the decoder module of the model for at most `max_epochs` epochs, possibly shuffling the train data prior to each epoch (`shuffle_for_each_epoch`).  

The initial learning rate is `learning_rate` if the process is started from scratch; restored models come with their optimizers which include the latest learning rate. If the validation perplexity does not improve with at least `perplexity_delta` for `decay_patience` epochs in a row, the learning rate is reduced my multiplying it with `learning_rate_decay` ($lr = decay * lr$). If `early_stop_patience` learning rate changes still lead to no perplexity improvement (or the loss becomes NaN), the training process is terminated.  

TensorBoard logs go to `tensor_board_subdir` with `tensor_board_update_freq` frequency.  

Checkpoints (`SavedModel`s) go to `checkpoints_subdir` with `checkpoint_freq` frequency.  



In [None]:
history, metrics = nic.train_model(
    model=compiled_decoder,
    path_to_data=data_dir,
    is_decoder_only=True,
    batch_size=batch_size,
    buffer_size=buffer_size,
    tensor_board_dir=tensor_board_subdir,
    tensor_board_update_freq=tensor_board_update_freq,
    checkpoint_dir=checkpoints_subdir,
    checkpoint_freq=checkpoint_freq,
    learning_rate_decay=learning_rate_decay,
    decay_patience=decay_patience,
    perplexity_delta=perplexity_delta,
    min_learning_rate=min_learning_rate,
    early_stop_patience=early_stop_patience,
    max_epochs=max_epochs,
    shuffle_for_each_epoch=shuffle_for_each_epoch,
    initial_epoch=initial_epoch
)

In [None]:
print("Test metrics:")

for name, value in metrics.items():
    print(f"{name}: {value:.4f}")

### Plots
Here we plot the training history and view TensorBoard logs.  

In [None]:
def plot(history, metric, title=None):
    y = history.history[metric]

    if (title is None):
        title = metric.capitalize()

    plt.plot(y)
    plt.xlabel('Epochs')
    plt.ylabel(title)
    plt.title(f"{title} plot")
    return plt.show()

In [None]:
plot(history, "loss")

In [None]:
plot(history, "val_loss", title="Validation loss")

In [None]:
plot(history, "perplexity")

In [None]:
plot(history, "val_perplexity", title="Validation perplexity")

In [None]:
plot(history, "lr", title="Learning rate")

Copy the output of the next cell and paste it inside the quotes after `--logdir`.  

In [None]:
tensor_board_subdir

In [None]:
%load_ext tensorboard
%tensorboard --logdir ""

### Evaluate
Here we evaulate the decoder module by computing its [BLEU-4](https://aclanthology.org/P02-1040.pdf) score on test and validation images.  

In [None]:
data_types = [
    "test",
    "val",
]
scores = dict()

for t in data_types:
    scores[t] = nic.bleu_score_of(
        compiled_decoder,
        is_decoder_only=True,
        path_to_data=data_dir,
        data_type=t,
        batch_size=32,
        caption_limit=100
    )
    print(f"Model {t} BLEU score: {scores[t]:.2f}")

### Generate captions
At this point we can generate captions for images which are not necessarily part of MSCOCO.  

We need to set `path_to_images` to a directory storing images to be captioned. The images can be in JPEG and PNG format.  

We also need to connect the decoder module with the CNN encoder.  



In [None]:
from PIL import Image

path_to_images = os.path.join(drive_path, "images")

In [None]:
nic_model = nic.connect(
    compiled_decoder,
    encoder_model=None,
    image_shape=(299, 299, 3)
)
tf.keras.utils.plot_model(nic_model,
                          "nic.png",
                          show_shapes=True,
                          show_dtype=True)

In [None]:
nic_model.summary()

In [None]:
image_paths = [
    os.path.join(path_to_images, image_name)
    for image_name in os.listdir(path_to_images)
]
image_paths

In [None]:
captions = list(nic.generate_captions_from_paths(
    image_paths,
    nic_model,
    data_dir,
    batch_size=32,
    caption_limit=100
))
captions

Let's view one of the images and the caption generated for it.  
Pick an image by specifying its index (`image_index`) in the list of paths.  

In [None]:
image_index = 0
assert 0 <= image_index < len(image_paths)
image_path = image_paths[image_index]
image_path

In [None]:
Image.open(image_path)

In [None]:
captions[image_index]