## Example of using TorchCNNBuilder for MovingMnist dataset

#### MovingMnist dataset is benchmark for video forecasting task. It is presented by 1000 samples with 20 frames series with numbers which are moving on different trajectories. It can be loaded by [official link](https://www.cs.toronto.edu/~nitish/unsupervised_video/).

In [1]:
import numpy as np
import torch
from matplotlib import pyplot as plt
from torch import nn, optim, tensor
from torch.utils.data import TensorDataset, DataLoader
from torchcnnbuilder.models import ForecasterBase
from torch.optim.lr_scheduler import ReduceLROnPlateau

##### Data preparation include normalization and separating on train and test parts. As features (input) for model first 17 frames are used, as target (output) last 3 frames are used.

It should be noticed, that there is **not cyclic component** in each time-spatial series. So model should learn dynamics of numbers moving by examples from other series (on other numbers). And convolutional layers should help to reproduce view of number by previous frames of series. 

In [2]:
from skimage.transform import resize

data = np.load('data/moving_mnist.npy').astype(np.float32)/255

train_set = data[:, :1000, :, :]
validation_set = data[:, 1000:1500:, :, :]
test_set = data[:, 1500:2000, :, :]

dim = 45

train_set = resize(train_set, (train_set.shape[0], train_set.shape[1], dim, dim))
validation_set = resize(validation_set, (validation_set.shape[0], validation_set.shape[1], dim, dim))
test_set = resize(test_set, (test_set.shape[0], test_set.shape[1], dim, dim))

train_features = train_set[:10, :, :, :]
train_features = np.swapaxes(train_features, 0, 1)
train_target = train_set[10:, :, :, :]
train_target = np.swapaxes(train_target, 0, 1)
train_dataset = TensorDataset(tensor(train_features), tensor(train_target))

validation_features = validation_set[:10, :, :, :]
validation_features = np.swapaxes(validation_features, 0, 1)
validation_target = validation_set[10:, :, :, :]
validation_target = np.swapaxes(validation_target, 0, 1)
validation_dataset = TensorDataset(tensor(validation_features), tensor(validation_target))

##### Model building with simple structure - 5 convolutional and 5 transposed convolutional layers. Resolution of images is 64x64 pixels and also specified

In [3]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Calculation on device: {device}')
model = ForecasterBase(input_size=[dim, dim],
                       in_time_points=10,
                       out_time_points=10,
                       n_layers=5,
                       finish_activation_function=nn.ReLU())
model = model.to(device)

Calculation on device: cuda


##### Set parameters for training. The simple strategy without scheduler is presented. Epochs number and batch size can be changed depend on device and quality requirements.

In [4]:
epochs = 1000000
batch_size = 400
dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)
val_dataloader = DataLoader(validation_dataset, batch_size=batch_size, shuffle=False)
optimizer = optim.AdamW(model.parameters(), lr=0.0001)
criterion = nn.MSELoss()

##### Model training on 8000 samples. Loss values per epoch are saved for convergence visualization  


In [None]:
from tqdm.notebook import tqdm

progress_bar = tqdm(list(np.arange(epochs)), desc="Epoch", colour="white")
info_bar = {"Loss": 0}
losses = []
val_losses = []
epoches = []
for epoch in range(epochs):
    loss = 0
    for train_features, train_targets in dataloader:
        train_features = train_features.to(device)
        train_targets = train_targets.to(device)
        optimizer.zero_grad()
        outputs = model(train_features)
        train_loss = criterion(outputs, train_targets)
        train_loss.backward()
        optimizer.step()
        loss += train_loss.item()
    loss = loss / len(dataloader)
    
    val_loss_value = 0
    for v_train_features, v_train_targets in val_dataloader:
        v_train_features = v_train_features.to(device)
        v_train_targets = v_train_targets.to(device)
        optimizer.zero_grad()
        outputs = model(v_train_features)
        val_loss = criterion(outputs, v_train_targets)
        val_loss.backward()
        optimizer.step()
        val_loss_value += val_loss.item()
    val_loss_value = val_loss_value / len(val_dataloader)
    
    info_bar['Loss'] = np.round(loss, 5)
    info_bar['Validation loss'] = np.round(val_loss_value, 5)
    progress_bar.update()
    progress_bar.set_postfix_str(info_bar)

    losses.append(loss)
    val_losses.append(val_loss_value)
    epoches.append(epoch)
    if epoch%10000==0:
        torch.save(model.state_dict(), f'models_mnist/{dim}_mnist_{epoch}(small).pt')

torch.save(model.state_dict(), f'models_mnist/10to10_{dim}_mnist_{epoch}(small).pt')

Epoch:   0%|          | 0/1000000 [00:00<?, ?it/s]

##### Loss value per epoch visualization. A gradual decrease in loss values indicates that the task has been set correctly.

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
ax1.plot(epoches, losses)
ax1.set_ylabel('MSE Loss')
ax1.set_xlabel('Epoch')
ax1.set_title('Train')
ax2.grid()

ax2.plot(epoches, val_losses)
ax2.set_ylabel('MSE Loss')
ax2.set_xlabel('Epoch')
ax2.set_title('Validation')
ax2.grid()
plt.suptitle('Convergence plot')
plt.tight_layout()
plt.savefig(f'models_mnist/10to10_convergence_{dim}_mnist_{epoch}_(small).png')
plt.show()

### Quality estimation on test set
##### *Loading features and target for test set* 

In [None]:
test_features = test_set[:10, :, :, :]
test_features = np.swapaxes(test_features, 0, 1)
test_target = test_set[10:, :, :, :]
test_target = np.swapaxes(test_target, 0, 1)
print('Data loaded')

##### *MAE (mean absolute error) calculation for each sample of test set*

In [None]:
from pytorch_msssim import ssim

l1_errors = []
ssim_errors = []
for s in range(test_features.shape[0]):
    features = tensor(test_features[s]).to(device)
    prediction = model(features).detach().cpu().numpy()
    target = test_target[s]
    mae = np.mean(abs(prediction - target))
    l1_errors.append(mae)
    ssim_errors.append(ssim(prediction, target))
print(f'Mean MAE for test set = {np.mean(l1_errors)}')  
print(f'Mean SSIM for test set = {np.mean(ssim_errors)}')  

##### Visualization of prediction results on test set (for first 5 samples) 

In [None]:
for s in range(5):
    tensor_features = tensor(test_features[s]).to(device)
    prediction = model(tensor_features).detach().cpu().numpy()
    plt.rcParams["figure.figsize"] = (12, 4)
    fig, axs = plt.subplots(2, 10)
    for i in range(10):
        axs[0][i].imshow(prediction[i])
        axs[1][i].imshow(test_target[s][i])
    plt.suptitle(f'Sample {s}')
    plt.show()

It can be concluded that the predictive ability of such a model with described training scheme is limited to 2 frames, despite the high quality metric. 