# Optimizer Boiler Plate
---
This notebook is a boilerplate to use WandB optimizer to optimize the C-VAE model. Please be noted that the classes and functions that are used here are slightly different from the one that are persented in the ```./src``` folder. All the function here are designed to match **WandB** optimization process.

## Imports

In [1]:
import torch
from torch.nn import functional as F

import torch.nn as nn
from torch import optim as optim

# wandb
import wandb

# misc
import numpy as np
import time

# modules
from src.main_utils import Configuration
from src.evaluation_utils import stroke_visualizer_mix 
from src.thirdHand_data_loader import get_min_max_from_dataset
from src.train_utils import train_model
from src.motion_visualization_tools import compare_orig_rec_gen, show_generated_motions_advanced
from main import create_the_model

___


## WandB Settings

In [None]:
%env "WANDB_NOTEBOOK_NAME" "Optimization_boilerplate.ipynb"
wandb.login()

### Setting Sweep
Sweep is the WandB toolkit for optimizing the model. 

In [3]:
sweep_config = {
    'method': 'random'
    }

metric = {
    'name': 'loss',
    'goal': 'minimize'   
    }

sweep_config['metric'] = metric

### Setting hyper-parameter options
All the hyper parameters that are going to be tested will be presented as a dictionary, Sweep randomly goes through different combinations of these options and report on the progress.

In [4]:
parameters_dict = {
                'optimizer': {'values':['adam', 'rmsprop', 'sgd', 'nadam']}, 
                'first_filter_size': {'values':[5, 6, 7, 8, 9, 10]},
                'latent_dim': {'values':[4, 5, 6, 7, 8]},
                'depth':{'values':[2, 3, 4]},
                'kernel_size': {'values':[3, 5]},
                'dropout': {'values':[0.1, 0.2, .4, .5]},
                'epochs': {'values':[300]}, 
                'learning_rate': {'values':[0.01, 0.001, 0.0001]},
                'batch_size': {'values':[128]},
                'reduction': {'values':['sum']}, #'mean'
                'kld_weight': {'values':[0.1, 1, 10]},
                'rec_loss': {'values':['L1', 'L2']},
            }

sweep_config['parameters'] = parameters_dict

In [5]:
sweep_id = wandb.sweep(sweep_config, project="thirdHand_C_VAE_GitHub_Demo")

Create sweep with ID: uqs31f9g
Sweep URL: https://wandb.ai/ardibid/thirdHand_C_VAE_GitHub_Demo/sweeps/uqs31f9g


---

## C-VAE Dynamic Model Architecture

### Encoder

In [6]:
class Encoder(nn.Module):
    def __init__(self, device, first_filter_size, kernel_size, depth, dropout, latent_dim):
        super(Encoder, self).__init__()
        self.device= device
        self.first_filter_size= first_filter_size
        self.kernel_size= kernel_size
        self.encoder_padding = kernel_size//2 -1
        self.depth = depth 
        self.latent_dim = 2**latent_dim
        self.filter_number = [2**(i) for i in range(first_filter_size+1)]
        self.filter_number.reverse()
        
        self.filter_number = self.filter_number[:self.depth]
        self.last_encoder_filter_size = None
        
        self.dropout = dropout
        self.encoder_layers = self.make_encoder() 
        
        self.last_filter_size = self.filter_number[0]
        self.last_feature_size= (10-(depth*2+1))
        self.last_dim =  self.last_filter_size*self.last_feature_size

        self.flatten_layer = nn.Flatten().to(device)
        self.convert_to_latent = nn.Linear(self.last_dim, 2*self.latent_dim).to(device)
        
    def make_encoder(self):
        encoder_cnn_blocks = []
        
        for i in range(len(self.filter_number)):
            if i ==0:
                in_dim = 20
                out_dim = self.filter_number[i]   
            else:
                in_dim = self.filter_number[i-1]
                out_dim = self.filter_number[i]
                
            cnn_block_layers=[
                            nn.Conv1d(in_channels= in_dim, 
                                    out_channels= out_dim, 
                                    kernel_size= self.kernel_size, 
                                    padding= self.encoder_padding),
                            nn.BatchNorm1d(out_dim),
                            nn.ReLU(),
                            nn.Dropout(self.dropout),
                            ]
            
            cnn_block = nn.Sequential(*cnn_block_layers).to(self.device)
            
            encoder_cnn_blocks.append(cnn_block)
            self.last_encoder_filter_size = out_dim
            
        self.filter_number.reverse()
        
        return nn.ModuleList(encoder_cnn_blocks)
    
    
    def reparametrization(self, mean, log_var):
        """
        Samples from a normal distribution with a given set of
        means and log_vars
        """
        # epsilon is a vector of size (1, latent_dim)
        # it is samples from a Standard Normal distribution
        # mean = 0. and std = 1.
        epsilon = torch.normal(mean= 0, std= 1, size = log_var.shape).to(self.device) 

        # we need to convert log(var) into var:
        var = torch.exp(log_var*0.5)
        # epsilon = torch.randn_like(var)
        # now, we change the standard normal distributions to
        # a set of non standard normal distributions
        z = mean + epsilon*var
        return z
    
    def forward(self, x, y):
        
        for block in self.encoder_layers:
            x = block(x) 

        latent_ready = self.flatten_layer(x) 
        latent = self.convert_to_latent(latent_ready)

        mean = latent[:, : self.latent_dim]
        log_var = latent[:,self.latent_dim:]

        z = self.reparametrization(mean, log_var)

        return z, mean, log_var

### Decoder

In [7]:
class Decoder(nn.Module):
    def __init__(self, device, first_filter_size, kernel_size, depth, latent_dim, last_filter_size, last_feature_size):
        super(Decoder, self).__init__()
        
        self.device= device
        self.first_filter_size= first_filter_size
        self.kernel_size= kernel_size
        self.encoder_padding = kernel_size//2 -1
        self.depth = depth 
        self.latent_dim = 2**latent_dim
        self.last_filter_size= last_filter_size
        self.last_feature_size= last_feature_size
        
        
        
        self.filter_number = [2**(i) for i in range(first_filter_size+1)]        
        self.filter_number = self.filter_number[-self.depth:]
        self.filter_number.reverse()
        
        self.decoder_layers = self.make_decoder()
        self.z_to_decoder = nn.Linear(self.latent_dim+1,self.last_filter_size*self.last_feature_size).to(device)
       
    def make_decoder(self):
        decoder_cnn_blocks = []
        
        for i in range(len(self.filter_number)+1):
            self.decoder_padding = self.encoder_padding
            if i == 0:
                in_dim = self.last_filter_size
                out_dim = self.filter_number[i]
                
            elif i == len(self.filter_number):
                in_dim = self.filter_number[i-1]
                out_dim = 20  
                self.decoder_padding += 1
                
            else:
                in_dim = self.filter_number[i-1]
                out_dim = self.filter_number[i]
                        
            cnn_block_layers = [
                                nn.ConvTranspose1d(in_channels= in_dim, 
                                                    out_channels= out_dim, 
                                                    kernel_size= self.kernel_size, 
                                                    padding= self.decoder_padding,
                                                    ),
                                ]
            
            
            cnn_block = nn.Sequential(*cnn_block_layers).to(self.device)
            decoder_cnn_blocks.append(cnn_block)
            
        return nn.ModuleList(decoder_cnn_blocks)
    
    def forward(self, z, y):
        # print(z.shape, y.shape)
        fused_data =  torch.cat((z,y[:,0,:]), dim=1)
        # print( z.shape, y.shape, fused_data.shape)
        decoded = self.z_to_decoder(fused_data).view(-1, self.last_filter_size, self.last_feature_size)
        
        #decoded = self.z_to_decoder(z).view(-1, self.last_filter_size, self.last_feature_size)
        
        for block in self.decoder_layers:
            decoded = block(decoded)
        
        return decoded 

### C-VAE

In [8]:
class VAE_CNN(nn.Module):
    def __init__(self, device, first_filter_size, kernel_size, depth, dropout, latent_dim, rec_loss, reduction, kld_weight):
        super(VAE_CNN, self).__init__()

        self.encoder = Encoder(
                                device, 
                                first_filter_size, 
                                kernel_size, 
                                depth, 
                                dropout, 
                                latent_dim,
                                )

        self.decoder = Decoder(
                                device, 
                                first_filter_size, 
                                kernel_size, 
                                depth, 
                                latent_dim, 
                                self.encoder.last_filter_size,
                                self.encoder.last_feature_size,
                                )
        
        self.reduction = reduction
        self.kld_weight = kld_weight
        self.rec_loss = rec_loss
        
    def vae_loss_function(self, x, x_rec, log_var, mean):
        if self.rec_loss == "L1":
            train_rec_loss = F.l1_loss(x_rec, x, reduction=self.reduction)   
        else:
            train_rec_loss = F.mse_loss(x_rec, x, reduction=self.reduction)     
        train_kld_loss = torch.mean(-0.5 * torch.sum(1 + log_var - mean**2 - log_var.exp(), dim = 1), dim = 0)

        train_loss = train_rec_loss  + train_kld_loss*self.kld_weight
        
        return train_loss, train_rec_loss, train_kld_loss*self.kld_weight  
    
    def forward(self, x, y):
        z, mean, log_var = self.encoder(x, y)
        x_rec = self.decoder(z, y)
        return x_rec, mean, log_var
        

In [9]:
def vae_loss_function(x, x_rec, log_var, mean, rec_loss, reduction, kld_weight):
    if rec_loss == "L1":
        train_rec_loss = F.l1_loss(x_rec, x, reduction=reduction)   
    else:
        train_rec_loss = F.mse_loss(x_rec, x, reduction=reduction)     
    train_kld_loss = torch.mean(-0.5 * torch.sum(1 + log_var - mean**2 - log_var.exp(), dim = 1), dim = 0)

    train_loss = train_rec_loss  + train_kld_loss*kld_weight
    
    return train_loss, train_rec_loss, train_kld_loss*kld_weight

## Training functions

---
### Only for training outside of **wandb**

In [10]:
def eval_epoch(
                model, 
                config, 
                data_iterator, 
                plot= False, 
                scaled_plot= False, 
                update_tensorboard= False, 
                epoch=None, 
                loss_function= None, 
                show_one_sample= False,
                save_plot= False,
                path_to_save_plot = None,
                is_vae= False,
                rec_loss= "L1", 
                reduction="sum", 
                kld_weight= 1e-1):
    
    
    model.eval()
    batch_eval_loss = 0
    batch_eval_rec_losses= 0 
    batch_eval_kld_losses= 0
    
    item_to_show = np.random.randint(len(config.valid_iterator))
    
    for i, data in enumerate(config.valid_iterator):
        x= data[config.data_item]
        y= data["Y"]
        
        x_rec = model(x, y)
        if is_vae:
            x_rec, mean, log_var = model(x,y)
            eval_loss, eval_rec_losses, eval_kld_losses = loss_function(
                                                                        x, 
                                                                        x_rec, 
                                                                        log_var,
                                                                        mean, 
                                                                        rec_loss= "L1", 
                                                                        reduction='sum', 
                                                                        kld_weight= kld_weight,
                                                                        )
            batch_eval_loss += eval_loss.item()
            batch_eval_rec_losses += eval_rec_losses.item()
            batch_eval_kld_losses += eval_kld_losses.item()
        
        else:
            eval_loss = loss_function(x, x_rec)
            batch_eval_loss += eval_loss.item()
        
        if i == 0:
            if plot or update_tensorboard or save_plot:
                if scaled_plot:
                    if config.data_item == 2:
                        min = data[5][0]
                        max = data[6][0]
                        x = (x + min) * (max-min)
                        x_rec = (x_rec + min) * (max-min)

                    elif config.data_item == 4:
                        min = data[7][0]
                        max = data[8][0]
                        x = (x + min) * (max-min)
                        x_rec = (x_rec + min) * (max-min)               
                    
                fig = stroke_visualizer_mix(x, x_rec)
                
                if plot:
                    fig.show()
                
                if update_tensorboard:
                    fig.write_image(path_to_save_plot) 
                    image =  Image.open(path_to_save_plot)
                    image = np.asarray(image)
                    
                    config.writer.add_image(tag='test_progress', 
                                        img_tensor = image,
                                        global_step= epoch,
                                        dataformats='HWC') 
                if save_plot:
                    fig.write_image(path_to_save_plot) 
                    
    counter = len(config.valid_iterator)   
    
    if is_vae:
        return (batch_eval_loss/counter,
                batch_eval_rec_losses/counter, 
                batch_eval_kld_losses/counter)
    else:     
        return batch_eval_loss/counter               


___
# WandB Process

### Supporting functions

In [11]:
def optimizer_selector(model, optimizer_name, lr):
    if optimizer_name == "adam":
        optimzer = optim.Adam(model.parameters(), lr = lr)
        
    elif optimizer_name == "rmsprop":
        optimzer = optim.RMSprop(model.parameters(), lr = lr)
        
    elif optimizer_name == "sgd":
        optimzer = optim.SGD(model.parameters(), lr = lr)
        
    elif optimizer_name == "nadam":
        optimzer = optim.NAdam(model.parameters(), lr = lr)
        
    return optimzer

def dataset_maker(batch_size, project_config):
    project_config.batch_size = batch_size
    project_config.process_dataset_dataloaders() 
    
    
def train_epoch(model, project_config, optimizer):
    model.train()
    epoch_loss = 0
    epoch_rec_loss = 0
    epoch_kld_loss = 0
    
    for data in project_config.train_iterator:
        x= data[project_config.data_item]
        y= data["Y"]
        
        optimizer.zero_grad()
        x_rec, mean, log_var = model(x,y)
        
        
        train_loss, train_rec_loss, train_kld_loss = vae_loss_function (x, 
                                                                        x_rec, 
                                                                        log_var, 
                                                                        mean, 
                                                                        rec_loss= model.rec_loss, 
                                                                        reduction= model.reduction,
                                                                        kld_weight= model.kld_weight)
        
        # updating the history
        epoch_rec_loss += train_rec_loss.item()
        epoch_kld_loss += train_kld_loss.item()
        epoch_loss += train_loss.item() 
        
        train_loss.backward()
        optimizer.step()
    
        
                   
    counter = len(project_config.train_iterator)  
     
    results = [epoch_loss/counter,
                epoch_rec_loss/counter, 
                epoch_kld_loss/counter]   
  
    return results

---

## Optimization
WandB will use this train function to generate various architectures and train them to figure out which architecture is the best. 

In [12]:
def optimizer_loop(config=None):    
    path_to_save_plot = "runs/progress/tmp_fig.png"
    
    with wandb.init(config=config):
        config = wandb.config # this one basically iterates over the configuration settings

        # going through data, model, and optimizer variations
        dataset_maker(config.batch_size, project_config)

        model = VAE_CNN(device= project_config.device, 
                        first_filter_size= config.first_filter_size, 
                        kernel_size= config.kernel_size, 
                        depth= config.depth, 
                        dropout= config.dropout,
                        latent_dim= config.latent_dim,
                        rec_loss= config.rec_loss, 
                        reduction= config.reduction, 
                        kld_weight= config.kld_weight)

        optimizer= optimizer_selector(model, config.optimizer, config.learning_rate)
        
        # main train loop
        for epoch in range(config.epochs):
            train_loss, train_rec_loss, train_kld_loss = train_epoch(model, 
                                                                    project_config,
                                                                    optimizer)
            
            if epoch % 100 == 99:
                log_plot= True
            else:
                log_plot= False  

            try:
                eval_loss, eval_rec_losses, eval_kld_loss = eval_epoch(
                                                                model= model, 
                                                                config= project_config, 
                                                                data_iterator= project_config.valid_iterator, 
                                                                plot= False,
                                                                scaled_plot= False,
                                                                update_tensorboard= False,
                                                                epoch= epoch,
                                                                loss_function= vae_loss_function,
                                                                show_one_sample= False,
                                                                save_plot= log_plot,
                                                                path_to_save_plot = path_to_save_plot,
                                                                is_vae= True,
                                                                rec_loss= model.rec_loss, 
                                                                reduction=model.reduction, 
                                                                kld_weight= model.kld_weight,
                                                                )
            
                if epoch % 10 == 0:
                    # updating the training progress after each 10 epochs 
                    # we don't do it every epoch to save time
                    wandb.log(dict(loss= train_loss, epoch= epoch))
                    wandb.log(dict(loss_rec = train_rec_loss, epoch= epoch))
                    wandb.log(dict(loss_KLD= train_kld_loss, epoch= epoch))
                    wandb.log(dict(eval_loss=eval_loss,epoch= epoch))
                    wandb.log(dict(eval_rec=eval_rec_losses,epoch= epoch))
                    wandb.log(dict(eval_KLD=eval_kld_loss,epoch= epoch))
                
                if log_plot:
                    # saving an image of the process every 100 epochs
                    # if we do it more than this, it will kill time!
                    wandb.log({"eval_sample":wandb.Image(path_to_save_plot)})
            except:
                print ("something happend to the eval process, some issues with colors")

### Running the optimization process

In [13]:
project_config = Configuration()
wandb.agent(sweep_id, optimizer_loop, count= 1)

Data loaded from 11 filse, stored in a dataframe with shape (248486, 10)
Dataframe headers are: ['px', 'py', 'pz', 'v1x', 'v1y', 'v1z', 'v2x', 'v2y', 'v2z', 'hand']
Data loaded from 2 filse, stored in a dataframe with shape (54402, 10)
Dataframe headers are: ['px', 'py', 'pz', 'v1x', 'v1y', 'v1z', 'v2x', 'v2y', 'v2z', 'hand']


[34m[1mwandb[0m: Agent Starting Run: yreojszz with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	depth: 2
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 300
[34m[1mwandb[0m: 	first_filter_size: 10
[34m[1mwandb[0m: 	kernel_size: 5
[34m[1mwandb[0m: 	kld_weight: 10
[34m[1mwandb[0m: 	latent_dim: 5
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	rec_loss: L2
[34m[1mwandb[0m: 	reduction: sum
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


0,1
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
eval_KLD,▁█▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▆▆▆▇▆▆▆▆▆▆
eval_loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval_rec,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
loss_KLD,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
loss_rec,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
epoch,290.0
eval_KLD,45.14679
eval_loss,864.91986
eval_rec,819.77304
loss,99.95659
loss_KLD,46.21644
loss_rec,53.74015


---

## Inserting the optimized hyper-parameters to make the final model

Once the model is optimized, fill in the blanks and test the model

In [14]:
model, project_config, model_config = create_the_model(device= 'cuda', 
                                                        csv_folder_path= None, 
                                                        tresh_l= 0.289, 
                                                        tresh_h_normal= 0.4, 
                                                        tresh_h_riz= 0.27, 
                                                        dist= 15, 
                                                        peak_dist= 30, 
                                                        motion_fixed_length= 20, 
                                                        data_item="X_centered_scaled",
                                                        batch_size= 128, 
                                                        kernel_size= 5,  
                                                        first_filter_size= 9,  
                                                        depth= 2, 
                                                        dropout= 0.1,
                                                        epochs= 300, 
                                                        latent_dim= 8,
                                                        rec_loss= "L1",
                                                        reduction= "sum",
                                                        kld_weight= 1e-1,
                                                        model_name_to_save= "c_vae_model")

Data loaded from 11 filse, stored in a dataframe with shape (248486, 10)
Dataframe headers are: ['px', 'py', 'pz', 'v1x', 'v1y', 'v1z', 'v2x', 'v2y', 'v2z', 'hand']
Data loaded from 2 filse, stored in a dataframe with shape (54402, 10)
Dataframe headers are: ['px', 'py', 'pz', 'v1x', 'v1y', 'v1z', 'v2x', 'v2y', 'v2z', 'hand']


In [15]:
train_losses ,train_rec_losses,train_kld_losses, eval_losses = train_model(model, project_config, model_config)

Image 0 saved
0:	Total: 6005.45348	Eval loss: 8098.43213	 Rec loss: 5998.88065	 KLD loss: 6.57288	 time: 1.7s
Image 50 saved
50:	Total: 449.63744	Eval loss: 447.07079	 Rec loss: 392.88063	 KLD loss: 56.75681	 time: 10.1s
Image 100 saved
100:	Total: 381.84488	Eval loss: 313.89047	 Rec loss: 324.97815	 KLD loss: 56.86673	 time: 18.1s
Image 150 saved
150:	Total: 354.65965	Eval loss: 287.46407	 Rec loss: 298.74484	 KLD loss: 55.91481	 time: 26.3s
Image 200 saved
200:	Total: 313.29176	Eval loss: 271.97845	 Rec loss: 258.30891	 KLD loss: 54.98285	 time: 34.5s
Image 250 saved
250:	Total: 302.87928	Eval loss: 251.59362	 Rec loss: 248.91885	 KLD loss: 53.96043	 time: 43.4s
Image 300 saved
300:	Total: 296.06186	Eval loss: 244.19494	 Rec loss: 242.87177	 KLD loss: 53.19009	 time: 51.9s


Comparing a sample from the original dataset, the same sample reconstructed, and a sample that is generated with a slight difference.

In [20]:
compare_orig_rec_gen(model, project_config)