<a href="https://colab.research.google.com/github/aachen6/deepTC/blob/master/colab/deepTC_net_image.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DeepTC - Post-binding Architecture

The objective of *deepTC* can be found on [deepTC github page](https://github.com/aachen6/deepTC/) for new readers. Now, we have the best track and satellite image dataset of the historical TCs ready from the first two notebooks. This notebook will cover architecture of *deepTC*, which is based on *pytorch*. To explore different deep neural network architectures efficiently,  *deepTC* features post-binding deep neutral network architecture from a configuration file without revising a single line of the code. 

1. Data Preprocess
   - 1.1 [Satellite images and tracks of TCs](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_images_tracks_sync.ipynb)

   - 1.2 [Statistics of satellite images and tracks](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_images_tracks_stats.ipynb)

2. Model for TC Image
   - **2.1 [Post-binding architecture of TC image](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_net_image.ipynb)**

   - 2.2 [CNN model for intensity classification](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_classification_cnn5.ipynb)

   - 2.3 [Resnet model for intensity classification](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_classification_resnet.ipynb)

   - 2.4 [Resnet model for TC intensity estimation](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_intensity_resnet.ipynb)

3. Generversial Model for TC images
   - 3.1 [DCGAN model for deepTC](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_dcgan.ipynb)

4. Model for TC Track
   - 4.1 [Post-binding architecture of TC Track](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_net_track.ipynb)
    
   - 4.2 [LSTM model for TC track prediction](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_track_lstm.ipynb)
 
   - 4.3 [LSTM-CNN model for TC track prediction](https://github.com/aachen6/deepTC/blob/master/colab/deepTC_track_lstmcnn.ipynb)


Let's start with importing the necessary python modules, particularly installing pytorch with GPU support on Google Colab.

In [0]:
# basics
import os
import yaml
import pickle
import numpy as np
import pandas as pd
from copy import deepcopy

# intall an early verison of pillow, as the latest pillow cause an error 
# when loading images from zip file
!pip install pillow==4.1.1

# handling the images
from PIL import Image
from zipfile import ZipFile

# install and load pytorch
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

import torch
torch.backends.cudnn.enabled = False  # it seems that cudnn doesn't work

Collecting pillow==4.1.1
[?25l  Downloading https://files.pythonhosted.org/packages/36/e5/88b3d60924a3f8476fa74ec086f5fbaba56dd6cee0d82845f883b6b6dd18/Pillow-4.1.1-cp36-cp36m-manylinux1_x86_64.whl (5.7MB)
[K    100% |████████████████████████████████| 5.7MB 7.1MB/s 
Installing collected packages: pillow
  Found existing installation: Pillow 4.0.0
    Uninstalling Pillow-4.0.0:
      Successfully uninstalled Pillow-4.0.0
Successfully installed pillow-4.1.1


##Post-Binding Deep Neutral Network

It's beneficial to test different architectures of deep netural network. To make the process more efficient, we decouple the model construction from the code by post-binding the deep neutral network from a configuration file. A *YAML* configuration file is used to define the architecture of the deep netural network. Two classes are created to construct the deep neutral network based on the *YAML* configuration file, i.e. a static class mapping method string names to pytorch methods or class instances and a pytorch module subclass to generate the model instance. Currently, sequential model with an extension to have residual block/net is implemented. It is straightforward to extend the idea to include more complex deep neutral network architectures. 

The first class *PyTorchCall* is simply a static class that maps pytorch methods based on their string names with the corresponding arguements. Only the necessary methods for the current application are included at this moment. The implementation is very straightforward utilizing *python getattr* method. 

~~~python
class PyTorchCall:
     def map_torch_call(func_str):
          return getattr(PytorchCall, '_' + func_str)
     # any pytorch calls to be added below
~~~

In [0]:
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class PyTorchCall:

    # will update to unroll function variables using **kwarg
  
    @staticmethod
    def map_torch_call(func_str): return getattr(PyTorchCall, '_' + func_str)

    # pytorch nn calls
    @staticmethod 
    def _linear(args): return nn.Linear(*args['args'], **args['kwargs'])
    @staticmethod
    def _dropout(args): return nn.Dropout(*args['args'], **args['kwargs'])
    @staticmethod
    def _conv2d(args):
        return nn.Conv2d(*args['args'], **args['kwargs'])
    @staticmethod
    def _deconv2d(args):
        return nn.ConvTranspose2d(*args['args'], **args['kwargs'])
    @staticmethod
    def _upsample(args):
        return nn.Upsample(*args['args'], **args['kwargs'])
    @staticmethod
    def _batchnorm2d(args):
        return nn.BatchNorm2d(*args['args'], **args['kwargs'])
    @staticmethod     
    def _avgpool2d(args):
        return nn.AvgPool2d(*args['args'], **args['kwargs'])
    @staticmethod    
    def _maxpool2d(args):
        return nn.MaxPool2d(*args['args'], **args['kwargs'])

    @staticmethod
    def _relu(args):
        return nn.ReLU(*args['args'], **args['kwargs'])
    @staticmethod
    def _tanh(args):
        return nn.Tanh(*args['args'], **args['kwargs'])
    @staticmethod
    def _sigmoid(args):
        return nn.Sigmoid(*args['args'], **args['kwargs'])
    @staticmethod 
    def _leakyrelu(args):
        return nn.LeakyReLU(*args['args'], **args['kwargs'])

    # pytorch functional
    #@staticmethod
    #def _relu(args): return F.relu
    @staticmethod 
    def _batchnorm(args): return F.batch_norm
    @staticmethod
    def _softmax(args): return F.softmax

    @staticmethod
    def _view1d(args): return  
    @staticmethod
    def _view2d(args): return
      
    # pytorch loss
    @staticmethod
    def _mseloss(): return nn.MSELoss() 
    @staticmethod
    def _bceloss(): return nn.BCELoss()
    @staticmethod
    def _crossentropy(): return nn.CrossEntropyLoss() 
		
    # pytorch optimiter
    @staticmethod
    def _adam(): return optim.Adam
		

The second class *YML2Model* inherits pytoch *nn.Module* that is designed to generate pytorch model instance based on the *YAML* configuration file. The configuration file defines each layer of deep neutral network according to pytorch *nn.Module*. An example of two-layer convolution network is shown below,

~~~yaml
model: 
 - cnn2
cnn2:
  - layer1-sequential: # groupd into sequential but can be expanded out into each layer
    - conv2d:
        args: [1, 32, 3]
        kwargs: [padding: 1, stride: 1]
    - maxpool2d:
        args: [2]
        kwargs: [padding: 0, stride: 2]
    - relue:
        args: []
        kwargs: {}
  - layer2-sequential:
    - conv2d:
        args: [32, 32, 3]
        kwargs: [padding: 1, stride: 1]
    - maxpool2d:
        args: [2]
        kwargs: [padding: 0, stride: 2]
    - relue:
        args: []
        kwargs: {}        
  - layer3-view:
      args: []
      kwargs: {}
  - layer4-linear:
      args: [8192, 10]
      kwargs: {}
~~~

The pytorch module subclass should inherit *nn.Mudule* and define the following methods:

~~~python
__init__ : # define layer as class variable
__forward__: # forward loop for the network 
~~~
The *init* method construct each layer as its class variable, which are eventually used in the forward method. *View* method is included as a layer which is handled seperately. Alternatively, this method can be wrapped into a separate pytorch *nn.Module* subclass. Since the pytorch *nn.Module* is sequential, a residual block class *YML2Resblock* that inherits pytorch *nn.Module* is also defined to enable residual network. It is straightforward to extend the idea to generate more complex deep neutral network architectures. 

In [0]:
class YML2Model(torch.nn.Module):
    
    def __init__(self, config, model_name):
      
        super(YML2Model, self).__init__()
      
        self.layers = []
        cfg_model = config[model_name]
        for lyr_cfg in cfg_model:
            # get current layer name, typc, and arguments
            lyr_key = list(lyr_cfg.keys())[0]
            [lyr_name, lyr_type] = lyr_key.split('-')
            lyr_args = lyr_cfg[lyr_key]
            
            # residual block
            if lyr_type == 'resblock':     # resblock
                lyr_module = YML2Resblock(config['resblocks'][lyr_args])
            
            # sequential layer
            elif lyr_type == 'sequential': # layer in sequential
                modules = []
                for row in lyr_args:
                    r_func = list(row.keys())[0]
                    r_args = row[r_func]                  
                    r_module = PyTorchCall.map_torch_call(r_func)(r_args)
                    modules.append(r_module)
                    if r_func == 'lstm': 
                        self.hidden_dim = r_args['args'][1]  
                        self.num_layers = r_args['kwargs']['num_layers']
                        #self.init_hidden(1, r_func) # need to take care of bidirection
                lyr_module = nn.Sequential(*modules)
                               
            else: # individual nn.module
                lyr_module = PyTorchCall.map_torch_call(lyr_type)(lyr_args) 
               
            # register layer to the class
            setattr(self, lyr_name, lyr_module)
            self.layers.append([lyr_type, lyr_args, getattr(self, lyr_name)])
            
            
    def forward(self, x):
 
        # implementation of Module forward method
        for lyr_type, ly_arg, lyr_module in self.layers:
            # print (lyr_module)
            if lyr_type == 'view1d': 
                n = int(np.prod(x.size()[1:]))
                x = x.view(-1, n)
            elif lyr_type == 'view2d':
                kwargs = ly_arg['kwargs']
                x = x.view(x.size()[0], kwargs['channel'], kwargs['size'], kwargs['size'])
            else:  # nn.module instance
                x = lyr_module(x)
                
        return x     


This is the implementation of residual block class, which is essentialy similar to *YML2Network* class, execpt that it allows non-sequential link of the pytorch *nn.Module*. The details will be covered later in the implementation of the residual network.

In [0]:
class YML2Resblock(torch.nn.Module):
    
    def __init__(self, cfg_block):
      
        super(YML2Resblock, self).__init__()
        
        self.layers = []
        for cfg_lyr in cfg_block:
            # get current layer name, typc, and arguments
            lyr_key = list(cfg_lyr.keys())[0]
            [lyr_name, lyr_type] = lyr_key.split('-')
            lyr_args = cfg_lyr[lyr_key]
            if lyr_type == 'residual':
                row = lyr_args[0]
                r_func = list(row.keys())[0]
                r_args = row[r_func]
                lyr_module = PyTorchCall.map_torch_call(r_func)(r_args)

            # sequential layer
            elif lyr_type == 'sequential':  # individaul layer
                modules = []
                for row in lyr_args:
                    r_func = list(row.keys())[0]
                    r_args = row[r_func]                  
                    r_module = PyTorchCall.map_torch_call(r_func)(r_args)
                    modules.append(r_module)
                lyr_module = nn.Sequential(*modules)
                              
            else: #  torch module
                lyr_module = PyTorchCall.map_torch_call(lyr_type)(lyr_args) 
               
            # register layer to the class
            setattr(self, lyr_name, lyr_module)
            self.layers.append([lyr_type, getattr(self, lyr_name)])
            
            
    def forward(self, x):
      
        residual = x
        
        for i, (lyr_type, lyr_module) in enumerate(self.layers):
            if lyr_type == 'residual':  
                residual = lyr_module(x)
                break
                
            if i==0: 
                out = lyr_module(x)
            else:  # nn.module instance
                out = lyr_module(out)
                       
        out += residual
        
        if self.layers[-1][0]=='relu':
            out = self.layers[-1][1](out)
        
        return out

## Storm Dataset Class

A lot of effort in solving any machine learning problem goes in to preparing the data. To make the process easy, this class inherits pytorch *Dataset* class and handles data preparation before feeding into the training. According to pytorch document, the custom dataset class should inherit *Dataset* and override the following methods:

~~~python
__len__ : # so that len(dataset) returns the size of the dataset.
__getitem__: # to support the indexing such that dataset can be used to get ith sample
~~~

The *getitem* method returns training input and target, and the index for data is also returned which can be used later during the post-processing, e.g. to idenfity mis-classified images from confusion matrix. A data split method is also implemented to split the data into training, validation, and test set. As shown earlier, the number of samples for each class are not well-balanced, therefore, label-aware can be enabled in the split method so that the ratio of sample size of each class is preserved within training, validation, and testing sets. 


In [0]:
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision.transforms import transforms
from torch.utils.data.sampler import SubsetRandomSampler

class ImageDataSet(Dataset):
    
    def __init__(self, config, transform=None, cats=None, hotstart=False):
        
        self.model_object = config['model_object']
        self.batch_size = config['batch_size']
        self.num_workers = config['num_workers']
        
        # read input files
        f_storm_msg = config['f_storm_msg']
        self.pd_storm = pd.read_msgpack(f_storm_msg)
        
        # load image archive
        f_image_zip = config['f_image_zip']
        self.img_archv = ZipFile(f_image_zip, 'r')
        
        # define transformation    
        self.transform = transform
        
        # train valid test split
        f_data_yml = config['f_data_yml']
        if hotstart: # read from file
            with open(f_data_yml, 'rb') as fp: 
                data = yaml.load(fp)
                self.data_indices = data['indices']
                if 'class' in self.model_object:
                    self.one_hot_key = data['one_hot_key'] 
                    self.one_hot_rev = data['one_hot_rev']  
        else: # create a new split
            if 'class' in self.model_object:
                # define one hot key map
                self.one_hot_key = {}
                self.one_hot_rev = {}
                if cats is None:
                    cats = set(self.pd_storm[b'cat'].tolist())
                    cats = sorted(list(cats))
                for i, cat in enumerate(sorted(cats)):
                    self.one_hot_key[cat] = i
                    self.one_hot_rev[i] = cat
            
            # train validation split
            self.data_indices = self.train_valid_test(config)

            data = {} # save dataset
            data['indices'] = self.data_indices  
            if 'class' in self.model_object:
                data['one_hot_key'] = self.one_hot_key
                data['one_hot_rev'] = self.one_hot_rev            
                      
            with open(f_data_yml, 'w') as fp: 
                yaml.dump(data, fp)
            
        # summary of dataset
        valid_pct = 1.- config['valid_pct']
        test_pct = 1. - config['test_pct']
        n_train = len(self.data_indices['train'])
        n_valid = len(self.data_indices['valid'])
        n_test = len(self.data_indices['test'])
        batch = self.batch_size
        
        divider = '-' * 36
        header  = '{:<10s}{:>10s}{:>10s}{:>10s}'
        record1 = '{:<10s}{:>10.2f}{:>10.2f}{:>10.2f}'
        record2 = '{:<10s}{:>10d}{:>10d}{:>10d}'

        print (divider)
        print ('summary of dataset')
        print (divider)
        print (header.format(' ', 'train', 'valid', 'test')) 
        print (record1.format('percent', test_pct*valid_pct, test_pct*(1-valid_pct), 1-test_pct))
        print (record2.format('size', n_train, n_valid, n_test))
        print (record2.format('batch', int(n_train/batch), int(n_valid/batch), int(n_test/batch)))

        return            
        
        
    def __len__(self):
        return self.pd_storm[0].count()
            

    def __getitem__(self, idx):
 
        row = self.pd_storm.iloc[idx]
        cat = row[b'cat']

        if 'class' in self.model_object:
            target = self.one_hot_key[cat]   # no need to create one hot for pytorch
        else:
            target = [row[b'wind'], row[b'pres']]
                
        image = row[b'image'].decode('utf-8')
        temp = image.split('.')[0].split('_')
        f_image = temp[0] + '_' + temp[1] + '.jpg'
        
        sample = Image.open(self.img_archv.open(f_image))
        if self.transform is not None: 
            sample = self.transform(sample)
                
        return idx, sample, target 
		
    
    def random_split(self, indices, pct, label_aware=True, shuffle=True, seed=64):
      
        # creating data indices for two splits:
        indices_1 = []  # first half of the indices
        indices_2 = []  # second half of the indices
        cats = set(self.pd_storm[b'cat'].tolist()) if label_aware else ['all']
        pd_sub = self.pd_storm.iloc[indices]
        for cat in cats:
            sub_indices = pd_sub[pd_sub[b'cat']==cat].index.tolist() if label_aware else indices
            if shuffle: np.random.seed(seed)
            np.random.shuffle(sub_indices)

            isplit  = int(np.floor(len(sub_indices)*pct))            
            indices_1 = indices_1 + sub_indices[:isplit]
            indices_2 = indices_2 + sub_indices[isplit:]
        
        return indices_1, indices_2

      
    def train_valid_test(self, config):
        
        data_indices = {} 
        valid_pct = 1.- config['valid_pct']
        test_pct = 1. - config['test_pct']
        label_aware = config['label_aware']
        shuffle = config['shuffle']
        seed = config['seed']
        
        indices = self.pd_storm.index.tolist()
        
        if config['test_pct'] is None:
            test_indices = None
            train_indices, valid_indices = self.random_split(indices, valid_pct, label_aware, shuffle, seed)
        else:
            _indices, test_indices = self.random_split(indices, test_pct, label_aware, shuffle)
            train_indices, valid_indices = self.random_split(_indices, valid_pct, label_aware, shuffle, seed) 
            
        data_indices['train'] = train_indices
        data_indices['valid'] = valid_indices
        data_indices['test'] = test_indices
            
        return data_indices

      
    def load_data(self):
      
        data_split = {} 
        
        batch_size = self.batch_size
        num_workers = self.num_workers
       
        train_indices = self.data_indices['train']
        valid_indices = self.data_indices['valid']
        test_indices = self.data_indices['test']
            
        train_sampler = SubsetRandomSampler(train_indices)
        valid_sampler = SubsetRandomSampler(valid_indices)

        data_split['train'] = DataLoader(self, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers)
        data_split['valid'] = DataLoader(self, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers)        
        
        if test_indices is None:
            data_split['test'] = None 
        else:
            test_sampler = SubsetRandomSampler(test_indices)  
            data_split['test'] = DataLoader(self, batch_size=batch_size, sampler=test_sampler, num_workers=num_workers)
        
        return data_split
      
     
    def normalization_factor(self, sample_a, sample_b):
      
        (n_a, mean_a, std_a) = sample_a
        (n_b, mean_b, std_b) = sample_b
  
        n_c = n_a + n_b
        mean_c = n_a*mean_a + n_b*mean_b
        mean_c = mean_c/n_c
  
        numerator = (n_a-1)*std_a**2. + (n_b-1)*std_b**2. + \
                    n_a*(mean_a-mean_c)**2. + n_b*(mean_b-mean_c)**2.
  
        denorminator = n_c - 1
  
        std_c = np.sqrt(numerator/denorminator)
  
        return np.array([n_c, mean_c, std_c])

##Image Trainer Class

This is when things start to get interesting. The trainer class links everything togther and perform training to optimize the network based on loss objective. During initilization of the trainer instance,  parameters like number of epoch, batch size, loss funcation etc. are passed from the configuration file. Some state parameters are also initialized to document the training state for model assessment, such as training_batch_loss, training_batch_accuracy etc. The model net is passed to the trainer class, and the loss function and optimizer are initialized based on the configuration file. Finally, we simply have to loop over our data iterator, and feed the inputs to the network and optimize. 
~~~python
for i_epoch in range(self.max_epochs):
    for i_batch, (_, images, labels) in enumerate(data['train']): 
        self.optimizer.zero_grad()  # set the gradient to zero 
        predicts = self.model(images)  # make prediction
        loss = self.criterion(predicts, labels)  # calculate loss 
        loss.backward() # backpropagation to get the weight update
        self.optimizer.step() # update weight using the optimizer
~~~
During the training, we follow the approach similar to [this tutorial](https://github.com/GokuMohandas/practicalAI/blob/master/notebooks/11_Convolutional_Neural_N`etworks.ipynb) by Goku Mohandas to calculate the running epoch loss and accuracy. 
~~~python
batch_accu = self.accuracy(predicts, labels)
epoch_accu += (batch_accu - epoch_accu) / (i_batch + 1)
~~~              
Depending on the number of parameters, it may take a long time to run. It would be cost-efficient to detect early stopping if the proposed architecture does work well for the problem.  In order to check the process and performance of the training while it's running, two methods are implemented, i.e. a method to show progress bar for each epoch and a method for dynamic visualiation of batch and running epoch loss and accuracy. Those methods embed html in the notebook for dynamic refresh.

~~~python
def html_loss_plot(self, image):
    return  HTML("<img src='{0}'/>".format(image))
  
def html_progress(self, var, value, max=100):
    return HTML("""{var}: <progress value='{value}' max='{max}', style='width: 80%'>{value}
                            </progress>""".format(var=var, value=value, max=max))
 ~~~
During the training, the state parameters will be saved into a file for post-process and/or hotstart the training. The model dict state will be saved as well using *torch.save* method.



In [0]:
import io
import base64
import matplotlib.pyplot as plt
from IPython.display import HTML
from IPython.display import display

class ImageTrainer(object):
    def __init__(self, params, model, hotstart=False):
        # CUDA for PyTorch
        self.use_cuda = torch.cuda.is_available()
        self.device = torch.device('cuda:0' if self.use_cuda else 'cpu')
        if self.device!='cpu':
            divider = '-' * 36
            print(divider)
            print('summary of GPU')
            print(divider)
            print(torch.cuda.get_device_name(0))
            print('Memory Usage:')
            print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
            print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')
        else:
            print('training with cpu')   
 
        # model objective
        self.model_object = params['model_object']

        # hyper params
        self.max_epochs = params['max_epochs']
        self.batch_size = params['batch_size']
        self.r_learning = params['r_learning']
        self.loss_func  = params['loss_func']
        self.optim_func = params['optimizer']
 
        # path for output
        self.f_state    = params['f_state_yml']
        self.f_model    = params['f_model_pth']
        self.f_test     = params['f_test_yml']
        
        # training state 
        self.state = {'stop_early':   False,
                      'stop_criteria': 99.9,
                      'stop_step':    0,
                      'epoch_index':  0,
                      'best_epoch':   -1,
                      'best_accu' :   -1,
                      'test_loss':    -1,
                      'test_accu':    -1,
                      'train_epoch_loss': [],
                      'train_epoch_accu': [],
                      'train_batch_loss': [],
                      'train_batch_accu': [],
                      'valid_epoch_loss': [],
                      'valid_epoch_accu': [],
                      'valid_batch_loss': [],
                      'valid_batch_accu': []}

        # model
        self.model = model.to(self.device)
        # loss
        self.criterion = PyTorchCall.map_torch_call(self.loss_func)()
        # optimizer
        self.optimizer = PyTorchCall.map_torch_call(self.optim_func)()
        self.optimizer = self.optimizer(model.parameters(), lr=self.r_learning)
        
        if hotstart: # hotstart 
            model_state = torch.load(params['f_model_pth'])
            self.model.load_state_dict(model_state['model_state_dict'])
            for key, value in self.state.items():
                self.state[key] = model_state[key]
    

    def train_loop(self, data):
        
        divider = '-' * 36
        header  = '{:<10s}{:>10s}{:>10s}'
        record  = '{:<10s}{:>10.3f}{:>10.3f}'
        print (divider)
        print ('training')
        print (divider)
        
        loss_plot = display(self.html_loss_plot('PLOT'), display_id=True)
        
        # loop over epochs
        for i_epoch in range(self.max_epochs):
            
            self.state['epoch_index'] = i_epoch
            
            # training
            epoch_loss = 0.
            epoch_accu = 0.
            self.model.train()
            bar_train = display(self.html_progress('Train', 0, 100), display_id=True)
            for i_batch, (_, inputs, targets) in enumerate(data['train']):

                # transfer to GPU
                inputs = inputs.to(self.device)
                targets = targets.to(self.device)
                
                # model computations
                self.optimizer.zero_grad()
                predicts = self.model(inputs)
 
                loss = self.criterion(predicts, targets)
                batch_loss = loss.item()
                epoch_loss += (batch_loss - epoch_loss) / (i_batch + 1)
                
                loss.backward()
                self.optimizer.step()
                
                batch_accu = self.accuracy(predicts, targets)
                epoch_accu += (batch_accu - epoch_accu) / (i_batch + 1)
                
                self.state['train_batch_loss'].append(batch_loss)
                self.state['train_batch_accu'].append(batch_accu)

                pct_done = (i_batch+1)/len(data['train'])*100
                bar_train.update(self.html_progress('Train', pct_done, 100))

            self.state['train_epoch_loss'].append(epoch_loss)
            self.state['train_epoch_accu'].append(epoch_accu)
  
            # validation
            epoch_loss = 0.
            epoch_accu = 0.
            self.model.eval()
            bar_valid = display(self.html_progress('Valid', 0, 100), display_id=True)
            for i_batch, (_, inputs, targets) in enumerate(data['valid']):
                # transfer to GPU
                inputs = inputs.to(self.device)
                targets = targets.to(self.device)
                
                # model computations
                self.optimizer.zero_grad()
                predicts = self.model(inputs)
 
                loss = self.criterion(predicts, targets)
                batch_loss = loss.item()
                epoch_loss += (batch_loss - epoch_loss) / (i_batch + 1)
                
                loss.backward()
                self.optimizer.step()
                
                batch_accu = self.accuracy(predicts, targets)
                epoch_accu += (batch_accu - epoch_accu) / (i_batch + 1)
                
                self.state['valid_batch_loss'].append(batch_loss)
                self.state['valid_batch_accu'].append(batch_accu)
                
                pct_done = (i_batch+1)/len(data['valid'])*100
                bar_valid.update(self.html_progress('Valid', pct_done, 100))                
                
            self.state['valid_epoch_loss'].append(epoch_loss)
            self.state['valid_epoch_accu'].append(epoch_accu)
            
            # epoch summary
            if i_epoch%1==0:
                print (divider)
                print ('summary of epoch:', i_epoch)
                print (divider)
                print (header.format(' ', 'loss', 'accurary')) 
                print (record.format('train', self.state['train_epoch_loss'][-1], self.state['train_epoch_accu'][-1]))
                print (record.format('valid', self.state['valid_epoch_loss'][-1], self.state['valid_epoch_accu'][-1]))
                uri = self.update_loss_plot()
                loss_plot.update(self.html_loss_plot(uri))
                print (' ')
            
            self.update_save_state()
            if self.state['stop_early']: break
             
        
    def test_loop(self, data, apply_softmax=True):
        total = 0
        correct = 0
        self.model.eval()
        with torch.no_grad():
            for i_batch, (idxs, images, labels) in enumerate(data['test']):
                images = images.to(self.device)
                labels = labels.to(self.device)
                predicts = self.model(images)
                if apply_softmax:
                    #print (np.array(predicts.data)[0])
                    predicts  = F.softmax(predicts, 1) 
                    #print (np.array(predicts.data)[0])
                    #break

                _, predicted = torch.max(predicts.data, 1)
            
                if i_batch==0: 
                    test_idxs = idxs.data
                    test_labels = labels.data
                    test_predicts = predicted.data
                else:
                    test_idxs = torch.cat((test_idxs, idxs.data))
                    test_labels = torch.cat((test_labels, labels.data))
                    test_predicts = torch.cat((test_predicts, predicted.data))
            
                total += labels.size(0)
                correct += (predicted==labels).sum()
            
        test_idxs = test_idxs.cpu().detach().numpy()
        test_labels = test_labels.cpu().detach().numpy()
        test_predicts = test_predicts.cpu().detach().numpy()
       
        # test summary
        divider = '-' * 36
        print(divider)
        print('summary of test')
        print(divider)
        print('{:<10s}{:>10s}{:>10s}'.format('', 'total', 'accuracy'))
        print('{:<10s}{:>10d}{:>10.3f}'.format('test', total, 100. * correct / total))    
        
        # save test
        test_results = {}
        test_results['idxs'] = test_idxs
        test_results['labels'] = test_labels
        test_results['predicts'] = test_predicts
        test_results['accuracy'] = float(100 * correct / total)
        
        with open(self.f_test, 'w') as fp:
            yaml.dump(test_results, fp)
            
        return test_results
      
        
    def accuracy(self, predicts, targets):
      
        if self.model_object=='classification': 
            _, predicts_indices = predicts.max(dim=1)
            n_correct = torch.eq(predicts_indices, targets).sum().item()
            return n_correct / len(predicts_indices) * 100
        
        if self.model_object=='regression':
            _, predicts_indices = predicts.max(dim=1)
            n_correct = torch.eq(predicts_indices, targets).sum().item()
            return n_correct / len(predicts_indices) * 100
          
   
    def html_loss_plot(self, image):
        
        h = HTML("<img src='{0}'/>".format(image))
    
        return h

    
    def html_progress(self, var, value, max=100):
      
        h = HTML("""{var}: <progress value='{value}' max='{max}', style='width: 80%'>{value}
                           </progress>""".format(var=var, value=value, max=max))
    
        return h
       
      
    def update_loss_plot(self):
        
        train_batch_loss = self.state['train_batch_loss']
        train_batch_accu = self.state['train_batch_accu']
        train_epoch_loss = self.state['train_epoch_loss']
        train_epoch_accu = self.state['train_epoch_accu']

        ntb = len(train_batch_loss)
        nte = len(train_epoch_loss)
        nnn = ntb/nte 
        xtb = np.arange(ntb)/nnn 
        xte = np.arange(nte) + 1

        valid_batch_loss = self.state['valid_batch_loss']
        valid_batch_accu = self.state['valid_batch_accu']
        valid_epoch_loss = self.state['valid_epoch_loss']
        valid_epoch_accu = self.state['valid_epoch_accu']

        nvb = len(valid_batch_loss)
        nve = len(valid_epoch_loss)
        nnn = nvb/nve 
        xvb = np.arange(nvb)/nnn 
        xve = np.arange(nve) + 1

        fig, axes = plt.subplots(2,2, figsize=(8,6))
        axes[0,0].plot(xtb, train_batch_loss)
        axes[0,0].plot(xte, train_epoch_loss)
        axes[0,1].plot(xtb, train_batch_accu)
        axes[0,1].plot(xte, train_epoch_accu)
        axes[1,0].plot(xvb, valid_batch_loss)
        axes[1,0].plot(xve, valid_epoch_loss)
        axes[1,1].plot(xvb, valid_batch_accu)
        axes[1,1].plot(xve, valid_epoch_accu)

        bio = io.BytesIO()
        fig.savefig(bio, format='png')
        bio.seek(0)
        uri = 'data:image/png;base64,' + base64.encodebytes(bio.getvalue()).decode()

        plt.close()

        return uri
      
      
    def update_save_state(self):
        
        # save state
        with open(self.f_state, 'w') as fp:
            yaml.dump(self.state, fp)
            
        # save model
        if self.state['epoch_index']==0:
            self.state['best_accu'] = 0 
            self.state['best_epoch'] = self.state['epoch_index']

        cur_accu = self.state['valid_epoch_accu'][-1]
        if self.state['best_accu']<cur_accu:
            self.state['best_accu'] = cur_accu
            self.state['best_epoch'] = self.state['epoch_index']
            if cur_accu>self.state['stop_criteria']: self.state['stop_early'] = True
            # save the model
            state_cp = deepcopy(self.state)
            state_cp['model_state_dict'] = self.model.state_dict()
            state_cp['optim_state_dict'] = self.optimizer.state_dict()
            torch.save(state_cp, self.f_model)
            

##Storm Inference Class

Once the model is trained and optimized, this class will initiate the model based on the configuration file and state dict file and predict for new samples, which is straightforward.

In [0]:
class ImageInference(object):
  
    def __init__(self, config, model_name):
        # CUDA for PyTorch
        self.use_cuda = torch.cuda.is_available()
        self.device = torch.device('cuda:0' if self.use_cuda else 'cpu')
 
        # Model
        model = YML2Model(config, model_name)
        model_state = torch.load(config['params']['f_model_pth'])
        model.load_state_dict(model_state['model_state_dict'])
        self.model = model.to(self.device)
        
    def inference(self, imgs, apply_softmax=True):
     
        #data = torch.zeros([64, 1, 256, 256])
        #for i, im in enumerate(imgs): # assume less than batch size for now
        #    data[i] = im
                      
        self.model.eval() 
        with torch.no_grad():
            imgs = imgs.to(self.device)  
            predicts = self.model(imgs)
            if apply_softmax:
                predicts = F.softmax(predicts, 1)
        
        return predicts.cpu().detach().numpy() 
      

##Inplace Test of CNN/Resnet

In [0]:
# In place test code block to be commented out 

'''
torch.cuda.empty_cache()

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

work_dir = r'/content/drive/My Drive/Colab Notebooks/deepTC'
p_data  = work_dir + os.sep + 'data/AL'
p_image = work_dir + os.sep + 'image/AL'
p_model = work_dir + os.sep + 'model/tc_cnn5'

# load configuration file
f_config = p_model + os.sep + 'config_cnn5.yaml'
with open(f_config, 'r') as fp: config = yaml.load(fp)
  
# contruct the model
storm_cnn = YML2Model(config, 'cnn5')

# update path for config
config_params = config['params']
config_params['f_image_zip'] = p_image + os.sep + config_params['f_image_zip']
config_params['f_storm_msg'] = p_data  + os.sep + config_params['f_storm_msg']
config_params['f_data_yml']  = p_model + os.sep + config_params['f_data_yml']
config_params['f_state_yml'] = p_model + os.sep + config_params['f_state_yml']
config_params['f_model_pth'] = p_model + os.sep + config_params['f_model_pth']
config_params['f_test_yml']  = p_model + os.sep + config_params['f_test_yml']

image_transforms = transforms.Compose([
    transforms.Grayscale(1),
    transforms.ToTensor(),
    transforms.Normalize((0.456,), (0.222,))])

# dataset
storm_data = ImageDataSet(config_params, image_transforms, hotstart=False)
data_split = storm_data.load_data()

# model
storm_train = ImageTrainer(config_params, storm_cnn, hotstart=False)

# train & valid
storm_train.train_loop(data_split)

# test
test_results = storm_train.test_loop(data_split)

'''


##Inplace Test of GAN

In [0]:

# In place test code block to be commented out 

'''
%matplotlib inline
import io
import base64

from torch.autograd import Variable
from torchvision.utils import save_image

import matplotlib.pyplot as plt
from IPython.display import HTML
from IPython.display import display

class ImageTrainerGAN(object):
  
    def __init__(self, params, g_model, d_model, hotstart=False):
        # CUDA for PyTorch
        self.use_cuda = torch.cuda.is_available()
        self.device = torch.device('cuda:0' if self.use_cuda else 'cpu')
        if self.device!='cpu':
            divider = '-' * 36
            print(divider)
            print('summary of GPU')
            print(divider)
            print(torch.cuda.get_device_name(0))
            print('Memory Usage:')
            print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
            print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')
        else:
            print('training with cpu')  
               
        # model object
        self.model_object = params['model_object']
 
        # send model to device (CPU or GPU)
        self.g_model = g_model.to(self.device)
        self.d_model = d_model.to(self.device)
    
        # parameters
        self.latent_dim = 128
        self.f_state    = params['f_state_yml']
        self.f_model    = params['f_model_pth']
        self.f_test     = params['f_test_yml']
        self.max_epochs = params['max_epochs']
        self.batch_size = params['batch_size']
        self.r_learning = params['r_learning']
        self.loss_func  = params['loss_func']
        self.g_optim_func = params['g_optimizer']
        self.d_optim_func = params['d_optimizer']
        self.state = {'stop_early':   False,
                      'stop_criteria': 99.,
                      'stop_step':    0,
                      'epoch_index':  0,
                      'best_epoch':   -1,
                      'best_accu' :   -1,
                      'test_loss':    -1,
                      'test_accu':    -1,
                      'g_train_epoch_loss': [],
                      'g_train_epoch_accu': [],
                      'g_train_batch_loss': [],
                      'g_train_batch_accu': [],
                      'd_train_epoch_loss': [],
                      'd_train_epoch_accu': [],
                      'd_train_batch_loss': [],
                      'd_train_batch_accu': []}

        self.criterion = PyTorchCall.map_torch_call(self.loss_func)()
        
        self.g_optimizer = PyTorchCall.map_torch_call(self.g_optim_func)()
        self.g_optimizer = self.g_optimizer(g_model.parameters(), lr=0.0001, betas=(0.9, 0.999))
        
        self.d_optimizer = PyTorchCall.map_torch_call(self.d_optim_func)()
        self.d_optimizer = self.d_optimizer(d_model.parameters(), lr=0.0001, betas=(0.5, 0.999))
        
        self.criterion.to(self.device)
        
        self.g_model.apply(self.weights_init)
        self.d_model.apply(self.weights_init)
     
        self.fig = None
        self.ax = None
        self.gplot = None
        self.dplot = None
        
        if hotstart:
            model_state = torch.load(params['f_model_pth'])
            self.g_model.load_state_dict(model_state['g_model_state_dict'])
            self.d_model.load_state_dict(model_state['d_model_state_dict'])
            for key, value in self.state.items():
                self.state[key] = model_state[key]
    

    def train_loop(self, data):
        
        divider = '-' * 36
        header  = '{:<16s}{:>10s}{:>10s}'
        record  = '{:<16s}{:>10.3f}{:>10.3f}'
              
        print (divider)
        print ('training')
        print (divider)
        
        g_epoch_loss = 0. 
        g_epoch_accu = 0.
        d_epoch_loss = 0. 
        d_epoch_accu = 0.    
        
        fixed_noise = torch.randn((36, self.latent_dim), device=self.device)
        loss_plot = display(self.html_loss_plot('PLOT'), display_id=True)     
        
        # loop over epochs
        for i_epoch in range(self.max_epochs):
            
            self.state['epoch_index'] = i_epoch
            
            epoch_loss = 0.
            epoch_accu = 0.
            
            bar_train = display(self.html_progress('Train', 0, 100), display_id=True)
              
            for i_batch, (_, real_imgs, _) in enumerate(data['train']):
                
                # transfer to GPU
                real_imgs = real_imgs.to(self.device)

                # set up label for fake and real images
                r = np.zeros((real_imgs.shape[0], 1))
                fake = Variable(torch.tensor(r, device=self.device, dtype=torch.float), requires_grad=False)
                r = np.ones((real_imgs.shape[0], 1))
                real = Variable(torch.tensor(r, device=self.device, dtype=torch.float), requires_grad=False)


                # --------------------
                #  Train Discriminator
                # --------------------
                
                self.d_optimizer.zero_grad()

                predict_real = self.d_model(real_imgs)                
                d_loss_real = self.criterion(predict_real, real)
                d_loss_real.backward()
                
                # Measure discriminator's ability to classify real from generated samples
                #r = np.random.normal(0, 1, (real_imgs.shape[0], self.latent_dim))
                #z = Variable(torch.tensor(r, device=self.device, dtype=torch.float))
                z = torch.randn((real_imgs.shape[0], self.latent_dim), device=self.device)
                fake_imgs = self.g_model(z)
 
                predict_fake = self.d_model(fake_imgs.detach())
                d_loss_fake = self.criterion(predict_fake, fake)
                d_loss_fake.backward()
                
                d_loss = 0.5*(d_loss_real + d_loss_fake)
                
                #predicts = torch.cat((predict_real, predict_fake))
                #targets = torch.cat((real, fake))
                #d_loss = self.criterion(predicts, targets) / 2.
                #d_loss.backward()
                
                self.d_optimizer.step()
                
                d_batch_loss = d_loss.item()
                d_epoch_loss += (d_batch_loss - d_epoch_loss) / (i_batch + 1)
                
                #d_batch_accu = self.accuracy(predicts, targets)
                d_batch_accu = self.accuracy(predict_real, real)
                d_batch_accu += self.accuracy(predict_fake, fake)
                d_batch_accu = 0.5 * d_batch_accu 
                d_epoch_accu += (d_batch_accu - d_epoch_accu) / (i_batch + 1)

                self.state['d_train_batch_loss'].append(d_batch_loss)
                self.state['d_train_batch_accu'].append(d_batch_accu)   
                
                
                # -----------------
                #  Train Generator
                # -----------------

                self.g_optimizer.zero_grad()

                # sample noise as generator input
                #r = np.random.normal(0, 1, (real_imgs.shape[0], self.latent_dim))
                #z = Variable(torch.tensor(r, device=self.device, dtype=torch.float))

                # generate a batch of images
                #fake_imgs = self.g_model(z)
  
                # loss measures generator's ability to fool the discriminator
                predict_fake = self.d_model(fake_imgs)
                g_loss = self.criterion(predict_fake, real)

                g_loss.backward()
                self.g_optimizer.step()

                g_batch_loss = g_loss.item()
                g_epoch_loss += (g_batch_loss - g_epoch_loss) / (i_batch + 1)
                
                g_batch_accu = self.accuracy(predict_fake, real)
                g_epoch_accu += (g_batch_accu - g_epoch_accu) / (i_batch + 1)

                self.state['g_train_batch_loss'].append(g_batch_loss)
                self.state['g_train_batch_accu'].append(g_batch_accu)              
                
                pct_done = (i_batch+1)/len(data['train'])*100
                bar_train.update(self.html_progress('Train', pct_done, 100))
                #print ('{0:>3d}'.format(i_batch), end = ' ') 
                #if (i_batch+1)%30==0: print (' ')
                #    #print (divider)
                #    #print ('summary of i_batch:', i_batch+1)
                #    #print (divider)
                #    #print (header.format(' ', 'loss', 'accurary')) 
                #    #print (record.format('Generator', self.state['g_train_batch_loss'][-1], self.state['g_train_batch_accu'][-1]))
                #    #print (record.format('Discriminator', self.state['d_train_batch_loss'][-1], self.state['d_train_batch_accu'][-1]))
                               

            self.state['g_train_epoch_loss'].append(g_epoch_loss)
            self.state['g_train_epoch_accu'].append(g_epoch_accu) 
            self.state['d_train_epoch_loss'].append(d_epoch_loss)
            self.state['d_train_epoch_accu'].append(d_epoch_accu)
 
            print (' ')
            # epoch summary
            if i_epoch%1==0:
                print (divider)
                print ('summary of epoch:', i_epoch)
                print (divider)
                print (header.format(' ', 'loss', 'accurary')) 
                print (record.format('Generator', self.state['g_train_epoch_loss'][-1], self.state['g_train_epoch_accu'][-1]))
                print (record.format('Discriminator', self.state['d_train_epoch_loss'][-1], self.state['d_train_epoch_accu'][-1]))
                print (' ')
                uri = self.update_loss_plot()
                loss_plot.update(self.html_loss_plot(uri))
                
            if i_epoch%1==0:
                self.update_loss_plot()
                f_image = '/content/drive/My Drive/Colab Notebooks/deepTC/model/dcgan/image/images_{0}.png'.format(i_epoch)
                fixed = self.g_model(fixed_noise)
                save_image(fixed.data, f_image, nrow=6, normalize=True)
            
            self.update_save_state()
            if self.state['stop_early']: break
             
        
    def accuracy(self, predicts, targets):
      
        pred = predicts >= 0.5
        truth = targets >= 0.5
        n_correct = torch.eq(pred, truth).sum().item()
        return n_correct / len(predicts) * 100
      
      
    def html_loss_plot(self, image):
        
        h = HTML("""<img src='{0}'/>""".format(image))
    
        return h

    
    def html_progress(self, var, value, max=100):
        h = HTML("""{var}: <progress value='{value}' max='{max}', style='width: 80%'>{value}
                           </progress>""".format(var=var, value=value, max=max))
    
        return h
    
        
    def update_loss_plot(self):
        
        g_batch_loss = self.state['g_train_batch_loss']
        d_batch_loss = self.state['d_train_batch_loss']      
        x = range(len(g_batch_loss))
        
        # initiate plot
        fig, ax = plt.subplots() 
        ax.plot(x, g_batch_loss)
        ax.plot(x, d_batch_loss)
        
        bio = io.BytesIO()
        fig.savefig(bio, format='png')
        bio.seek(0)
        uri = 'data:image/png;base64,' + base64.encodebytes(bio.getvalue()).decode()

        plt.close()
        
        return uri
        
        
    def update_save_state(self):
        
        # save state
        with open(self.f_state, 'w') as fp:
            yaml.dump(self.state, fp)
            
        # save the model
        state_cp = deepcopy(self.state)
        state_cp['d_model_state_dict'] = self.d_model.state_dict()
        state_cp['d_optim_state_dict'] = self.d_optimizer.state_dict()
        state_cp['g_model_state_dict'] = self.g_model.state_dict()
        state_cp['g_optim_state_dict'] = self.g_optimizer.state_dict()
 
        torch.save(state_cp, self.f_model)
  
  
    def weights_init(self, m):
        classname = m.__class__.__name__
        if classname.find('Conv') != -1:
            nn.init.normal_(m.weight.data, 0.0, 0.02)
        elif classname.find('BatchNorm') != -1:
            nn.init.normal_(m.weight.data, 1.0, 0.02)
            nn.init.constant_(m.bias.data, 0)
            
'''

In [0]:
# In place test code block to be commented out 

'''
torch.cuda.empty_cache()

from google.colab import drive
drive.mount('/content/drive', force_remount=True)
torch.cuda.empty_cache()

work_dir = r'/content/drive/My Drive/Colab Notebooks/deepTC'
p_data  = work_dir + os.sep + 'data/AL'
p_image = work_dir + os.sep + 'image/AL'
p_model = work_dir + os.sep + 'model/dcgan'

# load configuration file
f_config = p_model + os.sep + 'config_dcgan.yaml'
with open(f_config, 'r') as fp: config = yaml.load(fp)
  

# update path for config
config_params = config['params']
config_params['f_image_zip'] = p_image + os.sep + config_params['f_image_zip']
config_params['f_storm_msg'] = p_data  + os.sep + config_params['f_storm_msg']
config_params['f_data_yml']  = p_model + os.sep + config_params['f_data_yml']
config_params['f_state_yml'] = p_model + os.sep + config_params['f_state_yml']
config_params['f_model_pth'] = p_model + os.sep + config_params['f_model_pth']
config_params['f_test_yml']  = p_model + os.sep + config_params['f_test_yml']

# contruct the model
generator = YML2Model(config, 'generator')
discriminator = YML2Model(config, 'discriminator')

image_transforms = transforms.Compose([
    transforms.Grayscale(1),
    transforms.Resize(64),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

# dataset
# filter HU3 data only 
f_img_msg = p_data + os.sep + 'al_ir_track_filtered.msg'
f_hu1_msg = p_data + os.sep + 'al_ir_track_hu1.msg'
pd_storm = pd.read_msgpack(f_img_msg) 
pd_storm = pd_storm[pd_storm[b'cat']==b'HU1']
pd_storm.reset_index(drop=True)
pd_storm.index = pd.RangeIndex(len(pd_storm.index))
pd_storm.index = range(len(pd_storm.index))
pd_storm.to_msgpack(f_hu1_msg)
#print (pd_storm.head())

storm_data = ImageDataSet(config_params, image_transforms, hotstart=False)
data_split = storm_data.load_data()

# model
storm_train = ImageTrainerGAN(config_params, generator, discriminator, hotstart=False)

# train & valid
storm_train.train_loop(data_split)

'''