<a href="https://colab.research.google.com/github/AdilZouitine/outfit/blob/master/Outfit_MNIST_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import sys
import datetime

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

from torchsummary import summary

print('Using PyTorch version:', torch.__version__, ' Device:', device)

Using PyTorch version: 1.1.0  Device: cuda


In [5]:
! git clone https://github.com/AdilZouitine/outfit
% cd outfit
! pip install -r requirements.txt
! pip install -e .
% cd ..

fatal: destination path 'outfit' already exists and is not an empty directory.
/content/outfit
Obtaining file:///content/outfit
Installing collected packages: outfit
  Found existing installation: outfit 0.1
    Can't uninstall 'outfit'. No files were found to uninstall.
  Running setup.py develop for outfit
Successfully installed outfit
/content


# **Google collab bug : You must restart your environment so that it will be able to import the library**


**Keyboard shortcut**   : `Ctrl + M`




`Wardrobe` is a class that allows you to tidy up your experimentation.

`getlog` is function decorator that allow you writes all console prints of the decorated function to a file.

`Logger`  allowing to display on the console and write a message in a file.




In [0]:
from outfit import Wardrobe, getlog, Logger

**Data importation** :

In [0]:
# This is the dictionary where we will store the parameters.
param = {'batch_size': 128}

In [8]:
train_dataset = datasets.MNIST('./data', 
                               train=True, 
                               download=True, 
                               transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ]))

validation_dataset = datasets.MNIST('./data', 
                                    train=False, 
                                    transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ]))

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=param['batch_size'], 
                                           shuffle=True)

validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, 
                                                batch_size=param['batch_size'], 
                                                shuffle=False)

0it [00:00, ?it/s]

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


9920512it [00:02, 3365119.86it/s]                             


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz


0it [00:00, ?it/s]

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:00, 58445.79it/s]                           
0it [00:00, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


1654784it [00:01, 864499.40it/s]                             
0it [00:00, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


8192it [00:00, 21692.99it/s]            

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Processing...
Done!





**Database initialisation** :

You can specify a path if the file does not exist it will create it and initialize the database. 

For this tutorial we use: memory: to store it in RAM memory.

In [0]:
wardrobe = Wardrobe(db_path=':memory:')

Adding experimentation, it is NECESSARY to declare the experimentation first before the parameters, outputs, features or scores.

In [0]:
exp = {
    'experiment_name': 'Hello world',
    'comment': 'Using a simple cnn on MNIST',
    'date_experiment': datetime.datetime.now()
}

wardrobe.add_experiment(**exp)

**Model declaration** :

In [0]:
class SimpleNet(nn.Module):

    def __init__(self, dropout_proba, **kwargs):
        super(SimpleNet, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding =1)
        
        self.dropout = nn.Dropout2d(p=dropout_proba)
    
        self.linear1 = nn.Linear(1568, 64)
        self.linear2 = nn.Linear(64, 10)

    def forward(self, x):

        x = self.conv1(x)
        x = F.max_pool2d(x, 2)
        x = F.relu(x)
        
        x = self.conv2(x)
        x = F.max_pool2d(x, 2)
        x = F.relu(x)
       
        x = x.view(x.size(0), -1)

        x = self.linear1(x)
        x = F.relu(x)
        x = self.dropout(x)
        
        x = self.linear2(x)
        x = F.log_softmax(x, dim=1)
        return x

In [0]:
loss = nn.CrossEntropyLoss()

Update the parameters

In [0]:
param_model = {'dropout_proba': 0.2}
param.update(param_model)

With the help of the getlog function decorator we then recover to write in a file all the prints on the terminal 

In [0]:
output = {'training log': 'training_log.txt'}

In [0]:
@getlog(filepath=output['training log'])
def train(model, loss_function,loaders, optimizer, device,max_epoch):

    dataset_sizes = {'train': len(loaders['train'].dataset),
                 'valid': len(loaders['valid'].dataset)}
    
    metrics = {
               'train loss': [],
               'train acc': [],
               'valid loss': [],
               'valid acc': []
              }
    for epoch in range(1, max_epoch + 1):
        print("-"* 40)
        print(" \n Epoch {}/{} \n".format(epoch, max_epoch))
        
        for phase in ['train', 'valid']:

            running_loss = 0.0
            running_corrects = 0
            for batch_idx, (data, target) in enumerate(loaders[phase]):

                data, target = data.to(device), target.to(device)
                
                data, target = Variable(data), Variable(target)

                optimizer.zero_grad()

                with torch.set_grad_enabled(phase == 'train'):
                    output = model(data)
                    loss = loss_function(output, target)
                    _, predicted = torch.max(output.data, 1)

                if phase == 'train':
                    loss.backward()

                    optimizer.step()


                running_loss += loss.item() * data.size(0)

                running_corrects += (predicted == target).sum().item()
            
            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_accuracy = running_corrects / dataset_sizes[phase]
            print('{} : Loss: {:.4f} and Accuracy : {:.4f}'.format(phase, epoch_loss, epoch_accuracy))
            
            
            metrics['{} loss'.format(phase)].append(epoch_loss)
            metrics['{} acc'.format(phase)].append(epoch_accuracy)
    
        
    return model , metrics

In [0]:
dloaders = {'train':train_loader, 'valid':validation_loader}

In [0]:
model = SimpleNet(**param_model)

model.to(device)

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum= 0.9)


In [0]:
param.update({'loss':'Cross Entropy',
              'learning rate': 0.001 ,
              'Optimizer':'SGD',
              'momentum': 0.9,
              'max epoch': 5
             })

We will get the model summary in a file.

In [23]:
output.update({'model summary': 'summary.txt'})

sys.stdout = Logger(filepath=output['model summary'])

summary(model, input_size=(1, 28, 28))

sys.stdout.logfile.close()
sys.stdout = sys.stdout.terminal


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 28, 28]             160
            Conv2d-2           [-1, 32, 14, 14]           4,640
            Linear-3                   [-1, 64]         100,416
         Dropout2d-4                   [-1, 64]               0
            Linear-5                   [-1, 10]             650
Total params: 105,866
Trainable params: 105,866
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.14
Params size (MB): 0.40
Estimated Total Size (MB): 0.55
----------------------------------------------------------------


In [19]:
trained_model, metrics = train(model, loss, dloaders, optimizer, device, param['max epoch'])

----------------------------------------
 
 Epoch 1/5 

train : Loss: 1.1247 and Accuracy : 0.6660
valid : Loss: 0.3924 and Accuracy : 0.8817
----------------------------------------
 
 Epoch 2/5 

train : Loss: 0.3270 and Accuracy : 0.9010
valid : Loss: 0.2477 and Accuracy : 0.9273
----------------------------------------
 
 Epoch 3/5 

train : Loss: 0.2352 and Accuracy : 0.9284
valid : Loss: 0.1881 and Accuracy : 0.9434
----------------------------------------
 
 Epoch 4/5 

train : Loss: 0.1905 and Accuracy : 0.9432
valid : Loss: 0.1615 and Accuracy : 0.9507
----------------------------------------
 
 Epoch 5/5 

train : Loss: 0.1602 and Accuracy : 0.9524
valid : Loss: 0.1481 and Accuracy : 0.9549


We insert it all into the database.


In [0]:
score = {key: val[-1] for key,val in metrics.items()} # get the last value
wardrobe.add_dict_score(score)

In [0]:
wardrobe.add_dict_output(output)


In [0]:
wardrobe.add_dict_parameter(param)

In [0]:
wardrobe.tidy() # commit in database

Let's now look at the result. 
We use the `get_best_scores` method to see the best experimentation according to a criterion.
As there is only one experimentation in the tutorial, the generator returns only one experimentation.

If you iterate several times you will be able to see what your best experimentation is in a simple way.



In [33]:
for exp in wardrobe.get_best_scores(mode='max',on_score='valid acc'):
    _ = exp 

════════════════════
│ TOP 1 EXPERIMENT │
════════════════════



Table : Experiment 

╒════╤═════════════════╤═══════════════════╤═════════════════════════════╤════════════════════╕
│    │   id_experiment │ experiment_name   │ comment                     │ date_experiement   │
╞════╪═════════════════╪═══════════════════╪═════════════════════════════╪════════════════════╡
│  0 │               1 │ Hello world       │ Using a simple cnn on MNIST │                    │
╘════╧═════════════════╧═══════════════════╧═════════════════════════════╧════════════════════╛


Table : Parameter 

╒════╤════════════════╤══════════════════╤═══════════════╤══════════════╕
│    │   id_parameter │ parameter_name   │ parameter     │   experiment │
╞════╪════════════════╪══════════════════╪═══════════════╪══════════════╡
│  0 │              1 │ batch_size       │ 128           │            1 │
├────┼────────────────┼──────────────────┼───────────────┼──────────────┤
│  1 │              2 │ dropout_proba    