## ComentML: crash course

Welcome to the ML platform allowing data scientists and teams to track, compare, explain and optimize experiments and models. 

https://www.comet.ml/site/

Benefits:
- easy to share the experiment
- easy to reproduse other expreiments
- easy to get fancy plots
- you can attach the `html` page to your paper results, and it is going to be a standard of sharing of the research in ML

### Let's try some:

In [1]:
#!pip install comet-ml

In [11]:
# import comet_ml in the top of your file
from comet_ml import Experiment
    
# Add the following code anywhere in your machine learning file
experiment = Experiment(api_key="kYVqzmHEUN7WQLo86k2bZs1Z7",
                        project_name="sklearn-dummy-project", workspace="kondratevakate")

COMET INFO: ---------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://www.comet.ml/kondratevakate/pytorch-dummy-project/7796b3978a114d51a799ad948c87421a
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     mean_fit_time [4]           : (0.000997447967529297, 0.0017934322357177734)
COMET INFO:     mean_score_time [4]         : (0.0007973670959472656, 0.0009967327117919923)
COMET INFO:     mean_test_score [4]         : (0.9666666666666666, 0.9800000000000001)
COMET INFO:     param_C [4]                 : (1, 10)
COMET INFO:     param_kernel [4]            : rbf
COMET INFO:     rank_test_score [4]         : (1, 4)
COMET INFO:     split0_test_score [4]       : (0.9666666666666667, 1.0)
COMET INFO:     split1_test_score [4]       : (0.9666666666666667, 1.0)
COMET INFO:     split2_test_score [4]       : (0.9, 0.9666666666666667)


In [12]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
random_state = 42

cancer = load_breast_cancer()
print("cancer.keys(): {}".format(cancer.keys()))
print("Shape of cancer data: {}\n".format(cancer.data.shape))
print("Sample counts per class:\n{}".format(
      {n: v for n, v in zip(cancer.target_names, np.bincount(cancer.target))}))
print("\nFeature names:\n{}".format(cancer.feature_names))

cancer.keys(): dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
Shape of cancer data: (569, 30)

Sample counts per class:
{'malignant': 212, 'benign': 357}

Feature names:
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [13]:
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data,
    cancer.target,
    stratify=cancer.target,
    random_state=random_state)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

logreg = LogisticRegression()

param_grid = {'C':[0.001,0.01,0.1,1,5,10,20,50,100]}

clf = GridSearchCV(logreg,
                    param_grid=param_grid,
                    cv=10,
                    n_jobs=-1)

clf.fit(X_train_scaled, y_train)

y_pred = clf.predict(X_test_scaled)

print("\nResults\nConfusion matrix \n {}".format(confusion_matrix(y_test, y_pred)))

f1 = f1_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print("F1 score is {:6.3f}".format(f1))
print("Precision score is {:6.3f}".format(precision))
print("Recall score is {:6.3f}".format(recall))


Results
Confusion matrix 
 [[52  1]
 [ 1 89]]
F1 score is  0.989
Precision score is  0.989
Recall score is  0.989


### Grid search:

In [14]:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'kernel': ('linear', 'rbf'), 'C': [1, 10]}
svr = svm.SVC()
clf = GridSearchCV(svr, parameters)
clf.fit(iris.data, iris.target)

for i in range(len(clf.cv_results_['params'])):
    for k,v in clf.cv_results_.items():
        if k == "params":
            experiment.log_parameters(v[i])
        else:
            experiment.log_metric(k,v[i])

### Let's check this out: 

https://www.comet.ml/kondratevakate/sklearn-dummy-project/

## Pytorch tutorial:

In [7]:
# Add the following code anywhere in your machine learning file
experiment = Experiment(api_key="kYVqzmHEUN7WQLo86k2bZs1Z7",
                        project_name="pytorch-dummy-project", workspace="kondratevakate")

COMET INFO: ---------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://www.comet.ml/kondratevakate/sklearn-dummy-project/43a3ce31b60e45b6a969adc9f7911070
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     mean_fit_time [4]     : (0.0006013393402099609, 0.0035867691040039062)
COMET INFO:     mean_score_time [4]   : (0.000199127197265625, 0.0012012481689453124)
COMET INFO:     mean_test_score [4]   : (0.9666666666666666, 0.9800000000000001)
COMET INFO:     param_C [4]           : (1, 10)
COMET INFO:     param_kernel [4]      : rbf
COMET INFO:     rank_test_score [4]   : (1, 4)
COMET INFO:     split0_test_score [4] : (0.9666666666666667, 1.0)
COMET INFO:     split1_test_score [4] : (0.9666666666666667, 1.0)
COMET INFO:     split2_test_score [4] : (0.9, 0.9666666666666667)
COMET INFO:     split3_test_score [4] : (0.93333333333

In [8]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

In [9]:
hyper_params = {
    "sequence_length": 28,
    "input_size": 28,
    "hidden_size": 128,
    "num_layers": 2,
    "num_classes": 10,
    "batch_size": 50,
    "num_epochs": 2,
    "learning_rate": 0.01
}

experiment.log_parameters(hyper_params)

# MNIST Dataset
train_dataset = dsets.MNIST(root='./data/',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data/',
                           train=False,
                           transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=hyper_params['batch_size'],
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=hyper_params['batch_size'],
                                          shuffle=False)

# RNN Model (Many-to-One)
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial states
        h0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))
        c0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

        # Forward propagate RNN
        out, _ = self.lstm(x, (h0, c0))

        # Decode hidden state of last time step
        out = self.fc(out[:, -1, :])
        return out

rnn = RNN(hyper_params['input_size'], hyper_params['hidden_size'], hyper_params['num_layers'], hyper_params['num_classes'])

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=hyper_params['learning_rate'])

# Train the Model

with experiment.train():
    step = 0
    for epoch in range(hyper_params['num_epochs']):
        correct = 0
        total = 0
        for i, (images, labels) in enumerate(train_loader):
            images = Variable(images.view(-1, hyper_params['sequence_length'], hyper_params['input_size']))
            labels = Variable(labels)

            # Forward + Backward + Optimize
            optimizer.zero_grad()
            outputs = rnn(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # Compute train accuracy
            _, predicted = torch.max(outputs.data, 1)
            batch_total = labels.size(0)
            total += batch_total

            batch_correct = (predicted == labels.data).sum()
            correct += batch_correct

            # Log batch_accuracy to Comet.ml; step is each batch
            step += 1
            experiment.log_metric("batch_accuracy", batch_correct / batch_total, step=step)

            if (i + 1) % 100 == 0:
                print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                      % (epoch + 1, hyper_params['num_epochs'], i + 1, len(train_dataset) // hyper_params['batch_size'], loss.item()))

    # Log epoch accuracy to Comet.ml; step is each epoch
        experiment.log_metric("batch_accuracy", correct / total, step=epoch)


with experiment.test():
    # Test the Model
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = Variable(images.view(-1, hyper_params['sequence_length'], hyper_params['input_size']))
        outputs = rnn(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum()

    experiment.log_metric("accuracy", correct / total)
    print('Test Accuracy of the model on the 10000 test images: %d %%' % (100 * correct / total))

Epoch [1/2], Step [100/600], Loss: 0.5517
Epoch [1/2], Step [200/600], Loss: 0.2987
Epoch [1/2], Step [300/600], Loss: 0.0777
Epoch [1/2], Step [400/600], Loss: 0.0827
Epoch [1/2], Step [500/600], Loss: 0.1044
Epoch [1/2], Step [600/600], Loss: 0.0601
Epoch [2/2], Step [100/600], Loss: 0.1374
Epoch [2/2], Step [200/600], Loss: 0.1368
Epoch [2/2], Step [300/600], Loss: 0.0716
Epoch [2/2], Step [400/600], Loss: 0.0857
Epoch [2/2], Step [500/600], Loss: 0.3013
Epoch [2/2], Step [600/600], Loss: 0.0592
Test Accuracy of the model on the 10000 test images: 97 %


##### Tutorials adopted from: https://www.comet.ml/docs/python-sdk