<a href="https://colab.research.google.com/github/Latse-M/comet_ml/blob/main/Copy_of_sklearn_comet_starter_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to Comet ML  

Comet is a great tool for model versioning and experimentation as it records the parameters and conditions from each of your experiements- allowing you to reproduce your results, or go back to a previous version of your experiment.  

To create an account, visit https://www.comet.ml/  
Follow the instructions for a single user account. Once that is created, you will see a project folder. That is where the records of your experiments can be viewed. 

Comet has an abundance of tutorials and scripts, we're just going to run through this notebook to get you started on the right track. For this illustration, we will be using one of the examples found on the Comet ML GitHub repo.

To begin with, you should install as illustrated below if you don't already have it. *Always import Experiment at the top of your notebook/script.*


In [None]:
!pip install comet_ml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting comet_ml
  Downloading comet_ml-3.31.19-py3-none-any.whl (441 kB)
[K     |████████████████████████████████| 441 kB 7.9 MB/s 
Collecting semantic-version>=2.8.0
  Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
Collecting websocket-client<1.4.0,>=0.55.0
  Downloading websocket_client-1.3.3-py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 1.2 MB/s 
[?25hCollecting wurlitzer>=1.0.2
  Downloading wurlitzer-3.0.2-py3-none-any.whl (7.3 kB)
Collecting sentry-sdk>=1.1.0
  Downloading sentry_sdk-1.11.1-py2.py3-none-any.whl (168 kB)
[K     |████████████████████████████████| 168 kB 37.0 MB/s 
[?25hCollecting everett[ini]>=1.0.1
  Downloading everett-3.1.0-py2.py3-none-any.whl (35 kB)
Collecting simplejson
  Downloading simplejson-3.18.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (130 kB)
[

In [None]:
from comet_ml import Experiment

You will see an API key button at the top of the page when you click on an experiment- use this key as illustrated below to link your current workspace to comet. (If a project is empty, the code below will autogenerate for you on the project page, just copy and paste it in here)

In [None]:
# Setting the API key (saved as environment variable)
experiment = Experiment(
    api_key="k3nHY42OVmQjQpRKOzQ5u6Z8O",
    project_name="advanced-clasification",
    workspace="motlanthimahlatse-gmail-com")


COMET ERROR: Failed to calculate active processors count. Fall back to default CPU count 1
COMET INFO: Couldn't find a Git repository in '/content' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://www.comet.com/motlanthimahlatse-gmail-com/advanced-clasification/84bd9ab229ad4e9f8d6483dd9dfa102b



Import the rest of your necessary libraries as you usually would. For this demonstration we will be using the breast cancer dataset for classification so we will also import that from sklearn.

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix

In [None]:
# Have a look at your dataset
cancer = load_breast_cancer()
print("cancer.keys(): {}".format(cancer.keys()))
print("Shape of cancer data: {}\n".format(cancer.data.shape))
print("Sample counts per class:\n{}".format(
      {n: v for n, v in zip(cancer.target_names, np.bincount(cancer.target))}))
print("\nFeature names:\n{}".format(cancer.feature_names))

cancer.keys(): dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])
Shape of cancer data: (569, 30)

Sample counts per class:
{'malignant': 212, 'benign': 357}

Feature names:
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


Split your data into train and test sets, keep in mind that you need to set a random state for your results to be reproduced!

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data,
    cancer.target,
    stratify=cancer.target,
    random_state=7)

In [None]:
# Scale your data!

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## GridSearch 

For this example we've used a gridsearch but you may use a model with default parameters or your own parameters too- Just remember to add/remove the neccesary data when you are logging your parameters at the end of the experiment.

The `param_grid` variable contains the 'C' values we want our gridsearch to iterate through.



In [None]:
logreg = LogisticRegression()
param_grid = {'C': [0.001, 0.01, 0.1, 1, 5, 10, 20, 50, 100]}

In [None]:
# Training and testing using GridSearch
clf = GridSearchCV(logreg,
                   param_grid=param_grid,
                   cv=10,
                   n_jobs=-1)

clf.fit(X_train_scaled, y_train)
y_pred = clf.predict(X_test_scaled)

## Results

Now that our model has trained, we can have a look at the results- Below is a confusion matrix indicating that at first glance, we have a fairly good model going. We then save the F1 score, Precision, and Recall as individual variables to go into our metric dictionary for logging.

P.S. have a look at the Comet tutorial page for interesting confusion matrix plots.

In [None]:
print("\nResults\nConfusion matrix \n {}".format(
    confusion_matrix(y_test, y_pred)))


Results
Confusion matrix 
 [[52  1]
 [ 2 88]]


In [None]:
# Saving each metric to add to a dictionary for logging

f1 = f1_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

In [None]:
# Create dictionaries for the data we want to log

params = {"random_state": 7,
          "model_type": "logreg",
          "scaler": "standard scaler",
          "param_grid": str(param_grid),
          "stratify": True
          }
metrics = {"f1": f1,
           "recall": recall,
           "precision": precision
           }

In [None]:
# Log our parameters and results
experiment.log_parameters(params)
experiment.log_metrics(metrics)

If you're using comet within a jupyter notebook, it's important to end your experiment when you've finished as illustrated below.

In [None]:
experiment.end()

COMET INFO: ----------------------------
COMET INFO: Comet.ml Experiment Summary:
COMET INFO:   Data:
COMET INFO:     url: https://www.comet.ml/jo-moon/general/96b60794fd8747a084b2a1c0cc015a33
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     f1                      : (0.9832402234636872, 0.9832402234636872)
COMET INFO:     precision               : (0.9887640449438202, 0.9887640449438202)
COMET INFO:     recall                  : (0.9777777777777777, 0.9777777777777777)
COMET INFO:     sys.cpu.percent.01 [31] : (0.9, 10.3)
COMET INFO:     sys.cpu.percent.02 [31] : (0.9, 7.5)
COMET INFO:     sys.cpu.percent.avg [31]: (0.95, 8.9)
COMET INFO:     sys.ram.total [31]      : (13653561344.0, 13653561344.0)
COMET INFO:     sys.ram.used [31]       : (592932864.0, 735928320.0)
COMET INFO: ----------------------------
COMET INFO: Uploading stats to Comet before program termination (may take several seconds)


## Display  

Running `experiment.display()` will show you your experiments comet.ml page inside your notebook as illustrated below. You can do this immediately after an experiment is run, and logged. 

In [None]:
experiment.display()