This tutorial adapted from the towards data science blog post titles "Getting Started with Comet ML" by Angelica Lo Duca, which can be found [here](https://towardsdatascience.com/getting-started-with-comet-ml-549d44aff0c9) (though be aware it is a 'member-only' story so do not use your last free article on it!).

# Overview

**What is Comet ML?**

Comet ML is an online experimentation platform for testing Machine Learning projects (similiar to Neptune.ai, Guild.ai, etc.). Its main advantage is that builing reporting dashboards and monitoring systems is made very easy.

Moreover, Comet ML can be easily integrated with most popular ML libraries like scikit-learn, keras, etc. These experiments can be written in Python, Javascript, Java, R, and REST APIs. Today we will only be dicussing how to use the python SDK, but the SDKs for other languages should be similiar enough.


**Features of Comet ML**

- Users can easily build and compare the results of different experiments for the same project

- The model can be monitored from the early stages up to debgging

- Makes collaborating with other project devs easy (Note: this feature is **not** included with the free account)

- Easily build reports and panels

- Share projects publicly through their platform

# Working with Comet ML

Step 1: create a free account, log in, and create a new project.

Step 2: Enter name and description of project. 

Step 3: You will now arrive at an empty dashboard. Select the 'Add' button in the upper right hand corner, followed by 'New Experiment'. 

Step 4: The website will then generate a snippet of code which is to be placed in the project you are working on (shown in the cell below).

Step 5: install the Comet ML library using `pip3 install comet_ml` or `python3 -m pip3 install comet_ml`

Now we are ready to get started.

In [9]:
# import comet_ml at the top of your file
from comet_ml import Experiment

# Create an experiment with your api key
experiment = Experiment(
    api_key="IkwyddQVVCoRe8psqPzNBu9uk",
    project_name="decision-tree3",
    workspace="klucke",
)


COMET INFO: Couldn't find a Git repository in '/home/klucke/git-projects/school/22-fall/python-data-science/comet-ml' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://www.comet.com/klucke/decision-tree3/6bfc7f3eac2242799fadde723d289acb



# Example usage

To better show how to use the library, I will walk us through an example project using the [heart attack dataset](https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset) from Kaggle (sign-up with Kaggle is required). The task for this dataset is to predict wether or not the patient is has a high chance of heart attack, given several features for example, age, sex, resting blood pressure, etc. I know the task seems a bit ill defined, but this is a toy example.

In [10]:
import pandas as pd

# Load data:
df = pd.read_csv('archive/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [11]:
# seperate data into features and targets, respectively.
x = df.drop(columns=['output'])
y = df['output']

In [12]:
x.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2


In [5]:
y.head()

0    1
1    1
2    1
3    1
4    1
Name: output, dtype: int64

In [13]:
from sklearn.preprocessing import MinMaxScaler

# scale inputs
scaler = MinMaxScaler()
x_scaled = scaler.fit_transform(x)

print('scaled features min:', x_scaled.min(), 'scaled features max:', x_scaled.max())

scaled features min: 0.0 scaled features max: 1.0


In [14]:
from sklearn.model_selection import train_test_split

# create train/test split
x_train, x_test, y_train, y_test = train_test_split(x_scaled, y, random_state=42)

In [15]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
import numpy as np

# create model:
model = DecisionTreeClassifier(random_state=42, max_depth=2)
min_samples = 5 

target_names = ['class 0', 'class 1']

# examine how the number of training samples effects model performance:
for step in range(min_samples, len(x_train)):
    model.fit(x_train[:step], y_train[:step])
    
    y_pred = model.predict(x_test)
    report = classification_report(y_test, y_pred, target_names=target_names, output_dict=True)
    
    for label, metric in report.items():
        try:
            experiment.log_metrics(metric, prefix=label, step=step)
            
        except:
            experiment.log_metric(label, metric, step=step)
            
    experiment.log_confusion_matrix(y_test.tolist(), y_pred.tolist())
    
experiment.display(tab='confusion-matrices')
experiment.end()

COMET INFO: ---------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://www.comet.com/klucke/decision-tree3/6bfc7f3eac2242799fadde723d289acb
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     accuracy [222]               : (0.42105263157894735, 0.7631578947368421)
COMET INFO:     class 0_f1-score [222]       : (0.3888888888888889, 0.7272727272727272)
COMET INFO:     class 0_precision [222]      : (0.3783783783783784, 0.8421052631578947)
COMET INFO:     class 0_recall [222]         : (0.34285714285714286, 0.9714285714285714)
COMET INFO:     class 0_support              : 35
COMET INFO:     class 1_f1-score [222]       : (0.046511627906976744, 0.7906976744186047)
COMET INFO:     class 1_precision [222]      : (0.46153846153846156, 0.8421052631578947)
COMET INFO:     class 1_recall [222]         : (0.024390243902439025, 0.9268