### Objective: 
This notebook provides an overview on how to use the framework for experiment tracking. 

#### Data scientist/ ML engineer journey on experimentation
```
Data Scientist
     |
     v
Create Experiment ("Model V1")
     |
     v
+------------------+        +------------------+        +----------------------+
|   Run 1: Train   | -----> | Run 1: Evaluate  | -----> | Accuracy(metric): 92%|
| LR=0.01, Epochs=10|        +------------------+        +---------------------+
+------------------+

     |
     v

+------------------+        +------------------+        +----------------------+
|   Run 2: Train   | -----> | Run 2: Evaluate  | -----> | Accuracy(metric): 94%|
| LR=0.005, Epochs=20|      +-------------------|       +----------------------+
+------------------+

     |
     v

More Runs (optional)
     |
     v
Compare & Analyze Results
```


In [1]:
## import pacakges
from exptree.src import tree
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score

#### Create the project

In [2]:
tree.set_project(name = "IRIS CLASSIFICATION",
                 created_by = "member 1")

Project IRIS CLASSIFICATION - (ID: a1422e94-e7a5-4597-8e7f-7980e9596aea) created successfully


#### Create Experiment

In [3]:
tree.set_experiment(name = "RandomForestModel",
                 created_by = "member 1",
                 description = """The Iris dataset is a classic multivariate dataset introduced by Ronald Fisher, 
                 containing 150 samples of iris flowers with four features: sepal length, sepal width, petal length, and petal width. 
                 The problem involves classifying the flowers into one of three species—Setosa, Versicolor, or Virginica—based on these features.""")

Experiment RandomForestModel - (ID: c56e18c0-e5e2-4a2e-af58-f222054b0bcc) created successfully


#### Start Runs
A run is an instance of the experiment with specific hyperparamters and other things

In [4]:
tree.start_run(name="run1", created_by="member 1")

Run run1 - (ID: 315bcfe1-8fe6-49cb-864e-aa248a9b43bf) created successfully


#### Prepare dataset

In [5]:
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

#### Perform training with hyperparameters and log the experiment metadata

In [6]:
rf_params = {
    "n_estimators": 1,
    "max_depth": 8,
    "min_samples_split": 4,
    "min_samples_leaf": 2,
    "max_features": "sqrt",
    "random_state": 42
}

tree.log_hyperparameters(value = rf_params)

'hyperparameter : n_estimators' logged (ID: 913e5a32-1120-42ae-bbce-3c625f7e99ce)
'hyperparameter : max_depth' logged (ID: 3178ce83-8874-488b-a0a0-4185da326352)
'hyperparameter : min_samples_split' logged (ID: e7840bae-da35-44cc-8269-aa07ff2b4cc2)
'hyperparameter : min_samples_leaf' logged (ID: 53b9e12b-4126-4b1a-9042-289f9d62aa4b)
'hyperparameter : max_features' logged (ID: 14a3fd2a-ea13-4cbc-b200-f6d7714bb170)
'hyperparameter : random_state' logged (ID: 02392cea-1a39-4f22-854d-db832b46c7ee)


In [7]:
# Create and train the RandomForest model
model = RandomForestClassifier(**rf_params)
model.fit(X_train, y_train)

In [8]:
# Predict and evaluate
y_pred = model.predict(X_test)
metrics = {"accuracy": accuracy_score(y_test, y_pred),
           "precision" : precision_score(y_test, y_pred, average='macro'),
           "recall": recall_score(y_test, y_pred, average='macro')}

In [9]:
tree.log_metrics(value = metrics)

'metric : accuracy' logged (ID: e516397b-daae-447b-b061-15bf910f6579)
'metric : precision' logged (ID: bb0b592f-ab58-474c-a156-4eb53cf6a9aa)
'metric : recall' logged (ID: bc9d5a52-c377-44ce-8f48-1bf52ba369c5)


In [10]:
tree.log_artifacts(name = "Dataset", 
                   value = "Iris flower data set", 
                   artifact_type="data")

'artifact : Dataset' logged (ID: 9a218d8c-522e-442e-9618-f3981c0b6f43)


In [11]:
tree.stop_run()

In [12]:
metrics

{'accuracy': 0.95,
 'precision': 0.9463937621832358,
 'recall': 0.9463937621832358}

#### Train with different hyperparameters

In [13]:
tree.start_run(name="run3", created_by="member 1") #create a new run

Run run3 - (ID: 7ce3ac99-2121-4f4b-9199-d4865525747d) created successfully


In [14]:
rf_params = {
    "n_estimators": 3,
    "max_depth": 3,
    "min_samples_split": 5,
    "min_samples_leaf": 3,
    "max_features": "log2",
    "random_state": 42
}

tree.log_hyperparameters(value = rf_params)

'hyperparameter : n_estimators' logged (ID: 3e7e028a-d5c7-4651-99fc-346f57d9edef)
'hyperparameter : max_depth' logged (ID: eff68de4-139a-4007-8ca1-6f7d305a2488)
'hyperparameter : min_samples_split' logged (ID: 3adb957a-1666-4bb5-a7fb-4c338c109dde)
'hyperparameter : min_samples_leaf' logged (ID: 73318a71-7d25-4b62-a4a9-1d4374be8120)
'hyperparameter : max_features' logged (ID: c0537b3d-3c1e-4ac0-a74a-7076a1a3f885)
'hyperparameter : random_state' logged (ID: d5701081-9f56-4ac8-8e0a-5e59c95d1130)


In [15]:
# Create and train the RandomForest model
model = RandomForestClassifier(**rf_params)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
metrics = {"accuracy": accuracy_score(y_test, y_pred),
           "precision" : precision_score(y_test, y_pred, average='macro'),
           "recall": recall_score(y_test, y_pred, average='macro')}
tree.log_metrics(value = metrics)
tree.log_artifacts(name = "Dataset", 
                   value = "Iris flower data set", 
                   artifact_type="data")
tree.stop_run()

'metric : accuracy' logged (ID: 12989fe7-2cbb-4c2b-9d47-a687f263d285)
'metric : precision' logged (ID: b4d5d7e3-8ce3-4a19-8233-404d460d2e4b)
'metric : recall' logged (ID: ed1527b1-1a74-4f16-aa2f-90396f51098e)
'artifact : Dataset' logged (ID: 94d27dac-a08b-4578-9ea1-ab82c0d93a9c)


#### Explore Experiment and compare Runs 

In [16]:
tree.view_experiments()

HTML(value="<h1 style='color: #2c3e50; margin-bottom: 20px;'>Experiment Dashboard</h1>")

Dropdown(description='Experiment:', layout=Layout(width='350px'), options=('RandomForestModel',), style=Descri…

HTML(value="\n        <div style='border: 1px solid #e0e0e0; border-radius: 8px; padding: 15px; margin-bottom:…

HTML(value="<div style='font-size: 1.3em; font-weight: bold; color: #3a3a3a; margin-bottom: 12px;'>Metrics Sum…

HBox(children=(Text(value='', description='Filter runs:', layout=Layout(margin='0 20px 0 0', width='350px'), p…

VBox(children=(HBox(children=(HTML(value="<div style='font-size: 1.3em; font-weight: bold; color: #3a3a3a; mar…

HTML(value="<div style='padding: 8px; color: #555;'>Displaying 2 runs for experiment 'RandomForestModel'</div>…

#### Look at individual runs

In [17]:
tree.view_runs(experiment_name="RandomForestModel")

HTML(value='\n        <style>\n            .run-view-container { max-width: 1000px; margin: 0 auto; font-famil…

VBox(children=(HTML(value='<div class="run-view-header"><div class="run-view-title">ML Run Explorer</div></div…

#### Export tree and track this file on git - experiment metadata can be tracked on git like any other code/file

In [18]:
tree.export_tree(filename="randomforestmodel.json")

#### Load the file next time when resuming the work

In [19]:
tree.load_tree(filename="randomforestmodel.json")

Imported the experiment tree successfully


#### To resume work we need to set project and experiment

In [20]:
tree.view_tree()

Unnamed: 0,project_name,project_id,experiment_name,experiment_id,run_name,run_id
0,IRIS CLASSIFICATION,a1422e94-e7a5-4597-8e7f-7980e9596aea,RandomForestModel,c56e18c0-e5e2-4a2e-af58-f222054b0bcc,run1,315bcfe1-8fe6-49cb-864e-aa248a9b43bf
1,IRIS CLASSIFICATION,a1422e94-e7a5-4597-8e7f-7980e9596aea,RandomForestModel,c56e18c0-e5e2-4a2e-af58-f222054b0bcc,run3,7ce3ac99-2121-4f4b-9199-d4865525747d


#### To resume work we need to set project and experiment

In [21]:
tree.set_project(name = "IRIS CLASSIFICATION")
tree.set_experiment(name = "RandomForestModel")

Using existing project IRIS CLASSIFICATION - (ID: ['a1422e94-e7a5-4597-8e7f-7980e9596aea'])
Using existing experiment RandomForestModel - (ID: ['c56e18c0-e5e2-4a2e-af58-f222054b0bcc'])


#### >>
```1. start new run -> log hyperparamters -> log metrics -> view experiments (resume work)```<br>
```2. view experiments/run (if only visualization needed - for PM/TL)```

#### Additional notes:
-> It is possible to add tensorboard links(any external links) to artifacts to view epoch level metrics or finegraded analysis on training<br>
-> It is possible to  add images such as plots to artifacts to view results of experiments (like training loss /accruacy curve/ AUC ROC etc)<br>
-> tree.delete(`<name of node>`,`<type of node>`) to remove logged info from tree. (type - experiment, run, metric, hyperparamter, prompt, artifact)