### Objective: 
This notebook provides an overview on how to use the framework for experiment tracking. 

#### Data scientist/ ML engineer journey on experimentation
```
Data Scientist
     |
     v
Create Experiment ("Model V1")
     |
     v
+------------------+        +------------------+        +----------------------+
|   Run 1: Train   | -----> | Run 1: Evaluate  | -----> | Accuracy(metric): 92%|
| LR=0.01, Epochs=10|        +------------------+        +---------------------+
+------------------+

     |
     v

+------------------+        +------------------+        +----------------------+
|   Run 2: Train   | -----> | Run 2: Evaluate  | -----> | Accuracy(metric): 94%|
| LR=0.005, Epochs=20|      +-------------------|       +----------------------+
+------------------+

     |
     v

More Runs (optional)
     |
     v
Compare & Analyze Results
```


In [1]:
## import pacakges
from src import tree
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score

#### Create the project

In [2]:
tree.set_project(name = "IRIS CLASSIFICATION",
                 created_by = "member 1")

Project IRIS CLASSIFICATION - (ID: 5f891f77-4950-4069-be50-0cfcdf56cefe) created successfully


#### Create Experiment

In [3]:
tree.set_experiment(name = "RandomForestModel",
                 created_by = "member 1",
                 description = """The Iris dataset is a classic multivariate dataset introduced by Ronald Fisher, 
                 containing 150 samples of iris flowers with four features: sepal length, sepal width, petal length, and petal width. 
                 The problem involves classifying the flowers into one of three species—Setosa, Versicolor, or Virginica—based on these features.""")

Experiment RandomForestModel - (ID: 2b392d76-d017-4016-b53b-19ad17ddd8c9) created successfully


#### Start Runs
A run is an instance of the experiment with specific hyperparamters and other things

In [4]:
tree.start_run(name="run1", created_by="member 1")

Run run1 - (ID: 6cabe5ed-728f-4751-80f4-ec0a5c7c7a61) created successfully


#### Prepare dataset

In [5]:
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

#### Perform training with hyperparameters and log the experiment metadata

In [6]:
rf_params = {
    "n_estimators": 1,
    "max_depth": 8,
    "min_samples_split": 4,
    "min_samples_leaf": 2,
    "max_features": "sqrt",
    "random_state": 42
}

tree.log_hyperparameters(value = rf_params)

'hyperparameter : n_estimators' logged (ID: 877b706a-dfb4-46a4-99c5-10ca22e04861)
'hyperparameter : max_depth' logged (ID: 93a925bf-1b5a-4b7d-b747-3ea2a5fb9117)
'hyperparameter : min_samples_split' logged (ID: 5e75520e-ff66-465b-97f8-db3e7cf5ccbb)
'hyperparameter : min_samples_leaf' logged (ID: 684b9652-a75b-454e-9add-827709097a5a)
'hyperparameter : max_features' logged (ID: 9315558b-0095-4b98-aacb-dad1413720d3)
'hyperparameter : random_state' logged (ID: 417cc40c-2f06-4f83-820c-a53194a9dcfe)


In [7]:
# Create and train the RandomForest model
model = RandomForestClassifier(**rf_params)
model.fit(X_train, y_train)

In [8]:
# Predict and evaluate
y_pred = model.predict(X_test)
metrics = {"accuracy": accuracy_score(y_test, y_pred),
           "precision" : precision_score(y_test, y_pred, average='macro'),
           "recall": recall_score(y_test, y_pred, average='macro')}

In [9]:
tree.log_metrics(value = metrics)

'metric : accuracy' logged (ID: f818de21-f922-4ef9-a2ff-167d7bdd109e)
'metric : precision' logged (ID: 977e142d-3866-4859-8769-ecb17400eae2)
'metric : recall' logged (ID: 9e5203f8-ca71-4559-a680-3e13e269715f)


In [10]:
tree.log_artifacts(name = "Dataset", 
                   value = "Iris flower data set", 
                   artifact_type="data")

'artifact : Dataset' logged (ID: 6489da23-9856-4608-9117-303892f397d5)


In [11]:
tree.stop_run()

In [12]:
metrics

{'accuracy': 0.95,
 'precision': 0.9463937621832358,
 'recall': 0.9463937621832358}

#### Train with different hyperparameters

In [13]:
tree.start_run(name="run3", created_by="member 1") #create a new run

Run run3 - (ID: a7e5fff4-fa8d-4fb7-a274-da8929135a14) created successfully


In [14]:
rf_params = {
    "n_estimators": 3,
    "max_depth": 3,
    "min_samples_split": 5,
    "min_samples_leaf": 3,
    "max_features": "log2",
    "random_state": 42
}

tree.log_hyperparameters(value = rf_params)

'hyperparameter : n_estimators' logged (ID: 1bd68214-8ace-4a36-8cbf-02974ef5dba0)
'hyperparameter : max_depth' logged (ID: 6d6d4374-15f4-4db6-8101-8c4d4fe78a32)
'hyperparameter : min_samples_split' logged (ID: d4059c77-e1ef-4f92-8cac-40ef67497847)
'hyperparameter : min_samples_leaf' logged (ID: eb87590d-8d54-4ce7-b5fc-25cf669ba30e)
'hyperparameter : max_features' logged (ID: 7151d4cd-9a94-450c-b867-500b033cf988)
'hyperparameter : random_state' logged (ID: 4e0b7d41-1775-4471-9808-b9d99b2ba6c5)


In [15]:
# Create and train the RandomForest model
model = RandomForestClassifier(**rf_params)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
metrics = {"accuracy": accuracy_score(y_test, y_pred),
           "precision" : precision_score(y_test, y_pred, average='macro'),
           "recall": recall_score(y_test, y_pred, average='macro')}
tree.log_metrics(value = metrics)
tree.log_artifacts(name = "Dataset", 
                   value = "Iris flower data set", 
                   artifact_type="data")
tree.stop_run()

'metric : accuracy' logged (ID: 5581147e-a56b-449a-9ffd-588a45fbe521)
'metric : precision' logged (ID: a863427f-c7c7-4d52-8637-621a92c65327)
'metric : recall' logged (ID: d385cfd3-4417-4c8d-8d1f-05af95e6c12d)
'artifact : Dataset' logged (ID: 4e8ff202-bcd5-4dc2-bbbe-c507d8b8c71f)


#### Explore Experiment and compare Runs 

In [16]:
tree.view_experiments()

HTML(value="<h1 style='color: #2c3e50; margin-bottom: 20px;'>Experiment Dashboard</h1>")

Dropdown(description='Experiment:', layout=Layout(width='350px'), options=('RandomForestModel',), style=Descri…

HTML(value="\n        <div style='border: 1px solid #e0e0e0; border-radius: 8px; padding: 15px; margin-bottom:…

HTML(value="<div style='font-size: 1.3em; font-weight: bold; color: #3a3a3a; margin-bottom: 12px;'>Metrics Sum…

HBox(children=(Text(value='', description='Filter runs:', layout=Layout(margin='0 20px 0 0', width='350px'), p…

VBox(children=(HBox(children=(HTML(value="<div style='font-size: 1.3em; font-weight: bold; color: #3a3a3a; mar…

HTML(value="<div style='padding: 8px; color: #555;'>Displaying 2 runs for experiment 'RandomForestModel'</div>…

#### Look at individual runs

In [17]:
tree.view_runs(experiment_name="RandomForestModel")

HTML(value='\n        <style>\n            .run-view-container { max-width: 1000px; margin: 0 auto; font-famil…

VBox(children=(HTML(value='<div class="run-view-header"><div class="run-view-title">ML Run Explorer</div></div…

#### Export tree and track this file on git - experiment metadata can be tracked on git like any other code/file

In [18]:
tree.export_tree(filename="randomforestmodel.json")

#### Load the file next time when resuming the work

In [19]:
tree.load_tree(filename="randomforestmodel.json")

Imported the experiment tree successfully


#### To resume work we need to set project and experiment

In [20]:
tree.view_tree()

Unnamed: 0,project_name,project_id,experiment_name,experiment_id,run_name,run_id
0,IRIS CLASSIFICATION,5f891f77-4950-4069-be50-0cfcdf56cefe,RandomForestModel,2b392d76-d017-4016-b53b-19ad17ddd8c9,run1,6cabe5ed-728f-4751-80f4-ec0a5c7c7a61
1,IRIS CLASSIFICATION,5f891f77-4950-4069-be50-0cfcdf56cefe,RandomForestModel,2b392d76-d017-4016-b53b-19ad17ddd8c9,run3,a7e5fff4-fa8d-4fb7-a274-da8929135a14


#### To resume work we need to set project and experiment

In [21]:
tree.set_project(name = "IRIS CLASSIFICATION")
tree.set_experiment(name = "RandomForestModel")

Using existing project IRIS CLASSIFICATION - (ID: ['5f891f77-4950-4069-be50-0cfcdf56cefe'])
Using existing experiment RandomForestModel - (ID: ['2b392d76-d017-4016-b53b-19ad17ddd8c9'])


#### >>
```1. start new run -> log hyperparamters -> log metrics -> view experiments (resume work)```<br>
```2. view experiments/run (if only visualization needed - for PM/TL)```

#### Additional notes:
-> It is possible to add tensorboard links(any external links) to artifacts to view epoch level metrics or finegraded analysis on training<br>
-> It is possible to  add images such as plots to artifacts to view results of experiments (like training loss /accruacy curve/ AUC ROC etc)<br>
-> tree.delete(`<name of node>`,`<type of node>`) to remove logged info from tree. (type - experiment, run, metric, hyperparamter, prompt, artifact)