<h1 style="text-align: center; color: darkblue;">Results</h1>

### ðŸ“‘ <font color='blue'> Table of Contents </font>
1. [Introduction](#1)
2. [Setup](#2)
3. [Helper Functions](#3)
4. [Results](#4) <br>
    4.1. [Model summary and configuration](#4.1) <br>
    4.2. [Validation performance](#4.2) <br>
    4.3. [Test performance](#4.3)


## <a id="1" style="color: darkred; text-decoration: none;">1. Introduction</a>

This notebook presents the final results of a model developed to predict the dipole moment of molecules from the QM9 dataset.

Here I summarize the final model, its evaluation metrics on the test set, and the main lessons learned throughout the project.


## <a id="2" style="color: darkred; text-decoration: none;">2. Setup </a>

In [1]:
import os, sys
sys.path.append(os.path.abspath("..")) # to be able to import src

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


In [37]:
import pandas as pd
import mlflow
from mlflow.tracking import MlflowClient
from pathlib import Path

from src.utils.logging import select_best_model

In [38]:
EXPERIMENT_NAME="qm9" # later from conf

In [29]:
# mlflow tracking URI
mlflow_path = Path().resolve().parent / "mlflow.db"
mlflow.set_tracking_uri(f"sqlite:///{mlflow_path}")

## <a id="3" style="color: darkred; text-decoration: none;">3. Helper Functions </a>

In [77]:
def get_experiment_id_by_name(name):
    """get exp. id by name"""
    exp = client.get_experiment_by_name(name)
    if exp is None:
        raise ValueError(f"Experiment '{name}' not found.")
    return exp.experiment_id


def get_run_data(run_id):
    """get data from a specific run"""
    client = MlflowClient()
    run = client.get_run(run_id)

    data = {
        "info": run.info,
        "params": run.data.params,
        "metrics": run.data.metrics,
        "tags": run.data.tags,
        "artifacts": {}
    }

    # download artifacts
    for a in client.list_artifacts(run_id):
        local_path = client.download_artifacts(run_id, a.path)
        data["artifacts"][a.path] = local_path

    return data




# experiment "test_evaluation" contains the test results of the previously selected best model
# according to val results
def get_best(experiment_id, run_name='test_evaluation'):
    """get data from my best experiment"""
    df = mlflow.search_runs(
    experiment_ids=[experiment_id],
    filter_string=(
            f"attributes.run_name = '{run_name}' "
        )
    )

    # Keep only the best (lowest) test_mse
    df = df.sort_values("metrics.test_mse", ascending=True)
    best_run = df.iloc[0]

    
    return {
        "run_id": best_run["run_id"],
        "tuning_run_id": best_run["tags.tuning_run_id"],
        "metrics" : {
            "mse": best_run['metrics.test_mse'],
            "rmse": best_run['metrics.test_rmse'],
            "mae": best_run['metrics.test_mae'],
            "r2": best_run['metrics.test_r2'],
            "ev": best_run['metrics.test_ev'],
        }
    }


def get_all_data(exp_name):
    exp_id = get_experiment_id_by_name(exp_name)
    best_data = get_best(exp_id)
    tuning_data = get_run_data(best_data['tuning_run_id'])
    return {
        "model_type": tuning_data["tags"]["model_type"],
        "test_metrics": best_data["metrics"],
        "val_metrics": tuning_data["metrics"],
        "params": tuning_data["params"],
        # artifacts, loss curve
    }
    
    



## <a id="4" style="color: darkred; text-decoration: none;">4. Results </a>

In [79]:
# get data
data = get_all_data(EXPERIMENT_NAME)

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

### <a id="4.1" style="color: darkorange; text-decoration: none;">4.1. Model summary and configuration </a>


In [82]:
df_model = pd.DataFrame([{'model_type': data['model_type']}])

df_params = pd.DataFrame(
    [{k: v for k, v in data['params'].items()}]
).T.reset_index()
df_params.columns = ["parameter", "value"]

display(df_model)
display(df_params)

Unnamed: 0,model_type
0,schnet


Unnamed: 0,parameter,value
0,val_ratio,0.2
1,subset,6000.0
2,target,0.0
3,batch_size,16.0
4,lr,0.0015625918075966
5,hidden_channels,128.0
6,num_filters,128.0
7,num_interactions,4.0


In [None]:
# .... ad entire model config and epochs....

### <a id="4.2" style="color: darkorange; text-decoration: none;">4.2. Validation performance </a>

In [83]:
df_val = pd.DataFrame(
    [{k: v for k, v in data['val_metrics'].items()}]
).T.reset_index()
df_val.columns = ["validation_metric", "value"]

df_val

Unnamed: 0,validation_metric,value
0,val_mse,0.12033
1,val_rmse,0.346886
2,val_mae,0.206557
3,val_r2,0.951106
4,val_ev,0.951205


### <a id="4.3" style="color: darkorange; text-decoration: none;">4.3. Test Performance </a>

In [84]:
df_test = pd.DataFrame(
    [{k: v for k, v in data['test_metrics'].items()}]
).T.reset_index()
df_test.columns = ["test_metric", "value"]

df_test

Unnamed: 0,test_metric,value
0,mse,0.265931
1,rmse,0.515685
2,mae,0.385589
3,r2,0.693977
4,ev,0.736624





## <a id="5" style="color: darkred; text-decoration: none;">5. Analysis & Conclusion </a>



maybe separate.............