# Analysis of Model Performance and Splitting Strategies for Molecular Datasets

# Table of Contents

## 1. Setup and Data Loading
- [1.1 Import Libraries](#Import-Libraries)
  - Data manipulation and visualization libraries
  - RDKit for molecular handling  
  - Custom utilities for ranking analysis

## 2. Model Performance Analysis
- [2.1 Overall Performance Gap between ID and OOD](#Performance-GAP-between-ID-and-OOD-\\(ALL-Models\\))
  - Analysis across all models and datasets
  - ROC-AUC performance comparison between in-distribution and out-of-distribution test sets
  - Performance gap quantification (ID - OOD)
  - Combined results table with mean ± standard deviation

- [2.2 Model Type Comparison](#Performance-GAP-between-ID-and-OOD-\\(ML-and-GNN-Models-separately\\))
  - Classical ML vs Graph Neural Networks
  - Separate analysis for each model type
  - Comparative performance gaps across splitting strategies
  - Model-specific performance tables and LaTeX export

## 3. Splitting Strategy Evaluation
- [3.1 Correlation Analysis between Splitting Methods](#Correlation-between-Splitters-\\(Ranking-Splitters\\))
  - Tanimoto similarity vs performance gaps
  - TMD (Tree Mover Distance) analysis
  - Spearman and Kendall correlation metrics
  - Pairwise ranking comparison across different splitting strategies

## 4. Supplementary Analysis
- [4.1 Dataset Statistics and Additional Metrics](#Misc)
  - Dataset size and activity ratio analysis
  - Multi-level indexing examples
  - Experimental data structures

---

**Overview:** This notebook provides a comprehensive analysis of model performance and splitting strategies for molecular datasets, comparing classical machine learning approaches with graph neural networks across multiple domain adaptation scenarios.

# Import Libraries

In [1]:
# Import required libraries
# For plotting
%matplotlib inline

# System and file operations
import os
import sys
import yaml
from typing import List  # For type hints

# Data manipulation libraries
import numpy as np  # For numerical operations
import pandas as pd  # For data frame operations

# RDKit libraries for molecular visualization
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

# Import custom utilities
from alinemol.utils import compare_rankings  # For comparing ranking methods

# Set up paths
# Get repository root path
repo_path = os.path.dirname(os.path.abspath(""))
CHECKOUT_PATH = repo_path
# Path to datasets directory
DATASET_PATH = os.path.join(repo_path, "datasets")

# Change working directory to repo root
os.chdir(CHECKOUT_PATH)
# Add repo root to Python path for imports
sys.path.insert(0, CHECKOUT_PATH)

# Configure RDKit drawing options to use comic style
Draw.SetComicMode(IPythonConsole.drawOptions)

In [2]:
# Load the configuration file which contains datasets, models, and splitting strategies
CFG = yaml.safe_load(open(os.path.join(DATASET_PATH, "config.yml"), "r"))

# Extract different model types from config
# Classical machine learning models
ML_MODELS: List = CFG["models"]["ML"]
# Graph neural network models trained from scratch
SCRATCH_GNN_MODELS: List = CFG["models"]["GNN"]["scratch"]
# Pre-trained graph neural network models
PRETRAINED_GNN_MODELS: List = CFG["models"]["GNN"]["pretrained"]
# Combine all GNN models into one list
GNN_MODELS: List = SCRATCH_GNN_MODELS + PRETRAINED_GNN_MODELS
# Create list of all model types
ALL_MODELS: List[List] = [ML_MODELS, SCRATCH_GNN_MODELS, PRETRAINED_GNN_MODELS]

# Get dataset names from TDC (Therapeutic Data Commons)
DATASET_NAMES: List = CFG["datasets"]["TDC"]
# Get different splitting strategies used for train/test splits
SPLIT_TYPES: List = CFG["splitting"]

# Read the pre-computed results from CSV file
results = pd.read_csv(os.path.join("classification_results", "TDC", "results.csv"))

# Add a column to identify if model is classical ML or GNN
results["model_type"] = results["model"].apply(lambda x: "Classical_ML" if x in ML_MODELS else "GNN")

# Dictionary mapping metric names to their display names
metric_mapping = {"accuracy": "Accuracy", "roc_auc": "ROC-AUC", "pr_auc": "PR-AUC"}

# Performance GAP between ID and OOD (ALL Models)

This section analyzes the performance gap between in-distribution (ID) and out-of-distribution (OOD) test sets
across different splitting strategies and datasets. The analysis includes both classical ML models and GNNs.
 
The performance is measured using ROC-AUC scores, with results showing:
- Test performance on ID data (data similar to training)
- Test performance on OOD data (data different from training) 
- The gap between ID and OOD performance (ID - OOD)

The results are formatted as: mean (standard deviation) across multiple runs

In [3]:
# Specify the metric we want to analyze (ROC-AUC score)
metric = "roc_auc"

# Initialize DataFrames to store means and standard deviations
# For in-distribution (ID) test set performance
mean_df_id = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
std_df_id = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)

# For out-of-distribution (OOD) test set performance
mean_df_ood = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
std_df_ood = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)

# For the performance gap between ID and OOD
diff_mean = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
diff_std = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)

# Calculate statistics for each dataset and splitting strategy combination
for dataset in DATASET_NAMES:
    for splits in SPLIT_TYPES:
        # Filter results for current dataset and split type
        df = results[(results["dataset"] == dataset) & (results["split"] == splits)]

        # Calculate mean and std dev of ID test performance
        mean_df_id.loc[splits, dataset] = df[f"ID_test_{metric}"].mean()
        std_df_id.loc[splits, dataset] = df[f"ID_test_{metric}"].std()

        # Calculate mean and std dev of OOD test performance
        mean_df_ood.loc[splits, dataset] = df[f"OOD_test_{metric}"].mean()
        std_df_ood.loc[splits, dataset] = df[f"OOD_test_{metric}"].std()

        # Calculate mean and std dev of performance gap (ID - OOD)
        diff_mean.loc[splits, dataset] = (df[f"ID_test_{metric}"] - df[f"OOD_test_{metric}"]).mean()
        diff_std.loc[splits, dataset] = (df[f"ID_test_{metric}"] - df[f"OOD_test_{metric}"]).std()

In [4]:
# Format mean and standard deviation values for ID test set results
# Convert numeric values to strings with 2 decimal places
mean_df_id = mean_df_id.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
std_df_id = std_df_id.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
# Combine mean and std into single string with format "mean (std)"
df_id = mean_df_id + " (" + std_df_id + ")"

# Format mean and standard deviation values for OOD test set results
mean_df_ood = mean_df_ood.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
std_df_ood = std_df_ood.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
# Combine mean and std into single string with format "mean (std)"
df_ood = mean_df_ood + " (" + std_df_ood + ")"

# Format mean and standard deviation values for performance gap (ID - OOD)
diff_mean = diff_mean.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
diff_std = diff_std.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
# Combine mean and std into single string with format "mean (std)"
df_diff = diff_mean + " (" + diff_std + ")"

# Combine all dataframes into one, with hierarchical index
# Keys indicate whether values are for ID test set, OOD test set, or performance gap
combined_df = pd.concat([df_id, df_ood, df_diff], keys=["Test (ID)", "Test (OOD)", "Gap"]).swaplevel(0, 1).sort_index()

# Define custom orders for the hierarchical index levels
split_order = SPLIT_TYPES  # Order for different data splitting strategies
performance_order = ["Test (ID)", "Test (OOD)", "Gap"]  # Order for performance metrics

# Create new index with desired ordering and rename levels
idx = pd.MultiIndex.from_product([split_order, performance_order], names=["Domain", "Performance"])
# Reorder the dataframe using the new index
combined_df = combined_df.reindex(idx)
combined_df

Unnamed: 0_level_0,Unnamed: 1_level_0,CYP1A2,CYP2C9,CYP2C19,CYP2D6,CYP3A4,HIV,AMES,HERG
Domain,Performance,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
random,Test (ID),0.77 (0.01),0.78 (0.01),0.85 (0.01),0.85 (0.01),0.76 (0.01),0.80 (0.02),0.67 (0.02),0.88 (0.02)
random,Test (OOD),0.77 (0.01),0.79 (0.01),0.86 (0.01),0.86 (0.01),0.76 (0.02),0.80 (0.02),0.67 (0.02),0.88 (0.02)
random,Gap,-0.00 (0.02),-0.00 (0.01),-0.00 (0.01),-0.00 (0.01),0.00 (0.01),-0.00 (0.03),0.00 (0.02),0.00 (0.01)
scaffold,Test (ID),0.78 (0.01),0.78 (0.01),0.85 (0.01),0.85 (0.02),0.76 (0.02),0.79 (0.03),0.67 (0.02),0.88 (0.02)
scaffold,Test (OOD),0.76 (0.02),0.77 (0.02),0.85 (0.01),0.85 (0.02),0.75 (0.02),0.77 (0.03),0.64 (0.02),0.84 (0.02)
scaffold,Gap,0.02 (0.02),0.01 (0.02),0.01 (0.02),0.01 (0.02),0.01 (0.03),0.02 (0.04),0.03 (0.02),0.04 (0.01)
scaffold_generic,Test (ID),0.77 (0.01),0.78 (0.01),0.85 (0.01),0.85 (0.01),0.76 (0.01),0.79 (0.03),0.67 (0.02),0.89 (0.02)
scaffold_generic,Test (OOD),0.76 (0.02),0.77 (0.01),0.84 (0.01),0.85 (0.02),0.75 (0.02),0.77 (0.04),0.64 (0.02),0.82 (0.02)
scaffold_generic,Gap,0.01 (0.02),0.01 (0.01),0.01 (0.01),0.01 (0.02),0.01 (0.02),0.03 (0.04),0.03 (0.02),0.07 (0.02)
molecular_weight,Test (ID),0.78 (0.01),0.76 (0.01),0.86 (0.01),0.86 (0.01),0.76 (0.01),0.73 (0.03),0.66 (0.01),0.88 (0.02)


In [94]:
latex_table = combined_df.to_latex(
    escape=False, index=True, float_format="{:.2f}".format, buf="assets/Model_comparison.tex"
)

# Performance GAP between ID and OOD (ML and GNN Models separately)

In [None]:
metric = "roc_auc"
mean_df_id_ML = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
std_df_id_ML = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
mean_df_id_GNN = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
std_df_id_GNN = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
mean_df_ood_ML = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
std_df_ood_ML = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
mean_df_ood_GNN = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
std_df_ood_GNN = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
diff_mean_ML = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
diff_std_ML = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
diff_mean_GNN = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)
diff_std_GNN = pd.DataFrame(index=SPLIT_TYPES, columns=DATASET_NAMES)

for dataset in DATASET_NAMES:
    for splits in SPLIT_TYPES:
        for model_type in ["Classical_ML", "GNN"]:
            df = results[
                (results["dataset"] == dataset) & (results["split"] == splits) & (results["model_type"] == model_type)
            ]
            if model_type == "Classical_ML":
                mean_df_id_ML.loc[splits, dataset] = df[f"ID_test_{metric}"].mean()
                std_df_id_ML.loc[splits, dataset] = df[f"ID_test_{metric}"].std()
                mean_df_ood_ML.loc[splits, dataset] = df[f"OOD_test_{metric}"].mean()
                std_df_ood_ML.loc[splits, dataset] = df[f"OOD_test_{metric}"].std()
                diff_mean_ML.loc[splits, dataset] = (df[f"ID_test_{metric}"] - df[f"OOD_test_{metric}"]).mean()
                diff_std_ML.loc[splits, dataset] = (df[f"ID_test_{metric}"] - df[f"OOD_test_{metric}"]).std()
            else:
                mean_df_id_GNN.loc[splits, dataset] = df[f"ID_test_{metric}"].mean()
                std_df_id_GNN.loc[splits, dataset] = df[f"ID_test_{metric}"].std()
                mean_df_ood_GNN.loc[splits, dataset] = df[f"OOD_test_{metric}"].mean()
                std_df_ood_GNN.loc[splits, dataset] = df[f"OOD_test_{metric}"].std()
                diff_mean_GNN.loc[splits, dataset] = (df[f"ID_test_{metric}"] - df[f"OOD_test_{metric}"]).mean()
                diff_std_GNN.loc[splits, dataset] = (df[f"ID_test_{metric}"] - df[f"OOD_test_{metric}"]).std()

In [102]:
mean_df_id_ML = mean_df_id_ML.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
std_df_id_ML = std_df_id_ML.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
df_id_ML = mean_df_id_ML + " (" + std_df_id_ML + ")"
mean_df_ood_ML = mean_df_ood_ML.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
std_df_ood_ML = std_df_ood_ML.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
df_ood_ML = mean_df_ood_ML + " (" + std_df_ood_ML + ")"
diff_mean_ML = diff_mean_ML.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
diff_std_ML = diff_std_ML.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
df_diff_ML = diff_mean_ML + " (" + diff_std_ML + ")"


mean_df_id_GNN = mean_df_id_GNN.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
std_df_id_GNN = std_df_id_GNN.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
df_id_GNN = mean_df_id_GNN + " (" + std_df_id_GNN + ")"
mean_df_ood_GNN = mean_df_ood_GNN.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
std_df_ood_GNN = std_df_ood_GNN.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
df_ood_GNN = mean_df_ood_GNN + " (" + std_df_ood_GNN + ")"
diff_mean_GNN = diff_mean_GNN.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
diff_std_GNN = diff_std_GNN.map(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
df_diff_GNN = diff_mean_GNN + " (" + diff_std_GNN + ")"

combined_df_ml = (
    pd.concat([df_id_ML, df_ood_ML, df_diff_ML], keys=["Test (ID)", "Test (OOD)", "Gap"]).swaplevel(0, 1).sort_index()
)
combined_df_gnn = (
    pd.concat([df_id_GNN, df_ood_GNN, df_diff_GNN], keys=["Test (ID)", "Test (OOD)", "Gap"])
    .swaplevel(0, 1)
    .sort_index()
)

combined_df = pd.concat([combined_df_ml, combined_df_gnn], keys=["Classical_ML", "GNN"]).swaplevel(0, 1).sort_index()
# Define custom orders for each level
split_order = SPLIT_TYPES  # custom order for splits
model_order = ["Classical_ML", "GNN"]  # custom order for models
performance_order = ["Test (ID)", "Test (OOD)", "Gap"]  # custom order

idx = pd.MultiIndex.from_product(
    [split_order, model_order, performance_order], names=["Domain", "Model Type", "Performance"]
)

combined_df = combined_df.reindex(idx)
combined_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CYP1A2,CYP2C9,CYP2C19,CYP2D6,CYP3A4,HIV,AMES,HERG
Domain,Model Type,Performance,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
random,Classical_ML,Test (ID),0.69 (0.01),0.72 (0.01),0.77 (0.01),0.79 (0.01),0.71 (0.01),0.83 (0.01),0.63 (0.02),0.82 (0.01)
random,Classical_ML,Test (OOD),0.69 (0.01),0.73 (0.01),0.78 (0.01),0.79 (0.01),0.71 (0.01),0.84 (0.01),0.62 (0.01),0.82 (0.01)
random,Classical_ML,Gap,0.00 (0.02),-0.00 (0.01),-0.01 (0.01),0.00 (0.02),-0.00 (0.01),-0.01 (0.02),0.00 (0.02),-0.00 (0.01)
random,GNN,Test (ID),0.69 (0.01),0.72 (0.02),0.78 (0.01),0.79 (0.02),0.69 (0.02),0.79 (0.02),0.64 (0.02),0.79 (0.03)
random,GNN,Test (OOD),0.69 (0.01),0.72 (0.01),0.78 (0.01),0.79 (0.02),0.69 (0.02),0.80 (0.02),0.63 (0.02),0.79 (0.03)
random,GNN,Gap,-0.00 (0.01),-0.00 (0.01),-0.00 (0.02),-0.00 (0.01),0.00 (0.01),-0.00 (0.02),0.00 (0.02),0.00 (0.01)
scaffold,Classical_ML,Test (ID),0.69 (0.01),0.73 (0.01),0.77 (0.01),0.79 (0.01),0.71 (0.01),0.83 (0.01),0.63 (0.02),0.82 (0.01)
scaffold,Classical_ML,Test (OOD),0.68 (0.02),0.71 (0.01),0.77 (0.01),0.78 (0.01),0.70 (0.02),0.81 (0.02),0.61 (0.03),0.78 (0.01)
scaffold,Classical_ML,Gap,0.01 (0.02),0.02 (0.02),0.01 (0.02),0.01 (0.02),0.01 (0.02),0.02 (0.02),0.02 (0.02),0.04 (0.01)
scaffold,GNN,Test (ID),0.69 (0.01),0.72 (0.01),0.78 (0.01),0.79 (0.02),0.68 (0.02),0.79 (0.03),0.64 (0.02),0.79 (0.02)


In [103]:
latex_table = combined_df.to_latex(
    escape=False, index=True, float_format="{:.2f}".format, buf="assets/ML_GNN_comparison.tex"
)

# Correlation between Splitters (Ranking Splittrers)

In [34]:
dist_df = pd.read_csv(os.path.join(DATASET_PATH, "TDC", "nearest_distances.csv"))
jaccard_df = dist_df.groupby(["split"])["tanimoto"].median().reset_index()
tmd_df = dist_df.groupby(["split"])["tmd"].median().reset_index()

metric = "roc_auc"
metric_mapping = {"accuracy": "Accuracy", "roc_auc": "ROC-AUC", "pr_auc": "PR-AUC"}

diff = results[f"ID_test_{metric}"] - results[f"OOD_test_{metric}"]
results["diff"] = diff

# groupby based on split and model_type
grouped = results.groupby(["split", "model_type"])["diff"].median().reset_index()
grouped_ml = grouped[grouped["model_type"] == "Classical_ML"]
grouped_gnn = grouped[grouped["model_type"] == "GNN"]

categories = jaccard_df["split"].tolist()

condition1 = jaccard_df["tanimoto"].tolist()
condition2 = tmd_df["tmd"].tolist()
condition3 = grouped_ml["diff"].tolist()
condition4 = grouped_gnn["diff"].tolist()

# for all pairwise comparisons of conditions, calculate the spearman correlation and kendall tau
all_conditions = [condition1, condition2, condition3, condition4]
all_pairs = [(i, j) for i in range(len(all_conditions)) for j in range(i + 1, len(all_conditions))]
all_pairs_conditions = [(all_conditions[i], all_conditions[j]) for i, j in all_pairs]

corr = {}
for i, j in all_pairs:
    c1, c2 = all_conditions[i], all_conditions[j]
    r = compare_rankings(c1, c2, categories)
    spearman, kendall = r["spearman_correlation"], r["kendall_tau"]
    print(f"Pairwise comparison between conditions {i} and {j}:")
    print(f"Pairwise comparison between conditions {c1} and {c2}:")
    print(f"Spearman correlation: {spearman:.3f}")
    print(f"Kendall tau: {kendall:.3f}")
    print("\n")

Pairwise comparison between conditions 0 and 1:
Pairwise comparison between conditions [0.6714285714285714, 0.6794871794871795, 0.631578947368421, 0.6307692307692307, 0.6825396825396826, 0.5774647887323944, 0.6103896103896104, 0.6142857142857143] and [163.63, 176.13, 175.37, 237.65, 152.27, 120.56, 133.94, 138.21]:
Spearman correlation: 0.595
Kendall tau: 0.429


Pairwise comparison between conditions 0 and 2:
Pairwise comparison between conditions [0.6714285714285714, 0.6794871794871795, 0.631578947368421, 0.6307692307692307, 0.6825396825396826, 0.5774647887323944, 0.6103896103896104, 0.6142857142857143] and [0.07947796702675464, 0.06905058792194707, 0.05643878315723588, 0.03969601668821565, 0.08337859035966705, -0.0006075039333802157, 0.022156833333412784, 0.018824789490587412]:
Spearman correlation: 0.952
Kendall tau: 0.857


Pairwise comparison between conditions 0 and 3:
Pairwise comparison between conditions [0.6714285714285714, 0.6794871794871795, 0.631578947368421, 0.6307692307

In [27]:
len(all_pairs)

16

# Misc

In [14]:
# Define custom orders for each level
split_order = SPLIT_TYPES  # custom order for splits
stat_order = ["Dataset Size", "Activity ratio"]  # custom order for models
set_order = ["Train", "Val (ID)", "Test (ID)", "Test (OOD)"]  # custom order

idx = pd.MultiIndex.from_product([split_order, stat_order, set_order], names=["Domain", "Statistics", "Set"])

In [16]:
data = np.random.rand(64, 8)
df = pd.DataFrame(data, index=idx, columns=DATASET_NAMES)

In [17]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CYP1A2,CYP2C9,CYP2C19,CYP2D6,CYP3A4,HIV,AMES,HERG
Domain,Statistics,Set,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
random,Dataset Size,Train,0.454838,0.190829,0.523167,0.286667,0.998031,0.158023,0.782483,0.484382
random,Dataset Size,Val (ID),0.845606,0.745452,0.477773,0.894201,0.471229,0.154896,0.246393,0.211783
random,Dataset Size,Test (ID),0.949295,0.190642,0.474821,0.498215,0.212113,0.287133,0.419908,0.938806
random,Dataset Size,Test (OOD),0.610071,0.028546,0.226658,0.202620,0.348207,0.565516,0.672860,0.472450
random,Activity ratio,Train,0.687896,0.730727,0.682380,0.535350,0.426055,0.411397,0.334243,0.656494
...,...,...,...,...,...,...,...,...,...,...
max_dissimilarity,Dataset Size,Test (OOD),0.810158,0.561266,0.516745,0.905754,0.125019,0.577599,0.431944,0.412205
max_dissimilarity,Activity ratio,Train,0.663777,0.439199,0.187821,0.483549,0.844098,0.286383,0.173870,0.159953
max_dissimilarity,Activity ratio,Val (ID),0.966530,0.037359,0.171360,0.437220,0.418223,0.615298,0.502105,0.618655
max_dissimilarity,Activity ratio,Test (ID),0.204601,0.163413,0.638684,0.058742,0.351547,0.592279,0.813251,0.256821
