### LazyMIL: Automated MIL Benchmarking and Smart Consensus Modeling

The **LazyMIL** module provides a convenient, high-level interface for applying **Multiple Instance Learning (MIL)** to real-world or benchmark datasets.  
It seamlessly combines **descriptor calculation**, **model training**, and **evaluation** into one streamlined workflow — ideal for competitions, benchmarks, or quick exploratory studies.

LazyMIL automatically:
- Handles **descriptor calculation** for molecules or fragments.  
- Trains **multiple MIL estimators** in parallel.  
- Collects predictions and metrics for model comparison.  
- Optionally integrates **smart consensus optimization** using a genetic algorithm.

For consensus modeling, LazyMIL leverages the **QSARcons** package — a flexible framework for discovering optimal model ensembles.

> 🧩 **Install QSARcons before running this tutorial:**
> ```bash
> pip install qsarcons
> ```

**In summary:**  
LazyMIL simplifies the process of testing, comparing, and combining MIL models, making it a practical tool for QSAR researchers and ML competitions alike.

In [1]:
import polaris
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

from qsarmil.lazy import LazyMIL
from qsarcons.consensus import RandomSearchRegressor, SystematicSearchRegressor, GeneticSearchRegressor

### 1. Loading a Polaris Benchmark Dataset

In this example, we use one of the **Polaris** benchmark datasets — a curated collection of molecular property prediction tasks.  
Specifically, we load **`adme-fang-solu-1`**, a dataset focused on **aqueous solubility (LOG_SOLUBILITY)** prediction.

The benchmark provides pre-defined training and testing splits to ensure reproducibility and fair comparison across models.

In [2]:
# Load the benchmark from the Hub
benchmark = polaris.load_benchmark("polaris/adme-fang-solu-1")

# Get the train and test data-loaders
data_train, data_test = benchmark.get_train_test_split()
data_train, data_test = data_train.as_dataframe(), data_test.as_dataframe()

smi_train, prop_train = data_train["smiles"].to_list(), data_train["LOG_SOLUBILITY"].to_list()

data_train, data_val = train_test_split(data_train, test_size=0.2, random_state=42)

### 2. Build multiple MIL models models with conformers as instances

In [3]:
data_test["LogS"] = [0 for i in data_test.index]

lazy_mil = LazyMIL(task="regression", hopt=False, output_folder="logs_bench", n_cpu=20, verbose=True)
lazy_mil.run(data_train, data_val, data_test)

Generating conformers: 100%|████████████████████████████████████████████████████████| 1262/1262 [00:47<00:00, 26.52it/s]
Generating conformers: 100%|██████████████████████████████████████████████████████████| 316/316 [00:13<00:00, 23.16it/s]
Generating conformers: 100%|██████████████████████████████████████████████████████████| 400/400 [00:16<00:00, 24.81it/s]
Calculating descriptors: 100%|█████████████████████████████████████████████████████| 1262/1262 [00:03<00:00, 325.96it/s]
Calculating descriptors: 100%|███████████████████████████████████████████████████████| 316/316 [00:01<00:00, 292.11it/s]
Calculating descriptors: 100%|███████████████████████████████████████████████████████| 400/400 [00:01<00:00, 360.03it/s]
Calculating descriptors: 100%|█████████████████████████████████████████████████████| 1262/1262 [00:05<00:00, 218.20it/s]
Calculating descriptors: 100%|███████████████████████████████████████████████████████| 316/316 [00:01<00:00, 204.15it/s]
Calculating descriptors: 100%|██

72 / 72 — MolFeatPmapper|DynamicPoolingNetworkRegressorsoressoressor


<qsarmil.lazy.LazyMIL at 0x7f8d6db73910>

### 3. Build model consensus

In [4]:
metric = "auto"
cons_size = "auto"

In [5]:
cons_methods = [
    ("Best", SystematicSearchRegressor(cons_size=1, metric=metric)),         
    ("Random", RandomSearchRegressor(cons_size=cons_size, n_iter=1000, metric=metric)),       
    ("Systematic", SystematicSearchRegressor(cons_size=cons_size, metric=metric)),
    ("Genetic", GeneticSearchRegressor(cons_size=cons_size, n_iter=50, pop_size=50, mut_prob=0.2, metric=metric))
]

In [6]:
# load model predictions
df_val = pd.read_csv("logs_bench/val.csv")
df_test = pd.read_csv("logs_bench/test.csv")

# skip first two columns (smiles and true property value)
x_val, true_val = df_val.iloc[:, 2:], df_val.iloc[:, 1]
x_test = df_test.iloc[:, 2:]

In [7]:
for name, cons_searcher in cons_methods:

    # run search
    best_cons = cons_searcher.run(x_val, true_val)
    
    # make val and test predictions
    pred_val = cons_searcher._consensus_predict(x_val[best_cons])
    pred_test = cons_searcher._consensus_predict(x_test[best_cons])
    
    # write prediction accuracy metric
    df_val[name] = pred_val
    df_test[name] = pred_test

### 4. Summurize results

In [8]:
res = pd.DataFrame()
for model in df_val.columns[2:]:
    res.loc[model, "R2"] = r2_score(df_val["Y_TRUE"], df_val[model])

In [9]:
res.sort_values(by="R2", ascending=False)

Unnamed: 0,R2
Genetic,0.396709
Random,0.384397
Systematic,0.372215
RDKitGETAWAY|DynamicPoolingNetworkRegressor,0.311626
Best,0.311626
...,...
MolFeatUSRD|DynamicPoolingNetworkRegressor,0.064956
RDKitWHIM|MeanInstanceWrapperMLPNetworkRegressor,0.010812
MolFeatUSRD|MeanInstanceWrapperMLPNetworkRegressor,-0.015897
MolFeatPmapper|MeanBagWrapperMLPNetworkRegressor,-0.027218


In [10]:
y_pred = df_test["Genetic"].to_list()
results = benchmark.evaluate(y_pred)
results

test_set,target_label,scores
test,LOG_SOLUBILITY,pearsonr0.5320304922182691mean_squared_error0.39061086253604754r20.27955320681794393mean_absolute_error0.4336216957394166spearmanr0.413671670214811explained_var0.28278434710607636
pearsonr,0.5320304922182691,
mean_squared_error,0.39061086253604754,
r2,0.27955320681794393,
mean_absolute_error,0.4336216957394166,
spearmanr,0.413671670214811,
explained_var,0.28278434710607636,
benchmark_artifact_id,polaris/adme-fang-solu-1,
benchmark_name,,
benchmark_owner,,

test_set,target_label,scores
test,LOG_SOLUBILITY,pearsonr0.5320304922182691mean_squared_error0.39061086253604754r20.27955320681794393mean_absolute_error0.4336216957394166spearmanr0.413671670214811explained_var0.28278434710607636
pearsonr,0.5320304922182691,
mean_squared_error,0.39061086253604754,
r2,0.27955320681794393,
mean_absolute_error,0.4336216957394166,
spearmanr,0.413671670214811,
explained_var,0.28278434710607636,

0,1
pearsonr,0.5320304922182691
mean_squared_error,0.3906108625360475
r2,0.2795532068179439
mean_absolute_error,0.4336216957394166
spearmanr,0.413671670214811
explained_var,0.2827843471060763
