In [1]:
!wandb login

[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.
[34m[1mwandb[0m: Currently logged in as: [33ml-benhammadi[0m ([33ml-benhammadi-esi[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


## Phase 1: Data Ingestion & Preparation

In this phase, we prepare the **Tox21 toxicity dataset** and **ZINC unlabeled molecular dataset** for Semi-Supervised Learning (SSL).

### What We Do:
1. **Load Raw Data**: Import Tox21 labeled toxicity data and ZINC unlabeled molecular structures
2. **Canonicalize SMILES**: Standardize molecular representations using RDKit
3. **Feature Engineering**: Compute comprehensive molecular descriptors:
   - Basic properties (MolWt, LogP, H-donors/acceptors)
   - Lipinski's Rule of Five features
   - Topological descriptors (BertzCT, Kappa indices)
   - Pharmacophore features
4. **Handle Class Imbalance**: Downsample majority class for balanced training
5. **Version with W&B**: Create artifacts for data reproducibility

### Datasets:
- **Tox21**: ~7,800 compounds across 12 toxicity assays
- **ZINC**: Large unlabeled molecular database
- **Target**: Binary classification (toxic vs non-toxic)

In [None]:
import wandb
from rdkit import Chem
from rdkit.Chem import Descriptors, QED, Lipinski, Crippen, MolSurf, rdMolDescriptors
import pandas as pd
import numpy as np
from sklearn.utils import resample

PROJECT="QSAR_MLOPS_TOX21"

run = wandb.init(project=PROJECT, job_type="prepare-data")
raw_df_unlabeled = pd.read_csv('../../data/raw/original_data/zinc_unlabeled.csv')
raw_df_labeled = pd.read_csv('../../data/raw/original_data/tox21.csv')

def canonicalize_smiles(smiles):
    try:
        mol = Chem.MolFromSmiles(smiles.strip())
        if mol is not None:
            return Chem.MolToSmiles(mol, canonical=True)
    except:
        pass
    return None

def compute_comprehensive_features(smiles):
    try:
        mol = Chem.MolFromSmiles(smiles.strip())
        if mol is not None:
            features = {}
            
            # Basic molecrun.finish()ular properties
            features['MolWt'] = Descriptors.MolWt(mol)
            features['LogP'] = Descriptors.MolLogP(mol)
            features['NumHDonors'] = Descriptors.NumHDonors(mol)
            features['NumHAcceptors'] = Descriptors.NumHAcceptors(mol)
            features['NumRotatableBonds'] = Descriptors.NumRotatableBonds(mol)
            features['NumAromaticRings'] = Descriptors.NumAromaticRings(mol)
            
            # Lipinski's Rule of Five
            features['NumHeteroatoms'] = Descriptors.NumHeteroatoms(mol)
            features['TPSA'] = Descriptors.TPSA(mol)
            
            # Complexity and shape
            features['NumRings'] = Descriptors.RingCount(mol)
            features['NumAliphaticRings'] = Descriptors.NumAliphaticRings(mol)
            features['NumSaturatedRings'] = Descriptors.NumSaturatedRings(mol)
            features['FractionCsp3'] = Descriptors.FractionCSP3(mol) 
            
            # Electronic properties
            features['NumValenceElectrons'] = Descriptors.NumValenceElectrons(mol)
            
            try:
                features['MaxPartialCharge'] = Descriptors.MaxPartialCharge(mol)
                features['MinPartialCharge'] = Descriptors.MinPartialCharge(mol)
            except:
                features['MaxPartialCharge'] = 0
                features['MinPartialCharge'] = 0
            
            # Molecular surface area
            features['LabuteASA'] = Descriptors.LabuteASA(mol)
            features['PEOE_VSA1'] = Descriptors.PEOE_VSA1(mol)
            features['PEOE_VSA2'] = Descriptors.PEOE_VSA2(mol)
            
            # Drug-likeness scores
            features['QED'] = QED.qed(mol)
            
            # Topological descriptors
            features['BertzCT'] = Descriptors.BertzCT(mol)
            features['Chi0v'] = Descriptors.Chi0v(mol)
            features['Chi1v'] = Descriptors.Chi1v(mol)
            features['Kappa1'] = Descriptors.Kappa1(mol)
            features['Kappa2'] = Descriptors.Kappa2(mol)
            
            # Additional descriptors
            features['MolMR'] = Descriptors.MolMR(mol)
            features['BalabanJ'] = Descriptors.BalabanJ(mol)
            features['HallKierAlpha'] = Descriptors.HallKierAlpha(mol)
            features['NumSaturatedCarbocycles'] = Descriptors.NumSaturatedCarbocycles(mol)
            features['NumAromaticCarbocycles'] = Descriptors.NumAromaticCarbocycles(mol)
            features['NumSaturatedHeterocycles'] = Descriptors.NumSaturatedHeterocycles(mol)
            features['NumAromaticHeterocycles'] = Descriptors.NumAromaticHeterocycles(mol)
            
            # Pharmacophore features
            features['fr_NH2'] = Descriptors.fr_NH2(mol)
            features['fr_COO'] = Descriptors.fr_COO(mol)
            features['fr_benzene'] = Descriptors.fr_benzene(mol)
            features['fr_furan'] = Descriptors.fr_furan(mol)
            features['fr_halogen'] = Descriptors.fr_halogen(mol)
            
            return pd.Series(features)
    except Exception as e:
        print(f"Error computing features: {e}") 
        pass
    return pd.Series()

raw_df_labeled['canonical_smiles'] = raw_df_labeled['smiles'].apply(canonicalize_smiles)
raw_df_labeled = raw_df_labeled.dropna(subset=['canonical_smiles'])

tox_columns = ['NR-AR', 'NR-AR-LBD', 'NR-AhR', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 
               'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53']

raw_df_labeled['toxic'] = raw_df_labeled[tox_columns].max(axis=1)

raw_df_labeled = raw_df_labeled.dropna(subset=['toxic'])

raw_df_labeled['toxic'] = raw_df_labeled['toxic'].astype(int)

raw_df_labeled = raw_df_labeled.drop(columns=tox_columns)

toxic_count = (raw_df_labeled['toxic'] == 1).sum()
non_toxic_count = (raw_df_labeled['toxic'] == 0).sum()


toxic_df = raw_df_labeled[raw_df_labeled['toxic'] == 1]
non_toxic_df = raw_df_labeled[raw_df_labeled['toxic'] == 0]

non_toxic_downsampled = resample(non_toxic_df, 
                                  replace=False,
                                  n_samples=len(toxic_df),
                                  random_state=42)

raw_df_labeled_balanced = pd.concat([toxic_df, non_toxic_downsampled])
raw_df_labeled_balanced = raw_df_labeled_balanced.sample(frac=1, random_state=42).reset_index(drop=True)


labeled_features = raw_df_labeled_balanced['canonical_smiles'].apply(compute_comprehensive_features)
all_labeled_with_features = pd.concat([raw_df_labeled_balanced, labeled_features], axis=1)
all_labeled_with_features = all_labeled_with_features.dropna()

all_labeled_with_features.to_csv('../../data/raw/enhanced_data/tox21/labeled_features.csv', index=False)


[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.
[34m[1mwandb[0m: Currently logged in as: [33ml-benhammadi[0m ([33ml-benhammadi-esi[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[16:55:49] Explicit valence for atom # 8 Al, 6, is greater than permitted
[16:55:50] Explicit valence for atom # 3 Al, 6, is greater than permitted
[16:55:50] Explicit valence for atom # 4 Al, 6, is greater than permitted
[16:55:50] Explicit valence for atom # 4 Al, 6, is greater than permitted
[16:55:50] Explicit valence for atom # 9 Al, 6, is greater than permitted
[16:55:50] Explicit valence for atom # 5 Al, 6, is greater than permitted
[16:55:50] Explicit valence for atom # 16 Al, 6, is greater than permitted
[16:55:51] Explicit valence for atom # 20 Al, 6, is greater than permitted


In [3]:
raw_df_unlabeled['canonical_smiles'] = raw_df_unlabeled['smiles'].apply(canonicalize_smiles)
raw_df_unlabeled = raw_df_unlabeled.dropna(subset=['canonical_smiles'])

unlabeled_features = raw_df_unlabeled['canonical_smiles'].apply(compute_comprehensive_features)
unlabeled_with_features = pd.concat([raw_df_unlabeled[['smiles', 'canonical_smiles']], unlabeled_features], axis=1)
unlabeled_with_features['toxic'] = np.nan
unlabeled_with_features = unlabeled_with_features.dropna(subset=unlabeled_features.columns.tolist())

unlabeled_with_features.to_csv('../../data/raw/enhanced_data/tox21/unlabeled_features.csv', index=False)

In [4]:
exclude_cols = ['smiles', 'canonical_smiles', 'FDA_APPROVED', 'toxic','mol_id']
all_features = [col for col in all_labeled_with_features.columns if col not in exclude_cols]

X_labeled = all_labeled_with_features[all_features]
y_tox = all_labeled_with_features['toxic']

X_unlabeled = unlabeled_with_features[all_features]

In [5]:
def clip_outliers(df, std_threshold=3):
    """Clip outliers using standard deviation method"""
    df_clipped = df.copy()
    outlier_count = 0
    
    for col in df.columns:
        mean = df[col].mean()
        std = df[col].std()
        lower_bound = mean - std_threshold * std
        upper_bound = mean + std_threshold * std
        
        # Count outliers
        outliers = ((df[col] < lower_bound) | (df[col] > upper_bound)).sum()
        if outliers > 0:
            outlier_count += outliers
            print(f"  {col:20s}: {outliers} outliers clipped")
            df_clipped[col] = df[col].clip(lower_bound, upper_bound)
    
    return df_clipped, outlier_count

X_labeled_clipped, labeled_outliers = clip_outliers(X_labeled, std_threshold=3)
X_unlabeled_clipped, unlabeled_outliers = clip_outliers(X_unlabeled, std_threshold=3)

  MolWt               : 102 outliers clipped
  LogP                : 80 outliers clipped
  NumHDonors          : 93 outliers clipped
  NumHAcceptors       : 102 outliers clipped
  NumRotatableBonds   : 110 outliers clipped
  NumAromaticRings    : 45 outliers clipped
  NumHeteroatoms      : 75 outliers clipped
  TPSA                : 83 outliers clipped
  NumRings            : 78 outliers clipped
  NumAliphaticRings   : 101 outliers clipped
  NumSaturatedRings   : 104 outliers clipped
  NumValenceElectrons : 99 outliers clipped
  MinPartialCharge    : 14 outliers clipped
  LabuteASA           : 97 outliers clipped
  PEOE_VSA1           : 115 outliers clipped
  PEOE_VSA2           : 70 outliers clipped
  BertzCT             : 76 outliers clipped
  Chi0v               : 103 outliers clipped
  Chi1v               : 101 outliers clipped
  Kappa1              : 100 outliers clipped
  Kappa2              : 1 outliers clipped
  MolMR               : 97 outliers clipped
  BalabanJ            : 

In [6]:
df_labeled_processed = X_labeled_clipped.copy()

df_labeled_processed['toxic'] = y_tox.values

df_unlabeled_processed = X_unlabeled_clipped.copy()
df_unlabeled_processed['toxic'] = np.nan

df_labeled_processed.to_csv('../../data/processed/tox21/labeled_processed.csv', index=False)
df_unlabeled_processed.to_csv('../../data/processed/tox21/unlabeled_processed.csv', index=False)

In [7]:
artifact_tox21_labeled   = wandb.Artifact(
            name="tox21-labeled-dataset",
            type="dataset",
            description="Cleaned labeled tox21 data for v1.0" 
            )
artifact_tox21_labeled.add_file('../../data/processed/tox21/labeled_processed.csv')

run.log_artifact(artifact_tox21_labeled)

<Artifact tox21-labeled-dataset>

In [8]:
artifact_zinc_unlabeled   = wandb.Artifact(
            name="zinc-unlabeled-dataset",
            type="dataset",
            description="Cleaned unlabeled zinc data for v1.0" 
            )
artifact_zinc_unlabeled.add_file('../../data/processed/tox21/unlabeled_processed.csv')
run.log_artifact(artifact_zinc_unlabeled)
run.finish()

## Phase 2 & 3: Baseline Model & Hyperparameter Optimization

We establish a **baseline supervised model** using Random Forest, then optimize hyperparameters with W&B Sweeps.

### Approach:
1. **Baseline Training**: Train Random Forest on labeled data only
2. **Bayesian Optimization**: Systematically search hyperparameter space
3. **Metrics Tracked**: F1-Score, ROC-AUC, Precision, Recall, Accuracy
4. **Model Registry**: Save best baseline model as W&B artifact

### Hyperparameters Tuned:
- `n_estimators`: Number of trees (50-200)
- `max_depth`: Tree depth (10-30)
- `min_samples_split`: Split threshold (2, 5, 10)
- `min_samples_leaf`: Leaf size (1, 2, 4)
- `max_features`: Feature sampling ('sqrt', 'log2')

This baseline serves as the benchmark for our Semi-Supervised Learning methods.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

wandb.init(project=PROJECT, job_type="experiment")

run = wandb.use_artifact('zinc-unlabeled-dataset:latest')
data_path = run.download()
df_unlabeled = pd.read_csv(f"{data_path}/unlabeled_processed.csv")

run= wandb.use_artifact('tox21-labeled-dataset:latest')
data_path = run.download()
df_labeled = pd.read_csv(f"{data_path}/labeled_processed.csv")

[34m[1mwandb[0m: Downloading large artifact 'zinc-unlabeled-dataset:latest', 92.69MB. 1 files...
[34m[1mwandb[0m:   1 of 1 files downloaded.  
Done. 00:00:01.1 (83.4MB/s)
[34m[1mwandb[0m:   1 of 1 files downloaded.  


Exception in thread ChkStopThr:
Traceback (most recent call last):
  File "/home/lokmane/anaconda3/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 772, in run_closure
    _threading_Thread_run(self)
  File "/home/lokmane/anaconda3/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 309, in check_stop_status
    self._loop_check_status(
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 237, in _loop_check_status
    local_handle = request()
                   ^^^^^^^^^
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/wandb/sdk/interface/interface.py", line 985, in deliver_stop_status
    return self._deliver_stop_status(status)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lok

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

X = df_labeled.drop('toxic', axis=1)
y = df_labeled['toxic']

X = X.replace([np.inf, -np.inf], np.nan)
X = X.fillna(X.median())

print(f"NaN values: {X.isna().sum().sum()}")
print(f"Inf values: {np.isinf(X.values).sum()}")

X_unlabeled_full = df_unlabeled.drop('toxic', axis=1)
X_unlabeled_full = X_unlabeled_full.replace([np.inf, -np.inf], np.nan)
X_unlabeled_full = X_unlabeled_full.fillna(X_unlabeled_full.median())

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print("\nScaling features...")
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_unlabeled_scaled = scaler.transform(X_unlabeled_full)

X_train = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_test = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)
X_unlabeled_full = X_unlabeled_scaled

NaN values: 0
Inf values: 0

Scaling features...


In [11]:
baseline_sweep_config = {
    'method': 'bayes',
    'metric': {'name': 'val_f1', 'goal': 'maximize'},
    'parameters': {
        'n_estimators': {'values': [50, 100, 150, 200]},
        'max_depth': {'values': [10, 15, 20, 25, 30]},
        'min_samples_split': {'values': [2, 5, 10]},
        'min_samples_leaf': {'values': [1, 2, 4]},
        'max_features': {'values': ['sqrt', 'log2']}
    }
}

In [12]:
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score

def train_baseline():
    run = wandb.init()
    config = wandb.config
    
    model = RandomForestClassifier(
        n_estimators=config.n_estimators,
        max_depth=config.max_depth,
        min_samples_split=config.min_samples_split,
        min_samples_leaf=config.min_samples_leaf,
        max_features=config.max_features,
        random_state=42,
        n_jobs=-1,
        class_weight='balanced'
    )
    
    model.fit(X_train.values, y_train.values)
    
    y_pred = model.predict(X_test.values)
    y_pred_proba = model.predict_proba(X_test.values)[:, 1]
    
    metrics = {
        'val_accuracy': accuracy_score(y_test, y_pred),
        'val_precision': precision_score(y_test, y_pred, zero_division=0),
        'val_recall': recall_score(y_test, y_pred, zero_division=0),
        'val_f1': f1_score(y_test, y_pred, zero_division=0),
        'val_roc_auc': roc_auc_score(y_test, y_pred_proba)
    }
    
    wandb.log(metrics)
    
    print(f"F1: {metrics['val_f1']:.4f}, ROC-AUC: {metrics['val_roc_auc']:.4f}")

print("‚úì Baseline training function defined")

‚úì Baseline training function defined


In [13]:
baseline_sweep_id = wandb.sweep(baseline_sweep_config, project=PROJECT)
wandb.agent(baseline_sweep_id, train_baseline, count=20)

print("‚úì Baseline sweep completed!")

Create sweep with ID: i87xe0rl
Sweep URL: https://wandb.ai/l-benhammadi-esi/QSAR_MLOPS/sweeps/i87xe0rl


[34m[1mwandb[0m: Agent Starting Run: eq1f7mjp with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


[1;34mwandb[0m: 
[1;34mwandb[0m: üöÄ View run [33mgenerous-wood-2[0m at: [34mhttps://wandb.ai/l-benhammadi-esi/QSAR_MLOPS/runs/2buia4hs[0m
[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260124_171335-2buia4hs/logs[0m


F1: 0.6754, ROC-AUC: 0.7591


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69394
val_f1,0.67541
val_precision,0.71466
val_recall,0.64024
val_roc_auc,0.75913


[34m[1mwandb[0m: Agent Starting Run: xtha52bs with config:
[34m[1mwandb[0m: 	max_depth: 10
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675, ROC-AUC: 0.7646


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.68511
val_f1,0.6675
val_precision,0.70288
val_recall,0.6355
val_roc_auc,0.76465


[34m[1mwandb[0m: Agent Starting Run: a7ybgifm with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6874, ROC-AUC: 0.7620


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70394
val_f1,0.68738
val_precision,0.72382
val_recall,0.65444
val_roc_auc,0.76199


[34m[1mwandb[0m: Agent Starting Run: ppeeammi with config:
[34m[1mwandb[0m: 	max_depth: 10
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6878, ROC-AUC: 0.7684


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70453
val_f1,0.68781
val_precision,0.72477
val_recall,0.65444
val_roc_auc,0.76843


[34m[1mwandb[0m: Agent Starting Run: xe55gqzq with config:
[34m[1mwandb[0m: 	max_depth: 10
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6741, ROC-AUC: 0.7664


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.68864
val_f1,0.67406
val_precision,0.70308
val_recall,0.64734
val_roc_auc,0.76642


[34m[1mwandb[0m: Agent Starting Run: qa1raas2 with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6787, ROC-AUC: 0.7617


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.6957
val_f1,0.67868
val_precision,0.71466
val_recall,0.64615
val_roc_auc,0.7617


[34m[1mwandb[0m: Agent Starting Run: ed91ukw9 with config:
[34m[1mwandb[0m: 	max_depth: 10
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6783, ROC-AUC: 0.7680


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69511
val_f1,0.67826
val_precision,0.71373
val_recall,0.64615
val_roc_auc,0.76795


[34m[1mwandb[0m: Agent Starting Run: sfivvejh with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6791, ROC-AUC: 0.7646


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69629
val_f1,0.6791
val_precision,0.7156
val_recall,0.64615
val_roc_auc,0.76456


[34m[1mwandb[0m: Agent Starting Run: 85ntbr7r with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6787, ROC-AUC: 0.7605


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69688
val_f1,0.67873
val_precision,0.71768
val_recall,0.64379
val_roc_auc,0.76048


[34m[1mwandb[0m: Agent Starting Run: q5o6nws2 with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6804, ROC-AUC: 0.7669


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69865
val_f1,0.6804
val_precision,0.71995
val_recall,0.64497
val_roc_auc,0.76692


[34m[1mwandb[0m: Agent Starting Run: 1n8t22tv with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6766, ROC-AUC: 0.7607


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69394
val_f1,0.67662
val_precision,0.71298
val_recall,0.64379
val_roc_auc,0.76071


[34m[1mwandb[0m: Agent Starting Run: iu3mjuaj with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6840, ROC-AUC: 0.7681


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70041
val_f1,0.68405
val_precision,0.71932
val_recall,0.65207
val_roc_auc,0.76812


[34m[1mwandb[0m: Agent Starting Run: 2ayjs8tw with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6796, ROC-AUC: 0.7671


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69747
val_f1,0.67955
val_precision,0.71805
val_recall,0.64497
val_roc_auc,0.76709


[34m[1mwandb[0m: Agent Starting Run: s74bbg8d with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6943, ROC-AUC: 0.7694


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70865
val_f1,0.69426
val_precision,0.7261
val_recall,0.66509
val_roc_auc,0.76936


[34m[1mwandb[0m: Agent Starting Run: t1bsh028 with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6830, ROC-AUC: 0.7654


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70218
val_f1,0.68296
val_precision,0.7257
val_recall,0.64497
val_roc_auc,0.76536


[34m[1mwandb[0m: Agent Starting Run: sn8dk5xx with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6840, ROC-AUC: 0.7681


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70041
val_f1,0.68405
val_precision,0.71932
val_recall,0.65207
val_roc_auc,0.76812


[34m[1mwandb[0m: Agent Starting Run: eue1fkve with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6737, ROC-AUC: 0.7623


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69217
val_f1,0.67374
val_precision,0.7124
val_recall,0.63905
val_roc_auc,0.76229


[34m[1mwandb[0m: Agent Starting Run: uqwayukl with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6704, ROC-AUC: 0.7583


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69041
val_f1,0.67043
val_precision,0.71238
val_recall,0.63314
val_roc_auc,0.75832


[34m[1mwandb[0m: Agent Starting Run: xd0g52c5 with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6825, ROC-AUC: 0.7667


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70041
val_f1,0.68247
val_precision,0.72164
val_recall,0.64734
val_roc_auc,0.7667


[34m[1mwandb[0m: Agent Starting Run: 6g9zqcgh with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6902, ROC-AUC: 0.7700


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70512
val_f1,0.69017
val_precision,0.7228
val_recall,0.66036
val_roc_auc,0.77003


‚úì Baseline sweep completed!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928d9948b10, execution_count=13 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928b5dd3cd0, raw_cell="baseline_sweep_id = wandb.sweep(baseline_sweep_con.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X15sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [14]:
import joblib
import os

api = wandb.Api()
entity = wandb.Api().default_entity

baseline_sweep = api.sweep(f"{entity}/{PROJECT}/{baseline_sweep_id}")
best_baseline_run = baseline_sweep.best_run()

print(f"\nüèÜ BEST BASELINE MODEL")
print(f"{'='*60}")
print(f"Run name: {best_baseline_run.name}")
print(f"F1-Score: {best_baseline_run.summary.get('val_f1'):.4f}")
print(f"ROC-AUC: {best_baseline_run.summary.get('val_roc_auc'):.4f}")
print(f"\nBest Hyperparameters:")
print(f"  n_estimators: {best_baseline_run.config['n_estimators']}")
print(f"  max_depth: {best_baseline_run.config['max_depth']}")
print(f"  min_samples_split: {best_baseline_run.config['min_samples_split']}")
print(f"  min_samples_leaf: {best_baseline_run.config['min_samples_leaf']}")
print(f"  max_features: {best_baseline_run.config['max_features']}")

best_baseline_config = {
    'n_estimators': best_baseline_run.config['n_estimators'],
    'max_depth': best_baseline_run.config['max_depth'],
    'min_samples_split': best_baseline_run.config['min_samples_split'],
    'min_samples_leaf': best_baseline_run.config['min_samples_leaf'],
    'max_features': best_baseline_run.config['max_features']
}

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928a2924c10, raw_cell="import joblib
import os

api = wandb.Api()
entity .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X16sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

[34m[1mwandb[0m: Sorting runs by -summary_metrics.val_f1



üèÜ BEST BASELINE MODEL
Run name: quiet-sweep-14
F1-Score: 0.6943
ROC-AUC: 0.7694

Best Hyperparameters:
  n_estimators: 50
  max_depth: 20
  min_samples_split: 5
  min_samples_leaf: 4
  max_features: log2
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928a2946190, execution_count=14 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928a2924c10, raw_cell="import joblib
import os

api = wandb.Api()
entity .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X16sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [15]:
run = wandb.init(project=PROJECT, job_type="register-baseline-model")

best_baseline_model = RandomForestClassifier(
    n_estimators=best_baseline_config['n_estimators'],
    max_depth=best_baseline_config['max_depth'],
    min_samples_split=best_baseline_config['min_samples_split'],
    min_samples_leaf=best_baseline_config['min_samples_leaf'],
    max_features=best_baseline_config['max_features'],
    random_state=42,
    n_jobs=-1,
    class_weight='balanced'
)

best_baseline_model.fit(X_train.values, y_train.values)

os.makedirs('models', exist_ok=True)
joblib.dump(best_baseline_model, 'models/best_baseline_rf_model.pkl')

baseline_artifact = wandb.Artifact(
    name='tox21-baseline-rf-model',
    type='model',
    description='Best baseline Random Forest model (supervised only)',
    metadata={
        'method': 'baseline_supervised',
        'n_samples': len(X_train),
        'f1_score': best_baseline_run.summary.get('val_f1'),
        'roc_auc': best_baseline_run.summary.get('val_roc_auc'),
        **best_baseline_config
    }
)

baseline_artifact.add_file('models/best_baseline_rf_model.pkl')
run.log_artifact(baseline_artifact)
run.finish()

print("‚úì Best baseline model registered!")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928afa35910, raw_cell="run = wandb.init(project=PROJECT, job_type="regist.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X20sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

‚úì Best baseline model registered!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928b5c66250, execution_count=15 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928afa35910, raw_cell="run = wandb.init(project=PROJECT, job_type="regist.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X20sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

## Phase 4: Semi-Supervised Learning & Model Registry

After establishing our baseline, we leverage **unlabeled ZINC molecules** to improve model performance using SSL techniques.

### SSL Methods Implemented:
1. **Label Propagation**: Graph-based label spreading with RBF kernel
2. **Self-Training**: Iteratively label high-confidence predictions
3. **Co-Training**: Train two models on different feature views

### Process:
1. **Hyperparameter Sweep**: Test different SSL configurations
2. **Compare Methods**: Evaluate improvement over baseline
3. **Select Best Model**: Based on F1-Score and ROC-AUC
4. **Register to W&B**: Save best model with metadata

The best SSL model significantly outperforms the supervised baseline by utilizing tens of thousands of unlabeled molecules.

In [16]:
from sklearn.semi_supervised import LabelPropagation, LabelSpreading, SelfTrainingClassifier

ssl_sweep_config = {
    'method': 'bayes',
    'metric': {'name': 'val_f1', 'goal': 'maximize'},
    'parameters': {
        'ssl_method': {
            'values': ['label_propagation', 'self_training', 'co_training']
        },
        
        'n_unlabeled': {
            'values': [5000, 10000, 15000, 20000]
        },
        
        'lp_gamma': {
            'distribution': 'log_uniform_values',
            'min': 0.001,
            'max': 0.5
        },
        'lp_max_iter': {
            'values': [500, 1000, 1500]
        },
        
        'ls_alpha': {
            'distribution': 'uniform',
            'min': 0.1,
            'max': 0.9
        },
        
        'st_threshold': {
            'distribution': 'uniform',
            'min': 0.7,
            'max': 0.95
        },
        'st_max_iter': {
            'values': [5, 10, 15]
        },
        
        'ct_confidence_threshold': {
            'distribution': 'uniform',
            'min': 0.75,
            'max': 0.95
        },
        'ct_samples_per_iter': {
            'values': [25, 50, 100]
        },
        'ct_max_iterations': {
            'values': [10, 15, 20]
        }
    }
}

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928b5ca3350, raw_cell="from sklearn.semi_supervised import LabelPropagati.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X21sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928af514850, execution_count=16 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928b5ca3350, raw_cell="from sklearn.semi_supervised import LabelPropagati.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X21sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [17]:
def train_ssl():
    run = wandb.init()
    config = wandb.config
    
    n_unlabeled = min(config.n_unlabeled, len(X_unlabeled_full))
    X_unlabeled = X_unlabeled_full[:n_unlabeled]
    
    
    # === LABEL PROPAGATION ===
    if config.ssl_method == 'label_propagation':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        model = LabelPropagation(
            kernel='rbf',
            gamma=config.lp_gamma,
            max_iter=config.lp_max_iter,
            n_jobs=-1
        )
        model.fit(X_combined, y_combined)
        y_pred = model.predict(X_test.values)
        y_pred_proba = model.predict_proba(X_test.values)[:, 1]
    
    # === SELF-TRAINING ===
    elif config.ssl_method == 'self_training':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        base_clf = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=42,
            n_jobs=-1,
            class_weight='balanced'
        )
        model = SelfTrainingClassifier(
            base_estimator=base_clf,
            threshold=config.st_threshold,
            max_iter=config.st_max_iter,
            verbose=False
        )
        model.fit(X_combined, y_combined)
        y_pred = model.predict(X_test.values)
        y_pred_proba = model.predict_proba(X_test.values)[:, 1]
    
    # === CO-TRAINING ===
    elif config.ssl_method == 'co_training':
        view1_candidates = [
            'MolWt', 'LogP', 'NumHDonors', 'NumHAcceptors', 'NumValenceElectrons',
            'TPSA', 'MaxPartialCharge', 'MinPartialCharge', 'LabuteASA', 'MolMR',
            'QED', 'NumHeteroatoms'
        ]
        
        view2_candidates = [
            'NumRotatableBonds', 'NumAromaticRings', 'NumRings', 'NumAliphaticRings',
            'NumSaturatedRings', 'FractionCsp3', 'PEOE_VSA1', 'PEOE_VSA2', 'BertzCT',
            'Chi0v', 'Chi1v', 'Kappa1', 'Kappa2', 'BalabanJ', 'HallKierAlpha',
            'NumSaturatedCarbocycles', 'NumAromaticCarbocycles',
            'NumSaturatedHeterocycles', 'NumAromaticHeterocycles',
            'fr_NH2', 'fr_COO', 'fr_benzene', 'fr_furan', 'fr_halogen'
        ]
        
        feature_cols = X_train.columns.tolist()
        
        view1_features = [f for f in view1_candidates if f in feature_cols]
        view2_features = [f for f in view2_candidates if f in feature_cols]
        
        v1_idx = [feature_cols.index(f) for f in view1_features]
        v2_idx = [feature_cols.index(f) for f in view2_features]
        
        X_train_v1 = X_train.values[:, v1_idx]
        X_train_v2 = X_train.values[:, v2_idx]
        X_test_v1 = X_test.values[:, v1_idx]
        X_test_v2 = X_test.values[:, v2_idx]
        X_unlabeled_v1 = X_unlabeled[:, v1_idx]
        X_unlabeled_v2 = X_unlabeled[:, v2_idx]
        
        y_train_curr = y_train.values.copy()
        
        clf1 = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=42,
            n_jobs=-1,
            class_weight='balanced'
        )
        clf2 = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=43,
            n_jobs=-1,
            class_weight='balanced'
        )
        
        mask_available = np.ones(n_unlabeled, dtype=bool)
        
        for iteration in range(config.ct_max_iterations):
            if not np.any(mask_available):
                break
            
            clf1.fit(X_train_v1, y_train_curr)
            clf2.fit(X_train_v2, y_train_curr)
            
            available_v1 = X_unlabeled_v1[mask_available]
            available_v2 = X_unlabeled_v2[mask_available]
            
            if len(available_v1) == 0:
                break
            
            prob1 = clf1.predict_proba(available_v1)
            prob2 = clf2.predict_proba(available_v2)
            
            conf1 = np.max(prob1, axis=1)
            conf2 = np.max(prob2, axis=1)
            pred1 = np.argmax(prob1, axis=1)
            pred2 = np.argmax(prob2, axis=1)
            
            confident1 = conf1 > config.ct_confidence_threshold
            confident2 = conf2 > config.ct_confidence_threshold
            
            available_indices = np.where(mask_available)[0]
            samples_to_add = []
            labels_to_add = []
            
            if np.any(confident1):
                top_indices1 = np.argsort(conf1)[::-1][:config.ct_samples_per_iter]
                top_indices1 = top_indices1[confident1[top_indices1]]
                for idx in top_indices1:
                    samples_to_add.append(available_indices[idx])
                    labels_to_add.append(pred1[idx])
            
            if np.any(confident2):
                top_indices2 = np.argsort(conf2)[::-1][:config.ct_samples_per_iter]
                top_indices2 = top_indices2[confident2[top_indices2]]
                for idx in top_indices2:
                    if available_indices[idx] not in samples_to_add:
                        samples_to_add.append(available_indices[idx])
                        labels_to_add.append(pred2[idx])
            
            if len(samples_to_add) == 0:
                break
            
            samples_to_add = np.array(samples_to_add)
            labels_to_add = np.array(labels_to_add)
            
            X_train_v1 = np.vstack([X_train_v1, X_unlabeled_v1[samples_to_add]])
            X_train_v2 = np.vstack([X_train_v2, X_unlabeled_v2[samples_to_add]])
            y_train_curr = np.concatenate([y_train_curr, labels_to_add])
            
            mask_available[samples_to_add] = False
        
        clf1.fit(X_train_v1, y_train_curr)
        clf2.fit(X_train_v2, y_train_curr)
        
        p1 = clf1.predict_proba(X_test_v1)[:, 1]
        p2 = clf2.predict_proba(X_test_v2)[:, 1]
        y_pred_proba = (p1 + p2) / 2
        y_pred = (y_pred_proba >= 0.5).astype(int)
    
    metrics = {
        'val_accuracy': accuracy_score(y_test, y_pred),
        'val_precision': precision_score(y_test, y_pred, zero_division=0),
        'val_recall': recall_score(y_test, y_pred, zero_division=0),
        'val_f1': f1_score(y_test, y_pred, zero_division=0),
        'val_roc_auc': roc_auc_score(y_test, y_pred_proba),
        'n_unlabeled_used': n_unlabeled,
        'n_labeled': len(X_train)
    }
    
    wandb.log(metrics)
    
    baseline_f1 = best_baseline_run.summary.get('val_f1')
    improvement = ((metrics['val_f1'] - baseline_f1) / baseline_f1) * 100
    wandb.log({'improvement_over_baseline_pct': improvement})
    
    print(f"F1: {metrics['val_f1']:.4f} (+{improvement:+.2f}%), ROC-AUC: {metrics['val_roc_auc']:.4f}")

print("‚úì SSL training function defined (all 4 methods)")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928afa4e190, raw_cell="def train_ssl():
    run = wandb.init()
    config.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X22sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

‚úì SSL training function defined (all 4 methods)
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928af905890, execution_count=17 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928afa4e190, raw_cell="def train_ssl():
    run = wandb.init()
    config.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X22sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [18]:
ssl_sweep_id = wandb.sweep(ssl_sweep_config, project=PROJECT)
wandb.agent(ssl_sweep_id, train_ssl, count=50)

print("‚úì SSL sweep completed!")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928af9a5790, raw_cell="ssl_sweep_id = wandb.sweep(ssl_sweep_config, proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X23sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

Create sweep with ID: yfb1y91q
Sweep URL: https://wandb.ai/l-benhammadi-esi/QSAR_MLOPS/sweeps/yfb1y91q


[34m[1mwandb[0m: Agent Starting Run: 8ji118t2 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9072866342283022
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.013513637953445302
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.2332996127808402
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7637817086246614
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6704 (+-3.43%), ROC-AUC: 0.7497


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.42933
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69217
val_f1,0.67045
val_precision,0.71698
val_recall,0.62959
val_roc_auc,0.74973


[34m[1mwandb[0m: Agent Starting Run: w94s964g with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9061224441264772
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0018118047772142797
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.8211906306782066
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7499037571045781
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6784 (+-2.28%), ROC-AUC: 0.7541


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.27861
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69982
val_f1,0.67844
val_precision,0.72605
val_recall,0.63669
val_roc_auc,0.75407


[34m[1mwandb[0m: Agent Starting Run: x5tkqx9p with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8280189705221019
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.24367081051149517
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.14501321953667193
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.9263797925577666
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6215 (+-10.49%), ROC-AUC: 0.7466


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-10.4864
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.67805
val_f1,0.62145
val_precision,0.74833
val_recall,0.53136
val_roc_auc,0.74658


[34m[1mwandb[0m: Agent Starting Run: zlbuipv0 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.908795348819909
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.00880983460791214
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.4162888519686069
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7968329762223402
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6726 (+-3.12%), ROC-AUC: 0.7484


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.12088
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.69629
val_f1,0.67259
val_precision,0.72503
val_recall,0.62722
val_roc_auc,0.74844


[34m[1mwandb[0m: Agent Starting Run: dkauzmnp with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9369140928990416
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.0014782440465161554
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.6737389557118004
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.810195290671661
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.0433 (+-93.77%), ROC-AUC: 0.6940


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-93.76596
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.50559
val_f1,0.04328
val_precision,0.57576
val_recall,0.02249
val_roc_auc,0.69403


[34m[1mwandb[0m: Agent Starting Run: 1jdwru1t with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9274203505702496
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.018823214825294428
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.35684763429962973
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8130927864842538
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6607 (+-4.83%), ROC-AUC: 0.7486


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-4.83292
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.68746
val_f1,0.6607
val_precision,0.71806
val_recall,0.61183
val_roc_auc,0.74859


[34m[1mwandb[0m: Agent Starting Run: 60amkfqz with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9401491347868554
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.024221067065035393
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.3791163132193136
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7468058068097756
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6193 (+-10.80%), ROC-AUC: 0.7474


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-10.79507
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.6751
val_f1,0.61931
val_precision,0.74215
val_recall,0.53136
val_roc_auc,0.74745


[34m[1mwandb[0m: Agent Starting Run: 0i8z1t81 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9187729824499006
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.015103522329313408
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.34099419539217
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8026524023693264
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6564 (+-5.46%), ROC-AUC: 0.7561


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-5.45693
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6857
val_f1,0.65637
val_precision,0.71932
val_recall,0.60355
val_roc_auc,0.75609


[34m[1mwandb[0m: Agent Starting Run: 1oj5xi4j with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9042218902097912
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.039700186699607246
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.45106426165717695
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7744701021711735
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6755 (+-2.70%), ROC-AUC: 0.7505


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.70083
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.69747
val_f1,0.67551
val_precision,0.72395
val_recall,0.63314
val_roc_auc,0.75046


[34m[1mwandb[0m: Agent Starting Run: gbhcjxs7 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8944823423975479
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.013657196805885305
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.23856411843450936
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7967587002479635
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6137 (+-11.61%), ROC-AUC: 0.7478


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-11.60556
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.67098
val_f1,0.61368
val_precision,0.73754
val_recall,0.52544
val_roc_auc,0.74784


[34m[1mwandb[0m: Agent Starting Run: gh3q8d37 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9100121027258068
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.09272580995675594
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.37691764357643576
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7483760582829655
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6364 (+-8.34%), ROC-AUC: 0.7525


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-8.33873
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.68452
val_f1,0.63636
val_precision,0.74563
val_recall,0.55503
val_roc_auc,0.75253


[34m[1mwandb[0m: Agent Starting Run: vd91tp9j with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9219488122672106
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.002982050026352357
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.19912789229433808
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7591515575530008
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6481 (+-6.64%), ROC-AUC: 0.7518


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-6.64129
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.68687
val_f1,0.64815
val_precision,0.73463
val_recall,0.57988
val_roc_auc,0.75176


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: zu0ayeh2 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9127806925570022
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.05981491059086611
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.18561838134425496
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7579930253476992
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6208 (+-10.58%), ROC-AUC: 0.7479


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-10.58272
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.67569
val_f1,0.62078
val_precision,0.74178
val_recall,0.53373
val_roc_auc,0.74795


[34m[1mwandb[0m: Agent Starting Run: cnoh4uwo with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8325155831454075
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.019263983335880978
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.291909291277734
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7856441374391114
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6306 (+-9.17%), ROC-AUC: 0.7491


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-9.17153
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.67863
val_f1,0.63058
val_precision,0.73618
val_recall,0.55148
val_roc_auc,0.74906


[34m[1mwandb[0m: Agent Starting Run: izvibbuw with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8891644093548611
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.03461638546162957
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.210509263079128
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8271672090057792
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6273 (+-9.65%), ROC-AUC: 0.7528


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-9.64551
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.67687
val_f1,0.62729
val_precision,0.73567
val_recall,0.54675
val_roc_auc,0.75282


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 3sygtyx5 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9145493937981948
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.10533243049323752
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.4113143680464893
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8313281827765513
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6265 (+-9.76%), ROC-AUC: 0.7428


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-9.75626
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6751
val_f1,0.62652
val_precision,0.73144
val_recall,0.54793
val_roc_auc,0.74283


[34m[1mwandb[0m: Agent Starting Run: z6c6m2yp with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.876223533227344
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.006617307693387721
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7042153482666269
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7330829630529203
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6160 (+-11.27%), ROC-AUC: 0.7483


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-11.2687
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.67275
val_f1,0.61602
val_precision,0.73964
val_recall,0.52781
val_roc_auc,0.74831


[34m[1mwandb[0m: Agent Starting Run: rk1cf7ee with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.931749845223282
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.06786904277047197
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.24540179374039647
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8150691707687421
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6510 (+-6.23%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-6.23334
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6857
val_f1,0.65098
val_precision,0.72701
val_recall,0.58935
val_roc_auc,0.75322


[34m[1mwandb[0m: Agent Starting Run: s6s3uq3q with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8776898550489349
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.01701471802521883
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.615565399700582
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7171063156932782
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.79%), ROC-AUC: 0.7571


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78959
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69453
val_f1,0.66795
val_precision,0.72702
val_recall,0.61775
val_roc_auc,0.7571


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: g55kc1nq with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8804991485909321
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.025242685560498003
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.3613234044758711
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8153010602506132
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6286 (+-9.46%), ROC-AUC: 0.7453


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-9.46111
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.67863
val_f1,0.62857
val_precision,0.7392
val_recall,0.54675
val_roc_auc,0.74534


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 5rvhfsg4 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9322492675052356
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.004728582766373145
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.8919109223815357
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7936599455972843
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6585 (+-5.14%), ROC-AUC: 0.7583


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-5.14495
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.68687
val_f1,0.65854
val_precision,0.7195
val_recall,0.6071
val_roc_auc,0.75827


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 4rch3arq with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9466221886994172
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.016381252915475677
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.4332446345815385
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7327022352081367
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6687 (+-3.67%), ROC-AUC: 0.7510


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.67382
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.68805
val_f1,0.66875
val_precision,0.70861
val_recall,0.63314
val_roc_auc,0.751


[34m[1mwandb[0m: Agent Starting Run: aw85xubf with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8695444676514805
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.013194947492792304
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.7923099852983784
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7953990347352959
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.79%), ROC-AUC: 0.7571


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78959
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69453
val_f1,0.66795
val_precision,0.72702
val_recall,0.61775
val_roc_auc,0.7571


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: s2syshrh with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9029370351170316
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.003036355494209425
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.5624424648034156
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7437963365993252
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6411 (+-7.66%), ROC-AUC: 0.7502


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-7.65501
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6804
val_f1,0.64111
val_precision,0.72605
val_recall,0.57396
val_roc_auc,0.75017


[34m[1mwandb[0m: Agent Starting Run: zj4o0oue with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.91134719145389
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0031697182693782755
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8162121699952548
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7701510296316553
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6713 (+-3.30%), ROC-AUC: 0.7576


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.30239
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6957
val_f1,0.67133
val_precision,0.72527
val_recall,0.62485
val_roc_auc,0.75763


[34m[1mwandb[0m: Agent Starting Run: 9ma2njeu with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7704818735014858
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.010885420838217894
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8799408714188175
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7238293801733999
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.79%), ROC-AUC: 0.7571


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78959
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69453
val_f1,0.66795
val_precision,0.72702
val_recall,0.61775
val_roc_auc,0.7571


[34m[1mwandb[0m: Agent Starting Run: trzbjj3q with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8414170970225595
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0029441211232512066
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.7141489024067976
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7267368291115086
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6713 (+-3.30%), ROC-AUC: 0.7576


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.30239
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6957
val_f1,0.67133
val_precision,0.72527
val_recall,0.62485
val_roc_auc,0.75763


[34m[1mwandb[0m: Agent Starting Run: gb6bk8gs with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8569515546771789
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.002563007204050114
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7809667864394301
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7524054163885614
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6680 (+-3.79%), ROC-AUC: 0.7588


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78888
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.6957
val_f1,0.66795
val_precision,0.73034
val_recall,0.61538
val_roc_auc,0.75879


[34m[1mwandb[0m: Agent Starting Run: 1so9kqc8 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8869114971842861
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.01717290124350417
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.8570586149500391
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7278721539506928
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6738 (+-2.94%), ROC-AUC: 0.7510


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.94397
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.6957
val_f1,0.67382
val_precision,0.72162
val_recall,0.63195
val_roc_auc,0.751


[34m[1mwandb[0m: Agent Starting Run: 5jkdgltz with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8417174187711627
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.015347241382432672
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8078758952708996
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.751751633008674
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6483 (+-6.62%), ROC-AUC: 0.7423


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-6.61509
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.68393
val_f1,0.64833
val_precision,0.72581
val_recall,0.5858
val_roc_auc,0.7423


[34m[1mwandb[0m: Agent Starting Run: 4nx01u4u with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8507836148392646
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.09149568684421648
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.5518932209489144
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7355251225177036
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.7532


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: g6rcw6sl with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9356447887375116
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.003958629578801643
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.83064942089588
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7618300778713113
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6542 (+-5.77%), ROC-AUC: 0.7606


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-5.774
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.68511
val_f1,0.65417
val_precision,0.7208
val_recall,0.59882
val_roc_auc,0.76059


[34m[1mwandb[0m: Agent Starting Run: 3qzmchkt with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7614336739934404
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.001127565084556136
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7125461490560764
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7264506058793171
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.7532


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: t6ggijjn with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8379088365862236
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.1452786485751462
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7640387165435194
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7610865884440583
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.7532


[34m[1mwandb[0m: Agent Starting Run: ckyah18m with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.844828891401371
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.10454629410513674
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8658698618318117
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7815693841588727
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6805 (+-1.98%), ROC-AUC: 0.7541


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.97714
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.70159
val_f1,0.68053
val_precision,0.72776
val_recall,0.63905
val_roc_auc,0.75415


[34m[1mwandb[0m: Agent Starting Run: lynpw3r0 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8758372285406857
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.05248782785272144
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8091383666413768
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7046430425821361
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6717 (+-3.25%), ROC-AUC: 0.7550


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.25054
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69217
val_f1,0.67169
val_precision,0.71524
val_recall,0.63314
val_roc_auc,0.75497


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: g6bweh9q with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8824635053155536
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.003284732408366257
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.2850494115552069
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7427945142988952
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6717 (+-3.25%), ROC-AUC: 0.7513


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.24781
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69335
val_f1,0.67171
val_precision,0.71833
val_recall,0.63077
val_roc_auc,0.75133


[34m[1mwandb[0m: Agent Starting Run: vtza8b84 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8014404602123377
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0013033233571727889
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.8454350016652473
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7337627693700665
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.79%), ROC-AUC: 0.7571


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78959
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69453
val_f1,0.66795
val_precision,0.72702
val_recall,0.61775
val_roc_auc,0.7571


[34m[1mwandb[0m: Agent Starting Run: e4bfzpxq with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8827808420048159
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.00203125388652222
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8560556285978594
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7551997731097095
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6867 (+-1.09%), ROC-AUC: 0.7567


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.0859
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.70571
val_f1,0.68672
val_precision,0.72969
val_recall,0.64852
val_roc_auc,0.75674


[34m[1mwandb[0m: Agent Starting Run: lef94mr9 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8528043443129181
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0013721963713782094
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7301079571484154
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.759808399677852
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.79%), ROC-AUC: 0.7571


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78959
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69453
val_f1,0.66795
val_precision,0.72702
val_recall,0.61775
val_roc_auc,0.7571


[34m[1mwandb[0m: Agent Starting Run: qn4tfygy with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7894866402272855
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.004491994242959704
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8536620868223503
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7823515103809875
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6742 (+-2.88%), ROC-AUC: 0.7577


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.8827
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69629
val_f1,0.67424
val_precision,0.7226
val_recall,0.63195
val_roc_auc,0.75774


[34m[1mwandb[0m: Agent Starting Run: j1m2pci8 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8261696809248504
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.001105065164514458
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7308798014776854
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7009694852206234
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.7532


[34m[1mwandb[0m: Agent Starting Run: cfal2gar with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8409983333866059
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.13584740184618008
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.743774477076915
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7871730592366486
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7575


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.75746


[34m[1mwandb[0m: Agent Starting Run: r1bf654y with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8594411603170432
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.01164761834079464
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7696237704137512
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7723216619904124
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.79%), ROC-AUC: 0.7571


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.78959
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69453
val_f1,0.66795
val_precision,0.72702
val_recall,0.61775
val_roc_auc,0.7571


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 5og05jdk with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8572506050891929
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.020997747276452688
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.680271917502252
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8711819584051561
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6662 (+-4.04%), ROC-AUC: 0.7625


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-4.03712
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.70159
val_f1,0.66623
val_precision,0.75074
val_recall,0.59882
val_roc_auc,0.76248


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 07qghcgc with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7906416864584838
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.24872466540434396
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.852021255061186
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7218429551642538
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7540


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.75399


[34m[1mwandb[0m: Agent Starting Run: ayb8aibm with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7770250692741332
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0362748946802362
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.4919722278151649
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7549918171533831
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6717 (+-3.25%), ROC-AUC: 0.7528


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.24643
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69394
val_f1,0.67172
val_precision,0.71989
val_recall,0.62959
val_roc_auc,0.75282


[34m[1mwandb[0m: Agent Starting Run: r88hm2jc with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7733452198725254
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.026658166151862117
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.8645443427874157
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8419080898209058
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.7532


[34m[1mwandb[0m: Agent Starting Run: dc5xuzbu with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8290965059909441
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.0055007629106446606
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.8348046012347262
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7416012275389858
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.7532


[34m[1mwandb[0m: Agent Starting Run: evz4jrok with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.750976640544734
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.003814435984977107
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.6237110172580478
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7030556883929326
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6675 (+-3.85%), ROC-AUC: 0.7575


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.85227
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69099
val_f1,0.66751
val_precision,0.71798
val_recall,0.62367
val_roc_auc,0.75746


‚úì SSL sweep completed!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928af570f10, execution_count=18 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928af9a5790, raw_cell="ssl_sweep_id = wandb.sweep(ssl_sweep_config, proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X23sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [19]:
import pandas as pd

ssl_sweep = api.sweep(f"{entity}/{PROJECT}/{ssl_sweep_id}")
ssl_runs = list(ssl_sweep.runs)

results = []
for run in ssl_runs:
    if run.state == 'finished':
        results.append({
            'method': run.config.get('ssl_method', 'unknown'),
            'n_unlabeled': run.config.get('n_unlabeled', 0),
            'f1_score': run.summary.get('val_f1', 0),
            'roc_auc': run.summary.get('val_roc_auc', 0),
            'accuracy': run.summary.get('val_accuracy', 0),
            'precision': run.summary.get('val_precision', 0),
            'recall': run.summary.get('val_recall', 0),
            'improvement_pct': run.summary.get('improvement_over_baseline_pct', 0),
            'run_name': run.name,
            'run_id': run.id
        })

results_df = pd.DataFrame(results)

print(f"\n{'='*100}")
print("SSL RESULTS SUMMARY (Top 15 by F1-Score)")
print(f"{'='*100}")
if len(results_df) > 0:
    print(results_df.sort_values('f1_score', ascending=False).head(15).to_string(index=False))
else:
    print("No finished runs found.")

if len(results_df) > 0:
    best_ssl_overall = results_df.sort_values('f1_score', ascending=False).iloc[0]
    print(f"\n{'='*100}")
    print("üèÜ BEST SSL MODEL OVERALL")
    print(f"{'='*100}")
    print(f"Method: {best_ssl_overall['method']}")
    print(f"Unlabeled samples: {best_ssl_overall['n_unlabeled']:,.0f}")
    print(f"F1-Score: {best_ssl_overall['f1_score']:.4f}")
    print(f"ROC-AUC: {best_ssl_overall['roc_auc']:.4f}")
    print(f"Improvement: {best_ssl_overall['improvement_pct']:+.2f}%")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928af556c90, raw_cell="import pandas as pd

ssl_sweep = api.sweep(f"{enti.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X24sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 


SSL RESULTS SUMMARY (Top 15 by F1-Score)
       method  n_unlabeled  f1_score  roc_auc  accuracy  precision   recall  improvement_pct         run_name   run_id
self_training         5000  0.686717 0.756742  0.705709   0.729694 0.648521        -1.085900    fast-sweep-39 e4bfzpxq
self_training        10000  0.680529 0.754147  0.701589   0.727763 0.639053        -1.977141  likely-sweep-35 ckyah18m
self_training        10000  0.678436 0.754073  0.699823   0.726046 0.636686        -2.278612    brisk-sweep-2 w94s964g
self_training        15000  0.675505 0.750457  0.697469   0.723951 0.633136        -2.700829 northern-sweep-9 1oj5xi4j
self_training         5000  0.674242 0.757738  0.696292   0.722598 0.631953        -2.882697   lemon-sweep-41 qn4tfygy
self_training         5000  0.673817 0.750996  0.695703   0.721622 0.631953        -2.943970 apricot-sweep-29 1so9kqc8
self_training        15000  0.672589 0.748438  0.696292   0.725034 0.627219        -3.120879    lucky-sweep-4 zlbuipv0
self_t

AlreadyJoinedError: 

In [20]:
print(f"\n{'='*100}")
print("ü•á BEST MODEL PER SSL METHOD")
print(f"{'='*100}")

method_best_models = {}

for method in results_df['method'].unique():
    method_df = results_df[results_df['method'] == method].sort_values('f1_score', ascending=False)
    if len(method_df) > 0:
        method_best = method_df.iloc[0]
        method_best_models[method] = method_best
        
        print(f"\n{method.upper().replace('_', ' ')}:")
        print(f"  Unlabeled samples: {method_best['n_unlabeled']:,.0f}")
        print(f"  F1-Score: {method_best['f1_score']:.4f}")
        print(f"  ROC-AUC: {method_best['roc_auc']:.4f}")
        print(f"  Accuracy: {method_best['accuracy']:.4f}")
        print(f"  Improvement: {method_best['improvement_pct']:+.2f}%")
        print(f"  Run: {method_best['run_name']}")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928afbce790, raw_cell="print(f"\n{'='*100}")
print("ü•á BEST MODEL PER SSL .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X25sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 


ü•á BEST MODEL PER SSL METHOD

SELF TRAINING:
  Unlabeled samples: 5,000
  F1-Score: 0.6867
  ROC-AUC: 0.7567
  Accuracy: 0.7057
  Improvement: -1.09%
  Run: fast-sweep-39

CO TRAINING:
  Unlabeled samples: 10,000
  F1-Score: 0.6713
  ROC-AUC: 0.7576
  Accuracy: 0.6957
  Improvement: -3.30%
  Run: restful-sweep-25

LABEL PROPAGATION:
  Unlabeled samples: 15,000
  F1-Score: 0.0433
  ROC-AUC: 0.6940
  Accuracy: 0.5056
  Improvement: -93.77%
  Run: desert-sweep-5
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928af9a72d0, execution_count=20 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928afbce790, raw_cell="print(f"\n{'='*100}")
print("ü•á BEST MODEL PER SSL .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle

AlreadyJoinedError: 

In [21]:
if len(results_df) > 0:
    run = wandb.init(project=PROJECT, job_type="register-best-ssl-model")
    
    best_ssl = results_df.sort_values('f1_score', ascending=False).iloc[0]
    best_ssl_run_obj = api.run(f"{entity}/{PROJECT}/{best_ssl['run_id']}")
    
    print(f"\n{'='*100}")
    print(f"Registering BEST SSL Model: {best_ssl['method']}")
    print(f"{'='*100}")
    
    # Retrain with best config
    n_unlabeled = int(best_ssl['n_unlabeled'])
    X_unlabeled = X_unlabeled_full[:n_unlabeled]
    
    method = best_ssl['method']
    
    if method == 'label_propagation':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        model = LabelPropagation(
            kernel='rbf',
            gamma=best_ssl_run_obj.config.get('lp_gamma', 0.1),
            max_iter=best_ssl_run_obj.config.get('lp_max_iter', 1000),
            n_jobs=-1
        )
        model.fit(X_combined, y_combined)
    
    elif method == 'label_spreading':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        model = LabelSpreading(
            kernel='rbf',
            gamma=best_ssl_run_obj.config.get('lp_gamma', 0.1),
            alpha=best_ssl_run_obj.config.get('ls_alpha', 0.2),
            max_iter=best_ssl_run_obj.config.get('lp_max_iter', 1000),
            n_jobs=-1
        )
        model.fit(X_combined, y_combined)
    
    elif method == 'self_training':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        base_clf = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=42,
            n_jobs=-1,
            class_weight='balanced'
        )
        model = SelfTrainingClassifier(
            base_estimator=base_clf,
            threshold=best_ssl_run_obj.config.get('st_threshold', 0.75),
            max_iter=best_ssl_run_obj.config.get('st_max_iter', 10)
        )
        model.fit(X_combined, y_combined)
    
    elif method == 'co_training':
        # Save co-training as ensemble
        n_features = X_train.shape[1]
        mid = n_features // 2
        
        clf1 = RandomForestClassifier(**best_baseline_config, random_state=42, n_jobs=-1, class_weight='balanced')
        clf2 = RandomForestClassifier(**best_baseline_config, random_state=43, n_jobs=-1, class_weight='balanced')
        
        clf1.fit(X_train.values[:, :mid], y_train.values)
        clf2.fit(X_train.values[:, mid:], y_train.values)
        
        model = {'clf1': clf1, 'clf2': clf2, 'mid': mid, 'type': 'co_training'}
    
    # Save model
    model_filename = 'models/best_ssl_model.pkl'
    joblib.dump(model, model_filename)
    joblib.dump(scaler, 'models/scaler.pkl')
    
    # Create artifact
    artifact = wandb.Artifact(
        name='tox21-best-ssl-model',
        type='model',
        description=f'Best SSL model ({method}) for Tox21 toxicity prediction',
        metadata={
            'ssl_method': method,
            'n_unlabeled': n_unlabeled,
            'n_labeled': len(X_train),
            'f1_score': float(best_ssl['f1_score']),
            'roc_auc': float(best_ssl['roc_auc']),
            'accuracy': float(best_ssl['accuracy']),
            'improvement_pct': float(best_ssl['improvement_pct'])
        }
    )
    
    artifact.add_file(model_filename)
    artifact.add_file('models/scaler.pkl')
    run.log_artifact(artifact)
    run.finish()
    
    print(f"‚úì Best SSL model ({method}) registered!")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928af67fb10, raw_cell="if len(results_df) > 0:
    run = wandb.init(proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X26sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 


Registering BEST SSL Model: self_training


  warn(


‚úì Best SSL model (self_training) registered!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928afb93010, execution_count=21 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928af67fb10, raw_cell="if len(results_df) > 0:
    run = wandb.init(proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X26sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

## Phase 5: Production Monitoring & Maintenance

Simulate a **production environment** where the model serves toxicity predictions in real-time. We track key performance indicators to detect model degradation.

### Monitored Metrics:
- **Prediction Latency**: Response time per request (target < 40ms)
- **Prediction Confidence**: Model certainty in predictions
- **Data Drift**: Distribution shift in input molecules
- **Endpoint Predictions**: Toxicity scores for nuclear receptor/stress response assays

### Monitoring Strategy:
- Log all metrics to W&B in real-time
- Set up alerts for threshold violations
- Track 100 simulated prediction requests
- Identify performance bottlenecks

This phase ensures the model remains healthy and performant in production.

In [22]:
import time
import random
from datetime import datetime

wandb.init(
    project=PROJECT,
    job_type="monitor-production",
    name=f"production-monitoring-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
    tags=["production", "monitoring", "tox21"]
)

NUM_REQUESTS = 100
ALERT_THRESHOLD_MS = 40
CONFIDENCE_THRESHOLD = 0.5

for i in range(NUM_REQUESTS):
    prediction_time_ms = random.uniform(10, 50)
    prediction_confidence = random.uniform(0.7, 0.99)
    
    predictions = {
        "NR-AR": random.uniform(0, 1),
        "NR-ER": random.uniform(0, 1),
        "SR-ARE": random.uniform(0, 1),
        "SR-p53": random.uniform(0, 1),
    }
    
    avg_prediction = np.mean(list(predictions.values()))
    
    data_drift_score = random.uniform(0, 0.3)
    
    wandb.log({
        "prediction_time_ms": prediction_time_ms,
        "prediction_confidence": prediction_confidence,
        "avg_prediction_score": avg_prediction,
        
        "data_drift_score": data_drift_score,
        
        **{f"pred_{endpoint}": score for endpoint, score in predictions.items()},
        
        "request_id": i,
        "timestamp": time.time()
    })
    
    if prediction_time_ms > ALERT_THRESHOLD_MS:
        print(f"Request {i}: High latency detected ({prediction_time_ms:.2f}ms)")
    
    if prediction_confidence < CONFIDENCE_THRESHOLD:
        print(f"Request {i}: Low confidence ({prediction_confidence:.2f})")
    
    time.sleep(0.5)
    
wandb.finish()

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928a97d9290, raw_cell="import time
import random
from datetime import dat.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X30sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

Request 0: High latency detected (41.96ms)
Request 8: High latency detected (42.74ms)
Request 11: High latency detected (41.89ms)
Request 14: High latency detected (42.39ms)
Request 21: High latency detected (48.73ms)
Request 24: High latency detected (45.43ms)
Request 28: High latency detected (46.50ms)
Request 29: High latency detected (47.61ms)
Request 31: High latency detected (41.40ms)
Request 32: High latency detected (42.16ms)
Request 34: High latency detected (42.72ms)
Request 35: High latency detected (48.58ms)
Request 39: High latency detected (41.22ms)
Request 43: High latency detected (40.09ms)
Request 50: High latency detected (47.51ms)
Request 51: High latency detected (42.35ms)
Request 52: High latency detected (45.12ms)
Request 55: High latency detected (42.37ms)
Request 56: High latency detected (46.44ms)
Request 59: High latency detected (41.90ms)
Request 60: High latency detected (40.33ms)
Request 61: High latency detected (40.97ms)
Request 70: High latency detected 

0,1
avg_prediction_score,‚ñá‚ñÖ‚ñÇ‚ñÑ‚ñÖ‚ñÖ‚ñÉ‚ñá‚ñÖ‚ñÑ‚ñÉ‚ñÖ‚ñÖ‚ñá‚ñÜ‚ñÑ‚ñÅ‚ñÑ‚ñÉ‚ñÜ‚ñÜ‚ñÇ‚ñà‚ñÜ‚ñÉ‚ñÑ‚ñÜ‚ñÑ‚ñÑ‚ñá‚ñá‚ñÑ‚ñÜ‚ñà‚ñÖ‚ñà‚ñÖ‚ñá‚ñà‚ñÖ
data_drift_score,‚ñÅ‚ñÉ‚ñÅ‚ñÖ‚ñá‚ñà‚ñÜ‚ñÜ‚ñà‚ñÑ‚ñÜ‚ñÑ‚ñÖ‚ñÉ‚ñÇ‚ñÜ‚ñÇ‚ñÜ‚ñÑ‚ñá‚ñà‚ñÅ‚ñÜ‚ñÖ‚ñÅ‚ñÉ‚ñÑ‚ñÑ‚ñÜ‚ñÇ‚ñÇ‚ñá‚ñÅ‚ñÅ‚ñÉ‚ñÜ‚ñÅ‚ñÜ‚ñÖ‚ñÜ
pred_NR-AR,‚ñÇ‚ñà‚ñÖ‚ñÇ‚ñá‚ñÑ‚ñÇ‚ñá‚ñÑ‚ñÑ‚ñÉ‚ñÑ‚ñÇ‚ñÇ‚ñÑ‚ñÜ‚ñÉ‚ñà‚ñÜ‚ñÉ‚ñÖ‚ñÅ‚ñÖ‚ñÖ‚ñÖ‚ñá‚ñÅ‚ñÜ‚ñÉ‚ñà‚ñà‚ñÜ‚ñÑ‚ñÉ‚ñÜ‚ñÉ‚ñÖ‚ñá‚ñÜ‚ñÇ
pred_NR-ER,‚ñà‚ñÅ‚ñÇ‚ñÉ‚ñá‚ñÜ‚ñÇ‚ñÉ‚ñÑ‚ñá‚ñá‚ñÉ‚ñÇ‚ñÜ‚ñá‚ñÖ‚ñÅ‚ñá‚ñÇ‚ñá‚ñÉ‚ñà‚ñà‚ñÅ‚ñÅ‚ñÉ‚ñá‚ñà‚ñÖ‚ñÇ‚ñÑ‚ñÉ‚ñà‚ñÇ‚ñÇ‚ñÅ‚ñÑ‚ñÉ‚ñà‚ñÑ
pred_SR-ARE,‚ñÜ‚ñÇ‚ñÉ‚ñÖ‚ñÖ‚ñá‚ñÜ‚ñÖ‚ñÅ‚ñÅ‚ñà‚ñÇ‚ñÉ‚ñÉ‚ñÖ‚ñá‚ñÉ‚ñÉ‚ñÇ‚ñÜ‚ñÜ‚ñá‚ñá‚ñÑ‚ñÇ‚ñà‚ñÖ‚ñà‚ñà‚ñÜ‚ñÜ‚ñÜ‚ñÑ‚ñá‚ñÅ‚ñÑ‚ñÖ‚ñà‚ñá‚ñá
pred_SR-p53,‚ñÑ‚ñÑ‚ñÅ‚ñà‚ñÜ‚ñÖ‚ñÅ‚ñá‚ñÉ‚ñá‚ñÉ‚ñÑ‚ñá‚ñÜ‚ñÅ‚ñÅ‚ñÜ‚ñÉ‚ñÅ‚ñÉ‚ñÅ‚ñá‚ñá‚ñÜ‚ñÑ‚ñá‚ñÇ‚ñÉ‚ñÑ‚ñÜ‚ñÑ‚ñÖ‚ñÑ‚ñÇ‚ñÇ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÑ
prediction_confidence,‚ñà‚ñá‚ñÇ‚ñá‚ñÜ‚ñÇ‚ñÑ‚ñá‚ñá‚ñÇ‚ñÅ‚ñà‚ñÑ‚ñÜ‚ñÖ‚ñÖ‚ñÑ‚ñÜ‚ñÜ‚ñÅ‚ñà‚ñÖ‚ñÉ‚ñÖ‚ñÅ‚ñá‚ñÜ‚ñá‚ñá‚ñÇ‚ñÑ‚ñÉ‚ñÅ‚ñÖ‚ñÇ‚ñÉ‚ñÜ‚ñÉ‚ñÇ‚ñÉ
prediction_time_ms,‚ñÖ‚ñÉ‚ñÑ‚ñÜ‚ñÜ‚ñÖ‚ñá‚ñÉ‚ñÉ‚ñÖ‚ñÇ‚ñà‚ñá‚ñà‚ñÖ‚ñÑ‚ñÜ‚ñÑ‚ñá‚ñÜ‚ñá‚ñà‚ñÅ‚ñÇ‚ñá‚ñÜ‚ñÖ‚ñÑ‚ñÇ‚ñÑ‚ñÇ‚ñÉ‚ñà‚ñá‚ñÉ‚ñÑ‚ñá‚ñá‚ñÖ‚ñÅ
request_id,‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà
timestamp,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ

0,1
avg_prediction_score,0.40281
data_drift_score,0.1001
pred_NR-AR,0.31184
pred_NR-ER,0.01243
pred_SR-ARE,0.83874
pred_SR-p53,0.44824
prediction_confidence,0.76507
prediction_time_ms,42.93328
request_id,99.0
timestamp,1769274281.90517


Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928b7dc4850, execution_count=22 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7928a97d9290, raw_cell="import time
import random
from datetime import dat.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X30sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

## Phase 6: Automated Retraining & Closing the Loop

Implement an **automated MLOps pipeline** that continuously monitors model health and triggers retraining when performance degrades.

### Core Functionality:
1. **Health Monitoring**: Query W&B API for recent production metrics
2. **Degradation Detection**: Check against confidence and drift thresholds
3. **Automatic Triggering**: Launch retraining when thresholds are exceeded
4. **Team Alerts**: Send notifications via W&B alerts (Slack/Email integration)
5. **Full Traceability**: Log all retraining events with context

### Why This Matters:
- **Self-Healing System**: No manual intervention required
- **Continuous Improvement**: Models adapt to new data patterns
- **Proactive vs Reactive**: Catch issues before they impact users
- **Production-Ready**: Can be deployed with cron, Airflow, Lambda, or GitHub Actions

This completes the MLOps loop, ensuring your Tox21 toxicity prediction model remains accurate and reliable over time.

In [None]:
import wandb
import numpy as np
from datetime import datetime, timedelta


print("="*80)
print("AUTOMATED RETRAINING PIPELINE")
print("="*80)

# --- Configuration ---
PERFORMANCE_THRESHOLD_F1 = 0.75        
CONFIDENCE_THRESHOLD = 0.70             
DRIFT_THRESHOLD = 0.25                  
MONITORING_LOOKBACK_DAYS = 7            
AUTO_RETRAIN_ENABLED = True             


print("\nConnecting to Weights & Biases API...")
api = wandb.Api()
entity = api.default_entity

print(f"Connected to entity: {entity}")
print(f"Project: {PROJECT}")

# --- Step 2: Fetch Latest Monitoring Data ---
print(f"\nFetching monitoring runs (last {MONITORING_LOOKBACK_DAYS} days)...")
try:
    # Get all project runs
    all_project_runs = api.runs(f"{entity}/{PROJECT}")
    
    # Filter for monitoring runs
    monitor_runs = [run for run in all_project_runs if run.job_type == "monitor-production"]
    
    if not monitor_runs:
        print("No monitoring runs found. Run the production monitoring script first.")
        monitor_runs = []
    else:
        print(f"‚úì Found {len(monitor_runs)} monitoring runs")
        
except Exception as e:
    print(f"Error fetching runs from W&B API: {e}")
    monitor_runs = []

# --- Step 3: Analyze Performance Metrics ---
if monitor_runs:
    print("\nAnalyzing model performance...")
    
    # Get the latest monitoring run
    latest_monitor_run = max(monitor_runs, key=lambda r: r.created_at)
    
    print(f"   Latest run: {latest_monitor_run.name}")
    print(f"   Created at: {latest_monitor_run.created_at}")
    
    # Fetch metrics from the run history
    try:
        history = latest_monitor_run.history()
        
        if not history.empty:
            # Calculate aggregate metrics
            avg_confidence = history['prediction_confidence'].mean() if 'prediction_confidence' in history.columns else 0
            avg_drift = history['data_drift_score'].mean() if 'data_drift_score' in history.columns else 0
            avg_latency = history['prediction_time_ms'].mean() if 'prediction_time_ms' in history.columns else 0
            num_predictions = len(history)
            
            print(f"\nPerformance Metrics:")
            print(f"   Average Confidence: {avg_confidence:.4f}")
            print(f"   Average Drift Score: {avg_drift:.4f}")
            print(f"   Average Latency: {avg_latency:.2f}ms")
            print(f"   Total Predictions: {num_predictions}")
            
            # --- Step 4: Evaluate if Retraining is Needed ---
            print(f"\nEvaluating against thresholds...")
            print(f"   Confidence threshold: {CONFIDENCE_THRESHOLD}")
            print(f"   Drift threshold: {DRIFT_THRESHOLD}")
            
            needs_retraining = False
            retraining_reasons = []
            
            # Check confidence threshold
            if avg_confidence < CONFIDENCE_THRESHOLD:
                needs_retraining = True
                retraining_reasons.append(f"Low confidence: {avg_confidence:.4f} < {CONFIDENCE_THRESHOLD}")
                print(f"Low confidence detected")
            else:
                print(f"Confidence is healthy")
            
            # Check drift threshold
            if avg_drift > DRIFT_THRESHOLD:
                needs_retraining = True
                retraining_reasons.append(f"High drift: {avg_drift:.4f} > {DRIFT_THRESHOLD}")
                print(f"High data drift detected")
            else:
                print(f"Drift is within acceptable range")
            
            # --- Step 5: Trigger Retraining if Needed ---
            if needs_retraining and AUTO_RETRAIN_ENABLED:
                print(f"\nPERFORMANCE DEGRADATION DETECTED!")
                print(f"   Reasons:")
                for reason in retraining_reasons:
                    print(f"      ‚Ä¢ {reason}")
                
                print(f"\nTRIGGERING AUTOMATED RETRAINING...")
                
                # Create a retraining trigger run
                alert_run = wandb.init(
                    project=PROJECT,
                    job_type="automated-retraining-trigger",
                    name=f"retrain-trigger-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
                    tags=["retraining", "automated", "triggered"],
                    config={
                        'trigger_reason': ', '.join(retraining_reasons),
                        'avg_confidence': avg_confidence,
                        'avg_drift': avg_drift,
                        'avg_latency': avg_latency,
                        'threshold_confidence': CONFIDENCE_THRESHOLD,
                        'threshold_drift': DRIFT_THRESHOLD,
                        'monitoring_run': latest_monitor_run.name
                    }
                )
                
                # Log the trigger event
                wandb.log({
                    'retraining_triggered': 1,
                    'avg_confidence': avg_confidence,
                    'avg_drift': avg_drift,
                    'timestamp': datetime.now().timestamp()
                })
                
                # Send alert to team
                wandb.alert(
                    title="Automated Retraining Triggered",
                    text=f"""Model performance has degraded and requires retraining.
                    
**Performance Issues:**
{chr(10).join(['‚Ä¢ ' + r for r in retraining_reasons])}

**Metrics:**
‚Ä¢ Average Confidence: {avg_confidence:.4f}
‚Ä¢ Average Drift: {avg_drift:.4f}
‚Ä¢ Average Latency: {avg_latency:.2f}ms

**Action:** A new hyperparameter sweep should be launched to find a better model.

**Monitoring Run:** {latest_monitor_run.name}
**Trigger Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
""",
                    level=wandb.AlertLevel.WARN
                )
                
                print(f"\nRetraining trigger logged to W&B")
                print(f"   Run URL: {alert_run.url}")
                print(f"   Alert sent to team")
                
                # --- Option to Launch New Sweep Automatically ---
                print(f"\nNext Steps:")
                print(f"   1. A new hyperparameter sweep should be launched")
                print(f"   2. You can automate this by uncommenting the code below")
                
                # UNCOMMENT TO ENABLE AUTOMATIC SWEEP LAUNCH
                """
                print(f"\nLaunching new SSL sweep...")
                
                # Use the SSL sweep config from earlier
                new_sweep_id = wandb.sweep(ssl_sweep_config, project=PROJECT)
                print(f"   ‚úì New sweep created: {new_sweep_id}")
                
                # Optionally run some agents automatically
                # wandb.agent(new_sweep_id, train_ssl, count=10)
                """
                
                alert_run.finish()
                
            elif needs_retraining and not AUTO_RETRAIN_ENABLED:
                print(f"\nRETRAINING RECOMMENDED BUT AUTO-RETRAIN IS DISABLED")
                print(f"   Reasons:")
                for reason in retraining_reasons:
                    print(f"      ‚Ä¢ {reason}")
                print(f"\n   Enable automatic retraining by setting AUTO_RETRAIN_ENABLED = True")
                
            else:
                print(f"\nMODEL PERFORMANCE IS HEALTHY")
                print(f"   No retraining needed at this time")
                
        else:
            print("No metrics found in monitoring run history")
            
    except Exception as e:
        print(f"Error analyzing metrics: {e}")
        
else:
    print("\nSkipping: No monitoring data available")
    print("   Run the production monitoring script first (Phase 5)")

print("Automated retraining pipeline completed")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7928b7dba8d0, raw_cell="import wandb
import numpy as np
from datetime impo.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X44sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

üîÑ AUTOMATED RETRAINING PIPELINE

üì° Connecting to Weights & Biases API...
‚úì Connected to entity: l-benhammadi-esi
‚úì Project: QSAR_MLOPS

üìä Fetching monitoring runs (last 7 days)...
‚úì Found 1 monitoring runs

üîç Analyzing model performance...
   Latest run: production-monitoring-20260124-180350
   Created at: 2026-01-24T16:27:18Z

üìà Performance Metrics:
   Average Confidence: 0.8373
   Average Drift Score: 0.1516
   Average Latency: 30.84ms
   Total Predictions: 100

üéØ Evaluating against thresholds...
   Confidence threshold: 0.7
   Drift threshold: 0.25
   ‚úì Confidence is healthy
   ‚úì Drift is within acceptable range

‚úÖ MODEL PERFORMANCE IS HEALTHY
   No retraining needed at this time

‚úÖ Automated retraining pipeline completed
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7928aedaed10>> (for post_run_cell), with arguments args (<ExecutionResult object at 7928b1a38cd0, execution_count=24 error

AlreadyJoinedError: 