In [1]:
!wandb login

[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.
[34m[1mwandb[0m: Currently logged in as: [33ml-benhammadi[0m ([33ml-benhammadi-esi[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


## Phase 1: Data Ingestion & Preparation

In this phase, we prepare the **Tox21 toxicity dataset** and **ZINC unlabeled molecular dataset** for Semi-Supervised Learning (SSL).

### What We Do:
1. **Load Raw Data**: Import Tox21 labeled toxicity data and ZINC unlabeled molecular structures
2. **Canonicalize SMILES**: Standardize molecular representations using RDKit
3. **Feature Engineering**: Compute comprehensive molecular descriptors:
   - Basic properties (MolWt, LogP, H-donors/acceptors)
   - Lipinski's Rule of Five features
   - Topological descriptors (BertzCT, Kappa indices)
   - Pharmacophore features
4. **Handle Class Imbalance**: Downsample majority class for balanced training
5. **Version with W&B**: Create artifacts for data reproducibility

### Datasets:
- **Tox21**: ~7,800 compounds across 12 toxicity assays
- **ZINC**: Large unlabeled molecular database
- **Target**: Binary classification (toxic vs non-toxic)

In [2]:
import wandb
from rdkit import Chem
from rdkit.Chem import Descriptors, QED, Lipinski, Crippen, MolSurf, rdMolDescriptors
import pandas as pd
import numpy as np
from sklearn.utils import resample

PROJECT="QSAR_MLOPS_TOX21"

run = wandb.init(project=PROJECT, job_type="prepare-data")
raw_df_unlabeled = pd.read_csv('../../data/raw/original_data/zinc_unlabeled.csv')
raw_df_labeled = pd.read_csv('../../data/raw/original_data/tox21.csv')

def canonicalize_smiles(smiles):
    try:
        mol = Chem.MolFromSmiles(smiles.strip())
        if mol is not None:
            return Chem.MolToSmiles(mol, canonical=True)
    except:
        pass
    return None

def compute_comprehensive_features(smiles):
    try:
        mol = Chem.MolFromSmiles(smiles.strip())
        if mol is not None:
            features = {}
            
            # Basic molecrun.finish()ular properties
            features['MolWt'] = Descriptors.MolWt(mol)
            features['LogP'] = Descriptors.MolLogP(mol)
            features['NumHDonors'] = Descriptors.NumHDonors(mol)
            features['NumHAcceptors'] = Descriptors.NumHAcceptors(mol)
            features['NumRotatableBonds'] = Descriptors.NumRotatableBonds(mol)
            features['NumAromaticRings'] = Descriptors.NumAromaticRings(mol)
            
            # Lipinski's Rule of Five
            features['NumHeteroatoms'] = Descriptors.NumHeteroatoms(mol)
            features['TPSA'] = Descriptors.TPSA(mol)
            
            # Complexity and shape
            features['NumRings'] = Descriptors.RingCount(mol)
            features['NumAliphaticRings'] = Descriptors.NumAliphaticRings(mol)
            features['NumSaturatedRings'] = Descriptors.NumSaturatedRings(mol)
            features['FractionCsp3'] = Descriptors.FractionCSP3(mol) 
            
            # Electronic properties
            features['NumValenceElectrons'] = Descriptors.NumValenceElectrons(mol)
            
            try:
                features['MaxPartialCharge'] = Descriptors.MaxPartialCharge(mol)
                features['MinPartialCharge'] = Descriptors.MinPartialCharge(mol)
            except:
                features['MaxPartialCharge'] = 0
                features['MinPartialCharge'] = 0
            
            # Molecular surface area
            features['LabuteASA'] = Descriptors.LabuteASA(mol)
            features['PEOE_VSA1'] = Descriptors.PEOE_VSA1(mol)
            features['PEOE_VSA2'] = Descriptors.PEOE_VSA2(mol)
            
            # Drug-likeness scores
            features['QED'] = QED.qed(mol)
            
            # Topological descriptors
            features['BertzCT'] = Descriptors.BertzCT(mol)
            features['Chi0v'] = Descriptors.Chi0v(mol)
            features['Chi1v'] = Descriptors.Chi1v(mol)
            features['Kappa1'] = Descriptors.Kappa1(mol)
            features['Kappa2'] = Descriptors.Kappa2(mol)
            
            # Additional descriptors
            features['MolMR'] = Descriptors.MolMR(mol)
            features['BalabanJ'] = Descriptors.BalabanJ(mol)
            features['HallKierAlpha'] = Descriptors.HallKierAlpha(mol)
            features['NumSaturatedCarbocycles'] = Descriptors.NumSaturatedCarbocycles(mol)
            features['NumAromaticCarbocycles'] = Descriptors.NumAromaticCarbocycles(mol)
            features['NumSaturatedHeterocycles'] = Descriptors.NumSaturatedHeterocycles(mol)
            features['NumAromaticHeterocycles'] = Descriptors.NumAromaticHeterocycles(mol)
            
            # Pharmacophore features
            features['fr_NH2'] = Descriptors.fr_NH2(mol)
            features['fr_COO'] = Descriptors.fr_COO(mol)
            features['fr_benzene'] = Descriptors.fr_benzene(mol)
            features['fr_furan'] = Descriptors.fr_furan(mol)
            features['fr_halogen'] = Descriptors.fr_halogen(mol)
            
            return pd.Series(features)
    except Exception as e:
        print(f"Error computing features: {e}") 
        pass
    return pd.Series()

raw_df_labeled['canonical_smiles'] = raw_df_labeled['smiles'].apply(canonicalize_smiles)
raw_df_labeled = raw_df_labeled.dropna(subset=['canonical_smiles'])

tox_columns = ['NR-AR', 'NR-AR-LBD', 'NR-AhR', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 
               'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53']

raw_df_labeled['toxic'] = raw_df_labeled[tox_columns].max(axis=1)

raw_df_labeled = raw_df_labeled.dropna(subset=['toxic'])

raw_df_labeled['toxic'] = raw_df_labeled['toxic'].astype(int)

raw_df_labeled = raw_df_labeled.drop(columns=tox_columns)

toxic_count = (raw_df_labeled['toxic'] == 1).sum()
non_toxic_count = (raw_df_labeled['toxic'] == 0).sum()


toxic_df = raw_df_labeled[raw_df_labeled['toxic'] == 1]
non_toxic_df = raw_df_labeled[raw_df_labeled['toxic'] == 0]

non_toxic_downsampled = resample(non_toxic_df, 
                                  replace=False,
                                  n_samples=len(toxic_df),
                                  random_state=42)

raw_df_labeled_balanced = pd.concat([toxic_df, non_toxic_downsampled])
raw_df_labeled_balanced = raw_df_labeled_balanced.sample(frac=1, random_state=42).reset_index(drop=True)


labeled_features = raw_df_labeled_balanced['canonical_smiles'].apply(compute_comprehensive_features)
all_labeled_with_features = pd.concat([raw_df_labeled_balanced, labeled_features], axis=1)
all_labeled_with_features = all_labeled_with_features.dropna()

all_labeled_with_features.to_csv('../../data/raw/enhanced_data/tox21/labeled_features.csv', index=False)


[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.
[34m[1mwandb[0m: Currently logged in as: [33ml-benhammadi[0m ([33ml-benhammadi-esi[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[10:45:39] Explicit valence for atom # 8 Al, 6, is greater than permitted
[10:45:39] Explicit valence for atom # 3 Al, 6, is greater than permitted
[10:45:39] Explicit valence for atom # 4 Al, 6, is greater than permitted
[10:45:40] Explicit valence for atom # 4 Al, 6, is greater than permitted
[10:45:40] Explicit valence for atom # 9 Al, 6, is greater than permitted
[10:45:40] Explicit valence for atom # 5 Al, 6, is greater than permitted
[10:45:40] Explicit valence for atom # 16 Al, 6, is greater than permitted
[10:45:41] Explicit valence for atom # 20 Al, 6, is greater than permitted


In [3]:
raw_df_unlabeled['canonical_smiles'] = raw_df_unlabeled['smiles'].apply(canonicalize_smiles)
raw_df_unlabeled = raw_df_unlabeled.dropna(subset=['canonical_smiles'])

unlabeled_features = raw_df_unlabeled['canonical_smiles'].apply(compute_comprehensive_features)
unlabeled_with_features = pd.concat([raw_df_unlabeled[['smiles', 'canonical_smiles']], unlabeled_features], axis=1)
unlabeled_with_features['toxic'] = np.nan
unlabeled_with_features = unlabeled_with_features.dropna(subset=unlabeled_features.columns.tolist())

unlabeled_with_features.to_csv('../../data/raw/enhanced_data/tox21/unlabeled_features.csv', index=False)

In [4]:
exclude_cols = ['smiles', 'canonical_smiles', 'FDA_APPROVED', 'toxic','mol_id']
all_features = [col for col in all_labeled_with_features.columns if col not in exclude_cols]

X_labeled = all_labeled_with_features[all_features]
y_tox = all_labeled_with_features['toxic']

X_unlabeled = unlabeled_with_features[all_features]

In [5]:
def clip_outliers(df, std_threshold=3):
    """Clip outliers using standard deviation method"""
    df_clipped = df.copy()
    outlier_count = 0
    
    for col in df.columns:
        mean = df[col].mean()
        std = df[col].std()
        lower_bound = mean - std_threshold * std
        upper_bound = mean + std_threshold * std
        
        # Count outliers
        outliers = ((df[col] < lower_bound) | (df[col] > upper_bound)).sum()
        if outliers > 0:
            outlier_count += outliers
            print(f"  {col:20s}: {outliers} outliers clipped")
            df_clipped[col] = df[col].clip(lower_bound, upper_bound)
    
    return df_clipped, outlier_count

X_labeled_clipped, labeled_outliers = clip_outliers(X_labeled, std_threshold=3)
X_unlabeled_clipped, unlabeled_outliers = clip_outliers(X_unlabeled, std_threshold=3)

  MolWt               : 102 outliers clipped
  LogP                : 80 outliers clipped
  NumHDonors          : 93 outliers clipped
  NumHAcceptors       : 102 outliers clipped
  NumRotatableBonds   : 110 outliers clipped
  NumAromaticRings    : 45 outliers clipped
  NumHeteroatoms      : 75 outliers clipped
  TPSA                : 83 outliers clipped
  NumRings            : 78 outliers clipped
  NumAliphaticRings   : 101 outliers clipped
  NumSaturatedRings   : 104 outliers clipped
  NumValenceElectrons : 99 outliers clipped
  MinPartialCharge    : 14 outliers clipped
  LabuteASA           : 97 outliers clipped
  PEOE_VSA1           : 115 outliers clipped
  PEOE_VSA2           : 70 outliers clipped
  BertzCT             : 76 outliers clipped
  Chi0v               : 103 outliers clipped
  Chi1v               : 101 outliers clipped
  Kappa1              : 100 outliers clipped
  Kappa2              : 1 outliers clipped
  MolMR               : 97 outliers clipped
  BalabanJ            : 

In [6]:
df_labeled_processed = X_labeled_clipped.copy()

df_labeled_processed['toxic'] = y_tox.values

df_unlabeled_processed = X_unlabeled_clipped.copy()
df_unlabeled_processed['toxic'] = np.nan

df_labeled_processed.to_csv('../../data/processed/tox21/labeled_processed.csv', index=False)
df_unlabeled_processed.to_csv('../../data/processed/tox21/unlabeled_processed.csv', index=False)

In [7]:
artifact_tox21_labeled   = wandb.Artifact(
            name="tox21-labeled-dataset",
            type="dataset",
            description="Cleaned labeled tox21 data for v1.0" 
            )
artifact_tox21_labeled.add_file('../../data/processed/tox21/labeled_processed.csv')

run.log_artifact(artifact_tox21_labeled)

<Artifact tox21-labeled-dataset>

In [8]:
artifact_zinc_unlabeled   = wandb.Artifact(
            name="zinc-unlabeled-dataset",
            type="dataset",
            description="Cleaned unlabeled zinc data for v1.0" 
            )
artifact_zinc_unlabeled.add_file('../../data/processed/tox21/unlabeled_processed.csv')
run.log_artifact(artifact_zinc_unlabeled)
run.finish()

## Phase 2 & 3: Baseline Model & Hyperparameter Optimization

We establish a **baseline supervised model** using Random Forest, then optimize hyperparameters with W&B Sweeps.

### Approach:
1. **Baseline Training**: Train Random Forest on labeled data only
2. **Bayesian Optimization**: Systematically search hyperparameter space
3. **Metrics Tracked**: F1-Score, ROC-AUC, Precision, Recall, Accuracy
4. **Model Registry**: Save best baseline model as W&B artifact

### Hyperparameters Tuned:
- `n_estimators`: Number of trees (50-200)
- `max_depth`: Tree depth (10-30)
- `min_samples_split`: Split threshold (2, 5, 10)
- `min_samples_leaf`: Leaf size (1, 2, 4)
- `max_features`: Feature sampling ('sqrt', 'log2')

This baseline serves as the benchmark for our Semi-Supervised Learning methods.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

wandb.init(project=PROJECT, job_type="experiment")

run = wandb.use_artifact('zinc-unlabeled-dataset:latest')
data_path = run.download()
df_unlabeled = pd.read_csv(f"{data_path}/unlabeled_processed.csv")

run= wandb.use_artifact('tox21-labeled-dataset:latest')
data_path = run.download()
df_labeled = pd.read_csv(f"{data_path}/labeled_processed.csv")

[34m[1mwandb[0m: Downloading large artifact 'zinc-unlabeled-dataset:latest', 92.69MB. 1 files...
[34m[1mwandb[0m:   1 of 1 files downloaded.  
Done. 00:00:00.8 (122.6MB/s)
[34m[1mwandb[0m:   1 of 1 files downloaded.  


Exception in thread IntMsgThr:
Traceback (most recent call last):
  File "/home/lokmane/anaconda3/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 772, in run_closure
    _threading_Thread_run(self)
  File "/home/lokmane/anaconda3/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 336, in check_internal_messages
    self._loop_check_status(
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 237, in _loop_check_status
    local_handle = request()
                   ^^^^^^^^^
  File "/home/lokmane/anaconda3/lib/python3.11/site-packages/wandb/sdk/interface/interface.py", line 1007, in deliver_internal_messages
    return self._deliver_internal_messages(internal_message)
           ^^^^^^^^^^^^^^^^^^^^^^^

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

X = df_labeled.drop('toxic', axis=1)
y = df_labeled['toxic']

X = X.replace([np.inf, -np.inf], np.nan)
X = X.fillna(X.median())

print(f"NaN values: {X.isna().sum().sum()}")
print(f"Inf values: {np.isinf(X.values).sum()}")

X_unlabeled_full = df_unlabeled.drop('toxic', axis=1)
X_unlabeled_full = X_unlabeled_full.replace([np.inf, -np.inf], np.nan)
X_unlabeled_full = X_unlabeled_full.fillna(X_unlabeled_full.median())

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print("\nScaling features...")
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_unlabeled_scaled = scaler.transform(X_unlabeled_full)

X_train = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_test = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)
X_unlabeled_full = X_unlabeled_scaled

NaN values: 0
Inf values: 0

Scaling features...


In [11]:
baseline_sweep_config = {
    'method': 'bayes',
    'metric': {'name': 'val_f1', 'goal': 'maximize'},
    'parameters': {
        'n_estimators': {'values': [50, 100, 150, 200]},
        'max_depth': {'values': [10, 15, 20, 25, 30]},
        'min_samples_split': {'values': [2, 5, 10]},
        'min_samples_leaf': {'values': [1, 2, 4]},
        'max_features': {'values': ['sqrt', 'log2']}
    }
}

In [12]:
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score

def train_baseline():
    run = wandb.init()
    config = wandb.config
    
    model = RandomForestClassifier(
        n_estimators=config.n_estimators,
        max_depth=config.max_depth,
        min_samples_split=config.min_samples_split,
        min_samples_leaf=config.min_samples_leaf,
        max_features=config.max_features,
        random_state=42,
        n_jobs=-1,
        class_weight='balanced'
    )
    
    model.fit(X_train.values, y_train.values)
    
    y_pred = model.predict(X_test.values)
    y_pred_proba = model.predict_proba(X_test.values)[:, 1]
    
    metrics = {
        'val_accuracy': accuracy_score(y_test, y_pred),
        'val_precision': precision_score(y_test, y_pred, zero_division=0),
        'val_recall': recall_score(y_test, y_pred, zero_division=0),
        'val_f1': f1_score(y_test, y_pred, zero_division=0),
        'val_roc_auc': roc_auc_score(y_test, y_pred_proba)
    }
    
    wandb.log(metrics)
    
    print(f"F1: {metrics['val_f1']:.4f}, ROC-AUC: {metrics['val_roc_auc']:.4f}")

print("‚úì Baseline training function defined")

‚úì Baseline training function defined


In [13]:
baseline_sweep_id = wandb.sweep(baseline_sweep_config, project=PROJECT)
wandb.agent(baseline_sweep_id, train_baseline, count=20)

print("‚úì Baseline sweep completed!")

Create sweep with ID: 6z4ddpg3
Sweep URL: https://wandb.ai/l-benhammadi-esi/QSAR_MLOPS_TOX21/sweeps/6z4ddpg3


[34m[1mwandb[0m: Agent Starting Run: 8n3uo1b1 with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


[1;34mwandb[0m: 
[1;34mwandb[0m: üöÄ View run [33mvocal-plasma-2[0m at: [34mhttps://wandb.ai/l-benhammadi-esi/QSAR_MLOPS_TOX21/runs/w8krzjlj[0m
[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260129_110301-w8krzjlj/logs[0m


F1: 0.6792, ROC-AUC: 0.7697


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69923
val_f1,0.67922
val_precision,0.72326
val_recall,0.64024
val_roc_auc,0.76973


[34m[1mwandb[0m: Agent Starting Run: a4uun9sj with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6849, ROC-AUC: 0.7606


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70159
val_f1,0.6849
val_precision,0.7212
val_recall,0.65207
val_roc_auc,0.76058


[34m[1mwandb[0m: Agent Starting Run: xku0x0aj with config:
[34m[1mwandb[0m: 	max_depth: 10
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6845, ROC-AUC: 0.7675


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70159
val_f1,0.68451
val_precision,0.72178
val_recall,0.65089
val_roc_auc,0.76754


[34m[1mwandb[0m: Agent Starting Run: 4bwclpzw with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6886, ROC-AUC: 0.7610


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70512
val_f1,0.68863
val_precision,0.72513
val_recall,0.65562
val_roc_auc,0.76096


[34m[1mwandb[0m: Agent Starting Run: reltkbo0 with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6792, ROC-AUC: 0.7613


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69747
val_f1,0.67915
val_precision,0.71863
val_recall,0.64379
val_roc_auc,0.76131


[34m[1mwandb[0m: Agent Starting Run: 4nferbhd with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6746, ROC-AUC: 0.7636


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69453
val_f1,0.67461
val_precision,0.71733
val_recall,0.63669
val_roc_auc,0.76355


[34m[1mwandb[0m: Agent Starting Run: 6kra62d0 with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6880, ROC-AUC: 0.7668


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70689
val_f1,0.68797
val_precision,0.73103
val_recall,0.6497
val_roc_auc,0.76682


[34m[1mwandb[0m: Agent Starting Run: 97qeg64o with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6771, ROC-AUC: 0.7647


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.6957
val_f1,0.67708
val_precision,0.71693
val_recall,0.64142
val_roc_auc,0.76474


[34m[1mwandb[0m: Agent Starting Run: 4v71fn5n with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6850, ROC-AUC: 0.7683


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70394
val_f1,0.68503
val_precision,0.72739
val_recall,0.64734
val_roc_auc,0.76825


[34m[1mwandb[0m: Agent Starting Run: tjq9gso9 with config:
[34m[1mwandb[0m: 	max_depth: 10
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6833, ROC-AUC: 0.7671


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.70041
val_f1,0.68326
val_precision,0.72047
val_recall,0.6497
val_roc_auc,0.76709


[34m[1mwandb[0m: Agent Starting Run: hjb0o22r with config:
[34m[1mwandb[0m: 	max_depth: 15
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6759, ROC-AUC: 0.7652


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.6957
val_f1,0.67586
val_precision,0.71867
val_recall,0.63787
val_roc_auc,0.7652


[34m[1mwandb[0m: Agent Starting Run: d43iqqwe with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6827, ROC-AUC: 0.7619


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69629
val_f1,0.68266
val_precision,0.71063
val_recall,0.6568
val_roc_auc,0.76185


[34m[1mwandb[0m: Agent Starting Run: qvtp9hxc with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6817, ROC-AUC: 0.7679


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.701
val_f1,0.6817
val_precision,0.72437
val_recall,0.64379
val_roc_auc,0.76793


[34m[1mwandb[0m: Agent Starting Run: jlbpvutq with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6787, ROC-AUC: 0.7647


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.6957
val_f1,0.67868
val_precision,0.71466
val_recall,0.64615
val_roc_auc,0.76475


[34m[1mwandb[0m: Agent Starting Run: dptgwfet with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6820, ROC-AUC: 0.7621


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69923
val_f1,0.68202
val_precision,0.71916
val_recall,0.64852
val_roc_auc,0.7621


[34m[1mwandb[0m: Agent Starting Run: nqns13uu with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 150
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6792, ROC-AUC: 0.7697


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69923
val_f1,0.67922
val_precision,0.72326
val_recall,0.64024
val_roc_auc,0.76973


[34m[1mwandb[0m: Agent Starting Run: 0wpbkzhq with config:
[34m[1mwandb[0m: 	max_depth: 25
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 2
[34m[1mwandb[0m: 	n_estimators: 200
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6807, ROC-AUC: 0.7594


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.6957
val_f1,0.68067
val_precision,0.71189
val_recall,0.65207
val_roc_auc,0.75942


[34m[1mwandb[0m: Agent Starting Run: h89mm4xd with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: sqrt
[34m[1mwandb[0m: 	min_samples_leaf: 4
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 100
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6763, ROC-AUC: 0.7636


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.6957
val_f1,0.67627
val_precision,0.71809
val_recall,0.63905
val_roc_auc,0.7636


[34m[1mwandb[0m: Agent Starting Run: xzmfhuxc with config:
[34m[1mwandb[0m: 	max_depth: 20
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 2
[34m[1mwandb[0m: 	min_samples_split: 10
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6864, ROC-AUC: 0.7620


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.701
val_f1,0.68642
val_precision,0.71742
val_recall,0.65799
val_roc_auc,0.76197


[34m[1mwandb[0m: Agent Starting Run: lv9esd8h with config:
[34m[1mwandb[0m: 	max_depth: 30
[34m[1mwandb[0m: 	max_features: log2
[34m[1mwandb[0m: 	min_samples_leaf: 1
[34m[1mwandb[0m: 	min_samples_split: 5
[34m[1mwandb[0m: 	n_estimators: 50
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6831, ROC-AUC: 0.7567


0,1
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
val_accuracy,0.69747
val_f1,0.68311
val_precision,0.713
val_recall,0.65562
val_roc_auc,0.75673


‚úì Baseline sweep completed!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22f7755810, execution_count=13 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22ebd13f50, raw_cell="baseline_sweep_id = wandb.sweep(baseline_sweep_con.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X20sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [14]:
import joblib
import os

api = wandb.Api()
entity = wandb.Api().default_entity

baseline_sweep = api.sweep(f"{entity}/{PROJECT}/{baseline_sweep_id}")
best_baseline_run = baseline_sweep.best_run()

print(f"\nüèÜ BEST BASELINE MODEL")
print(f"{'='*60}")
print(f"Run name: {best_baseline_run.name}")
print(f"F1-Score: {best_baseline_run.summary.get('val_f1'):.4f}")
print(f"ROC-AUC: {best_baseline_run.summary.get('val_roc_auc'):.4f}")
print(f"\nBest Hyperparameters:")
print(f"  n_estimators: {best_baseline_run.config['n_estimators']}")
print(f"  max_depth: {best_baseline_run.config['max_depth']}")
print(f"  min_samples_split: {best_baseline_run.config['min_samples_split']}")
print(f"  min_samples_leaf: {best_baseline_run.config['min_samples_leaf']}")
print(f"  max_features: {best_baseline_run.config['max_features']}")

best_baseline_config = {
    'n_estimators': best_baseline_run.config['n_estimators'],
    'max_depth': best_baseline_run.config['max_depth'],
    'min_samples_split': best_baseline_run.config['min_samples_split'],
    'min_samples_leaf': best_baseline_run.config['min_samples_leaf'],
    'max_features': best_baseline_run.config['max_features']
}

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22f35c8d50, raw_cell="import joblib
import os

api = wandb.Api()
entity .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X21sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

[34m[1mwandb[0m: Sorting runs by -summary_metrics.val_f1



üèÜ BEST BASELINE MODEL
Run name: giddy-sweep-4
F1-Score: 0.6886
ROC-AUC: 0.7610

Best Hyperparameters:
  n_estimators: 50
  max_depth: 30
  min_samples_split: 10
  min_samples_leaf: 2
  max_features: log2
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22f38c8cd0, execution_count=14 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22f35c8d50, raw_cell="import joblib
import os

api = wandb.Api()
entity .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X21sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [15]:
run = wandb.init(project=PROJECT, job_type="register-baseline-model")

best_baseline_model = RandomForestClassifier(
    n_estimators=best_baseline_config['n_estimators'],
    max_depth=best_baseline_config['max_depth'],
    min_samples_split=best_baseline_config['min_samples_split'],
    min_samples_leaf=best_baseline_config['min_samples_leaf'],
    max_features=best_baseline_config['max_features'],
    random_state=42,
    n_jobs=-1,
    class_weight='balanced'
)

best_baseline_model.fit(X_train.values, y_train.values)

os.makedirs('models', exist_ok=True)
joblib.dump(best_baseline_model, 'models/best_baseline_rf_model.pkl')

baseline_artifact = wandb.Artifact(
    name='tox21-baseline-rf-model',
    type='model',
    description='Best baseline Random Forest model (supervised only)',
    metadata={
        'method': 'baseline_supervised',
        'n_samples': len(X_train),
        'f1_score': best_baseline_run.summary.get('val_f1'),
        'roc_auc': best_baseline_run.summary.get('val_roc_auc'),
        **best_baseline_config
    }
)

baseline_artifact.add_file('models/best_baseline_rf_model.pkl')
run.log_artifact(baseline_artifact)
run.finish()

print("‚úì Best baseline model registered!")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22fd1b68d0, raw_cell="run = wandb.init(project=PROJECT, job_type="regist.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X22sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

‚úì Best baseline model registered!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22eb9788d0, execution_count=15 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22fd1b68d0, raw_cell="run = wandb.init(project=PROJECT, job_type="regist.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X22sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

## Phase 4: Semi-Supervised Learning & Model Registry

After establishing our baseline, we leverage **unlabeled ZINC molecules** to improve model performance using SSL techniques.

### SSL Methods Implemented:
1. **Label Propagation**: Graph-based label spreading with RBF kernel
2. **Self-Training**: Iteratively label high-confidence predictions
3. **Co-Training**: Train two models on different feature views

### Process:
1. **Hyperparameter Sweep**: Test different SSL configurations
2. **Compare Methods**: Evaluate improvement over baseline
3. **Select Best Model**: Based on F1-Score and ROC-AUC
4. **Register to W&B**: Save best model with metadata

The best SSL model significantly outperforms the supervised baseline by utilizing tens of thousands of unlabeled molecules.

In [16]:
from sklearn.semi_supervised import LabelPropagation, LabelSpreading, SelfTrainingClassifier

ssl_sweep_config = {
    'method': 'bayes',
    'metric': {'name': 'val_f1', 'goal': 'maximize'},
    'parameters': {
        'ssl_method': {
            'values': ['label_propagation', 'self_training', 'co_training']
        },
        
        'n_unlabeled': {
            'values': [5000, 10000, 15000, 20000]
        },
        
        'lp_gamma': {
            'distribution': 'log_uniform_values',
            'min': 0.001,
            'max': 0.5
        },
        'lp_max_iter': {
            'values': [500, 1000, 1500]
        },
        
        'ls_alpha': {
            'distribution': 'uniform',
            'min': 0.1,
            'max': 0.9
        },
        
        'st_threshold': {
            'distribution': 'uniform',
            'min': 0.7,
            'max': 0.95
        },
        'st_max_iter': {
            'values': [5, 10, 15]
        },
        
        'ct_confidence_threshold': {
            'distribution': 'uniform',
            'min': 0.75,
            'max': 0.95
        },
        'ct_samples_per_iter': {
            'values': [25, 50, 100]
        },
        'ct_max_iterations': {
            'values': [10, 15, 20]
        }
    }
}

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22f7eefc10, raw_cell="from sklearn.semi_supervised import LabelPropagati.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X24sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22fe12d850, execution_count=16 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22f7eefc10, raw_cell="from sklearn.semi_supervised import LabelPropagati.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X24sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [17]:
def train_ssl():
    run = wandb.init()
    config = wandb.config
    
    n_unlabeled = min(config.n_unlabeled, len(X_unlabeled_full))
    X_unlabeled = X_unlabeled_full[:n_unlabeled]
    
    
    # === LABEL PROPAGATION ===
    if config.ssl_method == 'label_propagation':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        model = LabelPropagation(
            kernel='rbf',
            gamma=config.lp_gamma,
            max_iter=config.lp_max_iter,
            n_jobs=-1
        )
        model.fit(X_combined, y_combined)
        y_pred = model.predict(X_test.values)
        y_pred_proba = model.predict_proba(X_test.values)[:, 1]
    
    # === SELF-TRAINING ===
    elif config.ssl_method == 'self_training':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        base_clf = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=42,
            n_jobs=-1,
            class_weight='balanced'
        )
        model = SelfTrainingClassifier(
            base_estimator=base_clf,
            threshold=config.st_threshold,
            max_iter=config.st_max_iter,
            verbose=False
        )
        model.fit(X_combined, y_combined)
        y_pred = model.predict(X_test.values)
        y_pred_proba = model.predict_proba(X_test.values)[:, 1]
    
    # === CO-TRAINING ===
    elif config.ssl_method == 'co_training':
        view1_candidates = [
            'MolWt', 'LogP', 'NumHDonors', 'NumHAcceptors', 'NumValenceElectrons',
            'TPSA', 'MaxPartialCharge', 'MinPartialCharge', 'LabuteASA', 'MolMR',
            'QED', 'NumHeteroatoms'
        ]
        
        view2_candidates = [
            'NumRotatableBonds', 'NumAromaticRings', 'NumRings', 'NumAliphaticRings',
            'NumSaturatedRings', 'FractionCsp3', 'PEOE_VSA1', 'PEOE_VSA2', 'BertzCT',
            'Chi0v', 'Chi1v', 'Kappa1', 'Kappa2', 'BalabanJ', 'HallKierAlpha',
            'NumSaturatedCarbocycles', 'NumAromaticCarbocycles',
            'NumSaturatedHeterocycles', 'NumAromaticHeterocycles',
            'fr_NH2', 'fr_COO', 'fr_benzene', 'fr_furan', 'fr_halogen'
        ]
        
        feature_cols = X_train.columns.tolist()
        
        view1_features = [f for f in view1_candidates if f in feature_cols]
        view2_features = [f for f in view2_candidates if f in feature_cols]
        
        v1_idx = [feature_cols.index(f) for f in view1_features]
        v2_idx = [feature_cols.index(f) for f in view2_features]
        
        X_train_v1 = X_train.values[:, v1_idx]
        X_train_v2 = X_train.values[:, v2_idx]
        X_test_v1 = X_test.values[:, v1_idx]
        X_test_v2 = X_test.values[:, v2_idx]
        X_unlabeled_v1 = X_unlabeled[:, v1_idx]
        X_unlabeled_v2 = X_unlabeled[:, v2_idx]
        
        y_train_curr = y_train.values.copy()
        
        clf1 = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=42,
            n_jobs=-1,
            class_weight='balanced'
        )
        clf2 = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=43,
            n_jobs=-1,
            class_weight='balanced'
        )
        
        mask_available = np.ones(n_unlabeled, dtype=bool)
        
        for iteration in range(config.ct_max_iterations):
            if not np.any(mask_available):
                break
            
            clf1.fit(X_train_v1, y_train_curr)
            clf2.fit(X_train_v2, y_train_curr)
            
            available_v1 = X_unlabeled_v1[mask_available]
            available_v2 = X_unlabeled_v2[mask_available]
            
            if len(available_v1) == 0:
                break
            
            prob1 = clf1.predict_proba(available_v1)
            prob2 = clf2.predict_proba(available_v2)
            
            conf1 = np.max(prob1, axis=1)
            conf2 = np.max(prob2, axis=1)
            pred1 = np.argmax(prob1, axis=1)
            pred2 = np.argmax(prob2, axis=1)
            
            confident1 = conf1 > config.ct_confidence_threshold
            confident2 = conf2 > config.ct_confidence_threshold
            
            available_indices = np.where(mask_available)[0]
            samples_to_add = []
            labels_to_add = []
            
            if np.any(confident1):
                top_indices1 = np.argsort(conf1)[::-1][:config.ct_samples_per_iter]
                top_indices1 = top_indices1[confident1[top_indices1]]
                for idx in top_indices1:
                    samples_to_add.append(available_indices[idx])
                    labels_to_add.append(pred1[idx])
            
            if np.any(confident2):
                top_indices2 = np.argsort(conf2)[::-1][:config.ct_samples_per_iter]
                top_indices2 = top_indices2[confident2[top_indices2]]
                for idx in top_indices2:
                    if available_indices[idx] not in samples_to_add:
                        samples_to_add.append(available_indices[idx])
                        labels_to_add.append(pred2[idx])
            
            if len(samples_to_add) == 0:
                break
            
            samples_to_add = np.array(samples_to_add)
            labels_to_add = np.array(labels_to_add)
            
            X_train_v1 = np.vstack([X_train_v1, X_unlabeled_v1[samples_to_add]])
            X_train_v2 = np.vstack([X_train_v2, X_unlabeled_v2[samples_to_add]])
            y_train_curr = np.concatenate([y_train_curr, labels_to_add])
            
            mask_available[samples_to_add] = False
        
        clf1.fit(X_train_v1, y_train_curr)
        clf2.fit(X_train_v2, y_train_curr)
        
        p1 = clf1.predict_proba(X_test_v1)[:, 1]
        p2 = clf2.predict_proba(X_test_v2)[:, 1]
        y_pred_proba = (p1 + p2) / 2
        y_pred = (y_pred_proba >= 0.5).astype(int)
    
    metrics = {
        'val_accuracy': accuracy_score(y_test, y_pred),
        'val_precision': precision_score(y_test, y_pred, zero_division=0),
        'val_recall': recall_score(y_test, y_pred, zero_division=0),
        'val_f1': f1_score(y_test, y_pred, zero_division=0),
        'val_roc_auc': roc_auc_score(y_test, y_pred_proba),
        'n_unlabeled_used': n_unlabeled,
        'n_labeled': len(X_train)
    }
    
    wandb.log(metrics)
    
    baseline_f1 = best_baseline_run.summary.get('val_f1')
    improvement = ((metrics['val_f1'] - baseline_f1) / baseline_f1) * 100
    wandb.log({'improvement_over_baseline_pct': improvement})
    
    print(f"F1: {metrics['val_f1']:.4f} (+{improvement:+.2f}%), ROC-AUC: {metrics['val_roc_auc']:.4f}")

print("‚úì SSL training function defined (all 4 methods)")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22f35fa8d0, raw_cell="def train_ssl():
    run = wandb.init()
    config.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X25sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

‚úì SSL training function defined (all 4 methods)
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22fe1f5a50, execution_count=17 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22f35fa8d0, raw_cell="def train_ssl():
    run = wandb.init()
    config.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X25sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [18]:
ssl_sweep_id = wandb.sweep(ssl_sweep_config, project=PROJECT)
wandb.agent(ssl_sweep_id, train_ssl, count=50)

print("‚úì SSL sweep completed!")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22fd120a50, raw_cell="ssl_sweep_id = wandb.sweep(ssl_sweep_config, proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X26sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

Create sweep with ID: 415gsqzj
Sweep URL: https://wandb.ai/l-benhammadi-esi/QSAR_MLOPS_TOX21/sweeps/415gsqzj


[34m[1mwandb[0m: Agent Starting Run: vq9kuxtt with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8823214756927386
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.46633720675856694
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.6173144766786385
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8473665564102328
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6692 (+-2.82%), ROC-AUC: 0.7508


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.81586
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.69688
val_f1,0.66924
val_precision,0.73174
val_recall,0.61657
val_roc_auc,0.75077


[34m[1mwandb[0m: Agent Starting Run: na7n2pmd with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9332641418827948
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.23804407762226623
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.3555222817709298
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7199141935625343
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6790 (+-1.40%), ROC-AUC: 0.7232


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.39914
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.66274
val_f1,0.67899
val_precision,0.64468
val_recall,0.71716
val_roc_auc,0.72317


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 6s4mwkdv with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7530548811797468
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.35962502578969324
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.3119631450697973
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.771215409375232
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6650 (+-3.43%), ROC-AUC: 0.7243


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.42915
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.68217
val_f1,0.66501
val_precision,0.69883
val_recall,0.63432
val_roc_auc,0.72428


[34m[1mwandb[0m: Agent Starting Run: cqtfq4n1 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8646608337642059
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.01869853661646963
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.46638488012831025
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.856315221862691
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6804 (+-1.20%), ROC-AUC: 0.7001


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.19591
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.63449
val_f1,0.68039
val_precision,0.602
val_recall,0.78225
val_roc_auc,0.70007


[34m[1mwandb[0m: Agent Starting Run: g5weo8x5 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7571429871927647
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.09301695213762542
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.6431397041109093
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7604111116398501
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6755 (+-1.90%), ROC-AUC: 0.7511


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.90301
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69806
val_f1,0.67552
val_precision,0.72554
val_recall,0.63195
val_roc_auc,0.75112


[34m[1mwandb[0m: Agent Starting Run: n9uhajvy with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7755862673098514
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.007180064699407413
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.659663315469458
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8463049323806413
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6722 (+-2.38%), ROC-AUC: 0.7645


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.3832
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69865
val_f1,0.67222
val_precision,0.73222
val_recall,0.6213
val_roc_auc,0.76448


[34m[1mwandb[0m: Agent Starting Run: m0on6hyr with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7812025954309124
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.005719648299362614
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.2403400242377
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7821292734636225
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6822 (+-0.93%), ROC-AUC: 0.7559


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-0.92613
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.70394
val_f1,0.68225
val_precision,0.73171
val_recall,0.63905
val_roc_auc,0.75587


[34m[1mwandb[0m: Agent Starting Run: o31xbhe0 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8314389317440976
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.001688530930942546
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.424582882086462
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.7545058676292611
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6768 (+-1.72%), ROC-AUC: 0.7590


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.72209
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69865
val_f1,0.67677
val_precision,0.7253
val_recall,0.63432
val_roc_auc,0.75902


[34m[1mwandb[0m: Agent Starting Run: 78mgpyf4 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8644641611119304
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.3088434626814073
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.8993179297861408
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.9007896152347018
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6701 (+-2.69%), ROC-AUC: 0.7532


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.6899
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69865
val_f1,0.6701
val_precision,0.7355
val_recall,0.61538
val_roc_auc,0.75321


[34m[1mwandb[0m: Agent Starting Run: 19wf4kwi with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8441292576972764
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.0018742425738619236
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.3579471523144705
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.8941934154105724
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.2881 (+-58.16%), ROC-AUC: 0.6943


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-58.15793
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.55503
val_f1,0.28814
val_precision,0.70507
val_recall,0.18107
val_roc_auc,0.69433


[34m[1mwandb[0m: Agent Starting Run: rjy0tc7d with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8231451813908486
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.003665058651400653
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.3596390437812126
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.8870603740947061
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.4469 (+-35.10%), ROC-AUC: 0.6948


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-35.10129
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.58917
val_f1,0.44691
val_precision,0.67626
val_recall,0.33373
val_roc_auc,0.69479


[34m[1mwandb[0m: Agent Starting Run: w94ai961 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.750725616456419
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.006119535574062913
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.30054486540289754
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.752898613795169
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6403 (+-7.02%), ROC-AUC: 0.7480


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-7.01545
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.67863
val_f1,0.64032
val_precision,0.72214
val_recall,0.57515
val_roc_auc,0.74798


[34m[1mwandb[0m: Agent Starting Run: 9geeuojj with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9139854080122738
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.004164124237975971
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.6598007489593425
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.8510419128042936
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6632 (+-3.69%), ROC-AUC: 0.7476


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.68763
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69217
val_f1,0.66323
val_precision,0.7274
val_recall,0.60947
val_roc_auc,0.74761


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 35mpi256 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9247047271830712
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.0011794577861284586
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.6648694870138654
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7083218956405546
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6742 (+-2.09%), ROC-AUC: 0.7492


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.09088
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6957
val_f1,0.67423
val_precision,0.72102
val_recall,0.63314
val_roc_auc,0.74921


[34m[1mwandb[0m: Agent Starting Run: srqjul7m with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8629813643286175
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.056684775166711096
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.3854346172706101
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.9101949183910312
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6736 (+-2.18%), ROC-AUC: 0.7122


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.18223
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.66745
val_f1,0.6736
val_precision,0.65801
val_recall,0.68994
val_roc_auc,0.71218


[34m[1mwandb[0m: Agent Starting Run: b1ykgz09 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8612582198064579
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.006423812192991901
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.822866448963789
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.707505332777709
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.5669 (+-17.68%), ROC-AUC: 0.6960


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-17.67721
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.63037
val_f1,0.5669
val_precision,0.67934
val_recall,0.48639
val_roc_auc,0.69604


[34m[1mwandb[0m: Agent Starting Run: esbitp8i with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8892918613393876
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.4052901591345897
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.648884484405326
[34m[1mwandb[0m: 	n_unlabeled: 5000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7683173117438684
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6742 (+-2.09%), ROC-AUC: 0.7569


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.09295
n_labeled,3962.0
n_unlabeled_used,5000.0
val_accuracy,0.69511
val_f1,0.67421
val_precision,0.71946
val_recall,0.63432
val_roc_auc,0.75687


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: dd3mznke with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8172341403905938
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.05034916965600952
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.7882732977476813
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8570496691680861
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6684 (+-2.94%), ROC-AUC: 0.7478


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.93958
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69688
val_f1,0.66838
val_precision,0.73305
val_recall,0.6142
val_roc_auc,0.74778


[34m[1mwandb[0m: Agent Starting Run: 5nmwjfln with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8116522338372681
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.2172393085155521
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.6507118693260316
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.7555070302281153
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6397 (+-7.10%), ROC-AUC: 0.7465


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-7.09984
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.67981
val_f1,0.63974
val_precision,0.72632
val_recall,0.5716
val_roc_auc,0.74651


[34m[1mwandb[0m: Agent Starting Run: qgwm8g65 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9434409040372576
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.004241827020853349
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.5858613429944941
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.9191807988625896
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6693 (+-2.80%), ROC-AUC: 0.7681


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.80168
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.70806
val_f1,0.66933
val_precision,0.76641
val_recall,0.59408
val_roc_auc,0.76813


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: j93e8z4k with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9010050356767576
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.011326073082325316
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.6693474754573933
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.865708707081346
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6436 (+-6.54%), ROC-AUC: 0.7544


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-6.54471
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.67922
val_f1,0.64356
val_precision,0.7193
val_recall,0.58225
val_roc_auc,0.75442


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 7cit7ash with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8676565995973706
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.05326201029114889
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.22328301706510464
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8446967158180672
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.




F1: 0.6825 (+-0.88%), ROC-AUC: 0.7085


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-0.88278
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.61566
val_f1,0.68255
val_precision,0.57921
val_recall,0.83077
val_roc_auc,0.70848


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 5r44gzr5 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7737628797084966
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.4328056399706195
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.5039654052232798
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7704263754207106
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6797 (+-1.29%), ROC-AUC: 0.7630


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.29307
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.70159
val_f1,0.67972
val_precision,0.729
val_recall,0.63669
val_roc_auc,0.76303


[34m[1mwandb[0m: Agent Starting Run: zetjon9v with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8248664110512159
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.10475652327260214
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.1420817262409031
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8437938691440819
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6770 (+-1.68%), ROC-AUC: 0.7161


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.68214
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.65803
val_f1,0.67704
val_precision,0.63836
val_recall,0.72071
val_roc_auc,0.71612


[34m[1mwandb[0m: Agent Starting Run: 016qqvn7 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7808767072526267
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.006257581423128602
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.851028711500481
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8414322069136916
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6658 (+-3.31%), ROC-AUC: 0.7465


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.31218
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69099
val_f1,0.66582
val_precision,0.72039
val_recall,0.61893
val_roc_auc,0.7465


[34m[1mwandb[0m: Agent Starting Run: tj567th5 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8029522600429685
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.08084696496848395
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.15820692064806546
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7825083290257757
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6762 (+-1.81%), ROC-AUC: 0.7113


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.8097
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.65215
val_f1,0.67616
val_precision,0.62959
val_recall,0.73018
val_roc_auc,0.71128


[34m[1mwandb[0m: Agent Starting Run: js5ukvad with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.816497416344522
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.0669160254244479
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.22471372941146425
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8138947342371908
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.




F1: 0.6819 (+-0.98%), ROC-AUC: 0.7085


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-0.97885
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.62272
val_f1,0.68189
val_precision,0.58718
val_recall,0.81302
val_roc_auc,0.70852


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: lmoj97gx with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8131305585411793
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 25
[34m[1mwandb[0m: 	lp_gamma: 0.02760532659656193
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.155083953282462
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7810768463207437
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6809 (+-1.13%), ROC-AUC: 0.7599


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.12912
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.69982
val_f1,0.68085
val_precision,0.72244
val_recall,0.64379
val_roc_auc,0.75989


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: qorygt1s with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8791886058838245
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.02836880919393947
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.4365656912855421
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8225993871808266
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6910 (++0.34%), ROC-AUC: 0.7029


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,0.34399
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.61624
val_f1,0.691
val_precision,0.57628
val_recall,0.86272
val_roc_auc,0.70294


[34m[1mwandb[0m: Agent Starting Run: j776rkun with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8655197085162513
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.004929099575066193
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.15960592388698266
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8711926736697768
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.5689 (+-17.39%), ROC-AUC: 0.6951


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-17.38877
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.62978
val_f1,0.56888
val_precision,0.6759
val_recall,0.49112
val_roc_auc,0.69511


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: dpr5wzpk with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9377692621613604
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.31589698937450195
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.3035475660274517
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7148350943732966
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6771 (+-1.68%), ROC-AUC: 0.7220


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.67888
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.67098
val_f1,0.67707
val_precision,0.6614
val_recall,0.69349
val_roc_auc,0.72201


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: gnry7fyh with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8787998271439011
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.28202292906378956
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.4995192552319031
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7574706599898182
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6796 (+-1.32%), ROC-AUC: 0.7226


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.31552
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.66863
val_f1,0.67957
val_precision,0.65461
val_recall,0.70651
val_roc_auc,0.72264


[34m[1mwandb[0m: Agent Starting Run: 87kswl6v with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8555521109739522
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.0796458748504077
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.41168991532254295
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7690661494350025
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6751 (+-1.96%), ROC-AUC: 0.7443


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.96036
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69865
val_f1,0.67513
val_precision,0.72777
val_recall,0.62959
val_roc_auc,0.74426


[34m[1mwandb[0m: Agent Starting Run: hqvst3da with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8599944279927741
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.1055218117246287
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.23293336656227848
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7064230745116207
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6747 (+-2.02%), ROC-AUC: 0.7426


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.02253
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69806
val_f1,0.6747
val_precision,0.72678
val_recall,0.62959
val_roc_auc,0.74263


[34m[1mwandb[0m: Agent Starting Run: jepcgzve with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8560002915187034
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.10535531606188706
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.3906715759212206
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7299631408832638
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6722 (+-2.39%), ROC-AUC: 0.7441


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.38782
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69688
val_f1,0.67218
val_precision,0.72727
val_recall,0.62485
val_roc_auc,0.74407


[34m[1mwandb[0m: Agent Starting Run: qqigz1wz with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8573490918971223
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.04902227181614376
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.2587942168391442
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7500175066426087
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6646 (+-3.49%), ROC-AUC: 0.7445


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.49413
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.68687
val_f1,0.66456
val_precision,0.7112
val_recall,0.62367
val_roc_auc,0.74451


[34m[1mwandb[0m: Agent Starting Run: shel2mrk with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8710751820265774
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.3401919580510973
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.5624111707092008
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8915256888151457
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6601 (+-4.14%), ROC-AUC: 0.7586


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-4.14367
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.6957
val_f1,0.66009
val_precision,0.7426
val_recall,0.59408
val_roc_auc,0.75859


[34m[1mwandb[0m: Agent Starting Run: d87un4lo with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8458512571527396
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.4262342865184976
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.32822383000156974
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7727173011573862
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6747 (+-2.02%), ROC-AUC: 0.7506


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.02475
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.69747
val_f1,0.67468
val_precision,0.72517
val_recall,0.63077
val_roc_auc,0.75063


[34m[1mwandb[0m: Agent Starting Run: wmvxgd02 with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7972971401467215
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.29547100285516414
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.23480920505059963
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7914694577065885
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6829 (+-0.83%), ROC-AUC: 0.7526


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-0.82768
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.70159
val_f1,0.68293
val_precision,0.72414
val_recall,0.64615
val_roc_auc,0.75262


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: tmm9ffcz with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8667340606607012
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.2934471377873774
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.5272723292162519
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.7873359676088103
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6785 (+-1.47%), ROC-AUC: 0.7223


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.47203
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.66922
val_f1,0.67849
val_precision,0.6567
val_recall,0.70178
val_roc_auc,0.72233


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: uvs7iytm with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.77394683816463
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.06441470030405215
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.13031325093311277
[34m[1mwandb[0m: 	n_unlabeled: 10000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7117936192496462
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6463 (+-6.15%), ROC-AUC: 0.7514


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-6.15318
n_labeled,3962.0
n_unlabeled_used,10000.0
val_accuracy,0.6804
val_f1,0.64625
val_precision,0.71884
val_recall,0.58698
val_roc_auc,0.75139


[34m[1mwandb[0m: Agent Starting Run: nulln96q with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.828850782741102
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.3415355005222878
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.19163256788403535
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7075385747365832
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6688 (+-2.88%), ROC-AUC: 0.7450


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.88179
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.69276
val_f1,0.66878
val_precision,0.72093
val_recall,0.62367
val_roc_auc,0.74503


[34m[1mwandb[0m: Agent Starting Run: 6fclyrzd with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8313691231601882
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.2710630705815631
[34m[1mwandb[0m: 	lp_max_iter: 500
[34m[1mwandb[0m: 	ls_alpha: 0.16558796132408188
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.8054801683930426
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6705 (+-2.64%), ROC-AUC: 0.7505


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-2.63782
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.69335
val_f1,0.67046
val_precision,0.72011
val_recall,0.62722
val_roc_auc,0.75046


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: vi66cysg with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9181157215599122
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.472305096308031
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.5029748118949174
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 10
[34m[1mwandb[0m: 	st_threshold: 0.8020088566140348
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6679 (+-3.01%), ROC-AUC: 0.7197


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.01227
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.67863
val_f1,0.66788
val_precision,0.68711
val_recall,0.6497
val_roc_auc,0.71966


[34m[1mwandb[0m: Agent Starting Run: cq6w1jkj with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.7673738720821724
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.2989233178820782
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.393889046194387
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: label_propagation
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7014454999534998
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6785 (+-1.47%), ROC-AUC: 0.7221


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.46908
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.66981
val_f1,0.67851
val_precision,0.65778
val_recall,0.70059
val_roc_auc,0.72214


[34m[1mwandb[0m: Agent Starting Run: px55zn7z with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8524004897710115
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.09175572402532454
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.13478867725644603
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7349859878657353
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6809 (+-1.12%), ROC-AUC: 0.7505


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.12136
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.701
val_f1,0.6809
val_precision,0.72557
val_recall,0.64142
val_roc_auc,0.75046


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 2n9vw38z with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8298716718312295
[34m[1mwandb[0m: 	ct_max_iterations: 20
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.4314198064075579
[34m[1mwandb[0m: 	lp_max_iter: 1000
[34m[1mwandb[0m: 	ls_alpha: 0.4217360324821381
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7919288409818835
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6658 (+-3.31%), ROC-AUC: 0.7459


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.31147
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.68923
val_f1,0.66582
val_precision,0.71565
val_recall,0.62249
val_roc_auc,0.74589


[34m[1mwandb[0m: Agent Starting Run: 2qp2y5dj with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9436009306282142
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.10103817293547722
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.4421726843491523
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.8040287068106626
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6671 (+-3.13%), ROC-AUC: 0.7467


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.12707
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69335
val_f1,0.66709
val_precision,0.725
val_recall,0.61775
val_roc_auc,0.74675


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 96kxcx7n with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.8297232189405546
[34m[1mwandb[0m: 	ct_max_iterations: 10
[34m[1mwandb[0m: 	ct_samples_per_iter: 50
[34m[1mwandb[0m: 	lp_gamma: 0.33338773905159497
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.3932021600061191
[34m[1mwandb[0m: 	n_unlabeled: 15000
[34m[1mwandb[0m: 	ssl_method: co_training
[34m[1mwandb[0m: 	st_max_iter: 15
[34m[1mwandb[0m: 	st_threshold: 0.7581465947762465
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


F1: 0.6616 (+-3.93%), ROC-AUC: 0.7605


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-3.92936
n_labeled,3962.0
n_unlabeled_used,15000.0
val_accuracy,0.68746
val_f1,0.66157
val_precision,0.71685
val_recall,0.6142
val_roc_auc,0.76048


[34m[1mwandb[0m: Agent Starting Run: 6i34pm8d with config:
[34m[1mwandb[0m: 	ct_confidence_threshold: 0.9001896118385664
[34m[1mwandb[0m: 	ct_max_iterations: 15
[34m[1mwandb[0m: 	ct_samples_per_iter: 100
[34m[1mwandb[0m: 	lp_gamma: 0.3904950523624947
[34m[1mwandb[0m: 	lp_max_iter: 1500
[34m[1mwandb[0m: 	ls_alpha: 0.14289985129762145
[34m[1mwandb[0m: 	n_unlabeled: 20000
[34m[1mwandb[0m: 	ssl_method: self_training
[34m[1mwandb[0m: 	st_max_iter: 5
[34m[1mwandb[0m: 	st_threshold: 0.7713261878158286
[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/lokmane/.netrc.


  warn(


F1: 0.6756 (+-1.90%), ROC-AUC: 0.7471


0,1
improvement_over_baseline_pct,‚ñÅ
n_labeled,‚ñÅ
n_unlabeled_used,‚ñÅ
val_accuracy,‚ñÅ
val_f1,‚ñÅ
val_precision,‚ñÅ
val_recall,‚ñÅ
val_roc_auc,‚ñÅ

0,1
improvement_over_baseline_pct,-1.89811
n_labeled,3962.0
n_unlabeled_used,20000.0
val_accuracy,0.69923
val_f1,0.67556
val_precision,0.72877
val_recall,0.62959
val_roc_auc,0.74713


‚úì SSL sweep completed!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22fd1c7ed0, execution_count=18 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22fd120a50, raw_cell="ssl_sweep_id = wandb.sweep(ssl_sweep_config, proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X26sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

In [19]:
import pandas as pd

ssl_sweep = api.sweep(f"{entity}/{PROJECT}/{ssl_sweep_id}")
ssl_runs = list(ssl_sweep.runs)

results = []
for run in ssl_runs:
    if run.state == 'finished':
        results.append({
            'method': run.config.get('ssl_method', 'unknown'),
            'n_unlabeled': run.config.get('n_unlabeled', 0),
            'f1_score': run.summary.get('val_f1', 0),
            'roc_auc': run.summary.get('val_roc_auc', 0),
            'accuracy': run.summary.get('val_accuracy', 0),
            'precision': run.summary.get('val_precision', 0),
            'recall': run.summary.get('val_recall', 0),
            'improvement_pct': run.summary.get('improvement_over_baseline_pct', 0),
            'run_name': run.name,
            'run_id': run.id
        })

results_df = pd.DataFrame(results)

print(f"\n{'='*100}")
print("SSL RESULTS SUMMARY (Top 15 by F1-Score)")
print(f"{'='*100}")
if len(results_df) > 0:
    print(results_df.sort_values('f1_score', ascending=False).head(15).to_string(index=False))
else:
    print("No finished runs found.")

if len(results_df) > 0:
    best_ssl_overall = results_df.sort_values('f1_score', ascending=False).iloc[0]
    print(f"\n{'='*100}")
    print("üèÜ BEST SSL MODEL OVERALL")
    print(f"{'='*100}")
    print(f"Method: {best_ssl_overall['method']}")
    print(f"Unlabeled samples: {best_ssl_overall['n_unlabeled']:,.0f}")
    print(f"F1-Score: {best_ssl_overall['f1_score']:.4f}")
    print(f"ROC-AUC: {best_ssl_overall['roc_auc']:.4f}")
    print(f"Improvement: {best_ssl_overall['improvement_pct']:+.2f}%")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22fca57c10, raw_cell="import pandas as pd

ssl_sweep = api.sweep(f"{enti.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X30sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 


SSL RESULTS SUMMARY (Top 15 by F1-Score)
           method  n_unlabeled  f1_score  roc_auc  accuracy  precision   recall  improvement_pct           run_name   run_id
label_propagation        20000  0.690995 0.702938  0.616245   0.576285 0.862722         0.343987    smooth-sweep-29 qorygt1s
    self_training        15000  0.682927 0.752616  0.701589   0.724138 0.646154        -0.827683     peach-sweep-39 wmvxgd02
label_propagation        20000  0.682547 0.708477  0.615656   0.579208 0.830769        -0.882783     solar-sweep-22 7cit7ash
    self_training         5000  0.682249 0.755865  0.703943   0.731707 0.639053        -0.926131      lucky-sweep-7 m0on6hyr
label_propagation        20000  0.681886 0.708520  0.622719   0.587179 0.813018        -0.978850     quiet-sweep-27 js5ukvad
    self_training        15000  0.680905 0.750459  0.701001   0.725569 0.641420        -1.121356   restful-sweep-46 px55zn7z
      co_training        15000  0.680851 0.759887  0.699823   0.722444 0.643787    

AlreadyJoinedError: 

In [20]:
print(f"\n{'='*100}")
print("ü•á BEST MODEL PER SSL METHOD")
print(f"{'='*100}")

method_best_models = {}

for method in results_df['method'].unique():
    method_df = results_df[results_df['method'] == method].sort_values('f1_score', ascending=False)
    if len(method_df) > 0:
        method_best = method_df.iloc[0]
        method_best_models[method] = method_best
        
        print(f"\n{method.upper().replace('_', ' ')}:")
        print(f"  Unlabeled samples: {method_best['n_unlabeled']:,.0f}")
        print(f"  F1-Score: {method_best['f1_score']:.4f}")
        print(f"  ROC-AUC: {method_best['roc_auc']:.4f}")
        print(f"  Accuracy: {method_best['accuracy']:.4f}")
        print(f"  Improvement: {method_best['improvement_pct']:+.2f}%")
        print(f"  Run: {method_best['run_name']}")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d2302959bd0, raw_cell="print(f"\n{'='*100}")
print("ü•á BEST MODEL PER SSL .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X31sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 


ü•á BEST MODEL PER SSL METHOD

SELF TRAINING:
  Unlabeled samples: 15,000
  F1-Score: 0.6829
  ROC-AUC: 0.7526
  Accuracy: 0.7016
  Improvement: -0.83%
  Run: peach-sweep-39

LABEL PROPAGATION:
  Unlabeled samples: 20,000
  F1-Score: 0.6910
  ROC-AUC: 0.7029
  Accuracy: 0.6162
  Improvement: +0.34%
  Run: smooth-sweep-29

CO TRAINING:
  Unlabeled samples: 15,000
  F1-Score: 0.6809
  ROC-AUC: 0.7599
  Accuracy: 0.6998
  Improvement: -1.13%
  Run: soft-sweep-28
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d230294cc90, execution_count=20 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d2302959bd0, raw_cell="print(f"\n{'='*100}")
print("ü•á BEST MODEL PER SSL .." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.

AlreadyJoinedError: 

In [21]:
if len(results_df) > 0:
    run = wandb.init(project=PROJECT, job_type="register-best-ssl-model")
    
    best_ssl = results_df.sort_values('f1_score', ascending=False).iloc[0]
    best_ssl_run_obj = api.run(f"{entity}/{PROJECT}/{best_ssl['run_id']}")
    
    print(f"\n{'='*100}")
    print(f"Registering BEST SSL Model: {best_ssl['method']}")
    print(f"{'='*100}")
    
    # Retrain with best config
    n_unlabeled = int(best_ssl['n_unlabeled'])
    X_unlabeled = X_unlabeled_full[:n_unlabeled]
    
    method = best_ssl['method']
    
    if method == 'label_propagation':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        model = LabelPropagation(
            kernel='rbf',
            gamma=best_ssl_run_obj.config.get('lp_gamma', 0.1),
            max_iter=best_ssl_run_obj.config.get('lp_max_iter', 1000),
            n_jobs=-1
        )
        model.fit(X_combined, y_combined)
    
    elif method == 'label_spreading':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        model = LabelSpreading(
            kernel='rbf',
            gamma=best_ssl_run_obj.config.get('lp_gamma', 0.1),
            alpha=best_ssl_run_obj.config.get('ls_alpha', 0.2),
            max_iter=best_ssl_run_obj.config.get('lp_max_iter', 1000),
            n_jobs=-1
        )
        model.fit(X_combined, y_combined)
    
    elif method == 'self_training':
        X_combined = np.vstack([X_train.values, X_unlabeled])
        y_combined = np.concatenate([y_train.values, np.full(n_unlabeled, -1)])
        
        base_clf = RandomForestClassifier(
            n_estimators=best_baseline_config['n_estimators'],
            max_depth=best_baseline_config['max_depth'],
            min_samples_split=best_baseline_config['min_samples_split'],
            min_samples_leaf=best_baseline_config['min_samples_leaf'],
            max_features=best_baseline_config['max_features'],
            random_state=42,
            n_jobs=-1,
            class_weight='balanced'
        )
        model = SelfTrainingClassifier(
            base_estimator=base_clf,
            threshold=best_ssl_run_obj.config.get('st_threshold', 0.75),
            max_iter=best_ssl_run_obj.config.get('st_max_iter', 10)
        )
        model.fit(X_combined, y_combined)
    
    elif method == 'co_training':
        # Save co-training as ensemble
        n_features = X_train.shape[1]
        mid = n_features // 2
        
        clf1 = RandomForestClassifier(**best_baseline_config, random_state=42, n_jobs=-1, class_weight='balanced')
        clf2 = RandomForestClassifier(**best_baseline_config, random_state=43, n_jobs=-1, class_weight='balanced')
        
        clf1.fit(X_train.values[:, :mid], y_train.values)
        clf2.fit(X_train.values[:, mid:], y_train.values)
        
        model = {'clf1': clf1, 'clf2': clf2, 'mid': mid, 'type': 'co_training'}
    
    # Save model
    model_filename = 'models/best_ssl_model.pkl'
    joblib.dump(model, model_filename)
    joblib.dump(scaler, 'models/scaler.pkl')
    
    # Create artifact
    artifact = wandb.Artifact(
        name='tox21-best-ssl-model',
        type='model',
        description=f'Best SSL model ({method}) for Tox21 toxicity prediction',
        metadata={
            'ssl_method': method,
            'n_unlabeled': n_unlabeled,
            'n_labeled': len(X_train),
            'f1_score': float(best_ssl['f1_score']),
            'roc_auc': float(best_ssl['roc_auc']),
            'accuracy': float(best_ssl['accuracy']),
            'improvement_pct': float(best_ssl['improvement_pct'])
        }
    )
    
    artifact.add_file(model_filename)
    artifact.add_file('models/scaler.pkl')
    run.log_artifact(artifact)
    run.finish()
    
    print(f"‚úì Best SSL model ({method}) registered!")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d230ab05b50, raw_cell="if len(results_df) > 0:
    run = wandb.init(proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X32sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 


Registering BEST SSL Model: label_propagation


‚úì Best SSL model (label_propagation) registered!
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d230abdae50, execution_count=21 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d230ab05b50, raw_cell="if len(results_df) > 0:
    run = wandb.init(proje.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X32sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

## Phase 5: Production Monitoring & Maintenance

Simulate a **production environment** where the model serves toxicity predictions in real-time. We track key performance indicators to detect model degradation.

### Monitored Metrics:
- **Prediction Latency**: Response time per request (target < 40ms)
- **Prediction Confidence**: Model certainty in predictions
- **Data Drift**: Distribution shift in input molecules
- **Endpoint Predictions**: Toxicity scores for nuclear receptor/stress response assays

### Monitoring Strategy:
- Log all metrics to W&B in real-time
- Set up alerts for threshold violations
- Track 100 simulated prediction requests
- Identify performance bottlenecks

This phase ensures the model remains healthy and performant in production.

In [22]:
import time
import random
from datetime import datetime

wandb.init(
    project=PROJECT,
    job_type="monitor-production",
    name=f"production-monitoring-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
    tags=["production", "monitoring", "tox21"]
)

NUM_REQUESTS = 100
ALERT_THRESHOLD_MS = 40
CONFIDENCE_THRESHOLD = 0.5

for i in range(NUM_REQUESTS):
    prediction_time_ms = random.uniform(10, 50)
    prediction_confidence = random.uniform(0.7, 0.99)
    
    predictions = {
        "NR-AR": random.uniform(0, 1),
        "NR-ER": random.uniform(0, 1),
        "SR-ARE": random.uniform(0, 1),
        "SR-p53": random.uniform(0, 1),
    }
    
    avg_prediction = np.mean(list(predictions.values()))
    
    data_drift_score = random.uniform(0, 0.3)
    
    wandb.log({
        "prediction_time_ms": prediction_time_ms,
        "prediction_confidence": prediction_confidence,
        "avg_prediction_score": avg_prediction,
        
        "data_drift_score": data_drift_score,
        
        **{f"pred_{endpoint}": score for endpoint, score in predictions.items()},
        
        "request_id": i,
        "timestamp": time.time()
    })
    
    if prediction_time_ms > ALERT_THRESHOLD_MS:
        print(f"Request {i}: High latency detected ({prediction_time_ms:.2f}ms)")
    
    if prediction_confidence < CONFIDENCE_THRESHOLD:
        print(f"Request {i}: Low confidence ({prediction_confidence:.2f})")
    
    time.sleep(0.5)
    
wandb.finish()

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22fcad32d0, raw_cell="import time
import random
from datetime import dat.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X34sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

Request 4: High latency detected (46.10ms)
Request 10: High latency detected (45.60ms)
Request 15: High latency detected (44.95ms)
Request 16: High latency detected (45.35ms)
Request 22: High latency detected (42.63ms)
Request 29: High latency detected (41.35ms)
Request 40: High latency detected (40.49ms)
Request 44: High latency detected (45.56ms)
Request 46: High latency detected (40.84ms)
Request 48: High latency detected (48.89ms)
Request 58: High latency detected (40.25ms)
Request 59: High latency detected (41.97ms)
Request 60: High latency detected (44.02ms)
Request 62: High latency detected (42.55ms)
Request 69: High latency detected (41.33ms)
Request 72: High latency detected (43.87ms)
Request 74: High latency detected (49.54ms)
Request 75: High latency detected (41.81ms)
Request 76: High latency detected (42.37ms)
Request 79: High latency detected (46.87ms)
Request 82: High latency detected (43.22ms)
Request 93: High latency detected (49.50ms)
Request 94: High latency detected

0,1
avg_prediction_score,‚ñÇ‚ñÑ‚ñà‚ñÖ‚ñá‚ñÖ‚ñá‚ñá‚ñÉ‚ñÉ‚ñÅ‚ñÖ‚ñá‚ñÜ‚ñÜ‚ñá‚ñÇ‚ñÜ‚ñÑ‚ñá‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñá‚ñÖ‚ñÜ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÜ‚ñÖ‚ñÑ‚ñÖ‚ñÑ‚ñÉ‚ñÖ
data_drift_score,‚ñÖ‚ñá‚ñÉ‚ñÜ‚ñÇ‚ñÜ‚ñÑ‚ñÖ‚ñà‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñà‚ñÉ‚ñÅ‚ñÑ‚ñÜ‚ñÜ‚ñÜ‚ñÅ‚ñá‚ñà‚ñÑ‚ñÉ‚ñÑ‚ñÑ‚ñÜ‚ñÑ‚ñÑ‚ñÉ‚ñá‚ñÇ‚ñÑ‚ñà‚ñà‚ñà‚ñÜ‚ñÉ
pred_NR-AR,‚ñÅ‚ñÜ‚ñÉ‚ñá‚ñÑ‚ñá‚ñÇ‚ñá‚ñÅ‚ñÇ‚ñÅ‚ñÉ‚ñá‚ñÖ‚ñá‚ñÉ‚ñà‚ñÇ‚ñÜ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÖ‚ñÜ‚ñÖ‚ñÜ‚ñÖ‚ñÅ‚ñÜ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñà‚ñÑ‚ñÜ‚ñÇ‚ñÑ‚ñÖ
pred_NR-ER,‚ñÅ‚ñÑ‚ñá‚ñÜ‚ñÖ‚ñÇ‚ñà‚ñÑ‚ñÇ‚ñÑ‚ñÜ‚ñÖ‚ñÅ‚ñÖ‚ñÜ‚ñÉ‚ñÉ‚ñÖ‚ñÜ‚ñÇ‚ñà‚ñÉ‚ñÇ‚ñÑ‚ñÖ‚ñÖ‚ñÅ‚ñà‚ñÑ‚ñÖ‚ñÉ‚ñÅ‚ñÜ‚ñÑ‚ñÑ‚ñÜ‚ñÉ‚ñÉ‚ñÇ‚ñÖ
pred_SR-ARE,‚ñà‚ñá‚ñÅ‚ñÖ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÖ‚ñá‚ñÅ‚ñÜ‚ñÖ‚ñÜ‚ñÜ‚ñÇ‚ñÅ‚ñÜ‚ñÇ‚ñÉ‚ñÜ‚ñÅ‚ñÑ‚ñÜ‚ñÉ‚ñÜ‚ñÉ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÜ‚ñà‚ñÜ‚ñÜ‚ñÇ‚ñÑ‚ñÇ‚ñÖ‚ñÜ
pred_SR-p53,‚ñà‚ñÖ‚ñÅ‚ñà‚ñÉ‚ñà‚ñÜ‚ñá‚ñÜ‚ñá‚ñà‚ñÇ‚ñà‚ñÉ‚ñÉ‚ñÜ‚ñÉ‚ñÑ‚ñÅ‚ñÑ‚ñá‚ñÜ‚ñÉ‚ñÇ‚ñà‚ñÜ‚ñÜ‚ñÅ‚ñá‚ñÅ‚ñÑ‚ñÉ‚ñÑ‚ñÅ‚ñÇ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ
prediction_confidence,‚ñÜ‚ñÑ‚ñÇ‚ñÖ‚ñÇ‚ñÉ‚ñá‚ñÑ‚ñÅ‚ñá‚ñÜ‚ñÖ‚ñÅ‚ñà‚ñÜ‚ñÖ‚ñÑ‚ñÑ‚ñà‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÑ‚ñÉ‚ñÉ‚ñá‚ñÖ‚ñÇ‚ñÅ‚ñÉ‚ñÅ‚ñá‚ñÜ‚ñÉ‚ñá‚ñÜ‚ñÉ‚ñÅ‚ñÖ
prediction_time_ms,‚ñÑ‚ñÜ‚ñÑ‚ñá‚ñá‚ñÜ‚ñÇ‚ñÑ‚ñÜ‚ñÑ‚ñá‚ñÅ‚ñÜ‚ñÇ‚ñÖ‚ñÇ‚ñÜ‚ñá‚ñÅ‚ñÑ‚ñá‚ñà‚ñÉ‚ñÉ‚ñÖ‚ñÑ‚ñà‚ñà‚ñÑ‚ñÅ‚ñÉ‚ñà‚ñÜ‚ñÇ‚ñÑ‚ñÇ‚ñÅ‚ñÜ‚ñÖ‚ñÇ
request_id,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
timestamp,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ

0,1
avg_prediction_score,0.61047
data_drift_score,0.07386
pred_NR-AR,0.65759
pred_NR-ER,0.53328
pred_SR-ARE,0.67009
pred_SR-p53,0.5809
prediction_confidence,0.88109
prediction_time_ms,17.3988
request_id,99.0
timestamp,1769683484.78213


Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22ebab5590, execution_count=22 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7d22fcad32d0, raw_cell="import time
import random
from datetime import dat.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X34sZmlsZQ%3D%3D> result=None>,),kwargs {}:


AlreadyJoinedError: 

## Phase 6: Automated Retraining & Closing the Loop

Implement an **automated MLOps pipeline** that continuously monitors model health and triggers retraining when performance degrades.

### Core Functionality:
1. **Health Monitoring**: Query W&B API for recent production metrics
2. **Degradation Detection**: Check against confidence and drift thresholds
3. **Automatic Triggering**: Launch retraining when thresholds are exceeded
4. **Team Alerts**: Send notifications via W&B alerts (Slack/Email integration)
5. **Full Traceability**: Log all retraining events with context

### Why This Matters:
- **Self-Healing System**: No manual intervention required
- **Continuous Improvement**: Models adapt to new data patterns
- **Proactive vs Reactive**: Catch issues before they impact users
- **Production-Ready**: Can be deployed with cron, Airflow, Lambda, or GitHub Actions

This completes the MLOps loop, ensuring your Tox21 toxicity prediction model remains accurate and reliable over time.

In [23]:
import wandb
import numpy as np
from datetime import datetime, timedelta


print("="*80)
print("AUTOMATED RETRAINING PIPELINE")
print("="*80)

# --- Configuration ---
PERFORMANCE_THRESHOLD_F1 = 0.75        
CONFIDENCE_THRESHOLD = 0.70             
DRIFT_THRESHOLD = 0.25                  
MONITORING_LOOKBACK_DAYS = 7            
AUTO_RETRAIN_ENABLED = True             


print("\nConnecting to Weights & Biases API...")
api = wandb.Api()
entity = api.default_entity

print(f"Connected to entity: {entity}")
print(f"Project: {PROJECT}")

# --- Step 2: Fetch Latest Monitoring Data ---
print(f"\nFetching monitoring runs (last {MONITORING_LOOKBACK_DAYS} days)...")
try:
    # Get all project runs
    all_project_runs = api.runs(f"{entity}/{PROJECT}")
    
    # Filter for monitoring runs
    monitor_runs = [run for run in all_project_runs if run.job_type == "monitor-production"]
    
    if not monitor_runs:
        print("No monitoring runs found. Run the production monitoring script first.")
        monitor_runs = []
    else:
        print(f"‚úì Found {len(monitor_runs)} monitoring runs")
        
except Exception as e:
    print(f"Error fetching runs from W&B API: {e}")
    monitor_runs = []

# --- Step 3: Analyze Performance Metrics ---
if monitor_runs:
    print("\nAnalyzing model performance...")
    
    # Get the latest monitoring run
    latest_monitor_run = max(monitor_runs, key=lambda r: r.created_at)
    
    print(f"   Latest run: {latest_monitor_run.name}")
    print(f"   Created at: {latest_monitor_run.created_at}")
    
    # Fetch metrics from the run history
    try:
        history = latest_monitor_run.history()
        
        if not history.empty:
            # Calculate aggregate metrics
            avg_confidence = history['prediction_confidence'].mean() if 'prediction_confidence' in history.columns else 0
            avg_drift = history['data_drift_score'].mean() if 'data_drift_score' in history.columns else 0
            avg_latency = history['prediction_time_ms'].mean() if 'prediction_time_ms' in history.columns else 0
            num_predictions = len(history)
            
            print(f"\nPerformance Metrics:")
            print(f"   Average Confidence: {avg_confidence:.4f}")
            print(f"   Average Drift Score: {avg_drift:.4f}")
            print(f"   Average Latency: {avg_latency:.2f}ms")
            print(f"   Total Predictions: {num_predictions}")
            
            # --- Step 4: Evaluate if Retraining is Needed ---
            print(f"\nEvaluating against thresholds...")
            print(f"   Confidence threshold: {CONFIDENCE_THRESHOLD}")
            print(f"   Drift threshold: {DRIFT_THRESHOLD}")
            
            needs_retraining = False
            retraining_reasons = []
            
            # Check confidence threshold
            if avg_confidence < CONFIDENCE_THRESHOLD:
                needs_retraining = True
                retraining_reasons.append(f"Low confidence: {avg_confidence:.4f} < {CONFIDENCE_THRESHOLD}")
                print(f"Low confidence detected")
            else:
                print(f"Confidence is healthy")
            
            # Check drift threshold
            if avg_drift > DRIFT_THRESHOLD:
                needs_retraining = True
                retraining_reasons.append(f"High drift: {avg_drift:.4f} > {DRIFT_THRESHOLD}")
                print(f"High data drift detected")
            else:
                print(f"Drift is within acceptable range")
            
            # --- Step 5: Trigger Retraining if Needed ---
            if needs_retraining and AUTO_RETRAIN_ENABLED:
                print(f"\nPERFORMANCE DEGRADATION DETECTED!")
                print(f"   Reasons:")
                for reason in retraining_reasons:
                    print(f"      ‚Ä¢ {reason}")
                
                print(f"\nTRIGGERING AUTOMATED RETRAINING...")
                
                # Create a retraining trigger run
                alert_run = wandb.init(
                    project=PROJECT,
                    job_type="automated-retraining-trigger",
                    name=f"retrain-trigger-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
                    tags=["retraining", "automated", "triggered"],
                    config={
                        'trigger_reason': ', '.join(retraining_reasons),
                        'avg_confidence': avg_confidence,
                        'avg_drift': avg_drift,
                        'avg_latency': avg_latency,
                        'threshold_confidence': CONFIDENCE_THRESHOLD,
                        'threshold_drift': DRIFT_THRESHOLD,
                        'monitoring_run': latest_monitor_run.name
                    }
                )
                
                # Log the trigger event
                wandb.log({
                    'retraining_triggered': 1,
                    'avg_confidence': avg_confidence,
                    'avg_drift': avg_drift,
                    'timestamp': datetime.now().timestamp()
                })
                
                # Send alert to team
                wandb.alert(
                    title="Automated Retraining Triggered",
                    text=f"""Model performance has degraded and requires retraining.
                    
**Performance Issues:**
{chr(10).join(['‚Ä¢ ' + r for r in retraining_reasons])}

**Metrics:**
‚Ä¢ Average Confidence: {avg_confidence:.4f}
‚Ä¢ Average Drift: {avg_drift:.4f}
‚Ä¢ Average Latency: {avg_latency:.2f}ms

**Action:** A new hyperparameter sweep should be launched to find a better model.

**Monitoring Run:** {latest_monitor_run.name}
**Trigger Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
""",
                    level=wandb.AlertLevel.WARN
                )
                
                print(f"\nRetraining trigger logged to W&B")
                print(f"   Run URL: {alert_run.url}")
                print(f"   Alert sent to team")
                
                # --- Option to Launch New Sweep Automatically ---
                print(f"\nNext Steps:")
                print(f"   1. A new hyperparameter sweep should be launched")
                print(f"   2. You can automate this by uncommenting the code below")
                
                # UNCOMMENT TO ENABLE AUTOMATIC SWEEP LAUNCH
                """
                print(f"\nLaunching new SSL sweep...")
                
                # Use the SSL sweep config from earlier
                new_sweep_id = wandb.sweep(ssl_sweep_config, project=PROJECT)
                print(f"   ‚úì New sweep created: {new_sweep_id}")
                
                # Optionally run some agents automatically
                # wandb.agent(new_sweep_id, train_ssl, count=10)
                """
                
                alert_run.finish()
                
            elif needs_retraining and not AUTO_RETRAIN_ENABLED:
                print(f"\nRETRAINING RECOMMENDED BUT AUTO-RETRAIN IS DISABLED")
                print(f"   Reasons:")
                for reason in retraining_reasons:
                    print(f"      ‚Ä¢ {reason}")
                print(f"\n   Enable automatic retraining by setting AUTO_RETRAIN_ENABLED = True")
                
            else:
                print(f"\nMODEL PERFORMANCE IS HEALTHY")
                print(f"   No retraining needed at this time")
                
        else:
            print("No metrics found in monitoring run history")
            
    except Exception as e:
        print(f"Error analyzing metrics: {e}")
        
else:
    print("\nSkipping: No monitoring data available")
    print("   Run the production monitoring script first (Phase 5)")

print("Automated retraining pipeline completed")

Error in callback <bound method _WandbInit._pre_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7d22f8ee6990, raw_cell="import wandb
import numpy as np
from datetime impo.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/home/lokmane/Desktop/mini_projet/notebooks/tox21/tox21_mlops_life_cycle.ipynb#X36sZmlsZQ%3D%3D>,),kwargs {}:


AlreadyJoinedError: 

AUTOMATED RETRAINING PIPELINE

Connecting to Weights & Biases API...
Connected to entity: l-benhammadi-esi
Project: QSAR_MLOPS_TOX21

Fetching monitoring runs (last 7 days)...
‚úì Found 1 monitoring runs

Analyzing model performance...
   Latest run: production-monitoring-20260129-114353
   Created at: 2026-01-29T10:37:38Z

Performance Metrics:
   Average Confidence: 0.8467
   Average Drift Score: 0.1558
   Average Latency: 29.37ms
   Total Predictions: 100

Evaluating against thresholds...
   Confidence threshold: 0.7
   Drift threshold: 0.25
Confidence is healthy
Drift is within acceptable range

MODEL PERFORMANCE IS HEALTHY
   No retraining needed at this time
Automated retraining pipeline completed
Error in callback <bound method _WandbInit._post_run_cell_hook of <wandb.sdk.wandb_init._WandbInit object at 0x7d22ec3bc0d0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7d22ff6f72d0, execution_count=23 error_before_exec=None error_in_exec=None info=<ExecutionInf

AlreadyJoinedError: 