# 🧪 Student Lab: Hyperparameter Experiments (Manual vs Tuner)

**Goal:** Reproduce a recall-first HPO workflow where **you choose the model and hyperparameters**. The notebook saves artifacts and can run on SageMaker or locally.

> You’ll complete the **TODO** slots marked with `# <- TODO ✏️` to plug your model and search spaces.


## ✅ What You’ll Do
1. Capture environment for reproducibility  
2. Configure data location, instance types, training scripts, and **metrics regex**  
3. **Choose your model** inside your training script (e.g., CatBoost, XGBoost, LightGBM, PyTorch, etc.)  
4. Run **Manual Search** (many single jobs) or **Tuner HPO**  
5. Collect & compare metrics (recall-first mindset)  
6. Export results CSV and best params


## 🧰 Prerequisites
- AWS creds with SageMaker + S3 access
- Packages: `sagemaker`, `boto3`, `pandas`, `numpy`, `pyyaml`, optionally `mlflow`
- A `config.yaml` with your dataset pointer
- A training script that prints metrics with regex-friendly lines (you choose the **model inside the script**)

## ♻️ Reproducibility & Run Folder

In [None]:
import os, sys, json, hashlib, platform, random
from datetime import datetime
import numpy as np
import pandas as pd

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

RUN_TS = datetime.utcnow().strftime('%Y%m%dT%H%M%SZ')
RUN_ID = hashlib.sha1(f"{RUN_TS}-{SEED}".encode()).hexdigest()[:10]
ARTIFACT_DIR = os.environ.get('ARTIFACT_DIR', f"artifacts/hpo_lab_{RUN_TS}_{RUN_ID}")
os.makedirs(ARTIFACT_DIR, exist_ok=True)

env_info = {
    'python': sys.version,
    'platform': platform.platform(),
    'timestamp_utc': RUN_TS,
    'seed': SEED,
    'packages': { 'pandas': pd.__version__, 'numpy': np.__version__ }
}
with open(os.path.join(ARTIFACT_DIR, 'env_info.json'), 'w') as f:
    json.dump(env_info, f, indent=2)
env_info

## ⚙️ CONFIG — ✏️ Edit this cell

In [None]:
from pathlib import Path
import yaml

CONFIG = {
    'aws': {
        'instance_type_manual': os.getenv('INSTANCE_TYPE_MANUAL', 'ml.m5.large'),   # <- TODO ✏️ choose
        'instance_type_tuner': os.getenv('INSTANCE_TYPE_TUNER', 'ml.m5.xlarge'),   # <- TODO ✏️ choose
        'use_spot': True,
        'max_run_manual_sec': 2*60*60,
        'max_wait_manual_sec': 4*60*60,
        'max_run_tuner_sec': 60*60,
        'max_wait_tuner_sec': 2*60*60,
    },
    'data': {
        'parquet_uri': os.getenv('PARQUET_URI', 's3://your-bucket/path/*.parquet'),  # <- TODO ✏️ your S3 path
    },
    'training': {
        'entry_point_manual': 'train_sagemaker_large.py',  # <- TODO ✏️ your script (you choose model *inside* it)
        'entry_point_tuner': 'train_sagemaker.py',         # <- TODO ✏️ your script (can be the same as manual)
        'source_dir': 'training',                          # <- TODO ✏️ folder with training code
        'requirements': 'requirements.txt',                # <- TODO ✏️ requirements for remote training
        'config_yaml': 'config.yaml',                      # <- TODO ✏️ config file passed to script
        'mlflow_mode': os.getenv('MLFLOW_MODE', 'disabled'),
        'recall_target': float(os.getenv('RECALL_TARGET', '0.80')),  # <- TODO ✏️ if recall differs
        'chunk_size': int(os.getenv('CHUNK_SIZE', '50000')),
        'sample_ratio': float(os.getenv('SAMPLE_RATIO', '1.0')),
    },
    'regex_metrics': [  # <- TODO ✏️ align with your script's printouts
        { 'Name': 'roc_auc', 'Regex': r'Final ROC-AUC: ([0-9]+\.[0-9]+)' },
        { 'Name': 'f1_score', 'Regex': r'Final F1 @ target recall: ([0-9]+\.[0-9]+)' },
        { 'Name': 'churner_recall', 'Regex': r'Churner recall:\s+([0-9]+\.[0-9]+)' },
        { 'Name': 'churner_precision', 'Regex': r'Churner precision:\s+([0-9]+\.[0-9]+)' },
    ],
    'manual_search': {
        'strategy': 'focused',   # 'focused' | 'broad'  # <- TODO ✏️
        'max_parallel_jobs': 3,  # <- TODO ✏️
        'max_combinations': None,
        # Define YOUR hyperparameter grid (keys must match your script’s argparse/Hyperparameters)
        'grid_focused': {        # <- TODO ✏️ your focused grid
            'n_estimators': [2200, 2500, 3000],
            'learning_rate': [0.08],
            'depth': [6],
            'l2_leaf_reg': [5]
        },
        'grid_broad': {          # <- TODO ✏️ your broad grid
            'n_estimators': [1500, 2000, 2500, 3000, 3500],
            'learning_rate': [0.05, 0.07, 0.09, 0.11, 0.13],
            'depth': [4, 5, 6, 7, 8],
            'l2_leaf_reg': [2.0, 4.0, 6.0, 8.0, 10.0]
        }
    },
    'tuner': {
        'max_jobs': 8,                # <- TODO ✏️
        'max_parallel_jobs': 2,       # <- TODO ✏️
        'objective_metric': 'roc_auc',# <- TODO ✏️ consider stability
        'early_stopping_type': 'Auto',
        # Ranges MUST match the hyperparameter names your script accepts
        'ranges': {                   # <- TODO ✏️ your ranges
            'n-estimators': ['IntegerParameter', 500, 3000],
            'learning-rate': ['ContinuousParameter', 0.01, 0.2],
            'depth': ['IntegerParameter', 4, 10],
            'l2-leaf-reg': ['ContinuousParameter', 0.5, 10.0]
        }
    },
    'output': {
        'artifact_dir': ARTIFACT_DIR,
        'results_csv': str(Path(ARTIFACT_DIR)/'manual_search_results.csv'),
    }
}
CONFIG

## 🧭 Approach A — Manual Hyperparameter Search (you choose hyperparams)
The **keys** in the grid must match your training script hyperparameter names.

In [None]:
import boto3, sagemaker, itertools, time, os
from sagemaker.pytorch import PyTorch  # <- TODO ✏️ If you don't use PyTorch Estimator, import the right one
from datetime import datetime
from typing import List, Dict, Tuple

class ManualHyperparameterSearch:
    def __init__(self, config: dict):
        self.cfg = config
        self.sess = sagemaker.Session()
        try:
            self.role = sagemaker.get_execution_role()
        except Exception:
            self.role = os.getenv('SAGEMAKER_ROLE_ARN', 'YOUR-ROLE-ARN')  # <- TODO ✏️ set for local
        self.region = boto3.Session().region_name
        self.training_input = sagemaker.inputs.TrainingInput(
            s3_data=self.cfg['data']['parquet_uri'], content_type='application/x-parquet')
        print('🔬 Manual HPO ready | role:', self.role, '| region:', self.region)

    def _grid(self, strategy='focused'):
        grid = self.cfg['manual_search']['grid_focused'] if strategy=='focused' else self.cfg['manual_search']['grid_broad']
        names = list(grid.keys()); vals = list(grid.values())
        return [dict(zip(names, c)) for c in itertools.product(*vals)]

    def define_hyperparameter_grid(self, strategy='focused') -> List[Dict]:
        combos = self._grid(strategy)
        print(f"📊 {len(combos)} combinations | strategy={strategy}")
        print('Examples:', combos[:3])
        return combos

    def _estimator(self, params: Dict, job_index: int):
        ts = datetime.now().strftime('%Y%m%d-%H%M%S')
        job_name = f"student-manual-hpo-{job_index:02d}-{ts}"
        # NOTE: You can swap PyTorch for another Estimator if your script/framework differs
        est = PyTorch(
            entry_point=self.cfg['training']['entry_point_manual'],            # <- TODO ✏️ your script
            source_dir=self.cfg['training']['source_dir'],                     # <- TODO ✏️ your folder
            role=self.role,
            instance_type=self.cfg['aws']['instance_type_manual'],
            instance_count=1,
            framework_version='2.0.0',    # <- TODO ✏️ adapt if not PyTorch
            py_version='py310',
            hyperparameters={
                'mlflow-mode': self.cfg['training']['mlflow_mode'],
                'sample-ratio': str(self.cfg['training']['sample_ratio']),
                'chunk-size': str(self.cfg['training']['chunk_size']),
                'config': self.cfg['training']['config_yaml'],
                # Map grid keys to your script arg names (edit if your names differ)
                'n-estimators': str(params.get('n_estimators', '')),          # <- TODO ✏️ map
                'learning-rate': str(params.get('learning_rate', '')),        # <- TODO ✏️ map
                'depth': str(params.get('depth', '')),                        # <- TODO ✏️ map
                'l2-leaf-reg': str(params.get('l2_leaf_reg', ''))             # <- TODO ✏️ map
            },
            max_run=self.cfg['aws']['max_run_manual_sec'],
            use_spot_instances=self.cfg['aws']['use_spot'],
            max_wait=self.cfg['aws']['max_wait_manual_sec'],
            base_job_name=f"student-manual-hpo-{job_index:02d}",
            dependencies=[self.cfg['training']['config_yaml'], self.cfg['training']['requirements']],
            metric_definitions=self.cfg['regex_metrics']                       # <- TODO ✏️ align regex with script prints
        )
        return est, job_name

    def launch_search(self, param_combinations: List[Dict], max_parallel_jobs: int = 3):
        running, completed, failed = [], [], []
        for i, params in enumerate(param_combinations):
            while len(running) >= max_parallel_jobs:
                time.sleep(60)
                running, completed, failed = self._check_status(running, completed, failed)
            try:
                est, job_name = self._estimator(params, i+1)
                est.fit({'training': self.training_input}, wait=False)
                info = { 'job_index': i+1, 'job_name': est.latest_training_job.name, 'estimator': est, 'parameters': params, 'status': 'InProgress', 'start_time': datetime.now() }
                running.append(info)
                print('✅ Launched', info['job_name'], params)
                time.sleep(10)
            except Exception as e:
                print('❌ Launch failed:', e)
                failed.append({'job_index': i+1, 'parameters': params, 'error': str(e)})
        print('⏳ Waiting for all jobs ...')
        while running:
            time.sleep(120)
            running, completed, failed = self._check_status(running, completed, failed)
        print('🎯 Done | completed:', len(completed), '| failed:', len(failed))
        return completed, failed

    def _check_status(self, running, completed, failed):
        still = []
        for j in running:
            try:
                st = j['estimator'].latest_training_job.describe()['TrainingJobStatus']
                if st == 'Completed':
                    j['status'] = 'Completed'; j['end_time'] = datetime.now(); completed.append(j)
                    print('✅ Completed:', j['job_name'])
                elif st in ['Failed','Stopped']:
                    j['status'] = st; j['end_time'] = datetime.now(); failed.append(j)
                    print(f"❌ {st}:", j['job_name'])
                else:
                    still.append(j)
            except Exception as e:
                print('⚠️  Status error:', j.get('job_name'), e)
                still.append(j)
        if len(still) != len(running):
            print(f"📊 Update | completed: {len(completed)} | failed: {len(failed)} | running: {len(still)}")
        return still, completed, failed

    def collect_results(self, completed_jobs: List[Dict]):
        import pandas as pd
        rows = []
        for j in completed_jobs:
            try:
                desc = j['estimator'].latest_training_job.describe()
                m = {r['MetricName']: r['Value'] for r in desc.get('FinalMetricDataList', [])}
                rows.append({ 'job_name': j['job_name'], 'job_index': j['job_index'], 'training_time_minutes': (j['end_time']-j['start_time']).total_seconds()/60, **j['parameters'], **m })
            except Exception as e:
                print('⚠️  Collect error:', e)
        import pandas as pd
        df = pd.DataFrame(rows)
        if not df.empty:
            if 'roc_auc' in df.columns:
                df = df.sort_values('roc_auc', ascending=False)
            Path(self.cfg['output']['results_csv']).parent.mkdir(parents=True, exist_ok=True)
            df.to_csv(self.cfg['output']['results_csv'], index=False)
            print('💾 Saved:', self.cfg['output']['results_csv'])
        else:
            print('⚠️  No metrics collected.')
        return df

    def analyze_results(self, df):
        if df.empty:
            print('❌ No results to analyze'); return df
        keep = [c for c in ['job_index','roc_auc','f1_score','churner_recall','churner_precision','training_time_minutes'] + list(CONFIG['manual_search']['grid_focused'].keys()) if c in df.columns]
        print('\n🏅 Top results:')
        print(df[keep].head(10).to_string(index=False))
        return df

### ▶️ Run Manual Search
Uncomment and execute. **Costs real money** on AWS.

In [None]:
# search = ManualHyperparameterSearch(CONFIG)
# combos = search.define_hyperparameter_grid(strategy=CONFIG['manual_search']['strategy'])
# if CONFIG['manual_search']['max_combinations']: combos = combos[:CONFIG['manual_search']['max_combinations']]
# completed, failed = search.launch_search(combos, max_parallel_jobs=CONFIG['manual_search']['max_parallel_jobs'])
# results_df = search.collect_results(completed)
# search.analyze_results(results_df)

## 🤖 Approach B — SageMaker Hyperparameter Tuning Job (HPO)
Define **ranges** matching your script’s hyperparameter names. The tuner will maximize your chosen objective.

In [None]:
import boto3, sagemaker, os
from sagemaker.pytorch import PyTorch  # <- TODO ✏️ swap Estimator class if not using PyTorch
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, ContinuousParameter

def create_hyperparameter_tuning_job(config: dict):
    sess = sagemaker.Session()
    try:
        role = sagemaker.get_execution_role()
    except Exception:
        role = os.getenv('SAGEMAKER_ROLE_ARN', 'YOUR-ROLE-ARN')  # <- TODO ✏️ for local
    region = boto3.Session().region_name
    print('SageMaker role:', role); print('Region:', region)

    # Optional: emit requirements for remote installs used by your training script
    with open('requirements.txt','w') as f:
        f.write('pandas>=1.5.0\nnumpy>=1.24.0\nscikit-learn>=1.2.0\ncatboost>=1.2.0\nmlflow>=2.8.0\ns3fs>=2023.1.0\npyarrow>=10.0.0\nsqlalchemy>=2.0.0\nredshift-connector>=2.0.0\nsagemaker>=2.190.0\nboto3>=1.26.0\nawswrangler>=3.0.0\npyyaml>=6.0\n')

    est = PyTorch(
        entry_point=config['training']['entry_point_tuner'],     # <- TODO ✏️ script where YOU choose the model
        source_dir=config['training']['source_dir'],
        role=role,
        instance_type=config['aws']['instance_type_tuner'],
        instance_count=1,
        framework_version='2.0.0',  # <- TODO ✏️ adapt if not PyTorch
        py_version='py310',
        hyperparameters={'mlflow-mode': 'disabled', 'config': config['training']['config_yaml']},
        max_run=config['aws']['max_run_tuner_sec'],
        use_spot_instances=True,
        max_wait=config['aws']['max_wait_tuner_sec'],
        dependencies=[config['training']['config_yaml'], 'requirements.txt'],
        metric_definitions=config['regex_metrics']               # <- TODO ✏️ must match your script prints
    )

    # Build ranges from CONFIG (keys must match your script's hypargs)
    rng_cfg = config['tuner']['ranges']
    ranges = {}
    for k, v in rng_cfg.items():
        kind, lo, hi = v
        if kind == 'IntegerParameter':
            ranges[k] = IntegerParameter(int(lo), int(hi))
        else:
            ranges[k] = ContinuousParameter(float(lo), float(hi))

    tuner = HyperparameterTuner(
        estimator=est,
        objective_metric_name=config['tuner']['objective_metric'],  # <- TODO ✏️ choose objective
        hyperparameter_ranges=ranges,
        objective_type='Maximize',
        max_jobs=config['tuner']['max_jobs'],
        max_parallel_jobs=config['tuner']['max_parallel_jobs'],
        base_tuning_job_name='student-hpo-tuning',
        early_stopping_type=config['tuner']['early_stopping_type']
    )

    print('Starting tuning job ...')
    tuner.fit({
        'training': sagemaker.inputs.TrainingInput(
            s3_data=config['data']['parquet_uri'], content_type='application/x-parquet')
    })
    print('Tuning job started:', tuner.latest_tuning_job.job_name)
    return tuner

def monitor_tuning_job(tuner: HyperparameterTuner):
    import time
    while True:
        desc = tuner.describe(); status = desc['HyperParameterTuningJobStatus']
        print('Status:', status)
        if status in ['Completed','Failed','Stopped']:
            break
        try:
            best = tuner.best_training_job()
            print('Best so far:', best['TrainingJobName'], '| value:', best['FinalHyperParameterTuningJobObjectiveMetric']['Value'])
        except Exception:
            print('No completed jobs yet ...')
        time.sleep(60)
    print('Final status:', status)
    if status=='Completed':
        best = tuner.best_training_job()
        print('\nBest hyperparameters:')
        for p,v in best['TunedHyperParameters'].items():
            print(' ', p, ':', v)
        print('\nBest objective value:', best['FinalHyperParameterTuningJobObjectiveMetric']['Value'])
        try:
            print('\nTop jobs:')
            print(tuner.analytics().dataframe().head())
        except Exception as e:
            print('Analytics unavailable:', e)

### ▶️ Launch Tuning Job or Attach

In [None]:
# tuner = create_hyperparameter_tuning_job(CONFIG)   # <- TODO ✏️ run when ready
# monitor_tuning_job(tuner)

# from sagemaker.tuner import HyperparameterTuner
# tuner = HyperparameterTuner.attach('student-hpo-tuning-YYYY-MM-DD-HH-MM-SS-XYZ')  # <- TODO ✏️ job name
# monitor_tuning_job(tuner)

## 🧠 Tips
- Put **model choice** inside your training script; here we only pass hyperparameters and collect metrics.
- For **recall-first** selection, make sure your script prints recall (and F1 @ target recall). Use ROC-AUC as stable tuner objective, then review recall in your results.
- Ensure the names in `manual_search.grid_*` **match** the arg names your script expects (and the tuner ranges too).