# 🧪 Hyperparameter Experimentation (SageMaker) — Manual vs Tuner

**Purpose:** Provide a *stable*, **reproducible**, and **config-driven** notebook to explore hyperparameters for a churn model (or any binary classifier) in a way that’s safe to promote to production.

> Two approaches:
>
> 1) **Manual Search** — launch individual SageMaker training jobs for deterministic control and to work around tuner constraints.
> 2) **SageMaker Hyperparameter Tuning Job** — a dedicated HPO job with metric-based early stopping and parallel exploration.

Both approaches assume you have a training script (e.g., `training/train_sagemaker.py` or `training/train_sagemaker_large.py`) that prints metrics with regex-friendly lines.


## 📦 What You Get
- **Config-first** setup for S3 paths, instance types, and metric regex
- **Manual** launcher for many single training jobs in parallel (spot instances supported)
- **Dedicated** Hyperparameter Tuning Job (HPO) via `HyperparameterTuner`
- Recall-first **metrics extraction** (ROC-AUC, F1@target recall, churner recall/precision)
- Optional **MLflow** logging
- **Artifacts**: CSV of search results; best params printed


## 🧰 Prerequisites
- Python 3.9+
- AWS credentials with SageMaker permissions (role, S3 access)
- Packages: `sagemaker`, `boto3`, `pandas`, `numpy`, `pyyaml`, `scikit-learn` (for local utils), `mlflow` (optional)
- A `config.yaml` in the project root (next to this notebook or parent) with:
  ```yaml
  data:
    parquet_uri: s3://your-bucket/path/to/data/*.parquet
  ```
- Training scripts:
  - `training/train_sagemaker.py` (for Tuner)
  - `training/train_sagemaker_large.py` (for Manual jobs; can be the same if you prefer)

```python
# If running on a fresh environment, uncomment as needed
# %pip install sagemaker boto3 pandas numpy pyyaml scikit-learn mlflow
```


## ♻️ Reproducibility & Environment Capture
- Fixed seeds
- Unique run folder for artifacts
- Package versions captured


In [None]:
import os, sys, json, hashlib, platform, random
from datetime import datetime
import numpy as np
import pandas as pd

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

RUN_TS = datetime.utcnow().strftime('%Y%m%dT%H%M%SZ')
RUN_ID = hashlib.sha1(f"{RUN_TS}-{SEED}".encode()).hexdigest()[:10]
ARTIFACT_DIR = os.environ.get('ARTIFACT_DIR', f"artifacts/hpo_{RUN_TS}_{RUN_ID}")
os.makedirs(ARTIFACT_DIR, exist_ok=True)

env_info = {
    'python': sys.version,
    'platform': platform.platform(),
    'timestamp_utc': RUN_TS,
    'seed': SEED,
    'packages': { 'pandas': pd.__version__, 'numpy': np.__version__ },
}
with open(os.path.join(ARTIFACT_DIR, 'env_info.json'), 'w') as f:
    json.dump(env_info, f, indent=2)
env_info

## ⚙️ Configuration (Edit here)
Single source of truth for inputs, outputs, and behavior. You can flip between manual and tuner flows.


In [None]:
from pathlib import Path
import yaml

CONFIG = {
    'aws': {
        'instance_type_manual': os.getenv('INSTANCE_TYPE_MANUAL', 'ml.m5.large'),
        'instance_type_tuner': os.getenv('INSTANCE_TYPE_TUNER', 'ml.m5.xlarge'),
        'use_spot': True,
        'max_run_manual_sec': 2*60*60,   # 2 hours
        'max_wait_manual_sec': 4*60*60,  # 4 hours
        'max_run_tuner_sec': 60*60,      # 1 hour
        'max_wait_tuner_sec': 2*60*60,   # 2 hours
    },
    'data': {
        'parquet_uri': os.getenv('PARQUET_URI', 's3://your-bucket/path/to/data/*.parquet'),
    },
    'training': {
        'entry_point_manual': 'train_sagemaker_large.py',
        'entry_point_tuner': 'train_sagemaker.py',
        'source_dir': 'training',
        'requirements': 'requirements.txt',
        'config_yaml': 'config.yaml',
        'mlflow_mode': os.getenv('MLFLOW_MODE', 'disabled'),  # 'local' | 'sagemaker' | 'disabled'
        'recall_target': float(os.getenv('RECALL_TARGET', '0.80')),
        'chunk_size': int(os.getenv('CHUNK_SIZE', '50000')),
        'sample_ratio': float(os.getenv('SAMPLE_RATIO', '1.0')),
    },
    'regex_metrics': [
        { 'Name': 'roc_auc', 'Regex': r'Final ROC-AUC: ([0-9]+\.[0-9]+)' },
        { 'Name': 'f1_score', 'Regex': r'Final F1 @ target recall: ([0-9]+\.[0-9]+)' },
        { 'Name': 'churner_recall', 'Regex': r'Churner recall:\s+([0-9]+\.[0-9]+)' },
        { 'Name': 'churner_precision', 'Regex': r'Churner precision:\s+([0-9]+\.[0-9]+)' },
    ],
    'manual_search': {
        'strategy': 'focused',   # 'focused' | 'broad'
        'max_parallel_jobs': 3,
        'max_combinations': None,
        # focused grid (edit as needed)
        'grid_focused': {
            'n_estimators': [2200, 2500, 3000],
            'learning_rate': [0.08],
            'depth': [6],
            'l2_leaf_reg': [5]
        },
        # broad grid (edit as needed)
        'grid_broad': {
            'n_estimators': [1500, 2000, 2500, 3000, 3500],
            'learning_rate': [0.05, 0.07, 0.09, 0.11, 0.13],
            'depth': [4, 5, 6, 7, 8],
            'l2_leaf_reg': [2.0, 4.0, 6.0, 8.0, 10.0]
        }
    },
    'tuner': {
        'max_jobs': 5,
        'max_parallel_jobs': 2,
        'objective_metric': 'roc_auc',  # Stable for HPO
        'early_stopping_type': 'Auto',
        'ranges': {
            'n-estimators': ['IntegerParameter', 500, 3000],
            'learning-rate': ['ContinuousParameter', 0.01, 0.2],
            'depth': ['IntegerParameter', 4, 10],
            'l2-leaf-reg': ['ContinuousParameter', 0.5, 10.0]
        }
    },
    'output': {
        'artifact_dir': ARTIFACT_DIR,
        'results_csv': str(Path(ARTIFACT_DIR)/'manual_search_results.csv'),
    },
    'mlflow': {
        'enabled': False,
        'tracking_uri': os.getenv('MLFLOW_TRACKING_URI',''),
        'experiment_name': 'manual-hyperparameter-search'
    }
}
CONFIG

## 🧭 Approach A — Manual Hyperparameter Search (many single jobs)
Use when you want full control, need to work around tuner issues, or want to stage/batch jobs for very large data.


In [None]:
import boto3, sagemaker, itertools, time
from sagemaker.pytorch import PyTorch
from datetime import datetime
from typing import List, Dict, Tuple

class ManualHyperparameterSearch:
    def __init__(self, config: dict):
        self.cfg = config
        self.sess = sagemaker.Session()
        try:
            self.role = sagemaker.get_execution_role()
        except Exception:
            # For local notebooks where get_execution_role isn't available, allow role via env
            self.role = os.getenv('SAGEMAKER_ROLE_ARN', 'YOUR-ROLE-ARN')
        self.region = boto3.Session().region_name
        self.bucket = self.sess.default_bucket()

        # Data channel
        self.training_input = sagemaker.inputs.TrainingInput(
            s3_data=self.cfg['data']['parquet_uri'],
            content_type='application/x-parquet'
        )

        self.jobs = []
        print('🔬 Manual Hyperparameter Search Initialized')
        print('   Instance type:', self.cfg['aws']['instance_type_manual'])
        print('   Sample ratio:', self.cfg['training']['sample_ratio'])
        print('   Region:', self.region)

    def _grid(self, strategy='focused'):
        grid = self.cfg['manual_search']['grid_focused'] if strategy=='focused' else self.cfg['manual_search']['grid_broad']
        names = list(grid.keys())
        vals = list(grid.values())
        combos = list(itertools.product(*vals))
        combos = [dict(zip(names, c)) for c in combos]
        return combos

    def define_hyperparameter_grid(self, strategy='focused') -> List[Dict]:
        combos = self._grid(strategy)
        print(f"\n📊 Generated {len(combos)} combos | strategy: {strategy}")
        for i, p in enumerate(combos[:3]):
            print(f"   {i+1}: {p}")
        if len(combos)>3:
            print(f"   ... and {len(combos)-3} more")
        return combos

    def _estimator(self, params: Dict, job_index: int) -> Tuple[PyTorch,str]:
        ts = datetime.now().strftime('%Y%m%d-%H%M%S')
        job_name = f"manual-hpo-{job_index:02d}-{ts}"
        est = PyTorch(
            entry_point=self.cfg['training']['entry_point_manual'],
            source_dir=self.cfg['training']['source_dir'],
            role=self.role,
            instance_type=self.cfg['aws']['instance_type_manual'],
            instance_count=1,
            framework_version='2.0.0',
            py_version='py310',
            hyperparameters={
                'mlflow-mode': self.cfg['training']['mlflow_mode'],
                'sample-ratio': str(self.cfg['training']['sample_ratio']),
                'chunk-size': str(self.cfg['training']['chunk_size']),
                'config': self.cfg['training']['config_yaml'],
                'n-estimators': str(params['n_estimators']),
                'learning-rate': str(params['learning_rate']),
                'depth': str(params['depth']),
                'l2-leaf-reg': str(params['l2_leaf_reg'])
            },
            max_run=self.cfg['aws']['max_run_manual_sec'],
            use_spot_instances=self.cfg['aws']['use_spot'],
            max_wait=self.cfg['aws']['max_wait_manual_sec'],
            base_job_name=f"manual-hpo-{job_index:02d}",
            dependencies=[self.cfg['training']['config_yaml'], self.cfg['training']['requirements']],
            metric_definitions=self.cfg['regex_metrics']
        )
        return est, job_name

    def launch_search(self, param_combinations: List[Dict], max_parallel_jobs: int = 3):
        running, completed, failed = [], [], []
        total = len(param_combinations)
        print(f"\n🚀 Manual search: total {total}, parallel {max_parallel_jobs}")
        for i, params in enumerate(param_combinations):
            while len(running) >= max_parallel_jobs:
                print('⏳ Waiting for capacity ...')
                time.sleep(60)
                running, completed, failed = self._check_status(running, completed, failed)
            try:
                est, job_name = self._estimator(params, i+1)
                est.fit({'training': self.training_input}, wait=False)
                info = { 'job_index': i+1, 'job_name': est.latest_training_job.name, 'estimator': est, 'parameters': params, 'status': 'InProgress', 'start_time': datetime.now() }
                running.append(info); self.jobs.append(info)
                print(f"✅ Launched {info['job_name']} with {params}")
                time.sleep(10)
            except Exception as e:
                print('❌ Launch failed:', e)
                failed.append({'job_index': i+1, 'parameters': params, 'error': str(e)})
        print('\n⏳ Waiting for all jobs to finish ...')
        while running:
            time.sleep(120)
            running, completed, failed = self._check_status(running, completed, failed)
        print(f"\n🎯 Done. Completed: {len(completed)} | Failed: {len(failed)}")
        return completed, failed

    def _check_status(self, running, completed, failed):
        still = []
        for j in running:
            try:
                st = j['estimator'].latest_training_job.describe()['TrainingJobStatus']
                if st == 'Completed':
                    j['status'] = 'Completed'; j['end_time'] = datetime.now(); completed.append(j)
                    print('✅ Completed:', j['job_name'])
                elif st in ['Failed','Stopped']:
                    j['status'] = st; j['end_time'] = datetime.now(); failed.append(j)
                    print(f"❌ {st}:", j['job_name'])
                else:
                    still.append(j)
            except Exception as e:
                print('⚠️  Status check error:', j.get('job_name'), e)
                still.append(j)
        if len(still) != len(running):
            print(f"📊 Update — completed: {len(completed)}, failed: {len(failed)}, running: {len(still)}")
        return still, completed, failed

    def collect_results(self, completed_jobs: List[Dict]):
        import pandas as pd
        print(f"\n📊 Collecting results from {len(completed_jobs)} jobs ...")
        rows = []
        for j in completed_jobs:
            try:
                desc = j['estimator'].latest_training_job.describe()
                metrics = {m['MetricName']: m['Value'] for m in desc.get('FinalMetricDataList', [])}
                rows.append({
                    'job_name': j['job_name'],
                    'job_index': j['job_index'],
                    'training_time_minutes': (j['end_time'] - j['start_time']).total_seconds()/60,
                    **j['parameters'],
                    **metrics
                })
            except Exception as e:
                print('⚠️  Collect error for', j.get('job_name'), e)
        df = pd.DataFrame(rows)
        if not df.empty:
            if 'roc_auc' in df.columns:
                df = df.sort_values('roc_auc', ascending=False)
            out = self.cfg['output']['results_csv']
            Path(out).parent.mkdir(parents=True, exist_ok=True)
            df.to_csv(out, index=False)
            print('💾 Saved to', out)
        else:
            print('⚠️  No metrics collected.')
        return df

    def analyze_results(self, df: pd.DataFrame):
        if df.empty:
            print('❌ No results to analyze')
            return df
        print('\n📈 Analysis — top 5 by roc_auc')
        keep = [c for c in ['job_index','roc_auc','f1_score','churner_recall','churner_precision','n_estimators','learning_rate','depth','l2_leaf_reg'] if c in df.columns]
        print(df[keep].head(5).to_string(index=False))
        if 'roc_auc' in df.columns:
            print('\nSummary:')
            print('  best roc_auc:', df['roc_auc'].max())
            print('  mean roc_auc:', df['roc_auc'].mean())
            print('  std  roc_auc:', df['roc_auc'].std())
        return df


### ▶️ Run Manual Search
Uncomment and execute to launch. **Note:** This creates multiple training jobs and incurs AWS cost.


In [None]:
# search = ManualHyperparameterSearch(CONFIG)
# combos = search.define_hyperparameter_grid(strategy=CONFIG['manual_search']['strategy'])
# if CONFIG['manual_search']['max_combinations']:
#     combos = combos[:CONFIG['manual_search']['max_combinations']]
# completed, failed = search.launch_search(combos, max_parallel_jobs=CONFIG['manual_search']['max_parallel_jobs'])
# results_df = search.collect_results(completed)
# search.analyze_results(results_df)

## 🤖 Approach B — SageMaker Hyperparameter Tuning Job (HPO)
Create a dedicated tuning job that explores ranges and maximizes a chosen objective metric.


In [None]:
import boto3, sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, ContinuousParameter

def create_hyperparameter_tuning_job(config: dict):
    sess = sagemaker.Session()
    try:
        role = sagemaker.get_execution_role()
    except Exception:
        role = os.getenv('SAGEMAKER_ROLE_ARN', 'YOUR-ROLE-ARN')
    region = boto3.Session().region_name
    print('SageMaker role:', role)
    print('Region:', region)

    # Optionally write a minimal requirements.txt the training script can use
    req_txt = '''\
pandas>=1.5.0
numpy>=1.24.0
scikit-learn>=1.2.0
catboost>=1.2.0
mlflow>=2.8.0
s3fs>=2023.1.0
pyarrow>=10.0.0
sqlalchemy>=2.0.0
redshift-connector>=2.0.0
sagemaker>=2.190.0
boto3>=1.26.0
awswrangler>=3.0.0
pyyaml>=6.0
'''
    with open('requirements.txt','w') as f:
        f.write(req_txt)

    est = PyTorch(
        entry_point=config['training']['entry_point_tuner'],
        source_dir=config['training']['source_dir'],
        role=role,
        instance_type=config['aws']['instance_type_tuner'],
        instance_count=1,
        framework_version='2.0.0',
        py_version='py310',
        hyperparameters={ 'mlflow-mode': 'disabled', 'config': config['training']['config_yaml'] },
        max_run=config['aws']['max_run_tuner_sec'],
        use_spot_instances=True,
        max_wait=config['aws']['max_wait_tuner_sec'],
        dependencies=[config['training']['config_yaml'], 'requirements.txt'],
        metric_definitions=config['regex_metrics']
    )

    # Build hyperparameter ranges from config
    ranges_cfg = config['tuner']['ranges']
    ranges = {}
    for k, v in ranges_cfg.items():
        kind, lo, hi = v
        if kind == 'IntegerParameter':
            ranges[k] = IntegerParameter(int(lo), int(hi))
        else:
            ranges[k] = ContinuousParameter(float(lo), float(hi))

    tuner = HyperparameterTuner(
        estimator=est,
        objective_metric_name=config['tuner']['objective_metric'],
        hyperparameter_ranges=ranges,
        objective_type='Maximize',
        max_jobs=config['tuner']['max_jobs'],
        max_parallel_jobs=config['tuner']['max_parallel_jobs'],
        base_tuning_job_name='churn-model-tuning',
        early_stopping_type=config['tuner']['early_stopping_type']
    )

    print('Starting tuning job ...')
    print('Objective metric:', config['tuner']['objective_metric'])
    print('Max jobs:', config['tuner']['max_jobs'], '| Parallel:', config['tuner']['max_parallel_jobs'])

    tuner.fit({
        'training': sagemaker.inputs.TrainingInput(
            s3_data=config['data']['parquet_uri'],
            content_type='application/x-parquet'
        )
    })

    print('\nTuning job started:', tuner.latest_tuning_job.job_name)
    return tuner

def monitor_tuning_job(tuner: HyperparameterTuner):
    import time
    while True:
        desc = tuner.describe()
        status = desc['HyperParameterTuningJobStatus']
        print('Status:', status)
        if status in ['Completed','Failed','Stopped']:
            break
        try:
            best = tuner.best_training_job()
            print('Best so far:', best['TrainingJobName'], '| value:', best['FinalHyperParameterTuningJobObjectiveMetric']['Value'])
        except Exception:
            print('No completed jobs yet ...')
        time.sleep(60)
    print('Final status:', status)
    if status=='Completed':
        best = tuner.best_training_job()
        print('\nBest hyperparameters:')
        for p,v in best['TunedHyperParameters'].items():
            print(' ', p, ':', v)
        print('\nBest metric value:', best['FinalHyperParameterTuningJobObjectiveMetric']['Value'])
        try:
            print('\nTop 5 jobs:')
            print(tuner.analytics().dataframe().head())
        except Exception as e:
            print('Analytics unavailable:', e)

### ▶️ Launch Tuning Job
Uncomment and execute to start. You can later attach to a running job by name and monitor.


In [None]:
# tuner = create_hyperparameter_tuning_job(CONFIG)
# monitor_tuning_job(tuner)

# Or attach to an existing job by name:
# from sagemaker.tuner import HyperparameterTuner
# tuner = HyperparameterTuner.attach('churn-model-tuning-YYYY-MM-DD-HH-MM-SS-XYZ')
# monitor_tuning_job(tuner)

## 🧭 Guidance — When to Use Which Approach?
- **Manual Search**: you want explicit control of each job, different sample ratios/chunk sizes, or staged waves of experiments.
- **Tuner (HPO)**: you want automatic exploration, early stopping, and a consolidated analytics view in SageMaker.

**Metric of interest:** If your business goal is recall-first, ensure your training script reports the **recall at a target threshold** (and F1@target recall). You can still use ROC-AUC for tuner stability while analyzing recall externally.
