# SageMaker Experiments (Free-Tier, Notebooks optional)

This notebook is an **illustrative companion** to the CLI scripts in this mini-project. Its purpose is to make the workflow easy to read, reproduce, and present:

- Load a small tabular dataset
- Train a lightweight XGBoost model locally (inside the notebook kernel)
- Run a tiny, manual hyper-parameter search
- Save model and metrics as artefacts in Amazon S3
- Keep everything within the AWS Free Tier

You can run **either** the CLI scripts **or** this notebook. The notebook is included mainly for storytelling.

## How to use this notebook

1. **Set environment variables** in the first code cell:


In [2]:
import os
os.environ["AWS_REGION"]="eu-west-2"
os.environ["PROJECT"]="mini-project2"
os.environ["BUCKET"]=  "<your-existing-bucket-name>"
os.environ["S3_PREFIX"]="mini-project2"

2. **(Optional) Install dependencies** in-notebook if your kernel doesn’t already have them:

In [None]:
# !pip install -r ../requirements.txt

3. **Run the training and mini-HPO sections**. They mirror the CLI scripts and will upload:
   - `artifacts/model.joblib`
   - `metrics/metrics.json`
   - `metrics/hpo_summary.json`
under `s3://$BUCKET/$S3_PREFIX/...`

## Run order and what each step does

This notebook is for demonstration as well as results. We run a simple baseline first, then a tiny hyper-parameter search.

1) **Baseline training (`train_local.py`)**  
   - Trains one small XGBoost model and uploads:
     - `s3://$BUCKET/$S3_PREFIX/artifacts/model.joblib`
     - `s3://$BUCKET/$S3_PREFIX/metrics/metrics.json`  
   - Recommended first for a clean “before HPO” reference (but not strictly required).

2) **Mini HPO (`hpo_loop.py --trials 5`)**  
   - Runs a handful of short trials to explore hyper-parameters.  
   - Saves a summary with the best trial to:
     - `s3://$BUCKET/$S3_PREFIX/metrics/hpo_summary.json`  
   - You can run this on its own if you only want the HPO results.

**Now run:**

## Inspect outputs in S3

The scripts upload artefacts and metrics under `s3://$BUCKET/$S3_PREFIX/...`.

In [None]:
# Quick peek at metrics:
!aws s3 cp "s3://$BUCKET/$S3_PREFIX/metrics/metrics.json" -
!aws s3 cp "s3://$BUCKET/$S3_PREFIX/metrics/hpo_summary.json" - || true


## Costs Control
- Use a CPU kernel only; avoid creating endpoints.
- If you open this in SageMaker Studio, shut down the JupyterLab space when finished.
- Keep data small and trials modest; this notebook is designed for zero spend.

## Teardown (only this project’s data)
To remove just this project’s artefacts and metrics while keeping the bucket for future work:

In [None]:
!aws s3 rm "s3://$BUCKET/$S3_PREFIX" --recursive
# If you'd like to preview what would be deleted first:
# aws s3 rm "s3://$BUCKET/$S3_PREFIX" --recursive --dryrun

In [13]:
# Check what’s inside the bucket right now:
!aws s3 ls s3://$BUCKET/ --human-readable --summarize


Total Objects: 0
   Total Size: 0 Bytes
