AutoML Banner

## Notebook content

This notebook lets you review the experiment leaderboard for insights into trained model evaluation quality, load a chosen AutoGluon model from S3, and run predictions. 


ðŸ’¡ **Tips:**
- Ensure the S3 connection to pipeline run results is configured so the notebook can access run artifacts.
- The model name must match one of the models listed in the leaderboard (the **model** column).

### Contents
This notebook contains the following parts:

**[Setup](#setup)**  
**[Experiment run details](#experiment-run-details)**  
**[Experiment leaderboard](#experiment-leaderboard)**  
**[Download trained model](#download-trained-model)**  
**[Model insights](#model-insights)**  
**[Load the predictor](#load-the-predictor)**  
**[Predict the values](#predict-the-values)**  
**[Summary and next steps](#summary-and-next-steps)**

<a id="setup"></a>
## Setup

In [1]:
import warnings

warnings.filterwarnings("ignore")

In [2]:
%pip install autogluon.tabular[all]==1.5 | tail -n 1

Note: you may need to restart the kernel to use updated packages.


<a id="experiment-run-details"></a>
## Experiment run details

Set the pipeline name, run name, and run ID that identify the training run whose artifacts you want to load. These values are typically available from the pipeline run or workbench.

In [3]:
pipeline_name = "autogluon-tabular-training-pipeline"
run_id = "f5eaed46-4458-4bee-9b98-56e8306c1b64"

<a id="experiment-leaderboard"></a>
## Experiment leaderboard

ðŸ“Œ **Action:** Ensure the S3 connection is added to the workbench so the notebook can access the results.

In [4]:
import boto3
import os
from IPython.display import HTML

s3 = boto3.resource('s3', endpoint_url=os.environ['AWS_S3_ENDPOINT'])
bucket = s3.Bucket(os.environ['AWS_S3_BUCKET'])
leaderboard_prefix = os.path.join(pipeline_name, run_id, 'leaderboard-evaluation')
leaderboard_artifact_name = 'html_artifact'

for obj in bucket.objects.filter(Prefix=leaderboard_prefix):
    if leaderboard_artifact_name in obj.key:
        bucket.download_file(obj.key, leaderboard_artifact_name)

HTML(leaderboard_artifact_name)

Unnamed: 0,model,accuracy,balanced_accuracy,mcc,roc_auc,f1,precision,recall
2,NeuralNetFastAI_BAG_L2_FULL,0.919283,0.932584,0.848707,0.996485,0.908629,0.832558,1.0
1,LightGBMLarge_BAG_L2_FULL,0.914798,0.906744,0.821958,0.953006,0.890805,0.91716,0.865922
0,XGBoost_BAG_L1_FULL,0.899103,0.881667,0.791963,0.971031,0.863222,0.946667,0.793296


<a id="download-trained-model"></a>
## Download trained model

ðŸ’¡ **Tip:** IF you want to download different model than the best one set `model_name` accordingly (must match a name from the leaderboard **model** column).

In [5]:
model_name = "NeuralNetFastAI_BAG_L2_FULL"

Download model binaries and metrics.

In [6]:
full_refit_prefix = os.path.join(pipeline_name, run_id, "autogluon-models-full-refit")
best_model_subpath = os.path.join("model_artifact", model_name)
best_model_path = None
local_dir = None

for obj in bucket.objects.filter(Prefix=full_refit_prefix):
    if best_model_subpath in obj.key:
        target = obj.key if local_dir is None else os.path.join(local_dir, os.path.relpath(obj.key, s3_folder))
        if not os.path.exists(os.path.dirname(target)):
            os.makedirs(os.path.dirname(target))
        if obj.key[-1] == '/':
            continue
        bucket.download_file(obj.key, target)
        best_model_path = os.path.join(obj.key.split(model_name)[0], model_name)

print("Model artifact stored under", best_model_path)

Model artifact stored under autogluon-tabular-training-pipeline/f5eaed46-4458-4bee-9b98-56e8306c1b64/autogluon-models-full-refit/86fb488c-9306-4a28-907e-fdea0021b14b/model_artifact/NeuralNetFastAI_BAG_L2_FULL


<a id="model-insights"></a>
## Model insights

Display the confusion matrix and features importances for selected model.

### Confusion matrix

In [7]:
import pandas as pd

confusion_matrix = pd.read_json(os.path.join(best_model_path, "metrics", "confusion_matrix.json"))
confusion_matrix.head()

Unnamed: 0,0,1
0,231,0
1,36,179


### Feature importance
Top ten are displayed.

In [8]:
feature_importance = pd.read_json(os.path.join(best_model_path, "metrics", "feature_importance.json"))
feature_importance.head(10)

Unnamed: 0,importance,stddev,p_value,n,p99_high,p99_low
Name,0.251121,0.010396,3.517177e-07,5,0.272527,0.229715
Sex,0.117937,0.014839,2.944928e-05,5,0.148491,0.087384
Pclass,0.052018,0.016809,0.001144288,5,0.086627,0.017409
Age,0.040359,0.005932,5.443512e-05,5,0.052573,0.028144
Ticket,0.039013,0.008177,0.0002186093,5,0.05585,0.022177
SibSp,0.038565,0.003326,6.57081e-06,5,0.045413,0.031717
Embarked,0.031839,0.003684,2.11369e-05,5,0.039424,0.024253
Fare,0.030493,0.006058,0.0001774826,5,0.042967,0.01802
PassengerId,0.028251,0.008329,0.0008104531,5,0.045401,0.011101
Parch,0.027354,0.006613,0.0003798999,5,0.040971,0.013737


<a id="load-the-predictor"></a>
## Load the predictor

Load the trained model as a TabularPredictor object.

In [9]:
from autogluon.tabular import TabularPredictor

predictor = TabularPredictor.load(best_model_path)

<a id="predict-the-values"></a>
## Predict the values

Use sample records to predict values. 

In [10]:
import pandas as pd

score_data = {
    "PassengerId": [1, 2], 
    "Pclass": [3, 1], 
    "Name": ["Braund, Mr. Owen Harris", "Heikkinen, Miss. Laina"], 
    "Sex": [0, 1],
    "Age": [22, 26],
    "SibSp": [1, 0],
    "Parch": [0, 0],
    "Ticket": ["A/5 21171", "STON/O2. 3101282"],
    "Fare": [7.25, 7.9],
    "Cabin": ["", ""],
    "Embarked": ["S", "C"]
}
score_df = pd.DataFrame(data=score_data)
score_df.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,3,"Braund, Mr. Owen Harris",0,22,1,0,A/5 21171,7.25,,S
1,2,1,"Heikkinen, Miss. Laina",1,26,0,0,STON/O2. 3101282,7.9,,C


Predict the values using `predict_proba` method.

In [11]:
predictor.predict_proba(score_df)

Unnamed: 0,0,1
0,0.492604,0.507396
1,0.379297,0.620703


<a id="summary-and-next-steps"></a>
## Summary and next steps

**Summary:** This notebook loaded a trained AutoGluon model from S3, displayed the experiment leaderboard, and ran predictions on sample data using `predict_proba`.

**Next steps:**
- Run predictions on your own data (ensure columns match the training schema).
- Try another model from the leaderboard by changing `model_name` and re-running the download and load cells.
- Optionally create the Predictor online deployment using Kserve custom runtime.

---