AutoML Banner

## Notebook content

This notebook lets you review the experiment leaderboard for insights into trained model evaluation quality, load a chosen AutoGluon model from S3, and run predictions. 


ðŸ’¡ **Tips:**
- Ensure the S3 connection to pipeline run results is configured so the notebook can access run artifacts.
- The model name must match one of the models listed in the leaderboard (the **model** column).

### Contents
This notebook contains the following parts:

**[Setup](#setup)**  
**[Experiment run details](#experiment-run-details)**  
**[Experiment leaderboard](#experiment-leaderboard)**  
**[Download trained model](#download-trained-model)**  
**[Model insights](#model-insights)**  
**[Load the predictor](#load-the-predictor)**  
**[Predict the values](#predict-the-values)**  
**[Summary and next steps](#summary-and-next-steps)**

<a id="setup"></a>
## Setup

In [3]:
import warnings

warnings.filterwarnings("ignore")

In [4]:
%pip install autogluon.tabular[all]==1.5 | tail -n 1

Note: you may need to restart the kernel to use updated packages.


<a id="experiment-run-details"></a>
## Experiment run details

Set the pipeline name, and run ID that identify the training run whose artifacts you want to load. These values are typically available from the pipeline run or workbench.

In [5]:
pipeline_name = "autogluon-tabular-training-pipeline"
run_id = "0eb6fae9-8819-43b0-9a2c-9f2f61ad5c47"

<a id="experiment-leaderboard"></a>
## Experiment leaderboard

ðŸ“Œ **Action:** Ensure the S3 connection is added to the workbench so the notebook can access the results.

In [6]:
import boto3
import os
from IPython.display import HTML

s3 = boto3.resource('s3', endpoint_url=os.environ['AWS_S3_ENDPOINT'])
bucket = s3.Bucket(os.environ['AWS_S3_BUCKET'])
leaderboard_prefix = os.path.join(pipeline_name, run_id, 'leaderboard-evaluation')
leaderboard_artifact_name = 'html_artifact'

for obj in bucket.objects.filter(Prefix=leaderboard_prefix):
    if leaderboard_artifact_name in obj.key:
        bucket.download_file(obj.key, leaderboard_artifact_name)

HTML(leaderboard_artifact_name)

Unnamed: 0,model,r2,root_mean_squared_error,mean_squared_error,mean_absolute_error,pearsonr,median_absolute_error
0,RandomForestMSE_BAG_L1_FULL,0.990447,-0.880368,-0.775048,-0.447642,0.995632,-0.255999
1,RandomForestMSE_BAG_L4_FULL,0.946228,-2.088665,-4.362521,-1.361132,0.972874,-0.825
2,NeuralNetTorch_BAG_L4_FULL,0.941286,-2.182532,-4.763447,-1.403966,0.971167,-0.904951


<a id="download-trained-model"></a>
## Download trained model

ðŸ’¡ **Tip:** IF you want to download different model than the best one set `model_name` accordingly (must match a name from the leaderboard **model** column).

In [7]:
model_name = "RandomForestMSE_BAG_L1_FULL"

Download model binaries and metrics.

In [8]:
full_refit_prefix = os.path.join(pipeline_name, run_id, "autogluon-models-full-refit")
best_model_subpath = os.path.join("model_artifact", model_name)
best_model_path = None
local_dir = None

for obj in bucket.objects.filter(Prefix=full_refit_prefix):
    if best_model_subpath in obj.key:
        target = obj.key if local_dir is None else os.path.join(local_dir, os.path.relpath(obj.key, s3_folder))
        if not os.path.exists(os.path.dirname(target)):
            os.makedirs(os.path.dirname(target))
        if obj.key[-1] == '/':
            continue
        bucket.download_file(obj.key, target)
        best_model_path = os.path.join(obj.key.split(model_name)[0], model_name)

print("Model artifact stored under", best_model_path)

Model artifact stored under autogluon-tabular-training-pipeline/0eb6fae9-8819-43b0-9a2c-9f2f61ad5c47/autogluon-models-full-refit/021915de-a0d5-4d2e-9718-2eaa16515b35/model_artifact/RandomForestMSE_BAG_L1_FULL


<a id="model-insights"></a>
## Model insights

Display the features importances for selected model.

### Feature importance
Top ten are displayed.

In [10]:
import pandas as pd

feature_importance = pd.read_json(os.path.join(best_model_path, "metrics", "feature_importance.json"))
feature_importance.head(10)

Unnamed: 0,importance,stddev,p_value,n,p99_high,p99_low
lstat,0.948519,0.101739,1.6e-05,5,1.158002,0.739037
rm,0.315929,0.042088,3.7e-05,5,0.402589,0.22927
dis,0.064579,0.01521,0.000343,5,0.095896,0.033262
nox,0.055011,0.007657,4.4e-05,5,0.070777,0.039246
crim,0.043319,0.005173,2.4e-05,5,0.05397,0.032669
tax,0.017811,0.003395,0.000151,5,0.024801,0.01082
ptratio,0.015932,0.001657,1.4e-05,5,0.019343,0.012521
Unnamed: 0,0.00987,0.002091,0.000228,5,0.014176,0.005565
age,0.009368,0.001096,2.2e-05,5,0.011625,0.007111
indus,0.008282,0.001451,0.000109,5,0.01127,0.005295


<a id="load-the-predictor"></a>
## Load the predictor

Load the trained model as a TabularPredictor object.

In [11]:
from autogluon.tabular import TabularPredictor

predictor = TabularPredictor.load(best_model_path)

In [14]:
predictor.feature_metadata.to_dict()

{'Unnamed: 0': ('int', ()),
 'crim': ('float', ()),
 'zn': ('float', ()),
 'indus': ('float', ()),
 'chas': ('int', ('bool',)),
 'nox': ('float', ()),
 'rm': ('float', ()),
 'age': ('float', ()),
 'dis': ('float', ()),
 'rad': ('int', ()),
 'tax': ('int', ()),
 'ptratio': ('float', ()),
 'black': ('float', ()),
 'lstat': ('float', ())}

<a id="predict-the-values"></a>
## Predict the values

Use sample records to predict values. 

In [15]:
import pandas as pd

score_data = {
    "Unnamed: 0": [0.00632], 
    "crim": [18.0], 
    "zn": [2.31], 
    "indus": [0.0],
    "chas": [0.538],
    "nox": [6.575],
    "rm": [65.2],
    "age": [4.09],
    "dis": [1.0],
    "rad": [296],
    "tax": [15.3],
    "ptratio": [396.9],
    "black": [1],
    "lstat": [4.98]
}
score_df = pd.DataFrame(data=score_data)
score_df.head()

Unnamed: 0.1,Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296,15.3,396.9,1,4.98


Predict the values using `predict` method.

In [17]:
predictor.predict(score_df)

0    32.231335
Name: medv, dtype: float32

<a id="summary-and-next-steps"></a>
## Summary and next steps

**Summary:** This notebook loaded a trained AutoGluon model from S3, displayed the experiment leaderboard, and ran predictions on sample data using `predict_proba`.

**Next steps:**
- Run predictions on your own data (ensure columns match the training schema).
- Try another model from the leaderboard by changing `model_name` and re-running the download and load cells.
- Optionally create the Predictor online deployment using Kserve custom runtime.

---