# Wine Type Prediction
- [The Wine Type Prediction dataset](https://archive.ics.uci.edu/ml/datasets/Wine) consists of data related to the chemical properties of various wines and classifies each wine into on1 of 3 possible classes. The columns in the dataset are as follows:

|col name|description|
|:--|:--|
|target| This is the target variable to be predicted. There are three possible classes, class 1, 2 and 3 |
|alcohol| continuous | 
|malic_acid| continuous | 
|ash| continuous | 
|alcalinity_of_ash| continuous |    
|magnesium| continuous | 
|total_phenols| continuous | 
|flavanoids| continuous | 
|nonflavanoid_phenols| continuous | 
|proanthocyanins| continuous | 
|color_intensity| continuous | 
|hue| continuous | 
|od280/od315_of_diluted_wines| continuous | 
|proline| continuous | 


- The goal of this project is to build and tune a model to predict the `target` column using AWS Sagemaker and deploy the model as a `Serverless Inference Endpoint`

## Tips: 
- You can use the below code to get the S3 bucket to write any artifacts to
    ```
    import sagemaker
    session = sagemaker.Session()
    bucket = session.default_bucket()
    ```
- What ML task is this? Classification? Regression? Clustering?
- What are the data types of the columns? What pre-processing should you apply?
- How to determine the best hyperparameters for the model?
- How to test if the model is deployed successfully?

In [2]:
import pandas as pd

cols = [
    "target",
    "alcohol", 
    "malic_acid", 
    "ash", 
    "alcalinity_of_ash",    
    "magnesium", 
    "total_phenols", 
    "flavanoids", 
    "nonflavanoid_phenols", 
    "proanthocyanins", 
    "color_intensity", 
    "hue", 
    "od280/od315_of_diluted_wines", 
    "proline"
]

In [3]:
wine_df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", names=cols)

print(wine_df.shape)
wine_df.head()

(178, 14)


Unnamed: 0,target,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [4]:
wine_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   target                        178 non-null    int64  
 1   alcohol                       178 non-null    float64
 2   malic_acid                    178 non-null    float64
 3   ash                           178 non-null    float64
 4   alcalinity_of_ash             178 non-null    float64
 5   magnesium                     178 non-null    int64  
 6   total_phenols                 178 non-null    float64
 7   flavanoids                    178 non-null    float64
 8   nonflavanoid_phenols          178 non-null    float64
 9   proanthocyanins               178 non-null    float64
 10  color_intensity               178 non-null    float64
 11  hue                           178 non-null    float64
 12  od280/od315_of_diluted_wines  178 non-null    float64
 13  proli

In [5]:
wine_df['target']

0      1
1      1
2      1
3      1
4      1
      ..
173    3
174    3
175    3
176    3
177    3
Name: target, Length: 178, dtype: int64

In [6]:
# binarize the target column
# The original target column has classes 1, 2, 3.
# map them to 0, 1, 2.
wine_df['target'] = wine_df['target'].apply(lambda x: x - 1)

# Check the new class distribution to confirm the change
print("New Target Class Distribution (0, 1, 2)")
print(wine_df['target'].value_counts().sort_index())

New Target Class Distribution (0, 1, 2)
target
0    59
1    71
2    48
Name: count, dtype: int64


Split the train and test datasets

In [7]:
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(wine_df, test_size=0.1, random_state=42)

print(train_df.shape, test_df.shape)
train_df.head()

(160, 14) (18, 14)


Unnamed: 0,target,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
9,0,13.86,1.35,2.27,16.0,98,2.98,3.15,0.22,1.85,7.22,1.01,3.55,1045
114,1,12.08,1.39,2.5,22.5,84,2.56,2.29,0.43,1.04,2.9,0.93,3.19,385
18,0,14.19,1.59,2.48,16.5,108,3.3,3.93,0.32,1.86,8.7,1.23,2.82,1680
66,1,13.11,1.01,1.7,15.0,78,2.98,3.18,0.26,2.28,5.3,1.12,3.18,502
60,1,12.33,1.1,2.28,16.0,101,2.05,1.09,0.63,0.41,3.27,1.25,1.67,680


Write the train and test datasets to S3

In [8]:
import sagemaker

session = sagemaker.Session()
bucket = session.default_bucket()
bucket

'sagemaker-ap-southeast-2-907808569037'

In [9]:
# Write the files locally
train_df.to_csv('../data/train.csv', index=False)
test_df.to_csv('../data/test.csv', index=False)

In [10]:
# Upload the files into S3
train_path = session.upload_data(path='../data/train.csv', bucket=bucket, key_prefix='sagemaker/wine_type')
test_path = session.upload_data(path='../data/test.csv', bucket=bucket, key_prefix='sagemaker/wine_type')

print(f"Train Path: {train_path}")
print(f"Test Path: {test_path}")

Train Path: s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/train.csv
Test Path: s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/test.csv


In [11]:
!pip install ydata_profiling



In [12]:
!pip install --upgrade scipy



In [13]:
# Exploratory Data Analysis
from ydata_profiling import ProfileReport

In [14]:
profile = ProfileReport(train_df)
profile.to_file("profile_report.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]


  0%|          | 0/14 [00:00<?, ?it/s][A
100%|██████████| 14/14 [00:00<00:00, 134.58it/s][A


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [15]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 160 entries, 9 to 102
Data columns (total 14 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   target                        160 non-null    int64  
 1   alcohol                       160 non-null    float64
 2   malic_acid                    160 non-null    float64
 3   ash                           160 non-null    float64
 4   alcalinity_of_ash             160 non-null    float64
 5   magnesium                     160 non-null    int64  
 6   total_phenols                 160 non-null    float64
 7   flavanoids                    160 non-null    float64
 8   nonflavanoid_phenols          160 non-null    float64
 9   proanthocyanins               160 non-null    float64
 10  color_intensity               160 non-null    float64
 11  hue                           160 non-null    float64
 12  od280/od315_of_diluted_wines  160 non-null    float64
 13  proline   

In [16]:
# Split features and target
X_train = train_df.drop("target", axis=1)
y_train = train_df['target']

X_test = test_df.drop("target", axis=1)
y_test = test_df['target']

In [17]:
num_cols = X_train.select_dtypes(include=['int64', 'float64']).columns.tolist()

In [18]:
num_cols

['alcohol',
 'malic_acid',
 'ash',
 'alcalinity_of_ash',
 'magnesium',
 'total_phenols',
 'flavanoids',
 'nonflavanoid_phenols',
 'proanthocyanins',
 'color_intensity',
 'hue',
 'od280/od315_of_diluted_wines',
 'proline']

In [19]:
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

In [20]:
# Scale the continuous columns
sc = StandardScaler()

# Column transformer to apply transformation on numerical columns
ct = ColumnTransformer([
    ("Scaling", sc, num_cols)
])

# Random Forest Model
rfc = RandomForestClassifier()

# Sklearn pipeline to combine feature engineering and ML model
from sklearn.pipeline import Pipeline

pipeline_rfc_model = Pipeline([
    ("Data Transformations", ct),
    ("Random Forest Model", rfc)
])

In [21]:
# To view tha Pipeline model as a diagram
from sklearn import set_config
set_config(display="diagram")

In [22]:
# Fit the model locally
pipeline_rfc_model.fit(X_train, y_train)

0,1,2
,steps,"[('Data Transformations', ...), ('Random Forest Model', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,transformers,"[('Scaling', ...)]"
,remainder,'drop'
,sparse_threshold,0.3
,n_jobs,
,transformer_weights,
,verbose,False
,verbose_feature_names_out,True
,force_int_remainder_cols,'deprecated'

0,1,2
,copy,True
,with_mean,True
,with_std,True

0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [23]:
y_pred_train = pipeline_rfc_model.predict(X_train)
y_pred_test = pipeline_rfc_model.predict(X_test)

In [24]:
from sklearn.metrics import accuracy_score

# Compute accuracy on training data 
train_acc = accuracy_score(y_train, y_pred_train)
print(f"Train Accuracy: {train_acc:.4f}")

# Compute accuracy on test data
test_acc = accuracy_score(y_test, y_pred_test)
print(f"Test Accuracy: {test_acc:.4f}")

Train Accuracy: 1.0000
Test Accuracy: 1.0000


Fit the model on Sagemaker

In [25]:
%%writefile train.py

import argparse
import os
import pandas as pd
import joblib
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

model_file_name = "pipeline_modelA.joblib"

# Main function
def main():
    # Arguments
    parser = argparse.ArgumentParser()

    # Inbuilt Arguments
    parser.add_argument("--model_dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    # Add arguments for data directories
    # SageMaker passes these automatically if you use inputs={...} in the estimator
    parser.add_argument("--train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))

    # Hyperparameters to Tune
    parser.add_argument("--n_estimators", type=int, default=100)
    parser.add_argument("--min_samples_split", type=float, default=0.05)
    parser.add_argument("--criterion", type=str, default="gini")
    
    args, _ = parser.parse_known_args()

    # Load data
    # Read from local container paths, not S3
    # We join the channel path with the filename
    print(f"Reading training data from: {args.train}")
    train_df = pd.read_csv(os.path.join(args.train, "train.csv"))
    
    print(f"Reading test data from: {args.test}")
    test_df = pd.read_csv(os.path.join(args.test, "test.csv"))

    # Split features and targets
    X_train = train_df.drop("target", axis=1)
    y_train = train_df['target']

    X_test = test_df.drop("target", axis=1)
    y_test = test_df['target']
    
    # Define columns
    num_cols = X_train.select_dtypes(include=['int64', 'float64']).columns.tolist()
    
    # Scale the numerical features
    sc = StandardScaler()

    # Column transformer to apply transformation on numerical columns
    ct = ColumnTransformer([
    ("Scaling", sc, num_cols)
    ])

    # Random Forest Model
    rfc = RandomForestClassifier(n_estimators=args.n_estimators, 
                                 min_samples_split=args.min_samples_split,
                                 criterion=args.criterion)

    # Sklearn pipeline to combine feature engineering and ML model
    pipeline_rfc_model = Pipeline([
    ("Data Transformations", ct),
    ("Random Forest Model", rfc)
    ])
    
    # Fit the model locally
    pipeline_rfc_model.fit(X_train, y_train)
    
    y_pred_train = pipeline_rfc_model.predict(X_train)
    y_pred_test = pipeline_rfc_model.predict(X_test)
    
    # Compute accuracy on training data 
    train_acc = accuracy_score(y_train, y_pred_train)
    print(f"Train Accuracy: {train_acc:.4f}")

    # Compute accuracy on test data
    test_acc = accuracy_score(y_test, y_pred_test)
    print(f"Test Accuracy: {test_acc:.4f}")

    # Save the model
    model_save_path = os.path.join(args.model_dir, model_file_name)
    joblib.dump(pipeline_rfc_model, model_save_path)
    print(f"Model saved at {model_save_path}")

# Run the main function when the script runs
if __name__ == "__main__":
    main()

Writing train.py


In [26]:
%%writefile requirements.txt
pandas
scikit-learn
fsspec
s3fs

Writing requirements.txt


In [27]:
# Organize files
# This creates a 'code' folder and moves your files there.
# This ensures requirements.txt is found and installed correctly.
!mkdir -p code
!mv train.py code/
!mv requirements.txt code/

In [28]:
# Train!
# Choose instance_type
# Choose framework_version
import sagemaker
from sagemaker.sklearn.estimator import SKLearn
from sagemaker import get_execution_role

# Define the S3 Paths for your data
train_path = "s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/train.csv"
test_path = "s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/test.csv"


sklearn_estimator = SKLearn(
    base_job_name="wine-type-training-job",
    framework_version="1.2-1",
    
    # source_dir points to the folder containing BOTH script and requirements
    source_dir="code", 
    entry_point="train.py",

    # Note: We removed 'dependencies' because source_dir handles requirements.txt automatically
    
    hyperparameters={
        "n_estimators": 50,
        "min_samples_split": 0.05,
        "criterion": "gini"
    },
    instance_count=1,
    instance_type="ml.m5.large",
    use_spot_instances=True,
    max_wait=600,
    max_run=600,
    role=get_execution_role(),
)

# Launch Training with Inputs
# The keys 'train' and 'test' match the arguments in your script!
sklearn_estimator.fit({
    'train': train_path,
    'test': test_path
})

INFO:sagemaker:Creating training-job with name: wine-type-training-job-2025-11-27-18-14-57-233


2025-11-27 18:14:58 Starting - Starting the training job...
2025-11-27 18:15:13 Starting - Preparing the instances for training...
  import pkg_resources[0m
[34m2025-11-27 18:17:11,198 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2025-11-27 18:17:11,203 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2025-11-27 18:17:11,207 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2025-11-27 18:17:11,225 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2025-11-27 18:17:11,487 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt[0m
[34mCollecting fsspec (from -r requirements.txt (line 3))
  Downloading fsspec-2025.10.0-py3-none-any.whl.metadata (10 kB)[0m
[34mCollecting s3fs (from -r requirements.txt (line 4))
  Downloading s3fs-2025.10.0-py3-none-any.whl.metadata (1.4 kB)[0m
[34mCo

Check the training job name

In [29]:
import boto3
sm_client = boto3.client("sagemaker")

training_job_name = sklearn_estimator.latest_training_job.name

# Location of the model stored in S3
model_artifact = sm_client.describe_training_job(
    TrainingJobName=training_job_name
)["ModelArtifacts"]["S3ModelArtifacts"]

print(f"Training job name: {training_job_name}")
print(f"Model storage location: {model_artifact}")

Training job name: wine-type-training-job-2025-11-27-18-14-57-233
Model storage location: s3://sagemaker-ap-southeast-2-907808569037/wine-type-training-job-2025-11-27-18-14-57-233/output/model.tar.gz


Hyperparameter Tuning

In [30]:
import sagemaker
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, ContinuousParameter, CategoricalParameter

# Define the ranges to search
hyperparameter_ranges = {
    "n_estimators": IntegerParameter(1, 20),
    "min_samples_split": ContinuousParameter(0.01, 0.5),
    "criterion": CategoricalParameter(["gini", "entropy"])
}

# Define the Metric to Optimize
# This regex matches the print statement in train.py: "Test Accuracy: 0.9557"
objective_metric_name = 'test-accuracy'
metric_definitions = [{'Name': 'test-accuracy', 'Regex': 'Test Accuracy: ([0-9\\.]+)'}]

# Create the Tuner
tuner = HyperparameterTuner(
    estimator=sklearn_estimator,
    objective_metric_name=objective_metric_name,
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=metric_definitions,
    max_jobs=10,           # Total number of training jobs to run (Budget)
    max_parallel_jobs=2,   # How many to run at the same time
    objective_type='Maximize'
)

# Launch the Tuning Job
# We pass the same data inputs as before
tuner.fit({
    'train': "s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/train.csv",
    'test': "s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/test.csv"
})

INFO:sagemaker:Creating hyperparameter tuning job with name: sagemaker-scikit-lea-251127-1818


....................................................................................................................................................................................!


In [31]:
# Analyze tuning results
results = tuner.analytics().dataframe()
results.sort_values("FinalObjectiveValue", ascending=False).head()

Unnamed: 0,criterion,min_samples_split,n_estimators,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
1,"""entropy""",0.123598,18.0,sagemaker-scikit-lea-251127-1818-009-35803546,Completed,1.0,2025-11-27 18:31:50+00:00,2025-11-27 18:34:04+00:00,134.0
2,"""entropy""",0.010036,8.0,sagemaker-scikit-lea-251127-1818-008-cf4cf203,Completed,1.0,2025-11-27 18:28:47+00:00,2025-11-27 18:30:51+00:00,124.0
6,"""entropy""",0.070847,7.0,sagemaker-scikit-lea-251127-1818-004-406a0554,Completed,1.0,2025-11-27 18:22:38+00:00,2025-11-27 18:24:52+00:00,134.0
3,"""entropy""",0.01,7.0,sagemaker-scikit-lea-251127-1818-007-e162290a,Completed,1.0,2025-11-27 18:28:41+00:00,2025-11-27 18:30:47+00:00,126.0
0,"""entropy""",0.043749,12.0,sagemaker-scikit-lea-251127-1818-010-0b0e961c,Completed,0.9444,2025-11-27 18:31:49+00:00,2025-11-27 18:33:58+00:00,129.0


In [32]:
best_job_name = tuner.best_training_job()
print(f"The best performing job was: {best_job_name}")

The best performing job was: sagemaker-scikit-lea-251127-1818-004-406a0554


Create the infernece script(serve.py)

In [33]:
%%writefile serve.py

import os
import joblib
import pandas as pd

def model_fn(model_dir):
    """Load and return the model"""
    model_file_name = "pipeline_modelA.joblib"
    pipeline_model = joblib.load(os.path.join(model_dir, model_file_name))
    
    return pipeline_model

def input_fn(request_body, request_content_type):
    """Process the input json data and return the processed data.
    You can also add any input data pre-processing in this function
    """
    if request_content_type == "application/json":
        input_object = pd.read_json(request_body, lines=True)
        
        return input_object
    else:
        raise ValueError("Only application/json content type supported!")

def predict_fn(input_object, pipeline_model):
    """Make predictions on processed input data"""
    predictions = pipeline_model.predict(input_object)
    pred_probs = pipeline_model.predict_proba(input_object)
    
    prediction_object = pd.DataFrame(
        {
            "prediction": predictions.tolist(),
            "pred_prob_class0": pred_probs[:, 0].tolist(),
            "pred_prob_class1": pred_probs[:, 1].tolist()
        }
    )
    
    return prediction_object

def output_fn(prediction_object, request_content_type):
    """Post process the predictions and return as json"""
    return_object = prediction_object.to_json(orient="records", lines=True)
    
    return return_object

Overwriting serve.py


In [34]:
%%writefile requirements.txt
pandas
numpy
scikit-learn
joblib

Writing requirements.txt


Serverless Inference Endpoint

In [35]:
# Create the deployment
from sagemaker.sklearn.model import SKLearnModel
from sagemaker import Session, get_execution_role

session = Session()
bucket = session.default_bucket()

training_job_name = "sagemaker-scikit-lea-251127-1818-004-406a0554"
model_artifact = f"s3://{bucket}/{training_job_name}/output/model.tar.gz"
endpoint_name = "wine-type-prediction-pipeline-real-time"

model = SKLearnModel(
    name=endpoint_name,
    framework_version='1.2-1',
    entry_point='serve.py',
    source_dir='.',
    model_data=model_artifact,
    role=get_execution_role() 
)

In [36]:
# Create a config for serverless inference
from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(memory_size_in_mb=1024, max_concurrency=4)

In [37]:
# Deploy the model
predictor = model.deploy(serverless_inference_config=serverless_config)

INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.
INFO:sagemaker:Creating model with name: wine-type-prediction-pipeline-real-time
INFO:sagemaker:Creating endpoint-config with name wine-type-prediction-pipeline-real-time-2025-11-27-18-38-56-640
INFO:sagemaker:Creating endpoint with name wine-type-prediction-pipeline-real-time-2025-11-27-18-38-56-640


----!

In [38]:
endpoint_name = predictor.endpoint_name
print("Endpoint_Name:")
print(f"{endpoint_name}")

Endpoint_Name:
wine-type-prediction-pipeline-real-time-2025-11-27-18-38-56-640


Invoke the model

In [39]:
# Load some data that we want to make predictions on
import pandas as pd
import json

test_df = pd.read_csv("s3://sagemaker-ap-southeast-2-907808569037/sagemaker/wine_type/test.csv")

X_test = test_df.drop("target", axis=1)
y_test = test_df["target"]

# Get 2 rows to make prediction on
x_pred = X_test.head(2).to_json(orient='records', lines=True)
x_pred

'{"alcohol":13.64,"malic_acid":3.1,"ash":2.56,"alcalinity_of_ash":15.2,"magnesium":116,"total_phenols":2.7,"flavanoids":3.03,"nonflavanoid_phenols":0.17,"proanthocyanins":1.66,"color_intensity":5.1,"hue":0.96,"od280\\/od315_of_diluted_wines":3.36,"proline":845}\n{"alcohol":14.21,"malic_acid":4.04,"ash":2.44,"alcalinity_of_ash":18.9,"magnesium":111,"total_phenols":2.85,"flavanoids":2.65,"nonflavanoid_phenols":0.3,"proanthocyanins":1.25,"color_intensity":5.24,"hue":0.87,"od280\\/od315_of_diluted_wines":3.33,"proline":1080}\n'

In [40]:
# Submit to the Sereverless endpoint
import boto3
import json

sm_runtime = boto3.client("sagemaker-runtime")

response = sm_runtime.invoke_endpoint(EndpointName=endpoint_name,
                                      Body=x_pred,
                                      ContentType="application/json",
                                      Accept="application/json") 

In [41]:
print(response)

{'ResponseMetadata': {'RequestId': '3fa32f86-a8c2-4499-a45a-97f95f8d5d69', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3fa32f86-a8c2-4499-a45a-97f95f8d5d69', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Thu, 27 Nov 2025 18:43:04 GMT', 'content-type': 'text/html; charset=utf-8', 'content-length': '160', 'connection': 'keep-alive'}, 'RetryAttempts': 0}, 'ContentType': 'text/html; charset=utf-8', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7f7675586110>}


In [42]:
# Decode the response from the endpoint
response_body = response["Body"]
response_str = response_body.read().decode('utf-8')
response_df = pd.read_json(response_str, lines=True)

print(response_df)

   prediction  pred_prob_class0  pred_prob_class1
0           0          0.737365          0.203812
1           0          0.979592          0.020408


  response_df = pd.read_json(response_str, lines=True)


In [43]:
import boto3

def cleanup(endpoint_name):
    sm_client = boto3.client("sagemaker")
    sm_client.delete_endpoint(EndpointName=endpoint_name)

In [44]:
cleanup(endpoint_name)