## Deploy Logistic Regression model using SageMaker
boto3 - 
The AWS SDK for Python (Boto3) provides a Python API for AWS infrastructure services. Using the SDK for Python, you can build applications on top of Amazon S3, Amazon EC2, Amazon DynamoDB, and more.

In [1]:
import sagemaker
from sklearn.model_selection import train_test_split
import  pandas as pd
import boto3

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/arifmoazy/Library/Application Support/sagemaker/config.yaml


In [2]:
sm_boto3 = boto3.client('sagemaker')
sess = sagemaker.Session()
region = sess.boto_session.region_name

# get the unique name of s3 bucket
bucket = 'aws-sagemaker-fraud-detection-s3'


In [7]:
# publish/upload the data in S3
dataset_path_in_s3 = 'sagemaker/fraud_detection/data'

train_data_path = sess.upload_data(
     path='../data/train.csv', bucket=bucket, key_prefix=dataset_path_in_s3
)

test_data_path = sess.upload_data(
     path='../data/test.csv', bucket=bucket, key_prefix=dataset_path_in_s3
)
print(train_data_path)
print(test_data_path)

s3://aws-sagemaker-fraud-detection-s3/sagemaker/fraud_detection/data/train.csv
s3://aws-sagemaker-fraud-detection-s3/sagemaker/fraud_detection/data/test.csv


### Issue in the following code
The error message ModuleNotFoundError: No module named 's3fs' indicates that the Python package s3fs is not installed in your environment. This package is necessary for accessing S3 buckets using pandas, as it enables reading and writing files from S3 in a way that is compatible with pandas' read_csv function.
Solution:
pip install s3fs

In [12]:
from sagemaker import get_execution_role
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load your data (from S3, you can use pandas or directly split it)
df_train = pd.read_csv(train_data_path)
df_test = pd.read_csv(test_data_path)


X_train, y_train = df_train.drop(columns=['is_fraud']), df_train['is_fraud']
X_test, y_test = df_test.drop(columns=['is_fraud']), df_test['is_fraud']


### Create a Training Script for Logistic Regression
SageMaker allows you to provide a custom training script using its Script Mode. You can save the following code as train_logistic_regression.py in your local directory.
I have created a python script file in src by the name of sagemaker_logistic_regression.py

### Create a SageMaker Estimator to Train the Model
Now, use the Estimator to initiate training on AWS SageMaker.

In [26]:
# Get execution role
# get_execution_role() only work in sagemaker studio not out of it
# role = get_execution_role()
# Manually Finding the Role in IAM: If get_execution_role() doesn’t work for your environment, you can find the role manually:
# Go to the IAM section in the AWS Console.
# Click on Roles on the left sidebar.
# Find or search for the role that has the required permissions for SageMaker.
# Copy the ARN of that role.
# Example of an ARN for a role:
# Account ID (IAM user): 145023122385
# role = 'arn:aws:iam::145023122385:role/aws-sagemaker-practitioner'
role = 'arn:aws:iam::145023122385:role/aws-sagemaker-practitioner'
# Make sure this role has the necessary permissions (SageMaker, S3, etc.).

In [30]:
pd.read_csv(train_data_path).head()

Unnamed: 0,cc_num,merchant,category,amt,gender,zip,lat,long,city_pop,job,unix_time,merch_lat,merch_long,is_fraud
0,6011382886333463,644,0,-0.326279,1,-0.007694,1.19669,0.432759,-0.290814,201,1378557464,43.72775,-85.046376,0
1,370877495212014,87,4,0.785585,0,-0.751571,-0.462636,0.663378,-0.223335,126,1384774507,36.851523,-80.202303,0
2,3566373869538620,103,2,-0.162638,1,1.440961,-0.614888,-1.144865,-0.290618,366,1380454669,34.765582,-106.874102,0
3,6517217825320610,475,8,-0.434735,1,-0.341744,-1.362873,0.003733,-0.289709,258,1380431862,30.818746,-90.609324,0
4,213125815021702,75,13,-0.378975,1,-1.356629,0.697593,1.149876,-0.292369,356,1388422195,42.311107,-74.938746,0


In [28]:
from sagemaker.sklearn.estimator import SKLearn

# Define the SKLearn estimator
sklearn_estimator = SKLearn(
    entry_point="sagemaker_logistic_regression.py",  # Path to your script
    role=role,
    instance_type="ml.m5.large",
    instance_count=1,
    framework_version="0.23-1",  # Version of sklearn used
    # we can also pass the custom parameters as well
    # hyperparamters={
    #     'n_estimators': 100,
    #     'random_state': 0
    # }
    py_version="py3",
    output_path=f"s3://{bucket}/sagemaker/fraud_detection/output",
    use_spot_instances=True,
    max_wait=7200,
    max_run=3600
)

# Start the training job
sklearn_estimator.fit({'train': train_data_path}, wait=True)
# sklearn_estimator.fit({'train': train_data_path}, wait=True)


INFO:sagemaker:Creating training-job with name: sagemaker-scikit-learn-2024-10-23-22-11-29-228


2024-10-23 22:08:19 Starting - Starting the training job...
2024-10-23 22:08:41 Starting - Preparing the instances for training...
2024-10-23 22:09:04 Downloading - Downloading input data...
2024-10-23 22:09:30 Downloading - Downloading the training image..2024-10-23 22:10:09,322 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
2024-10-23 22:10:09,325 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2024-10-23 22:10:09,360 sagemaker_sklearn_container.training INFO     Invoking user training script.
2024-10-23 22:10:09,512 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2024-10-23 22:10:09,522 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2024-10-23 22:10:09,533 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2024-10-23 22:10:09,541 sagemaker-training-toolkit INFO     Invoking user script
Training Env:
{
    "ad

UnexpectedStatusException: Error for Training job sagemaker-scikit-learn-2024-10-23-22-11-29-228: Failed. Reason: AlgorithmError: framework error: 
Traceback (most recent call last):
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train
    entrypoint()
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 39, in main
    train(environment.Environment())
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 35, in train
    runner_type=runner.ProcessRunnerType)
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 100, in run
    wait, capture_error
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 291, in run
    cwd=environment.code_dir,
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 208, in check_error
    info=extra_info,
sagemaker_training.errors.ExecuteUserScriptError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage ""
Command "/miniconda3/bin/python sagemaker_logistic_regression.py. Check troubleshooting guide for common errors: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-python-sdk-troubleshooting.html

### Deploy the Model
Once the model is trained, you can deploy it to a real-time endpoint

In [None]:
# Deploy the model to an endpoint
predictor = sklearn_estimator.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="fraud-detection-endpoint"
)


### Make Predictions using the Deployed Model
After deploying the model, you can use the endpoint to make predictions on new data.

In [None]:
import numpy as np

# Send the test data to the deployed model for predictions
predictions = predictor.predict(X_test.values)
print(predictions)


### Clean up Resources
Once done, it is a good practice to delete the endpoint to avoid ongoing charges.

In [None]:
predictor.delete_endpoint()