## Project Description

Imagine you're developing a deep learning system tailored for sentiment analysis of product reviews, specifically for a newly established online beautiy product retail company. The goal is to assist the company in making informed decisions about inventory management – deciding what products to retain and what to remove from stock. The company, keen on enhancing customer satisfaction, has been actively monitoring comments on their website and has invested in annotators to label sentiments. They hand you a dataset comprising 80,000 customer reviews, each labeled with 0 for negative sentiment and 1 for positive sentiment. After extensive effort and refinement, you successfully train and deploy a classifier that predicts sentiment based on online comments. Excitedly, you report an 86% accuracy on a held-out test set to your bosses. However, to your disappointment, management expresses dissatisfaction, insisting on a minimum of 90% accuracy before considering the widespread implementation of the AI model. 
You suspect that certain annotators might have made errors, potentially affecting your model's effectiveness. Empowered by a newfound "confidence," you opt for "confidence" learning to pinpoint and rectify any inaccuracies in the dataset before embarking on the retraining process once more.

First, we prepare the environment for AWS SageMaker operations by setting up clients and retrieving essential configuration details like the default S3 bucket, execution role, and AWS region. 

In [1]:
import sagemaker
import logging
import boto3
import sagemaker
import pandas as pd
import json
import botocore
from botocore.exceptions import ClientError

config = botocore.config.Config(user_agent_extra='dlai-pds/c2/w3')

# low-level service client of the boto3 session
sm = boto3.client(service_name='sagemaker', 
                  config=config)

sm_runtime = boto3.client('sagemaker-runtime',
                          config=config)

sess = sagemaker.Session(sagemaker_client=sm,
                         sagemaker_runtime_client=sm_runtime)

bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = sess.boto_region_name

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [2]:
import os

In [3]:
bucket

'sagemaker-us-east-1-832397538555'

We then configure the data source for a training job in SageMaker, defining where the training data is located (in this case, an S3 bucket) and the nature of the data.

In [4]:
from sagemaker.inputs import TrainingInput

# TODO: set the path to the train data
train_data = TrainingInput(
    ..., 
    content_type='application/x-sagemaker-training-data'
)


In [5]:
from sagemaker.inputs import TrainingInput

# Set the path to your training data in S3
train_data = TrainingInput(
    s3_data="s3://myawsbucketdata1/data/",
    content_type='application/x-sagemaker-training-data'
)
output_path = "s3://myawsbucketdata1/output/"
train_path = '/opt/ml/input/data/train'


A PyTorch estimator with the specified configurations for a SageMaker training job is created. The training job will use the provided entry point script, run on the specified instance type, and output the trained model to the specified S3 path. The entry point script main.py contains the main steps that needs to be completed in this project.

In [6]:
source_path = os.getcwd()

The following script sets up a ModelCheckpoint callback to automatically save the best model (based on development loss) during the training process in a SageMaker training job. The best model will be stored at the specified directory path within the SageMaker environment.

In [8]:
# Save the best model during training by specifying the output path
# (Note: The output path should be where the best model will be saved within the S3 bucket)
model_checkpoint = {
    'ModelCheckpoint': {
        'monitor': 'dev_loss',
        'dirpath': '/opt/ml/model/',
        'filename': 'best_model',
        'save_top_k': 1,
        'mode': 'min'
    }
}



In [9]:
from sagemaker.pytorch import PyTorch
from sagemaker.inputs import TrainingInput


estimator = PyTorch(
    entry_point="main2.py",  # Path to your main training script
    source_dir=source_path,  # Path to the directory containing your training script and dependencies
    base_job_name="sagemaker-script-mode",  # Base name for your training job
    role=role,  # AWS role with permissions to run SageMaker, access to S3, etc.
    instance_count=1,  # Number of EC2 instances for training
    instance_type="ml.p3.2xlarge",  # EC2 instance type for training
    framework_version="2.1",  # Specify the PyTorch framework version that matches with py310
    py_version="py310",  # Python version to use (py3 corresponds to the best compatible Python 3 version)
    output_path= output_path,  # Path in S3 for model artifacts
    dependencies= ["requirements.txt"],
    hyperparameters={
        'batch_size': 32,  # Example hyperparameter
        'lr': 0.01  # Example learning rate
    },
    environment={
        'PYTHONPATH': 'src',  # Setting custom PYTHONPATH if needed
        'SM_CHANNEL_TRAIN': train_path ,  # Optionally, specify paths explicitly
    }
)

# TODO: Train the model
estimator.fit({'train': train_data})


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: sagemaker-script-mode-2024-06-06-05-35-32-600


2024-06-06 05:35:38 Starting - Starting the training job...
2024-06-06 05:35:39 Pending - Training job waiting for capacity......
2024-06-06 05:37:04 Pending - Preparing the instances for training......
2024-06-06 05:37:44 Downloading - Downloading input data...
2024-06-06 05:38:14 Downloading - Downloading the training image...............
2024-06-06 05:41:00 Training - Training image download completed. Training in progress...[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-06-06 05:41:16,899 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-06-06 05:41:16,917 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 05:41:16,929 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-06-06 05:41:16,931 sagemaker_pytorch_container.training INFO

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index True not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index True not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index False not found in DataFrame.[0m
[34mInvalid index True not found in DataFrame.

Starting the training process: 

In [10]:
model_artifact = estimator.model_data
print(f"Model artifact located at: {model_artifact}")
pytorch_image_uri = "763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.1-gpu-py310"

Model artifact located at: s3://myawsbucketdata1/output/sagemaker-script-mode-2024-06-06-05-35-32-600/output/model.tar.gz


## Model Deployment

We need to copy the training artifacts, i.e, output.tar.gz, from the corresponding S3 bucket to the current working directory.

In [12]:
#TODO: copy the training artifacts from the S3 bucket to the current working directory
!aws s3 cp s3://myawsbucketdata1/output/sagemaker-script-mode-2024-05-06-11-23-56-209/output/output.tar.gz .

download: s3://myawsbucketdata1/output/sagemaker-script-mode-2024-05-06-11-23-56-209/output/output.tar.gz to ./output.tar.gz


In [13]:
!tar -xzf output.tar.gz -C extracted_training_artifacts

tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'


In [14]:
from sagemaker.pytorch import PyTorchModel

pytorch_model = PyTorchModel(
    model_data=model_artifact,
    role=role,
    image_uri=pytorch_image_uri,  # Use the retrieved image URI
    entry_point='inference.py',  # Your inference script
    source_dir='src',  # Directory containing all necessary files
    sagemaker_session=sess
)

In [None]:
predictor = pytorch_model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='sentiment-analysis-endpoint-7'
)

INFO:sagemaker:Repacking model artifact (s3://myawsbucketdata1/output/sagemaker-script-mode-2024-05-15-10-42-50-035/output/model.tar.gz), script artifact (src), and dependencies ([]) into single tar.gz file located at s3://sagemaker-us-east-1-832397538555/pytorch-training-2024-05-15-15-16-21-880/model.tar.gz. This may take some time depending on model size...
INFO:sagemaker:Creating model with name: pytorch-training-2024-05-15-15-16-22-661
INFO:sagemaker:Creating endpoint-config with name sentiment-analysis-endpoint-7
INFO:sagemaker:Creating endpoint with name sentiment-analysis-endpoint-7


-----------------------------------------------