# PyTorch training

***

## Prerequisites

In [None]:
! pip install -r ./scripts/requirements.txt --upgrade
! pip install -U sagemaker

***

## Setup Configuration file path

In [None]:
import os
import sys

module_path = os.path.abspath(os.path.join("../.."))

if module_path not in sys.path:
    sys.path.append(module_path)

# Dataset

The data set (The Social Dilemma Tweets - Text Classification 2020) was downloaded from [Kaggle](https://www.kaggle.com/datasets/kaushiksuresh147/the-social-dilemma-tweets).
This dataset brings you the twitter responses made with the #TheSocialDilemma hashtag after watching the eye-opening documentary "The Social Dilemma" released in an OTT platform(Netflix) on September 9th, 2020.
The dataset was extracted using TwitterAPI, consisting of nearly 10,526 tweets from twitter users all over the globe!

We'd like to train a model based on the content of the text in order to determine the sentiment.

This is a multi-class classification problem:
* Negative - 0
* Neutral - 1
* Positive - 2

In [None]:
! rm -rf ./data && mkdir -p ./data
! curl https://sagemaker-sample-files.s3.amazonaws.com/datasets/tabular/tweets_dataset/TheSocialDilemma.csv -o ./data/data.csv

# Step 1 - Import Modules

Here we’ll import some libraries and define some variables.

In [None]:
import boto3
import sagemaker

In [None]:
sagemaker_client = boto3.client("sagemaker")
s3_client = boto3.client("s3")

Create a SageMaker Session and save the default region and the execution role in some Python variables

In [None]:
sagemaker_session = sagemaker.Session()
region = boto3.session.Session().region_name
role = sagemaker.get_execution_role()

In [None]:
bucket_name = sagemaker_session.default_bucket()

## Upload the dataset in the default Amazon S3 Bucket

In order to make data available for the SageMaker Processing Job, let's copy the dataset in the default S3 Bucket

In [None]:
# Download the 
# clean the buckets first
s3_client.delete_object(Bucket=bucket_name, Key="e2e-base/data/input")

input_data = sagemaker_session.upload_data('./data/data.csv', key_prefix="e2e-base/data/input")

input_data

***

# Step 2 - Create the Estimator

In [None]:
! pygmentize ./scripts/train.py

In [None]:
import sagemaker
from sagemaker.config import load_sagemaker_config

In [None]:
sagemaker_session = sagemaker.Session()

bucket_name = sagemaker_session.default_bucket()
default_prefix = sagemaker_session.default_bucket_prefix
configs = load_sagemaker_config()

In [None]:
instance_type = "ml.c5.xlarge"  # Override the instance type if you want to get a different container version
instance_count = 1

instance_type

In [None]:
image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=sagemaker_session.boto_session.region_name,
    version="2.6.0",
    instance_type=instance_type,
    image_scope="training"
)

image_uri

In [None]:
from sagemaker.pytorch.estimator import PyTorch

role = sagemaker.get_execution_role()

# define Training Job Name
job_name = f"train-pytorch-batch"

# define OutputDataConfig path
if default_prefix:
    output_path = f"s3://{bucket_name}/{default_prefix}/{job_name}"
else:
    output_path = f"s3://{bucket_name}/{job_name}"

# Define the Estimator
estimator = PyTorch(
    image_uri=image_uri,
    entry_point="train.py",
    source_dir="./scripts",
    instance_type=instance_type,
    instance_count=instance_count,
    role=role,
    sagemaker_session=sagemaker_session,
    base_job_name=job_name,
    max_run=7200,
    hyperparameters={"epochs": 100, "learning_rate": 0.001, "batch_size": 100},
    output_path=output_path,
)

In [None]:
from sagemaker.inputs import TrainingInput

# Pass the input data
train_input = TrainingInput(
    s3_data=input_data,
    distribution="FullyReplicated",
)

TRAINING_INPUTS = {
    "train": train_input
}

TRAINING_INPUTS

***

## Queue Some Training Jobs
This section and the following are intended to be used interactively so that you can explore how to use the SageMaker Python SDK to submit jobs to your Batch queues. Let's start by selecting which queue to submit to.

### Select the Queue to Use

In [None]:
from sagemaker.aws_batch.training_queue import TrainingQueue
# Set the queue type to use for your job submission
SMTJ_BATCH_QUEUE = "ml-c5-xlarge-queue"

# Construct the queue object using the SageMaker Python SDK
queue = TrainingQueue(SMTJ_BATCH_QUEUE)
print(f"Using queue: {queue.queue_name}")

### Submit your jobs
In the next cell, we are going to submit 2 Training jobs in the queue

We are going to use the API `submit` to submit all the jobs

In [None]:
for i in range(1, 3):
    job_name_i = f"{job_name}-{i}"
    queued_job = queue.submit(estimator, TRAINING_INPUTS, job_name_i)
    print(f"Submitted job {job_name_i}: {queued_job}")

## Display the Status of Running and 'In Queue' Jobs
We can use the job queue list and job queue snapshot APIs to programmaticaly view a snapshot of the jobs that the queue will run next. Keep in mind that for fair-share queues this ordering is dynamic and occassionally needs to be refreshed as new jobs are submitted to the queue or as share usage changes over time.

In [None]:
from smtj_batch_utils.queue_utils import print_queue_state

print_queue_state(queue)

### Submit an additional job
In the next cell, we are going to submit an additional job to the queue, by using the API `submit`

In [None]:
job_name_3 = job_name + "-3"
queued_job_3 = queue.submit(
    estimator, TRAINING_INPUTS, job_name_3
)

## Display the Status of Running and 'In Queue' Jobs
Now we are going to see another runnable job. Given that the last job has high priority, it will be run before the `MIDPRI` and `LOWPRI` jobs

In [None]:
from smtj_batch_utils.queue_utils import print_queue_state

print_queue_state(queue)

## Cancel a Job in the Queue
This next cell shows how to cancel an in queue job.

In [None]:
runnable_jobs = queue.list_jobs(status="RUNNABLE")
if runnable_jobs:
    for job in runnable_jobs:
        job_to_cancel = job
        print(f"Cancelling job: {job_to_cancel.describe().get('jobName', '')}")
        job_to_cancel.terminate()