## Setup and run a hyperparameter tuning job with the SageMaker SDK and HuggingFace container - this example uses the HuggingFace Trainer class.
1. Load data to S3 for training.
2. Configure SageMaker HuggingFace estimator.
3. Select hyperparameters to search.
4. Configure hyperparameter sweep with SageMaker Tuner.
5. Evaluate results.
6. Use best model for inference and deploy.

## The main difference between this notebook and hyperparameter_tuning.ipynb is that the training script I use here works with the HuggingFace Trainer class.

I am using a PyTorch 1.8 Python 3.6 GPU optimized kernel - Different kernels may require different versioning.

In [4]:
# !pip install "sagemaker>=2.48.0" "transformers==4.12.3" "datasets[s3]==1.18.3" --upgrade
# !pip install aiobotocore
#!{sys.executable} -m pip install ...

imports

In [126]:
import sagemaker
import boto3
import sagemaker.huggingface
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner
)

from sagemaker.huggingface import HuggingFace

from datasets import load_dataset
from transformers import AutoTokenizer, get_constant_schedule, get_constant_schedule_with_warmup
import s3fs

from utils import summarize_hpo_results

In [6]:
session = sagemaker.Session()

role = sagemaker.get_execution_role()
role_name = role.split('/')[-1]

sagemaker_session_bucket = session.default_bucket()

In [7]:
session = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {session.default_bucket()}")
print(f"sagemaker session region: {session.boto_region_name}")

sagemaker role arn: arn:aws:iam::264639154954:role/aaca-ani-cogsci-sagemaker-studio-role
sagemaker bucket: sagemaker-us-east-1-264639154954
sagemaker session region: us-east-1


Download data from Datasets library and store in S3

In [8]:
dataset_name = "imdb"

s3_prefix = "samples/datasets/imdb"

tokenizer_name = "distilbert-base-uncased"

In [9]:
dataset = load_dataset(dataset_name, ignore_verifications=True)

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

#tokenizer helper function
def tokenize(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True)

# load dataset
train_dataset, test_dataset = load_dataset('imdb', split=['train', 'test'])
test_dataset = test_dataset.shuffle().select(range(10000))

Reusing dataset imdb (/root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset imdb (/root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)


  0%|          | 0/2 [00:00<?, ?it/s]

Tokenize the data

In [10]:
train_dataset = train_dataset.map(tokenize, batched=True, batch_size=len(train_dataset))
test_dataset  = test_dataset.map(tokenize, batched=True, batch_size=len(test_dataset))

Loading cached processed dataset at /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1/cache-49cffa5c30057620.arrow


  0%|          | 0/1 [00:00<?, ?ba/s]

Convert to tensors

In [11]:
train_dataset = train_dataset.rename_column("label", "labels")
train_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
test_dataset = test_dataset.rename_column("label", "labels")
test_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

Upload data to S3 using the 'save_to_disk' Datasets method

In [12]:
s3 = s3fs.S3FileSystem()

training_input_path = f"s3://{session.default_bucket()}/{s3_prefix}/train"
train_dataset.save_to_disk(training_input_path, fs=s3)

test_input_path = f"s3://{session.default_bucket()}/{s3_prefix}/test"
test_dataset.save_to_disk(test_input_path, fs=s3)

print(f"train_path: {training_input_path}")
print(f"test_path: {test_input_path}")

train_path: s3://sagemaker-us-east-1-264639154954/samples/datasets/imdb/train
test_path: s3://sagemaker-us-east-1-264639154954/samples/datasets/imdb/test


## Hyperparameter tuning

Set hyperparamter values that are passed to the estimtor, but we are not seeking to tune.

In [13]:
hyperparameters = {"epochs": 2,
                   "train_batch_size": 16,
                   "model_name": "distilbert-base-uncased"
                  }

## Configure HuggingFace estimator

In [15]:
# tags very useful in multi-user accounts for tracking key information
TAGS = [{"Key": "Owner", "Value": "ccooney@aflac.com"},
        {"Key": "Environment", "Value": "Dev"}]

SageMaker has a specific estimator for use with HuggingFace (https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html)

It requires a python script to be passed (this is the main training script for your model). Other important parameters include instance type and package versioning.

In [34]:
huggingface_estimator = HuggingFace(entry_point='training_script.py',
                                    source_dir='./scripts',
                                    sagemaker_session=session,
                                    instance_type='ml.p3.2xlarge',
                                    instance_count=1,
                                    role=role,
                                    transformers_version='4.12',
                                    py_version='py38',
                                    pytorch_version='1.9',
                                    hyperparameters=hyperparameters,
                                    base_job_name='hpo-HF')

## Initialize hyperparameter ranges and the metric for evaluating model performance.

There are many hyperparameters that can be tuned the selection with largely be dependent on the type of model you are working with.

The SageMaker SDK enables setting of hyperparameter ranges through the ContinuousParameter, IntegerParameter, and CategoricalParameter methods.

Here, I am using all CategoricalParameters to demonstrate the 'Random' strategy for tuning. (See hyperparameter_tuning.ipynb for 'Bayesian')

Look at the arguments being parsed in huggingface_trainer.py to see some other hyperparameters that could be tuned.

In [120]:
hyperparameter_ranges = {"learning_rate": CategoricalParameter([0.001, 0.00001]),
                         "warmup_steps": CategoricalParameter([200, 400]),
                         "lr_scheduler_type": CategoricalParameter(["linear", "cosine"]),}

objective_metric = "loss"
objective_type = "Minimize"
metric_definitions = [{"Name": "loss", "Regex": "loss = ([0-9\\.]+)"}]

## Configure tuner
Pass the estimator, as well as the objective metric and the hyperparameter ranges we want to tune.
It is also possible to define the optimization strategy as 'Bayesian' | 'Random' | 'Hyperband' | 'Grid' (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html) - Here, we go with the Random approach.

In [122]:
tuner = HyperparameterTuner(
    huggingface_estimator,
    objective_metric,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=8,
    max_parallel_jobs=2,
    objective_type=objective_type,
    strategy="Random",
    tags=TAGS,
    base_tuning_job_name="hpo-HF",
)

In [123]:
tuner.fit(inputs={"train": training_input_path, "test": test_input_path})

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................!


## Evaluate the results

tuner.describe() prints details of the tuning job we created

Here, learning rate is clearly the most important hyperparameter to get right.

You may also notice the difference in how the tuning job is performed by 'Random' in comparison with 'Bayesian'

In [124]:
results_df = tuner.analytics().dataframe()
results_df.sort_values('FinalObjectiveValue')

Unnamed: 0,learning_rate,lr_scheduler_type,warmup_steps,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
6,"""1e-05""","""cosine""","""400""",hpo-HF-230125-1147-002-793ad2dd,Completed,0.001788,2023-01-25 11:49:06+00:00,2023-01-25 12:10:23+00:00,1277.0
0,"""1e-05""","""cosine""","""200""",hpo-HF-230125-1147-008-1d4e1504,Completed,0.001792,2023-01-25 12:47:14+00:00,2023-01-25 13:04:14+00:00,1020.0
2,"""1e-05""","""linear""","""200""",hpo-HF-230125-1147-006-ee5dcbe5,Completed,0.001797,2023-01-25 12:29:49+00:00,2023-01-25 12:46:49+00:00,1020.0
4,"""1e-05""","""linear""","""400""",hpo-HF-230125-1147-004-390b2a6b,Completed,0.001804,2023-01-25 12:12:25+00:00,2023-01-25 12:29:25+00:00,1020.0
3,"""0.001""","""linear""","""200""",hpo-HF-230125-1147-005-bbad7cf6,Completed,0.006931,2023-01-25 12:29:47+00:00,2023-01-25 12:46:48+00:00,1021.0
1,"""0.001""","""cosine""","""200""",hpo-HF-230125-1147-007-db54aac5,Completed,0.006931,2023-01-25 12:47:13+00:00,2023-01-25 13:04:18+00:00,1025.0
5,"""0.001""","""linear""","""400""",hpo-HF-230125-1147-003-044cf748,Completed,0.006932,2023-01-25 12:12:23+00:00,2023-01-25 12:29:23+00:00,1020.0
7,"""0.001""","""cosine""","""400""",hpo-HF-230125-1147-001-087bdf05,Completed,0.006932,2023-01-25 11:49:21+00:00,2023-01-25 12:10:13+00:00,1252.0


In [129]:
summarize_hpo_results(tuner.latest_tuning_job.job_name)

best score: 0.001787555986084044
best params: {'learning_rate': '"1e-05"', 'lr_scheduler_type': '"cosine"', 'warmup_steps': '"400"'}
best job-name: hpo-HF-230125-1147-002-793ad2dd


Select the best tuned model for deploying and making predictions

In [130]:
best_estimator = tuner.best_estimator()


2023-01-25 12:12:24 Starting - Preparing the instances for training
2023-01-25 12:12:24 Downloading - Downloading input data
2023-01-25 12:12:24 Training - Training image download completed. Training in progress.
2023-01-25 12:12:24 Uploading - Uploading generated training model
2023-01-25 12:12:24 Completed - Resource reused by training job: hpo-HF-230125-1147-004-390b2a6b


In [131]:
predictor = best_estimator.deploy(1, "ml.g4dn.xlarge")

-----------!

Use your deployed model to make predictions

In [132]:
result = predictor.predict({"inputs": "I watched this movie last night. At first I thought it was going to be truely awful, but in the end I have to admit I really enjoyed it and would recomment it to friends."})
result

[{'label': 'LABEL_1', 'score': 0.8847560286521912}]

In [133]:
classes = ["negative", "positive"]
id2label = {f"LABEL_{v}": k for v, k in enumerate(classes)}

print(f"Result: {id2label[result[0]['label']]}")

Result: positive


negative but with low confidence

In [135]:
result = predictor.predict({"inputs": "I have no opinion in this."})
result

[{'label': 'LABEL_0', 'score': 0.6395813822746277}]

In [136]:
predictor.delete_endpoint()