## Setup and run a hyperparameter tuning job with the SageMaker SDK and HuggingFace container
1. Load data to S3 for training.
2. Configure SageMaker HuggingFace estimator.
3. Select hyperparameters to search.
4. Configure hyperparameter sweep with SageMaker Tuner.
5. Evaluate results.
6. Use best model for inference and deploy.

## pip installs
sagemaker, transformers, and datasets - versions can vary depending on needs

In [3]:
# ! pip install "sagemaker>=2.48.0" "transformers==4.12.3" "datasets[s3]==1.18.3" --upgrade
# ! pip install aiobotocore==2.3.4
# ! pip install s3fs --upgrade

### imports

In [5]:
import sagemaker
import boto3
import sagemaker.huggingface
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner
)

from sagemaker.huggingface import HuggingFace

from datasets import load_dataset
from transformers import AutoTokenizer
import s3fs

from utils import summarize_hpo_results

In [6]:
session = sagemaker.Session()

role = sagemaker.get_execution_role()
role_name = role.split('/')[-1]

sagemaker_session_bucket = session.default_bucket()

In [7]:
session = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {session.default_bucket()}")
print(f"sagemaker session region: {session.boto_region_name}")

sagemaker role arn: arn:aws:iam::264639154954:role/aaca-ani-cogsci-sagemaker-studio-role
sagemaker bucket: sagemaker-us-east-1-264639154954
sagemaker session region: us-east-1


Download data from Datasets library and store in S3

In [8]:
dataset_name = "tweet_eval"
dataset_subgroup = "sentiment"

s3_prefix = "samples/datasets/tweet_eval"

tokenizer_name = "distilbert-base-uncased"

In [9]:
dataset = load_dataset(dataset_name, dataset_subgroup, ignore_verifications=True)

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

#tokenizer helper function
def tokenize(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True)

# load dataset
train_dataset, test_dataset = load_dataset(dataset_name, dataset_subgroup, split=['train', 'test'])
test_dataset = test_dataset.shuffle().select(range(10000))

Reusing dataset tweet_eval (/root/.cache/huggingface/datasets/tweet_eval/sentiment/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset tweet_eval (/root/.cache/huggingface/datasets/tweet_eval/sentiment/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


  0%|          | 0/2 [00:00<?, ?it/s]

Tokenize the data

In [10]:
train_dataset = train_dataset.map(tokenize, batched=True, batch_size=len(train_dataset))
test_dataset  = test_dataset.map(tokenize, batched=True, batch_size=len(test_dataset))

Loading cached processed dataset at /root/.cache/huggingface/datasets/tweet_eval/sentiment/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-a36b975b9ca349ef.arrow


  0%|          | 0/1 [00:00<?, ?ba/s]

Convert to tensors

In [11]:
train_dataset = train_dataset.rename_column("label", "labels")
train_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
test_dataset = test_dataset.rename_column("label", "labels")
test_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

Upload data to S3 using the 'save_to_disk' Datasets method

In [None]:
Upload data to S3 using the 'save_to_disk' Datasets method

In [12]:
s3 = s3fs.S3FileSystem()

training_input_path = f"s3://{session.default_bucket()}/{s3_prefix}/train"
train_dataset.save_to_disk(training_input_path, fs=s3)

test_input_path = f"s3://{session.default_bucket()}/{s3_prefix}/test"
test_dataset.save_to_disk(test_input_path, fs=s3)

print(f"train_path: {training_input_path}")
print(f"test_path: {test_input_path}")

train_path: s3://sagemaker-us-east-1-264639154954/samples/datasets/tweet_eval/train
test_path: s3://sagemaker-us-east-1-264639154954/samples/datasets/tweet_eval/test


## Hyperparameter tuning

Set hyperparamter values that are passed to the estimtor, but we are not seeking to tune.

In [13]:
hyperparameters = {"epochs": 2,
                   "train_batch_size": 16,
                   "model_name": "distilbert-base-uncased",
                   "num_labels": len(set(dataset['train']['label'])),
                   "metic": "f1",
                  }

## Configure HuggingFace estimator

In [14]:
# tags very useful in multi-user accounts for tracking key information
TAGS = [{"Key": "Owner", "Value": "ccooney@aflac.com"},
        {"Key": "Environment", "Value": "Dev"}]

SageMaker has a specific estimator for use with HuggingFace (https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html)

It requires a python script to be passed (this is the main training script for your model). Other important parameters include instance type and package versioning.

In [15]:
huggingface_estimator = HuggingFace(entry_point='training_script.py',
                                    source_dir='./scripts',
                                    sagemaker_session=session,
                                    instance_type='ml.p3.2xlarge',
                                    instance_count=1,
                                    role=role,
                                    transformers_version='4.12',
                                    py_version='py38',
                                    pytorch_version='1.9',
                                    hyperparameters=hyperparameters,
                                    base_job_name='hpo-HF')

## Initialize hyperparameter ranges and the metric for evaluating model performance.

There are many hyperparameters that can be tuned the selection with largely be dependent on the type of model you are working with.

The SageMaker SDK enables setting of hyperparameter ranges through the ContinuousParameter, IntegerParameter, and CategoricalParameter methods.

Look at the arguments being parsed in training_script.py to see some other hyperparameters that could be tuned.

In [16]:
hyperparameter_ranges = {"learning_rate": ContinuousParameter(0.0001, 0.1),
                         "warmup_steps": IntegerParameter(100, 500),
                         "optimizer": CategoricalParameter(["AdamW", "Adafactor"]),
                         "weight_decay": ContinuousParameter(0.00, 0.001)}

objective_metric = "loss"
objective_type = "Minimize"
metric_definitions = [{"Name": "loss", "Regex": "loss = ([0-9\\.]+)"}]

## Configure tuner
Pass the estimator, as well as the objective metric and the hyperparameter ranges we want to tune.
It is also possible to define the optimization strategy as 'Bayesian' | 'Random' | 'Hyperband' | 'Grid' (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html) - Here, we go with the default Bayesian approach.

In [17]:
tuner = HyperparameterTuner(huggingface_estimator,
                            objective_metric,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=3,
                            max_parallel_jobs=2,
                            objective_type=objective_type,
                            tags=TAGS,
                            base_tuning_job_name="hpo-HF",)

Then call the fit method in the same way you would with a normal SageMaker estimator.

In [18]:
tuner.fit(inputs={"train": training_input_path, "test": test_input_path})

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................!


## Evaluate the results

tuner.describe() prints details of the tuning job we created

In [19]:
tuner.describe()

{'HyperParameterTuningJobName': 'hpo-HF-230208-1023',
 'HyperParameterTuningJobArn': 'arn:aws:sagemaker:us-east-1:264639154954:hyper-parameter-tuning-job/hpo-hf-230208-1023',
 'HyperParameterTuningJobConfig': {'Strategy': 'Bayesian',
  'HyperParameterTuningJobObjective': {'Type': 'Minimize',
   'MetricName': 'loss'},
  'ResourceLimits': {'MaxNumberOfTrainingJobs': 3,
   'MaxParallelTrainingJobs': 2},
  'ParameterRanges': {'IntegerParameterRanges': [{'Name': 'warmup_steps',
     'MinValue': '100',
     'MaxValue': '500',
     'ScalingType': 'Auto'}],
   'ContinuousParameterRanges': [{'Name': 'learning_rate',
     'MinValue': '0.0001',
     'MaxValue': '0.1',
     'ScalingType': 'Auto'},
    {'Name': 'weight_decay',
     'MinValue': '0.0',
     'MaxValue': '0.001',
     'ScalingType': 'Auto'}],
   'CategoricalParameterRanges': [{'Name': 'optimizer',
     'Values': ['"AdamW"', '"Adafactor"']}]},
  'TrainingJobEarlyStoppingType': 'Off'},
 'TrainingJobDefinition': {'StaticHyperParameters': 

Print the results as a pandas DataFrame for a comparison of all hyperparameter combinations.

In [20]:
results_df = tuner.analytics().dataframe()
results_df

Unnamed: 0,learning_rate,optimizer,warmup_steps,weight_decay,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,0.000175,"""Adafactor""",192.0,0.000111,hpo-HF-230208-1023-003-f10bbaed,Completed,0.007823,2023-02-08 10:58:59+00:00,2023-02-08 11:26:07+00:00,1628.0
1,0.006286,"""AdamW""",240.0,0.00091,hpo-HF-230208-1023-002-867283a9,Completed,0.011643,2023-02-08 10:25:34+00:00,2023-02-08 10:57:15+00:00,1901.0
2,0.004381,"""Adafactor""",358.0,7.2e-05,hpo-HF-230208-1023-001-f7897507,Completed,0.011697,2023-02-08 10:25:36+00:00,2023-02-08 10:57:41+00:00,1925.0


In [21]:
results_df.sort_values('FinalObjectiveValue')

Unnamed: 0,learning_rate,optimizer,warmup_steps,weight_decay,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,0.000175,"""Adafactor""",192.0,0.000111,hpo-HF-230208-1023-003-f10bbaed,Completed,0.007823,2023-02-08 10:58:59+00:00,2023-02-08 11:26:07+00:00,1628.0
1,0.006286,"""AdamW""",240.0,0.00091,hpo-HF-230208-1023-002-867283a9,Completed,0.011643,2023-02-08 10:25:34+00:00,2023-02-08 10:57:15+00:00,1901.0
2,0.004381,"""Adafactor""",358.0,7.2e-05,hpo-HF-230208-1023-001-f7897507,Completed,0.011697,2023-02-08 10:25:36+00:00,2023-02-08 10:57:41+00:00,1925.0


Print best results for ease

In [22]:
summarize_hpo_results(tuner.latest_tuning_job.job_name)

best score: 0.007822966203093529
best params: {'learning_rate': '0.00017545073264835974', 'optimizer': '"Adafactor"', 'warmup_steps': '192', 'weight_decay': '0.00011148831909139079'}
best job-name: hpo-HF-230208-1023-003-f10bbaed


In [23]:
tuner.best_training_job()

'hpo-HF-230208-1023-003-f10bbaed'

Select the best tuned model for deploying and making predictions

In [24]:
best_estimator = tuner.best_estimator()


2023-02-08 11:26:09 Starting - Found matching resource for reuse
2023-02-08 11:26:09 Downloading - Downloading input data
2023-02-08 11:26:09 Training - Training image download completed. Training in progress.
2023-02-08 11:26:09 Uploading - Uploading generated training model
2023-02-08 11:26:09 Completed - Resource released due to keep alive period expiry


Deploy the model to a SageMaker Endpoint, choosing an instance type for inference

In [25]:
predictor = best_estimator.deploy(1, "ml.g4dn.xlarge")

---------!

Use your deployed model to make predictions

In [26]:
result = predictor.predict({"inputs": "Best thing ever!"})
result

[{'label': 'LABEL_2', 'score': 0.9897397756576538}]

Interpret the general 'LABEL_0' as positive or negative

In [27]:
classes = ["negative", "neutral", "positive"]
id2label = {f"LABEL_{v}": k for v, k in enumerate(classes)}

print(f"Result: {id2label[result[0]['label']]}")

Result: positive


Delete endpoint when complete

In [None]:
predictor.delete_endpoint()