# Fine-Tuning a BERT Model and Create a Text Classifier

In the previous section, we've already performed the Feature Engineering to create BERT embeddings from the `reviews_body` text using the pre-trained BERT model, and split the dataset into train, validation and test files. To optimize for Tensorflow training, we saved the files in TFRecord format. 

Now, let’s fine-tune the BERT model to our Customer Reviews Dataset and add a new classification layer to predict the `star_rating` for a given `review_body`.

![BERT Training](img/bert_training.png)

As mentioned earlier, BERT’s attention mechanism is called a Transformer. This is, not coincidentally, the name of the popular BERT Python library, “Transformers,” maintained by a company called HuggingFace. We will use a variant of BERT called [DistilBert](https://arxiv.org/pdf/1910.01108.pdf) which requires less memory and compute, but maintains very good accuracy on our dataset.

In [1]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [2]:
!pip install -q smdebug==0.8.0
!pip install -q sagemaker-experiments==0.1.13

# Track the `Experiment`
We will track every step of this experiment throughout the `prepare`, `train`, `optimize`, and `deploy`.

# Concepts

**Experiment**: A collection of related Trials.  Add Trials to an Experiment that you wish to compare together.

**Trial**: A description of a multi-step machine learning workflow. Each step in the workflow is described by a Trial Component. There is no relationship between Trial Components such as ordering.

**Trial Component**: A description of a single step in a machine learning workflow. For example data cleaning, feature extraction, model training, model evaluation, etc.

**Tracker**: A logger of information about a single TrialComponent.

![SageMaker Experiments](img/sagemaker-experiments.png)


# Create the `Experiment`

In [3]:
import time
from smexperiments.experiment import Experiment

timestamp = '{}'.format(int(time.time()))

experiment = Experiment.create(
                experiment_name='Amazon-Customer-Reviews-BERT-Experiment-{}'.format(timestamp),
                description='Amazon Customer Reviews BERT Experiment', 
                sagemaker_boto_client=sm)

experiment_name = experiment.experiment_name
print('Experiment name: {}'.format(experiment_name))

Experiment name: Amazon-Customer-Reviews-BERT-Experiment-1595706786


# Create the `Trial`

In [4]:
import time
from smexperiments.trial import Trial

timestamp = '{}'.format(int(time.time()))

trial = Trial.create(trial_name='trial-{}'.format(timestamp),
                     experiment_name=experiment_name,
                     sagemaker_boto_client=sm)

trial_name = trial.trial_name
print('Trial name: {}'.format(trial_name))

Trial name: trial-1595706786


# Create the `prepare` Trial Component and Tracker
Note:  A Trial Component is actually created through a Tracker.  This is a bit confusing, we know.

In [5]:
from smexperiments.tracker import Tracker

tracker_prepare = Tracker.create(display_name='prepare', 
                                 sagemaker_boto_client=sm)

prepare_trial_component_name = tracker_prepare.trial_component.trial_component_name
print('Prepare trial component name {}'.format(prepare_trial_component_name))

Prepare trial component name TrialComponent-2020-07-25-195306-dvkt


# Attach the `prepare` Trial Component and Tracker as a Component to the Trial

In [6]:
trial.add_trial_component(tracker_prepare.trial_component)

# Log All Parameters Used During `prepare` Phase

In [7]:
%store -r s3_raw_input_data

In [8]:
print(s3_raw_input_data)

s3://sagemaker-us-west-2-393371431575/amazon-reviews-pds/tsv/


In [9]:
tracker_prepare.log_input(name='raw_data_s3_uri', 
                          media_type='s3/uri', 
                          value=s3_raw_input_data)

# must save after logging
tracker_prepare.trial_component.save()

TrialComponent(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fe0c300fac8>,trial_component_name='TrialComponent-2020-07-25-195306-dvkt',display_name='prepare',trial_component_arn='arn:aws:sagemaker:us-west-2:393371431575:experiment-trial-component/trialcomponent-2020-07-25-195306-dvkt',response_metadata={'RequestId': '1989bbd3-dc82-4a6f-9814-03a4e88cab22', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '1989bbd3-dc82-4a6f-9814-03a4e88cab22', 'content-type': 'application/x-amz-json-1.1', 'content-length': '129', 'date': 'Sat, 25 Jul 2020 19:53:06 GMT'}, 'RetryAttempts': 0},parameters={},input_artifacts={'raw_data_s3_uri': TrialComponentArtifact(value='s3://sagemaker-us-west-2-393371431575/amazon-reviews-pds/tsv/',media_type='s3/uri')},output_artifacts={})

In [10]:
%store -r train_split_percentage

In [11]:
print(train_split_percentage)

0.9


In [12]:
%store -r validation_split_percentage

In [13]:
print(validation_split_percentage)

0.05


In [14]:
%store -r test_split_percentage

In [15]:
print(test_split_percentage)

0.05


In [16]:
%store -r max_seq_length

In [17]:
print(max_seq_length)

128


In [18]:
%store -r balance_dataset

In [19]:
print(balance_dataset)

False


In [20]:
tracker_prepare.log_parameters({
    'max_seq_length': max_seq_length,
    'train_split_percentage': train_split_percentage,
    'validation_split_percentage': validation_split_percentage,
    'test_split_percentage': test_split_percentage, 
    'balance_dataset': str(balance_dataset)
})

# must save after logging
tracker_prepare.trial_component.save()

TrialComponent(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fe0c300fac8>,trial_component_name='TrialComponent-2020-07-25-195306-dvkt',display_name='prepare',trial_component_arn='arn:aws:sagemaker:us-west-2:393371431575:experiment-trial-component/trialcomponent-2020-07-25-195306-dvkt',response_metadata={'RequestId': '0385407c-79ba-49b6-a492-902e4475068e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '0385407c-79ba-49b6-a492-902e4475068e', 'content-type': 'application/x-amz-json-1.1', 'content-length': '129', 'date': 'Sat, 25 Jul 2020 19:53:06 GMT'}, 'RetryAttempts': 0},parameters={'max_seq_length': 128, 'train_split_percentage': 0.9, 'validation_split_percentage': 0.05, 'test_split_percentage': 0.05, 'balance_dataset': 'False'},input_artifacts={'raw_data_s3_uri': TrialComponentArtifact(value='s3://sagemaker-us-west-2-393371431575/amazon-reviews-pds/tsv/',media_type='s3/uri')},output_artifacts={})

In [21]:
%store -r processed_train_data_s3_uri

In [22]:
print(processed_train_data_s3_uri)

s3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-train


In [23]:
%store -r processed_validation_data_s3_uri

In [24]:
print(processed_validation_data_s3_uri)

s3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-validation


In [25]:
%store -r processed_test_data_s3_uri

In [26]:
print(processed_test_data_s3_uri)

s3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-test


In [27]:
tracker_prepare.log_output(name='train_data_s3_uri', 
                           media_type='s3/uri', 
                           value=processed_train_data_s3_uri)

tracker_prepare.log_output(name='validation_data_s3_uri', 
                           media_type='s3/uri', 
                           value=processed_validation_data_s3_uri)

tracker_prepare.log_output(name='test_data_s3_uri', 
                           media_type='s3/uri', 
                           value=processed_test_data_s3_uri)

# must save after logging
tracker_prepare.trial_component.save()

TrialComponent(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fe0c300fac8>,trial_component_name='TrialComponent-2020-07-25-195306-dvkt',display_name='prepare',trial_component_arn='arn:aws:sagemaker:us-west-2:393371431575:experiment-trial-component/trialcomponent-2020-07-25-195306-dvkt',response_metadata={'RequestId': '93150c8a-c4d1-46f8-8534-82de6ff2bd4e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '93150c8a-c4d1-46f8-8534-82de6ff2bd4e', 'content-type': 'application/x-amz-json-1.1', 'content-length': '129', 'date': 'Sat, 25 Jul 2020 19:53:06 GMT'}, 'RetryAttempts': 0},parameters={'max_seq_length': 128, 'train_split_percentage': 0.9, 'validation_split_percentage': 0.05, 'test_split_percentage': 0.05, 'balance_dataset': 'False'},input_artifacts={'raw_data_s3_uri': TrialComponentArtifact(value='s3://sagemaker-us-west-2-393371431575/amazon-reviews-pds/tsv/',media_type='s3/uri')},output_artifacts={'train_data_s3_uri': TrialComponentArtifact(value='s3://sagemak

# Specify the Dataset in S3
We are using the train, validation, and test splits created in the previous section.

In [28]:
print(processed_train_data_s3_uri)

!aws s3 ls $processed_train_data_s3_uri/

s3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-train
2020-07-25 19:26:30      50998 part-algo-1-amazon_reviews_us_Digital_Software_v1_00.tfrecord
2020-07-25 19:26:30      72302 part-algo-2-amazon_reviews_us_Digital_Video_Games_v1_00.tfrecord


In [29]:
print(processed_validation_data_s3_uri)

!aws s3 ls $processed_validation_data_s3_uri/

s3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-validation
2020-07-25 19:26:30       3224 part-algo-1-amazon_reviews_us_Digital_Software_v1_00.tfrecord
2020-07-25 19:26:30       4298 part-algo-2-amazon_reviews_us_Digital_Video_Games_v1_00.tfrecord


In [30]:
print(processed_test_data_s3_uri)

!aws s3 ls $processed_test_data_s3_uri/

s3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-test
2020-07-25 19:26:31       3255 part-algo-1-amazon_reviews_us_Digital_Software_v1_00.tfrecord
2020-07-25 19:26:31       4477 part-algo-2-amazon_reviews_us_Digital_Video_Games_v1_00.tfrecord


# Specify S3 `Distribution Strategy`

In [31]:
s3_input_train_data = sagemaker.s3_input(s3_data=processed_train_data_s3_uri, 
                                         distribution='ShardedByS3Key') 
s3_input_validation_data = sagemaker.s3_input(s3_data=processed_validation_data_s3_uri, 
                                              distribution='ShardedByS3Key')
s3_input_test_data = sagemaker.s3_input(s3_data=processed_test_data_s3_uri, 
                                        distribution='ShardedByS3Key')

print(s3_input_train_data.config)
print(s3_input_validation_data.config)
print(s3_input_test_data.config)



{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-train', 'S3DataDistributionType': 'ShardedByS3Key'}}}
{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-validation', 'S3DataDistributionType': 'ShardedByS3Key'}}}
{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-49-035/output/bert-test', 'S3DataDistributionType': 'ShardedByS3Key'}}}


# Show TensorFlow Training Code

In [32]:
!pygmentize src/tf_bert_reviews.py

[34mimport[39;49;00m [04m[36mtime[39;49;00m
[34mimport[39;49;00m [04m[36mrandom[39;49;00m
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mfrom[39;49;00m [04m[36mglob[39;49;00m [34mimport[39;49;00m glob
[34mimport[39;49;00m [04m[36mpprint[39;49;00m
[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36msubprocess[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mtensorflow[39;49;00m [34mas[39;49;00m [04m[36mtf[39;49;00m
[37m#subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'tensorflow==2.1.0'])[39;49;00m
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[3

                        default=[34mFalse[39;49;00m)    
    parser.add_argument([33m'[39;49;00m[33m--output_data_dir[39;49;00m[33m'[39;49;00m, [37m# This is unused[39;49;00m
                        [36mtype[39;49;00m=[36mstr[39;49;00m,
                        default=os.environ[[33m'[39;49;00m[33mSM_OUTPUT_DATA_DIR[39;49;00m[33m'[39;49;00m])
    
    [37m# This points to the S3 location - this should not be used by our code[39;49;00m
    [37m# We should use /opt/ml/model/ instead[39;49;00m
    [37m# parser.add_argument('--model_dir', [39;49;00m
    [37m#                     type=str, [39;49;00m
    [37m#                     default=os.environ['SM_MODEL_DIR'])[39;49;00m
     
    args, _ = parser.parse_known_args()
    [36mprint[39;49;00m([33m"[39;49;00m[33mArgs:[39;49;00m[33m"[39;49;00m) 
    [36mprint[39;49;00m(args)
    
    env_var = os.environ 
    [36mprint[39;49;00m([33m"[39;49;00m[33mEnvironment Variables:[39;49;00m

# Setup Hyper-Parameters for Classification Layer

In [33]:
print(max_seq_length)

128


In [34]:
epochs=1
learning_rate=0.00001
epsilon=0.00000001
train_batch_size=128
validation_batch_size=128
test_batch_size=128
train_steps_per_epoch=50
validation_steps=50
test_steps=50
train_instance_count=1
train_instance_type='ml.c5.9xlarge'
train_volume_size=1024
use_xla=True
use_amp=True
freeze_bert_layer=True
enable_sagemaker_debugger=True
enable_checkpointing=False
enable_tensorboard=False
input_mode='Pipe'
run_validation=True
run_test=True
run_sample_predictions=True

# Setup Metrics To Track Model Performance

Sample log lines:
```
45/50 [========>.....................] - ETA: 3s - loss: 1.3920 - accuracy: 0.7210
...
50/50 [========>.....................] - ETA: 0s - val_loss: 0.0321 - val_accuracy: 0.7922
```
Would produce the following metrics in CloudWatch:

`loss` = 1.3920, `accuracy` = 0.7210

`val_loss` = 0.0321, `val_accuracy` = 0.7922

In [35]:
metrics_definitions = [
     {'Name': 'train:loss', 'Regex': 'loss: ([0-9\\.]+)'},
     {'Name': 'train:accuracy', 'Regex': 'accuracy: ([0-9\\.]+)'},
     {'Name': 'validation:loss', 'Regex': 'val_loss: ([0-9\\.]+)'},
     {'Name': 'validation:accuracy', 'Regex': 'val_accuracy: ([0-9\\.]+)'},
]

# Setup SageMaker Debugger
Define Debugger Rules

In [36]:
from sagemaker.debugger import Rule
from sagemaker.debugger import rule_configs
from sagemaker.debugger import CollectionConfig
from sagemaker.debugger import DebuggerHookConfig

rules=[
        Rule.sagemaker(
            rule_configs.loss_not_decreasing(),
            rule_parameters={
                'collection_names': 'losses,metrics',
                'use_losses_collection': 'true',
                'num_steps': '10',
                'diff_percent': '50'
            },
            collections_to_save=[
                CollectionConfig(name='losses',
                                 parameters={
                                     'save_interval': '10',
                                 }),
                CollectionConfig(name='metrics',
                                 parameters={
                                     'save_interval': '10',
                                 })
            ]
        ),
        Rule.sagemaker(
            rule_configs.overtraining(),
            rule_parameters={
                'collection_names': 'losses,metrics',
                'patience_train': '10',
                'patience_validation': '10',
                'delta': '0.5'
            },
            collections_to_save=[
                CollectionConfig(name='losses',
                                 parameters={
                                     'save_interval': '10',
                                 }),
                CollectionConfig(name='metrics',
                                 parameters={
                                     'save_interval': '10',
                                 })
            ]
        )
    ]

hook_config = DebuggerHookConfig(
    hook_parameters={
        'save_interval': '10', # number of steps
        'export_tensorboard': 'true',
        'tensorboard_dir': 'hook_tensorboard/',
    })

# Specify Checkpoint S3 Location
This is used for Spot Instances Training.  If nodes are replaced, the new node will start training from the latest checkpoint.

In [37]:
import uuid

checkpoint_s3_prefix = 'checkpoints/{}'.format(str(uuid.uuid4()))
checkpoint_s3_uri = 's3://{}/{}/'.format(bucket, checkpoint_s3_prefix)

print(checkpoint_s3_uri)

s3://sagemaker-us-west-2-393371431575/checkpoints/2315224a-3dff-46a0-8ce2-d7fdf8509bd9/


# Setup Our BERT + TensorFlow Script to Run on SageMaker
Prepare our TensorFlow model to run on the managed SageMaker service

In [38]:
from sagemaker.tensorflow import TensorFlow

estimator = TensorFlow(entry_point='tf_bert_reviews.py', 
                       source_dir='src', # put requirements.txt in this directory and it gets picked up
                       role=role,
                       train_instance_count=train_instance_count, # Make sure you have at least this number of input files or the ShardedByS3Key distibution strategy will fail the job due to no data available
                       train_instance_type=train_instance_type,
                       train_volume_size=train_volume_size,
#                        train_use_spot_instances=True,
#                        train_max_wait=7200, # Seconds to wait for spot instances to become available
                       checkpoint_s3_uri=checkpoint_s3_uri,
                       py_version='py3',
                       framework_version='2.1.0',
                       hyperparameters={'epochs': epochs,
                                        'learning_rate': learning_rate,
                                        'epsilon': epsilon,
                                        'train_batch_size': train_batch_size,
                                        'validation_batch_size': validation_batch_size,
                                        'test_batch_size': test_batch_size,                                             
                                        'train_steps_per_epoch': train_steps_per_epoch,
                                        'validation_steps': validation_steps,
                                        'test_steps': test_steps,
                                        'use_xla': use_xla,
                                        'use_amp': use_amp,                                             
                                        'max_seq_length': max_seq_length,
                                        'freeze_bert_layer': freeze_bert_layer,
                                        'enable_sagemaker_debugger': enable_sagemaker_debugger,
                                        'enable_checkpointing': enable_checkpointing,
                                        'enable_tensorboard': enable_tensorboard,                                        
                                        'run_validation': run_validation,
                                        'run_test': run_test,
                                        'run_sample_predictions': run_sample_predictions},
                       input_mode=input_mode,
                       metric_definitions=metrics_definitions,
                       rules=rules,
                       debugger_hook_config=hook_config,                       
#                       train_max_run=7200, # max 2 hours * 60 minutes seconds per hour * 60 seconds per minute
                      )

# Create the `Experiment Config`

In [39]:
experiment_config = {
    'ExperimentName': experiment_name,
    'TrialName': trial.trial_name,
    'TrialComponentDisplayName': 'train'
}

# Train the Model on SageMaker

In [40]:
estimator.fit(inputs={'train': s3_input_train_data, 
                      'validation': s3_input_validation_data,
                      'test': s3_input_test_data
              },              
              experiment_config=experiment_config,                   
              wait=False)

INFO:sagemaker:Creating training-job with name: tensorflow-training-2020-07-25-19-53-09-322


In [41]:
training_job_name = estimator.latest_training_job.name
print('Training Job Name:  {}'.format(training_job_name))

Training Job Name:  tensorflow-training-2020-07-25-19-53-09-322


In [42]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/jobs/{}">Training Job</a> After About 5 Minutes</b>'.format(region, training_job_name)))


In [43]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/TrainingJobs;prefix={};streamFilter=typeLogStreamPrefix">CloudWatch Logs</a> After About 5 Minutes</b>'.format(region, training_job_name)))


In [44]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview">S3 Output Data</a> After The Training Job Has Completed</b>'.format(bucket, training_job_name, region)))


In [45]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview">S3 Checkpoint Data</a> After The Training Job Has Completed</b>'.format(bucket, checkpoint_s3_prefix, region)))


# Wait Until the ^^ Training Job ^^ Completes Above!

In [46]:
estimator.latest_training_job.wait(logs=False)


2020-07-25 19:53:18 Starting - Starting the training job
2020-07-25 19:53:20 Starting - Launching requested ML instances............
2020-07-25 19:54:27 Starting - Preparing the instances for training.........
2020-07-25 19:55:15 Downloading - Downloading input data
2020-07-25 19:55:24 Training - Downloading the training image..
2020-07-25 19:55:37 Training - Training image download completed. Training in progress................................................................
2020-07-25 20:01:00 Uploading - Uploading generated training model.............
2020-07-25 20:02:10 Completed - Training job completed


# Show the Experiment Tracking Lineage

In [47]:
from sagemaker.analytics import ExperimentAnalytics

lineage_table = ExperimentAnalytics(
    sagemaker_session=sess,
    experiment_name=experiment_name,
    metric_names=['validation:accuracy'],
    sort_by="CreationTime",
    sort_order="Ascending",
)

lineage_df = lineage_table.dataframe()
lineage_df.shape

(2, 43)

In [48]:
lineage_df

Unnamed: 0,TrialComponentName,DisplayName,balance_dataset,max_seq_length,test_split_percentage,train_split_percentage,validation_split_percentage,SourceArn,SageMaker.ImageUri,SageMaker.InstanceCount,...,use_amp,use_xla,validation_batch_size,validation_steps,validation:accuracy - Min,validation:accuracy - Max,validation:accuracy - Avg,validation:accuracy - StdDev,validation:accuracy - Last,validation:accuracy - Count
0,TrialComponent-2020-07-25-195306-dvkt,prepare,False,128.0,0.05,0.9,0.05,,,,...,,,,,,,,,,
1,tensorflow-training-2020-07-25-19-53-09-322-aw...,train,,128.0,,,,arn:aws:sagemaker:us-west-2:393371431575:train...,763104351884.dkr.ecr.us-west-2.amazonaws.com/t...,1.0,...,True,True,128.0,50.0,0.6428,0.6428,0.6428,0.0,0.6428,1.0


In [49]:
sm.describe_trial_component(TrialComponentName=lineage_df.TrialComponentName[0])

{'TrialComponentName': 'TrialComponent-2020-07-25-195306-dvkt',
 'TrialComponentArn': 'arn:aws:sagemaker:us-west-2:393371431575:experiment-trial-component/trialcomponent-2020-07-25-195306-dvkt',
 'DisplayName': 'prepare',
 'CreationTime': datetime.datetime(2020, 7, 25, 19, 53, 6, 479000, tzinfo=tzlocal()),
 'CreatedBy': {},
 'LastModifiedTime': datetime.datetime(2020, 7, 25, 19, 53, 6, 699000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'Parameters': {'balance_dataset': {'StringValue': 'False'},
  'max_seq_length': {'NumberValue': 128.0},
  'test_split_percentage': {'NumberValue': 0.05},
  'train_split_percentage': {'NumberValue': 0.9},
  'validation_split_percentage': {'NumberValue': 0.05}},
 'InputArtifacts': {'raw_data_s3_uri': {'MediaType': 's3/uri',
   'Value': 's3://sagemaker-us-west-2-393371431575/amazon-reviews-pds/tsv/'}},
 'OutputArtifacts': {'test_data_s3_uri': {'MediaType': 's3/uri',
   'Value': 's3://sagemaker-us-west-2-393371431575/sagemaker-scikit-learn-2020-07-25-19-21-4

# Analyze Debugger Rules

In [50]:
estimator.latest_training_job.rule_job_summary()

[{'RuleConfigurationName': 'LossNotDecreasing',
  'RuleEvaluationJobArn': 'arn:aws:sagemaker:us-west-2:393371431575:processing-job/tensorflow-training-2020-0-lossnotdecreasing-c3b67948',
  'RuleEvaluationStatus': 'InProgress',
  'LastModifiedTime': datetime.datetime(2020, 7, 25, 20, 2, 10, 282000, tzinfo=tzlocal())},
 {'RuleConfigurationName': 'Overtraining',
  'RuleEvaluationJobArn': 'arn:aws:sagemaker:us-west-2:393371431575:processing-job/tensorflow-training-2020-0-overtraining-6dd8f6ac',
  'RuleEvaluationStatus': 'InProgress',
  'LastModifiedTime': datetime.datetime(2020, 7, 25, 20, 2, 10, 282000, tzinfo=tzlocal())}]

In [51]:
training_job_debugger_artifacts_path = estimator.latest_job_debugger_artifacts_path()
print(training_job_debugger_artifacts_path)

s3://sagemaker-us-west-2-393371431575/tensorflow-training-2020-07-25-19-53-09-322/debug-output


# Pass Variables to the Next Notebook(s)

In [52]:
print(training_job_name)

tensorflow-training-2020-07-25-19-53-09-322


In [53]:
%store training_job_name

Stored 'training_job_name' (str)


In [54]:
print(experiment_name)

Amazon-Customer-Reviews-BERT-Experiment-1595706786


In [55]:
%store experiment_name

Stored 'experiment_name' (str)


In [56]:
print(trial_name)

trial-1595706786


In [57]:
%store trial_name

Stored 'trial_name' (str)


In [58]:
print(prepare_trial_component_name)

TrialComponent-2020-07-25-195306-dvkt


In [59]:
%store prepare_trial_component_name

Stored 'prepare_trial_component_name' (str)


In [60]:
print(training_job_debugger_artifacts_path)

s3://sagemaker-us-west-2-393371431575/tensorflow-training-2020-07-25-19-53-09-322/debug-output


In [61]:
%store training_job_debugger_artifacts_path

Stored 'training_job_debugger_artifacts_path' (str)


In [62]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();