# V1 --- Sagemaker Training Job for Base NER Model Fine tuning of BERT 
In this noteboook, I set up the training environment to run an AWS sagemaker fine tuning job of BERT for Named Entity Recognition (NER) with the CONLL2003 dataset. The resulting model will be used in the augmentation of the wikipedia toxic comments dataset into an NER dataset. I will also use the RAC (binary classifier of toxic statements) I trained to add a new named entity to the modified dataset as well. That being toxic/hateful entities. 

There is code for fine tuning BERT on the CONNL2003 in this notebook that was sourced from Mastering Transformers (chapter 6 starting at page 181) and was adapted to run in a sagemaker training job.
The github for that section of the book is found here https://github.com/PacktPublishing/Mastering-Transformers/blob/main/CH06/CH06a_Fine_tuning_language_models_for_NER.ipynb

In [2]:
#!pip install datasets

In [None]:
import datasets
conll2003 = datasets.load_dataset('conll2003')

### A note about NER:
Named Entity Recognition is a token classification task where, for an input sequence, the objective is to determine which of the words/tokens belong to the different entities that abide within the sentence/text. As well as the added caveat of when those entity types first appear and when they are subsequently mentioned.

# Data Preperation

In [5]:
conll2003['train'][0]

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0]}

#### Here are the NER tag names with their associated meaning

O: Outside of any named entity  
B-PER: Start of a person's name  
I-PER: Inside a person's name  
B-LOC: Start of a location  
I-LOC: Inside a location  
B-ORG: Start of an organization  
I-ORG: Inside an organization  
B-MISC: Start of a miscellaneous entity  
I-MISC: Inside a miscellaneous entity  

In [6]:
example_0_tags = conll2003['train'][0]['ner_tags'] # integer encoded tags for example zero
ner_tag_names = conll2003['train'].features["ner_tags"].feature.names #get tag names 

decoded_target = [ner_tag_names[i] for i in example_0_tags] # human readable tags
original_sentence = ' '.join(conll2003['train'][0]['tokens']) # get original sentence

print(f'For the sentence: {original_sentence}\nThe tags are: {decoded_target}') # print the results

For the sentence: EU rejects German call to boycott British lamb .
The tags are: ['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']


#### The output above means that the organization 'EU' begins on the first token. Then the rest of the tokens are either not named entities or miscelaneous entities.

# Data Preparation

##### Here I am going to load in the tokenizer for use with distilbert. It is BertTokenizerFast

In [9]:
#!pip install transformers

In [None]:
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')

##### Below I am tokenizing the sentence about the EU boycott from above. Notice that there are now 12 input_ids as opposed to the 8 original tokens. This is because the tokenizer I am using is a subword tokenizer.

In [11]:
#NER tasks like the conll2003 are whitespaced tokenized, so we must specify this to the tokenizer
tokenizer([original_sentence], is_split_into_words=True)

{'input_ids': [101, 7270, 22961, 1528, 1840, 1106, 21423, 1418, 2495, 12913, 119, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

##### Because we are using a subword tokenizer, it will create discrepancies between the length of the original labels and the now longer, length of the input_ids sequence. The function below adresses this issue.

In [12]:
label_all_tokens = True # set special tokens to be included in the labels

def tokenize_and_align(examples):
    '''
    This functions job is to align the NER labels of text after tokenizing with a subword tokenizer.
    First, it tokenizes the text. Then it iterates through the labels of the text and, through a conditional
    statement, assigns subwords as well as special tokens the appropriate label. This is accoomplished by 
    referencing the word index of the subword token and comparing it to the previous word index. 

    this is necessary data prep for this task given the use of a subword tokenizer. 
    
    This function is to be mapped to the entire dataset
    '''

    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True) # tokenize the sentence
    labels = [] # initialize a list to store the label ids

    for i, label in enumerate(examples["ner_tags"]): # iterate over the labels 
        word_ids = tokenized_inputs.word_ids(batch_index=i) # get the word ids for the current sentence
        previous_word_idx = None 
        label_ids = [] 
        for word_idx in word_ids: 
            if word_idx is None: # special tokens like [CLS] and [SEP] have a word id of None 
                label_ids.append(-100) # so we set the label id to -100 for the model
            elif word_idx != previous_word_idx: # if the word id is different from the previous one, it is a new word
                label_ids.append(label[word_idx])  # so we append the label id
            else: 
                label_ids.append(label[word_idx] if label_all_tokens else -100) #subword tokens of a single word will share the same word idx, so we set the label id to the previous label or -100 if we don't want to use the subword labels
            previous_word_idx = word_idx  # update the previous word id
        labels.append(label_ids) # append the list of label ids for the current sentence
    tokenized_inputs["labels"] = labels # reasign the labels with the added subword labels for the current sentence
    return tokenized_inputs

##### Here we map the tokenize_and_align function from above to the entire dataset. .map() is a datasets method different from python's built in function map()

In [None]:
tokenized_dataset = conll2003.map(tokenize_and_align, batched=True)

##### This is the structure of the tokenized_dataset

In [14]:
tokenized_dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags', 'input_ids', 'token_type_ids', 'attention_mask', 'labels'],
        num_rows: 14041
    })
    validation: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags', 'input_ids', 'token_type_ids', 'attention_mask', 'labels'],
        num_rows: 3250
    })
    test: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags', 'input_ids', 'token_type_ids', 'attention_mask', 'labels'],
        num_rows: 3453
    })
})

##### Below is a single example from the train set. The last four key value pairs are new additions.

In [15]:
tokenized_dataset["train"][0]

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0],
 'input_ids': [101,
  7270,
  22961,
  1528,
  1840,
  1106,
  21423,
  1418,
  2495,
  12913,
  119,
  102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 'labels': [-100, 3, 0, 7, 0, 0, 0, 7, 0, 0, 0, -100]}

##### Here I am saving the model locally,

In [None]:
tokenized_dataset.save_to_disk('tokenized_conll2003')

##### This is because am going to save the model to s3 so that the hugging face estimator can access it during training.
##### Below I am creating a new bucket to store the data (and model) in.

In [17]:
import boto3

s3 = boto3.client('s3', region_name='us-east-2')
bucket_name = 'conll2003-task'

# Create a new bucket in the us-east-2 region
s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': 'us-east-2'})

# Check the location of an existing bucket
response = s3.get_bucket_location(Bucket=bucket_name)
bucket_location = response['LocationConstraint']
print('Bucket location:', bucket_location)

Bucket location: us-east-2


##### Here I upload the data to the bucket just created

In [22]:
import os

def upload_directory_to_s3(directory_path, s3_bucket, s3_key_prefix=''):
    '''
    uploads datasets stored in folders to a new dir in s3 bucket 
    for estimator to access. dir names are that which
    they are pulled from.
    '''
    directory_name = os.path.basename(directory_path)
    for root, dirs, files in os.walk(directory_path):
        for file in files:
            file_path = os.path.join(root, file)
            s3_key = os.path.join(s3_key_prefix, directory_name, os.path.relpath(file_path, directory_path))
            s3_bucket.Object(s3_key).upload_file(Filename=file_path)
            
#specify bucket for upload
s3 = boto3.resource('s3')
bucket_name = 'conll2003-task'
bucket = s3.Bucket(bucket_name)

#folder path to datset
folder_path = "/home/ec2-user/SageMaker/NER_training_sagemaker/tokenized_conll2003"

upload_directory_to_s3(folder_path, bucket)

##### Now that this data is uploaded it to s3 it will be accessed within the training docker

# Model Preperation

In [25]:
#!pip install transformers

##### I am downloading distilbert-base-uncased for this fine tuning job.

In [35]:
from transformers import AutoModelForTokenClassification

#there are 9 labels for the token classification task because there are 9 ner-tags
model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=9) 

##### We will also need a data collator. The job of the data collator is to convert the data into the correct format before being passed into the model. For example: giving padding to sequences shorter than the max length

In [36]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer) 

##### I am saving the model and tokenizer (from above) to the disk in order to upload to s3 and access within the training docker.

In [None]:
model.save_pretrained("distilbert-base-uncased")
tokenizer.save_pretrained("bert-tokenizer-fast")

##### Below I upload them

In [32]:
folder_paths = [
    "/home/ec2-user/SageMaker/NER_training_sagemaker/bert-tokenizer-fast",
    "/home/ec2-user/SageMaker/NER_training_sagemaker/distilbert-base-uncased"
]

#upload both tokenizer and model
for folder_path in folder_paths:
    upload_directory_to_s3(folder_path, bucket)

# Overview of Training Metric

In [3]:
#!pip install seqeval

##### Within the training script, we must provide a compute_metrics() function to the Trainer object. Below is that function. It prepares model outputs so that precision, recall, f1, and accuracy can be computed with seqeval.

In [58]:
import numpy as np
metric = datasets.load_metric("seqeval") #load in seqeval metric
    

def compute_metrics(p): 
    '''
        this function unpacks the predictions and labels from p. Then it applies argmax to the prediction logics which converts
        them to indices within the labels_list. Then assigned to true_predictions is a list comprehension of those indices converted 
        to their label names. The true_labels list has this analogous operation performed on the label indices of the targets for 
        that example. Then the true_predictiosn and true_labels are evaluated for precision, recall, and f1 using the seqeval package.
    '''
    #NER labels specific to the conll2003 task
    label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC'] 
        
    #unpack predictions
    predictions, labels = p 
        
    #get prediction indices for use in labels_list by argmaxing logits
    predictions = np.argmax(predictions, axis=2) 
    
    #prediction indicies ---> labels
    true_predictions = [ 
        [label_list[pred] for (pred, lab) in zip(prediction, label) if lab != -100] for prediction, label in zip(predictions, labels) 
    ] 
    
    #Ground truth indicies ---> labels
    true_labels = [ 
        [label_list[lab] for (pred, lab) in zip(prediction, label) if lab != -100] for prediction, label in zip(predictions, labels) 
    ] 
    
    #get score
    results = metric.compute(predictions=true_predictions, references=true_labels) 
    
    return { 
        "precision": results["overall_precision"], 
        "recall": results["overall_recall"], 
        "f1": results["overall_f1"], 
        "accuracy": results["overall_accuracy"], 
    } 

#### Here I am running an example forward pass to test the comput_metric() function

In [53]:
import torch

#get example from pre-processed dataset
tokenized_example = tokenized_dataset['train'][0]

# Convert lists to PyTorch tensors and add a batch dimension
input_ids = torch.tensor([tokenized_example['input_ids']])
#token_type_ids = torch.tensor([tokenized_example['token_type_ids']])
attention_mask = torch.tensor([tokenized_example['attention_mask']])

# Run a forward pass through the model
with torch.no_grad():
    outputs = model(input_ids, attention_mask=attention_mask)

# Extract the logits
logits = outputs.logits

# get labels 
labels = torch.tensor([tokenized_dataset['train'][0]['labels']])


##### The shapes output below suggest everything is in order. The labels are integer encoded. After argmaxing the logits, they will be the same. The two can then be evaluated in the compute_metrics function.

In [56]:
print(f'logits shape {logits.shape}')
print(f'lables shape {labels.shape}')

logits shape torch.Size([1, 12, 9])
lables shape torch.Size([1, 12])


##### Evaluate forward pass against labels

In [59]:
p = [logits, labels]
result = compute_metrics(p)

  _warn_prf(average, modifier, msg_start, len(result))


##### This low performance is to be expected as we have not fine tuned the pretrained model yet

In [60]:
print(result)

{'precision': 0.1111111111111111, 'recall': 0.3333333333333333, 'f1': 0.16666666666666666, 'accuracy': 0.1}


# Sagemaker Env


In [1]:
import sagemaker
sess = sagemaker.Session() #this creates a sagemaker session -
role = sagemaker.get_execution_role() #this gets permissions from the env where 
                                      #it is running. I am running in a sagemaker notebook instance

# HuggingFace Estimator
##### The huggingface estimator is a tool that will create a docker image of our specified hyperparams and conduct the training specified within the train.py training script.

In [2]:
from sagemaker.huggingface import HuggingFace


# hyperparameters which are passed to the training job
hyperparameters={'epochs': 3,
                 'per_device_train_batch_size': 16,
                 'per_device_eval_batch_size': 16#,
                 #'model_name_or_path': 's3://conll2003-task/distilbert-base-uncased/',
                 #'tokenizer_name_or_path': 's3://conll2003-task/bert-tokenizer-fast/'
                 }

# create the Estimator
huggingface_estimator = HuggingFace(
        entry_point='train.py',
        source_dir='/home/ec2-user/SageMaker/NER_training_sagemaker', #change this to training script location
        instance_type='ml.g4dn.xlarge',
        instance_count=1,
        role=role,
        transformers_version='4.28.1',
        pytorch_version='2.0.0',
        py_version='py310',
        hyperparameters = hyperparameters
)

In [4]:
huggingface_estimator.fit({
    'train': "s3://conll2003-task/tokenized_conll2003/train/", 
    'test': "s3://conll2003-task/tokenized_conll2003/validation/"
    })

Using provided s3_resource


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: huggingface-pytorch-training-2023-08-28-22-06-28-271


2023-08-28 22:06:46 Starting - Starting the training job...
2023-08-28 22:07:02 Starting - Preparing the instances for training......
2023-08-28 22:07:54 Downloading - Downloading input data...
2023-08-28 22:08:19 Training - Downloading the training image................................................
2023-08-28 22:16:37 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-08-28 22:16:49,156 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-08-28 22:16:49,175 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-08-28 22:16:49,184 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-08-28 22:16:49,190 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m


[34mPreparing metadata (setup.py): finished with status 'done'[0m
[34mBuilding wheels for collected packages: seqeval[0m
[34mBuilding wheel for seqeval (setup.py): started[0m
[34mBuilding wheel for seqeval (setup.py): finished with status 'done'[0m
[34mCreated wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16165 sha256=c7e64cb2a1f2c84fdf7bef3c05bd303ca74695a29e1532622cbccab7bfd0fcbd[0m
[34mStored in directory: /root/.cache/pip/wheels/1a/67/4a/ad4082dd7dfc30f2abfe4d80a2ed5926a506eb8a972b4767fa[0m
[34mSuccessfully built seqeval[0m
[34mInstalling collected packages: seqeval[0m
[34mSuccessfully installed seqeval-1.2.2[0m
[34m[notice] A new release of pip is available: 23.1.2 -> 23.2.1[0m
[34m[notice] To update, run: pip install --upgrade pip[0m
  metric = datasets.load_metric("seqeval") #load in seqeval metric after install[0m
[34mDownloading builder script:   0%|          | 0.00/2.47k [00:00<?, ?B/s][0m
[34mDownloading builder script: 6.33kB [00:

[34m11%|█         | 144/1317 [00:19<02:19,  8.41it/s][0m
[34m11%|█         | 145/1317 [00:19<02:41,  7.28it/s][0m
[34m11%|█         | 146/1317 [00:20<02:40,  7.28it/s][0m
[34m11%|█         | 147/1317 [00:20<02:36,  7.46it/s][0m
[34m11%|█         | 148/1317 [00:20<02:32,  7.66it/s][0m
[34m11%|█▏        | 149/1317 [00:20<02:40,  7.28it/s][0m
[34m11%|█▏        | 150/1317 [00:20<02:40,  7.28it/s][0m
[34m11%|█▏        | 151/1317 [00:20<02:50,  6.83it/s][0m
[34m12%|█▏        | 152/1317 [00:20<02:43,  7.13it/s][0m
[34m12%|█▏        | 153/1317 [00:21<02:51,  6.79it/s][0m
[34m12%|█▏        | 154/1317 [00:21<02:42,  7.15it/s][0m
[34m12%|█▏        | 155/1317 [00:21<02:43,  7.09it/s][0m
[34m12%|█▏        | 156/1317 [00:21<02:45,  7.02it/s][0m
[34m12%|█▏        | 157/1317 [00:21<02:44,  7.07it/s][0m
[34m12%|█▏        | 158/1317 [00:21<02:36,  7.40it/s][0m
[34m12%|█▏        | 159/1317 [00:21<02:42,  7.12it/s][0m
[34m12%|█▏        | 160/1317 [00:22<02:42,  7.11it/s][

[34m22%|██▏       | 287/1317 [00:39<02:24,  7.12it/s][0m
[34m22%|██▏       | 288/1317 [00:39<02:26,  7.03it/s][0m
[34m22%|██▏       | 289/1317 [00:39<02:25,  7.06it/s][0m
[34m22%|██▏       | 290/1317 [00:40<02:19,  7.37it/s][0m
[34m22%|██▏       | 291/1317 [00:40<02:28,  6.93it/s][0m
[34m22%|██▏       | 292/1317 [00:40<02:27,  6.96it/s][0m
[34m22%|██▏       | 293/1317 [00:40<02:21,  7.23it/s][0m
[34m22%|██▏       | 294/1317 [00:40<02:24,  7.07it/s][0m
[34m22%|██▏       | 295/1317 [00:40<02:26,  6.97it/s][0m
[34m22%|██▏       | 296/1317 [00:40<02:19,  7.31it/s][0m
[34m23%|██▎       | 297/1317 [00:41<02:27,  6.89it/s][0m
[34m23%|██▎       | 298/1317 [00:41<02:26,  6.97it/s][0m
[34m23%|██▎       | 299/1317 [00:41<02:26,  6.96it/s][0m
[34m23%|██▎       | 300/1317 [00:41<02:18,  7.36it/s][0m
[34m23%|██▎       | 301/1317 [00:41<02:31,  6.71it/s][0m
[34m23%|██▎       | 302/1317 [00:41<02:19,  7.26it/s][0m
[34m23%|██▎       | 303/1317 [00:41<02:16,  7.44it/s][

[34m33%|███▎      | 433/1317 [00:59<01:59,  7.41it/s][0m
[34m33%|███▎      | 434/1317 [00:59<02:01,  7.26it/s][0m
[34m33%|███▎      | 435/1317 [01:00<02:03,  7.12it/s][0m
[34m33%|███▎      | 436/1317 [01:00<02:05,  7.02it/s][0m
[34m33%|███▎      | 437/1317 [01:00<02:05,  7.03it/s][0m
[34m33%|███▎      | 438/1317 [01:00<02:06,  6.95it/s][0m
[34m33%|███▎      | 439/1317 [01:00<02:00,  7.29it/s][0m
[34m0%|          | 0/51 [00:00<?, ?it/s]#033[A[0m
[34m6%|▌         | 3/51 [00:00<00:03, 15.13it/s]#033[A[0m
[34m10%|▉         | 5/51 [00:00<00:03, 11.68it/s]#033[A[0m
[34m14%|█▎        | 7/51 [00:00<00:03, 11.61it/s]#033[A[0m
[34m18%|█▊        | 9/51 [00:00<00:03, 12.80it/s]#033[A[0m
[34m22%|██▏       | 11/51 [00:00<00:03, 12.47it/s]#033[A[0m
[34m25%|██▌       | 13/51 [00:01<00:03,  9.95it/s]#033[A[0m
[34m29%|██▉       | 15/51 [00:01<00:03, 10.26it/s]#033[A[0m
[34m33%|███▎      | 17/51 [00:01<00:03,  9.77it/s]#033[A[0m
[34m37%|███▋      | 19/51 [00:01<00:03,  9.

[34m45%|████▍     | 592/1317 [01:29<01:34,  7.67it/s][0m
[34m45%|████▌     | 593/1317 [01:29<01:45,  6.86it/s][0m
[34m45%|████▌     | 594/1317 [01:30<01:45,  6.84it/s][0m
[34m45%|████▌     | 595/1317 [01:30<01:46,  6.79it/s][0m
[34m45%|████▌     | 596/1317 [01:30<01:46,  6.76it/s][0m
[34m45%|████▌     | 597/1317 [01:30<01:46,  6.73it/s][0m
[34m45%|████▌     | 598/1317 [01:30<01:46,  6.74it/s][0m
[34m45%|████▌     | 599/1317 [01:30<01:48,  6.60it/s][0m
[34m46%|████▌     | 600/1317 [01:30<01:46,  6.72it/s][0m
[34m46%|████▌     | 601/1317 [01:31<01:46,  6.74it/s][0m
[34m46%|████▌     | 602/1317 [01:31<01:47,  6.67it/s][0m
[34m46%|████▌     | 603/1317 [01:31<01:39,  7.16it/s][0m
[34m46%|████▌     | 604/1317 [01:31<01:47,  6.61it/s][0m
[34m46%|████▌     | 605/1317 [01:31<01:47,  6.60it/s][0m
[34m46%|████▌     | 606/1317 [01:31<01:46,  6.65it/s][0m
[34m46%|████▌     | 607/1317 [01:31<01:46,  6.68it/s][0m
[34m46%|████▌     | 608/1317 [01:32<01:41,  7.00it/s][

[34m60%|██████    | 796/1317 [01:59<01:19,  6.52it/s][0m
[34m61%|██████    | 797/1317 [01:59<01:15,  6.87it/s][0m
[34m61%|██████    | 798/1317 [02:00<01:16,  6.77it/s][0m
[34m61%|██████    | 799/1317 [02:00<01:10,  7.34it/s][0m
[34m61%|██████    | 800/1317 [02:00<01:18,  6.61it/s][0m
[34m61%|██████    | 801/1317 [02:00<01:17,  6.65it/s][0m
[34m61%|██████    | 802/1317 [02:00<01:12,  7.06it/s][0m
[34m61%|██████    | 803/1317 [02:00<01:10,  7.31it/s][0m
[34m61%|██████    | 804/1317 [02:00<01:09,  7.40it/s][0m
[34m61%|██████    | 805/1317 [02:00<01:08,  7.44it/s][0m
[34m61%|██████    | 806/1317 [02:01<01:08,  7.48it/s][0m
[34m61%|██████▏   | 807/1317 [02:01<01:08,  7.46it/s][0m
[34m61%|██████▏   | 808/1317 [02:01<01:11,  7.10it/s][0m
[34m61%|██████▏   | 809/1317 [02:01<01:15,  6.75it/s][0m
[34m62%|██████▏   | 810/1317 [02:01<01:15,  6.72it/s][0m
[34m62%|██████▏   | 811/1317 [02:01<01:16,  6.65it/s][0m
[34m62%|██████▏   | 812/1317 [02:01<01:10,  7.13it/s][

[34m75%|███████▌  | 990/1317 [02:34<00:45,  7.11it/s][0m
[34m75%|███████▌  | 991/1317 [02:34<00:47,  6.91it/s][0m
[34m75%|███████▌  | 992/1317 [02:35<00:47,  6.85it/s][0m
[34m75%|███████▌  | 993/1317 [02:35<00:45,  7.09it/s][0m
[34m75%|███████▌  | 994/1317 [02:35<00:44,  7.33it/s][0m
[34m76%|███████▌  | 995/1317 [02:35<00:48,  6.68it/s][0m
[34m76%|███████▌  | 996/1317 [02:35<00:48,  6.59it/s][0m
[34m76%|███████▌  | 997/1317 [02:35<00:47,  6.79it/s][0m
[34m76%|███████▌  | 998/1317 [02:36<00:47,  6.65it/s][0m
[34m76%|███████▌  | 999/1317 [02:36<00:46,  6.86it/s][0m
[34m76%|███████▌  | 1000/1317 [02:36<00:48,  6.55it/s][0m
[34m{'loss': 0.3056, 'learning_rate': 1.9400244798041617e-05, 'epoch': 2.28}[0m
[34m76%|███████▌  | 1000/1317 [02:36<00:48,  6.55it/s][0m
[34m76%|███████▌  | 1001/1317 [02:37<02:41,  1.96it/s][0m
[34m76%|███████▌  | 1002/1317 [02:37<02:04,  2.52it/s][0m
[34m76%|███████▌  | 1003/1317 [02:37<01:39,  3.14it/s][0m
[34m76%|███████▌  | 1004/13

[34m90%|████████▉ | 1180/1317 [03:04<00:20,  6.57it/s][0m
[34m90%|████████▉ | 1181/1317 [03:04<00:21,  6.46it/s][0m
[34m90%|████████▉ | 1182/1317 [03:05<00:21,  6.39it/s][0m
[34m90%|████████▉ | 1183/1317 [03:05<00:20,  6.65it/s][0m
[34m90%|████████▉ | 1184/1317 [03:05<00:19,  6.90it/s][0m
[34m90%|████████▉ | 1185/1317 [03:05<00:18,  7.23it/s][0m
[34m90%|█████████ | 1187/1317 [03:05<00:17,  7.28it/s][0m
[34m90%|█████████ | 1188/1317 [03:05<00:17,  7.22it/s][0m
[34m90%|█████████ | 1189/1317 [03:06<00:18,  6.97it/s][0m
[34m90%|█████████ | 1190/1317 [03:06<00:17,  7.08it/s][0m
[34m90%|█████████ | 1191/1317 [03:06<00:18,  6.66it/s][0m
[34m91%|█████████ | 1192/1317 [03:06<00:18,  6.83it/s][0m
[34m91%|█████████ | 1193/1317 [03:06<00:18,  6.75it/s][0m
[34m91%|█████████ | 1194/1317 [03:06<00:18,  6.70it/s][0m
[34m91%|█████████ | 1195/1317 [03:06<00:17,  7.07it/s][0m
[34m91%|█████████ | 1196/1317 [03:07<00:16,  7.22it/s][0m
[34m91%|█████████ | 1197/1317 [03:07<00


2023-08-28 22:20:53 Uploading - Uploading generated training model[34m35%|███▌      | 18/51 [00:01<00:03,  8.41it/s][0m
[34m37%|███▋      | 19/51 [00:02<00:03,  8.19it/s][0m
[34m41%|████      | 21/51 [00:02<00:03,  9.08it/s][0m
[34m45%|████▌     | 23/51 [00:02<00:02,  9.42it/s][0m
[34m49%|████▉     | 25/51 [00:02<00:02,  9.65it/s][0m
[34m51%|█████     | 26/51 [00:02<00:02,  9.47it/s][0m
[34m53%|█████▎    | 27/51 [00:02<00:02,  9.38it/s][0m
[34m55%|█████▍    | 28/51 [00:02<00:02,  9.30it/s][0m
[34m57%|█████▋    | 29/51 [00:03<00:02,  8.65it/s][0m
[34m61%|██████    | 31/51 [00:03<00:02,  9.42it/s][0m
[34m67%|██████▋   | 34/51 [00:03<00:01, 11.79it/s][0m
[34m71%|███████   | 36/51 [00:03<00:01,  8.55it/s][0m
[34m73%|███████▎  | 37/51 [00:03<00:01,  8.28it/s][0m
[34m75%|███████▍  | 38/51 [00:04<00:01,  8.08it/s][0m
[34m78%|███████▊  | 40/51 [00:04<00:01,  8.96it/s][0m
[34m80%|████████  | 41/51 [00:04<00:01,  7.12it/s][0m
[34m84%|████████▍ | 43/51 [00:04<00: