# **PSST dataset spelling correction training script using machine translation**

### **Objective: Spelling correction training for PSST dataset speakers using machine translation**

### **Ensure that GPU and RAM is set up: will be needed for training purpose**

In [1]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Thu Jul 13 16:30:06 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-32GB            Off| 00000000:AF:00.0 Off |                    0 |
| N/A   40C    P0               59W / 300W|      0MiB / 32768MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
# ensure enough memory present so that training does not stop
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 201.2 gigabytes of available RAM

You are using a high-RAM runtime!


### **Install the libraries**

In [3]:
# Install required libraries
!pip install datasets
!pip install transformers==4.28.0
!pip install accelerate
!pip install jiwer
!pip install huggingface_hub

[0m

### **Import libraries**

In [4]:
# Import libraries
import torch
from transformers import BartTokenizerFast, BartForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from jiwer import wer
from huggingface_hub import notebook_login

In [5]:
# Login to Hugging Face
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [6]:
# Load the valid and test dataset from the JSON files
dataset = load_dataset('json', data_files={
    'train':'/work/van-speech-nlp/spelling_correction/json files/updated_train_data.json',
    'valid':'/work/van-speech-nlp/spelling_correction/json files/updated_valid_data.json',
    'test': '/work/van-speech-nlp/spelling_correction/json files/updated_test_data.json'
})

Found cached dataset json (/home/chakraborti.m/.cache/huggingface/datasets/json/default-e2c510fd398ee096/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)


  0%|          | 0/3 [00:00<?, ?it/s]

In [7]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['prompt', 'actual', 'prediction', 'id'],
        num_rows: 24790
    })
    valid: Dataset({
        features: ['prompt', 'actual', 'prediction', 'id'],
        num_rows: 3612
    })
    test: Dataset({
        features: ['prompt', 'actual', 'prediction', 'id'],
        num_rows: 7234
    })
})


In [8]:
print(dataset['train'][:5])
print(dataset['valid'][:5])
print(dataset['test'][:5])

{'prompt': ['house', 'house', 'comb', 'comb', 'toothbrush'], 'actual': ['HH AW TH', 'HH AW S', 'K AH UH M', 'K OW M', 'T UW TH B R AH SH'], 'prediction': ['HH AW S', 'HH AW S', 'K OW M', 'K OW M', 'T UW TH B  R AH SH'], 'id': ['ACWT02a-BNT01-house', 'ACWT02a-BNT01-house', 'ACWT02a-BNT02-comb', 'ACWT02a-BNT02-comb', 'ACWT02a-BNT03-toothbrush']}
{'prompt': ['house', 'house', 'comb', 'comb', 'toothbrush'], 'actual': ['HH AW TH', 'HH AW S', 'K AH UH M', 'K OW M', 'T UW TH B R AH SH'], 'prediction': ['HH AW S', 'HH AW S', 'K OW M', 'K OW M', 'T UW TH B  R AH SH'], 'id': ['BU01a-BNT01-house', 'BU01a-BNT01-house', 'BU01a-BNT02-comb', 'BU01a-BNT02-comb', 'BU01a-BNT03-toothbrush']}
{'prompt': ['house', 'house', 'comb', 'comb', 'toothbrush'], 'actual': ['HH AW TH', 'HH AW S', 'K AH UH M', 'K OW M', 'T UW TH B R AH SH'], 'prediction': ['HH AW S', 'HH AW S', 'K OW M', 'K OW M', 'B  R AH SH'], 'id': ['ACWT01a-BNT01-house', 'ACWT01a-BNT01-house', 'ACWT01a-BNT02-comb', 'ACWT01a-BNT02-comb', 'ACWT01a-

In [9]:
# assign the train and valid dataset
train_dataset = dataset['train']
val_dataset = dataset['valid']

In [10]:
# Check column names in train dataset
print(train_dataset.column_names)

# Check column names in validation dataset
print(val_dataset.column_names)

['prompt', 'actual', 'prediction', 'id']
['prompt', 'actual', 'prediction', 'id']


In [11]:
print(train_dataset['actual'][:5])
print(train_dataset['prediction'][:5])
#print(val_data[:5])

['HH AW TH', 'HH AW S', 'K AH UH M', 'K OW M', 'T UW TH B R AH SH']
['HH AW S', 'HH AW S', 'K OW M', 'K OW M', 'T UW TH B  R AH SH']


In [12]:
# Define the source and target language columns
source_lang = 'prediction'
target_lang = 'actual'

In [13]:
print(source_lang)

prediction


In [14]:
# Define the max_length for padding and truncation
max_length = 512

The preprocessing function serves to prepare the data for training or evaluation. It uses a tokenizer to tokenize the inputs and labels, formats the inputs by adding a source language identifier, encodes the tokenized inputs and labels, and creates a dictionary of model inputs. The function ensures that the data is properly tokenized, formatted, and encoded according to the model's requirements. It helps maintain consistency and compatibility between the input data and the model during training or evaluation.

In [15]:
# Initialize the tokenizer
tokenizer = BartTokenizerFast.from_pretrained('facebook/bart-base')

# Tokenize the data
# The preprocess_function function is defined to preprocess the data by tokenizing the inputs and labels
def preprocess_function(examples):
    inputs = [f'{source_lang}: {text}' for text in examples[source_lang]]
    targets = examples[target_lang]
    encoding = tokenizer(inputs, padding=True, truncation=True, return_tensors='pt', max_length=max_length)
    model_inputs = {
        'input_ids': encoding['input_ids'].squeeze(),
        'attention_mask': encoding['attention_mask'].squeeze(),
        'labels': tokenizer(targets, padding=True, truncation=True, return_tensors='pt')['input_ids'].squeeze()
    }
    return model_inputs

In [16]:
# Select a random data point from the train dataset
sample_data = train_dataset[0]

# Call the preprocess function on the sample data
processed_data = preprocess_function(sample_data)

# Inspect the output
print(processed_data)

{'input_ids': tensor([[    0, 37466, 26579,    35,   289,     2,     1],
        [    0, 37466, 26579,    35,   289,     2,     1],
        [    0, 37466, 26579,    35,  1437,  1437,     2],
        [    0, 37466, 26579,    35,    83,     2,     1],
        [    0, 37466, 26579,    35,   305,     2,     1],
        [    0, 37466, 26579,    35,  1437,  1437,     2],
        [    0, 37466, 26579,    35,   208,     2,     1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 0]]), 'labels': tensor([    0, 31901, 18463,  8640,     2])}


In [17]:
#train_data = preprocess_function(train_data)
#val_data = preprocess_function(val_data)

# Apply preprocess_function to train_data and val_data
train_dataset = train_dataset.map(preprocess_function, batched=True)
val_dataset = val_dataset.map(preprocess_function, batched=True)

Loading cached processed dataset at /home/chakraborti.m/.cache/huggingface/datasets/json/default-e2c510fd398ee096/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-d1b9ff4ce23a08ad.arrow
Loading cached processed dataset at /home/chakraborti.m/.cache/huggingface/datasets/json/default-e2c510fd398ee096/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-b6c23662b833f59b.arrow


In [18]:
# Access a few samples from train_dataset
for i in range(5):
    sample_input_ids = train_dataset['input_ids'][i]
    sample_attention_mask = train_dataset['attention_mask'][i]
    sample_labels = train_dataset['labels'][i]

    print(f"Sample {i+1}:")
    print("Input IDs:", sample_input_ids)
    print("Attention Mask:", sample_attention_mask)
    print("Labels:", sample_labels)
    print()

Sample 1:
Input IDs: [0, 37466, 26579, 35, 42339, 18463, 208, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Attention Mask: [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Labels: [0, 31901, 18463, 8640, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Sample 2:
Input IDs: [0, 37466, 26579, 35, 42339, 18463, 208, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Attention Mask: [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Labels: [0, 31901, 18463, 208, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Sample 3:
Input IDs: [0, 37466, 26579, 35, 229, 40395, 256, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Attention Mask: [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

A data loader is a component used in machine learning frameworks, such as PyTorch, to handle the loading and batching of data during the training or evaluation process. Its main purpose is to efficiently provide batches of data to the model for processing.

### **Train the model**

The purpose of a data collator is to take a list of samples from a dataset and collate them into a batch that can be processed by the model during training or evaluation. It works closely with the data loader to handle the specific requirements of the model's input format.

In [19]:
# clear out cuda memory
import torch
torch.cuda.empty_cache()

In [20]:
# define a data_collator function for batch processing
def data_collator(features):
    batch = {}
    # Pad input_ids and attention_mask to the maximum length within the batch
    max_length = max(len(feature['input_ids']) for feature in features)
    batch['input_ids'] = torch.stack([torch.tensor(feature['input_ids'] + [tokenizer.pad_token_id] * (max_length - len(feature['input_ids']))) for feature in features])
    batch['attention_mask'] = torch.stack([torch.tensor(feature['attention_mask'] + [0] * (max_length - len(feature['attention_mask']))) for feature in features])
    batch['labels'] = torch.stack([torch.tensor(feature['labels'] + [-100] * (max_length - len(feature['labels']))) for feature in features])
    return batch

The data loader is responsible for loading and batching the data, while the data collator is responsible for formatting and aligning the data within each batch. They serve different functions in the training process.

In [21]:
# Define training arguments
training_args = Seq2SeqTrainingArguments(
    output_dir="PSST_spell_correction_V3",
    evaluation_strategy="epoch",
    learning_rate=1e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=30,
    predict_with_generate=True,
    push_to_hub=True,
)

In [22]:
# verify passing the correct inputs to the trainer
print("Train Dataset:", train_dataset)
print("Validation Dataset:", val_dataset)
print("Tokenizer:", tokenizer)
print("Training Arguments:", training_args)

Train Dataset: Dataset({
    features: ['prompt', 'actual', 'prediction', 'id', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 24790
})
Validation Dataset: Dataset({
    features: ['prompt', 'actual', 'prediction', 'id', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 3612
})
Tokenizer: BartTokenizerFast(name_or_path='facebook/bart-base', vocab_size=50265, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False)}, clean_up_tokenization_spaces=True)
Training Arguments: Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dat

In [23]:
# model is initialized with the BARTForConditionalGeneration class and moved to the GPU if available.
model = BartForConditionalGeneration.from_pretrained('facebook/bart-base')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

BartForConditionalGeneration(
  (model): BartModel(
    (shared): Embedding(50265, 768, padding_idx=1)
    (encoder): BartEncoder(
      (embed_tokens): Embedding(50265, 768, padding_idx=1)
      (embed_positions): BartLearnedPositionalEmbedding(1026, 768)
      (layers): ModuleList(
        (0): BartEncoderLayer(
          (self_attn): BartAttention(
            (k_proj): Linear(in_features=768, out_features=768, bias=True)
            (v_proj): Linear(in_features=768, out_features=768, bias=True)
            (q_proj): Linear(in_features=768, out_features=768, bias=True)
            (out_proj): Linear(in_features=768, out_features=768, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (activation_fn): GELUActivation()
          (fc1): Linear(in_features=768, out_features=3072, bias=True)
          (fc2): Linear(in_features=3072, out_features=768, bias=True)
          (final_layer_norm): LayerNorm((768,), eps=1e-05,

In [24]:
# The Seq2SeqTrainer is created with the defined model, training arguments, datasets, tokenizer, and data_collator
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

/work/van-speech-nlp/spelling_correction/training and evaluation/PSST_spell_correction_V3 is already a clone of https://huggingface.co/monideep2255/PSST_spell_correction_V3. Make sure you pull the latest changes with `repo.git_pull()`.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

In [25]:
trainer.train()



Epoch,Training Loss,Validation Loss
1,0.3418,0.320598
2,0.2821,0.321605
3,0.2467,0.35012
4,0.2301,0.374705
5,0.2152,0.36138
6,0.2071,0.383605
7,0.2002,0.392345
8,0.1964,0.405326
9,0.1953,0.415405
10,0.1935,0.426924


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

TrainOutput(global_step=92970, training_loss=0.2119820987766656, metrics={'train_runtime': 27644.8245, 'train_samples_per_second': 26.902, 'train_steps_per_second': 3.363, 'total_flos': 1.0500224983492608e+17, 'train_loss': 0.2119820987766656, 'epoch': 30.0})

In [26]:
# push to trained model to huggingface
trainer.push_to_hub()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Upload file pytorch_model.bin:   0%|          | 1.00/532M [00:00<?, ?B/s]

To https://user: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@huggingface.co/monideep2255/PSST_spell_correction_V3
   572b6f4..79c1f3c  main -> main



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

To https://user: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@huggingface.co/monideep2255/PSST_spell_correction_V3
   79c1f3c..c5ea432  main -> main



huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


'https://huggingface.co/monideep2255/PSST_spell_correction_V3/commit/79c1f3cbaabe81707e0aa9d3946eec756231667e'