<a href="https://colab.research.google.com/github/GooseJacket/COMP-331-Final/blob/Grammar/Angela_Merrill_Final_Grammar_Version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#COMP 331 Final (Angela Merrill) - Grammar Corrector!

Guided by [this Bert Classification Tutorial](https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX) and [the given PA3 Text Generation Tutorial](https://colab.research.google.com/drive/1csEkXfVMAtjVQ7FGmUthlcnkclgELaAp?usp=sharing)

Data from [This Kaggle](https://www.kaggle.com/datasets/satishgunjal/grammar-correction)

# 1. Setup

In [76]:
# !pip install transformers

In [77]:
import pandas as pd
import numpy as np

import re
import time
import datetime

import random
from random import shuffle

import torch
from torch.utils.data import TensorDataset, random_split
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

from transformers import BertTokenizer
from transformers import BertForSequenceClassification, BertConfig
from transformers import get_linear_schedule_with_warmup


In [78]:
# Connect to GPU - pulled from Bert Text Classification Tutorial

# If there's a GPU available...
if torch.cuda.is_available():

    # Tell PyTorch to use the GPU.
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: Tesla T4


In [79]:
# Load data from csv
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Grammar Correction.csv")

# Should be 2018 bad-good sentence pairs!
print("Data Size:", df.shape[0])

df.head()

# Shuffle because it is sorted by type of grammar mistake
df = df.sample(len(df))

# get the good and bad lists
bad = [i for i in df["Ungrammatical Statement"].values]
good = [i for i in df["Standard English"].values]

Data Size: 2018


# 2. Train Text Classification

Mainly aided by [this tutorial!](https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX)

## Prepare Data

In [80]:
type_dict = {}

for i in df["Error Type"]:
  if i in type_dict:
    type_dict[i] += 1
  else:
    type_dict[i] = 1

type_dict

# 36 x 40

{'Mixed Conditionals': 49,
 'Clich√©s': 48,
 'Sentence Structure Errors': 103,
 'Conjunction Misuse': 49,
 'Inappropriate Register': 49,
 'Preposition Usage': 95,
 'Pronoun Errors': 47,
 'Article Usage': 100,
 'Run-on Sentences': 40,
 'Mixed Metaphors/Idioms': 50,
 'Sentence Fragments': 40,
 'Quantifier Errors': 48,
 'Gerund and Participle Errors': 50,
 'Faulty Comparisons': 49,
 'Capitalization Errors': 40,
 'Incorrect Auxiliaries': 50,
 'Relative Clause Errors': 51,
 'Verb Tense Errors': 100,
 'Punctuation Errors': 60,
 'Slang, Jargon, and Colloquialisms': 50,
 'Lack of Parallelism in Lists or Series': 50,
 'Infinitive Errors': 49,
 'Parallelism Errors': 49,
 'Contractions Errors': 49,
 'Subject-Verb Agreement': 100,
 'Agreement in Comparative and Superlative Forms': 49,
 'Redundancy/Repetition': 20,
 'Negation Errors': 50,
 'Word Choice/Usage': 40,
 'Abbreviation Errors': 50,
 'Ambiguity': 50,
 'Modifiers Misplacement': 46,
 'Passive Voice Overuse': 49,
 'Ellipsis Errors': 49,
 'Tau

In [81]:
# The tutorial had the text and labels as ndarrays, so I use np.hstack to keep that consistant
class_train_texts = np.hstack([bad, good])
class_train_lens = [len(str(i).split(" ")) for i in class_train_texts]
class_train_labels = np.hstack([[0 for i in range(len(bad))], [1 for i in range(len(good))]])

# make sure it's working
print(class_train_texts[0])  # should have grammar error
print(class_train_labels[0])  # should be 0
print(class_train_texts[-1])  # should be good grammar
print(class_train_labels[-1])  # should be 1

print("\nMax Length =", max(class_train_lens))  # get max length!

If I won the lottery, I could have traveled the world.
0
You should have been more careful while handling the equipment.
1

Max Length = 22


## Tokenize

In [82]:
# Torch got messed up!
# !pip uninstall torch
# !pip install torch

In [83]:
# Load the BERT tokenizer.
print('Loading BERT tokenizer...')
class_tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)

Loading BERT tokenizer...


In [84]:
max_len = max(class_train_lens) * 2  # to account for punctuation and larger things later

def tokenizeTexts(texts, labels):
  # Pulled straight from the BERT Text Classification Tutorial
  # texts (list of sentences) --> ids, attention_masks, labels

  ids = []
  attention_masks = []

  for sent in texts:
    encoded_dict = class_tokenizer.encode_plus(
      sent,                          # sentence
      add_special_tokens = True,     # add [CLS] and [SEP]
      max_length = max_len,          # used to pad to same size
      padding = "max_length",        # Had to change this from the tutorial!
      return_attention_mask = True,  # Get attention masks to ignore padding
      return_tensors = 'pt',          # Use pytorch
      truncation = True
    )

    # Log tokenized sentence and its attention mask
    ids.append(encoded_dict['input_ids'])
    attention_masks.append(encoded_dict['attention_mask'])

  # Lists --> tensors.
  ids = torch.cat(ids, dim=0)
  attention_masks = torch.cat(attention_masks, dim=0)
  labels = torch.tensor(labels)

  return ids, attention_masks, labels

class_train_ids, class_train_attention_masks, class_train_labels = tokenizeTexts(class_train_texts, class_train_labels)

# Print sentence 0, now as a list of IDs.
print('Original: ', class_train_texts[0])
print('Token IDs:', class_train_ids[0])
print('Attentions:', class_train_attention_masks[0])

Original:  If I won the lottery, I could have traveled the world.
Token IDs: tensor([  101,  1409,   146,  1281,  1103, 23366,   117,   146,  1180,  1138,
         5505,  1103,  1362,   119,   102,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0])
Attentions: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])


## Datasets/loaders

In [85]:
# Combine the training inputs into a TensorDataset.
class_train_dataset = TensorDataset(
    class_train_ids,
    class_train_attention_masks,
    class_train_labels)

class_train_batch_size = 16

# Create the DataLoaders for our training and validation sets.
# We'll take training samples in random order.
class_train_dataloader = DataLoader(
    class_train_dataset,  # The training samples.
    batch_size = class_train_batch_size, # Trains with this batch size.
    shuffle = True
)

## Set up Model

In [86]:
# Load BertForSequenceClassification
class_model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",          # Use the 12-layer BERT model, with an uncased vocab.
    num_labels = 2,               # Grammatical or Ungrammatical
    output_attentions = False,    # Whether the model returns attentions weights.
    output_hidden_states = False, # Whether the model returns all hidden-states.
)

# Tell pytorch to run this model on the GPU.
class_model.cuda()

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [87]:
# Huggingface's AdamW seems to be broken so I replaced it with torch's
class_train_opt = torch.optim.AdamW(class_model.parameters(),
                  lr = 5e-5, # Learning Rate
                  eps = 1e-8 # Epsilon?
                )

class_train_epochs = 2

# Create the learning rate scheduler.
class_train_scheduler = get_linear_schedule_with_warmup(class_train_opt,
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = len(class_train_dataloader) * class_train_epochs)

In [88]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

In [89]:
def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))

    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))


##Train!

In [90]:
# Pulled from Bert Text Classification Tutorial!
# I got rid of the validation to increase the training data.

# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128

# Set the seed value all over the place to make this reproducible.
seed_val = 42

random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# We'll store a number of quantities such as training and validation loss,
# validation accuracy, and timings.
class_train_training_stats = []

# Measure the total training time for the whole run.
class_train_total_t0 = time.time()

# For each epoch...
for epoch_i in range(0, class_train_epochs):

    # ========================================
    #               Training
    # ========================================

    # Perform one full pass over the training set.

    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, class_train_epochs))
    print('Training...')

    # Measure how long the training epoch takes.
    t0 = time.time()

    # Reset the total loss for this epoch.
    total_train_loss = 0

    # Put the model into training mode. Don't be mislead--the call to
    # `train` just changes the *mode*, it doesn't *perform* the training.
    # `dropout` and `batchnorm` layers behave differently during training
    # vs. test (source: https://stackoverflow.com/questions/51433378/what-does-model-train-do-in-pytorch)
    class_model.train()

    # For each batch of training data...
    for step, batch in enumerate(class_train_dataloader):

        # Progress update every 40 batches.
        if step % 40 == 0 and not step == 0:
            # Calculate elapsed time in minutes.
            elapsed = format_time(time.time() - t0)

            # Report progress.
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(class_train_dataloader), elapsed))

        # Unpack this training batch from our dataloader.
        #
        # As we unpack the batch, we'll also copy each tensor to the GPU using the
        # `to` method.
        #
        # `batch` contains three pytorch tensors:
        #   [0]: input ids
        #   [1]: attention masks
        #   [2]: labels
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)

        # Always clear any previously calculated gradients before performing a
        # backward pass. PyTorch doesn't do this automatically because
        # accumulating the gradients is "convenient while training RNNs".
        # (source: https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch)
        class_model.zero_grad()

        # Perform a forward pass (evaluate the model on this training batch).
        # In PyTorch, calling `model` will in turn call the model's `forward`
        # function and pass down the arguments. The `forward` function is
        # documented here:
        # https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification
        # The results are returned in a results object, documented here:
        # https://huggingface.co/transformers/main_classes/output.html#transformers.modeling_outputs.SequenceClassifierOutput
        # Specifically, we'll get the loss (because we provided labels) and the
        # "logits"--the model outputs prior to activation.
        result = class_model(b_input_ids,
                       token_type_ids=None,
                       attention_mask=b_input_mask,
                       labels=b_labels,
                       return_dict=True)

        loss = result.loss
        logits = result.logits

        # Accumulate the training loss over all of the batches so that we can
        # calculate the average loss at the end. `loss` is a Tensor containing a
        # single value; the `.item()` function just returns the Python value
        # from the tensor.
        total_train_loss += loss.item()

        # Perform a backward pass to calculate the gradients.
        loss.backward()

        # Clip the norm of the gradients to 1.0.
        # This is to help prevent the "exploding gradients" problem.
        torch.nn.utils.clip_grad_norm_(class_model.parameters(), 1.0)

        # Update parameters and take a step using the computed gradient.
        # The optimizer dictates the "update rule"--how the parameters are
        # modified based on their gradients, the learning rate, etc.
        class_train_opt.step()

        # Update the learning rate.
        class_train_scheduler.step()

    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(class_train_dataloader)

    # Measure how long this epoch took.
    training_time = format_time(time.time() - t0)

    print("")
    print("  Average training loss: {0:.2f}".format(avg_train_loss))
    print("  Training epoch took: {:}".format(training_time))

    # Record all statistics from this epoch.
    class_train_training_stats.append(
        {
            'epoch': epoch_i + 1,
            'Training Loss': avg_train_loss,
        }
    )

print("")
print("Training complete!")

print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-class_train_total_t0)))


Training...
  Batch    40  of    253.    Elapsed: 0:00:06.
  Batch    80  of    253.    Elapsed: 0:00:13.
  Batch   120  of    253.    Elapsed: 0:00:20.
  Batch   160  of    253.    Elapsed: 0:00:27.
  Batch   200  of    253.    Elapsed: 0:00:33.
  Batch   240  of    253.    Elapsed: 0:00:40.

  Average training loss: 0.70
  Training epoch took: 0:00:43

Training...
  Batch    40  of    253.    Elapsed: 0:00:06.
  Batch    80  of    253.    Elapsed: 0:00:14.
  Batch   120  of    253.    Elapsed: 0:00:20.
  Batch   160  of    253.    Elapsed: 0:00:26.
  Batch   200  of    253.    Elapsed: 0:00:32.
  Batch   240  of    253.    Elapsed: 0:00:39.

  Average training loss: 0.69
  Training epoch took: 0:00:41

Training complete!
Total training took 0:01:24 (h:mm:ss)


# Classification Function
input: text paragragh

output: list of guesses as un/grammatical

In [106]:
def predict(text):  # text = paragrah
  # turn paragraph into list of sentences
  splits = ["‚Ä¶", ".", "?", "!", ";", "\n"]
  texts = text
  for s in splits:
    texts = re.sub(re.escape(s), s+"SPLITHERE", texts)
  texts = [i for i in texts.split("SPLITHERE") if i not in ["", "\n"]]

  # we need fake labels!
  labels = [0 for i in texts]

  # set up data into dataloader
  ids, attention_masks, labels = tokenizeTexts(texts, labels)
  predict_dataset = TensorDataset(ids, attention_masks, labels)
  predict_dataloader = DataLoader(
              predict_dataset,
              sampler = RandomSampler(predict_dataset), # Shuffle the data
              batch_size = len(texts) # Go over entire paragraph
          )

  for batch in predict_dataloader:
    class_model.eval()
    with torch.no_grad():
      b_input_ids = batch[0].to(device)
      b_input_mask = batch[1].to(device)
      b_labels = batch[2].to(device)

      # Run the model over the data
      result = class_model(b_input_ids,
                      token_type_ids=None,
                      attention_mask=b_input_mask,
                      labels=b_labels,
                      return_dict=True)

      # Get the predictions
      logits = result.logits.detach().cpu().numpy()

  # Argmax to get the actual predictions
  guesses = np.argmax(logits, axis=1).flatten()

  return texts, guesses


predict_texts = [
    "I has a nightmare last night.",                        # 0
    "I went to the store but I forgets my shopping lsit!",  # 0
    "It was unfortunately embarrassing!",                   # 1
    "But whatever I can deal."
]
predict(" ".join(predict_texts))

(['I has a nightmare last night.',
  ' I went to the store but I forgets my shopping lsit!',
  ' It was unfortunately embarrassing!',
  ' But whatever I can deal.'],
 array([0, 1, 1, 0]))

#Train Text Generation

Mainly aided by the PA3 Tutorial!

## Setup

In [92]:
import os
import time
import datetime
from google.colab import drive

import pandas as pd
import seaborn as sns
import numpy as np
import random

import matplotlib.pyplot as plt
%matplotlib inline

import torch
from torch.utils.data import Dataset, DataLoader, random_split, RandomSampler, SequentialSampler
torch.manual_seed(42)

from transformers import GPT2LMHeadModel,  GPT2Tokenizer, GPT2Config, GPT2LMHeadModel
from transformers import get_linear_schedule_with_warmup
import torch.optim as optim

import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

## Prepare Data

In [93]:
gen_train_texts = np.dstack([bad, good])[0]

gen_train_texts = [" <Bad> " + i[0] + " <Fixed> " + i[1] for i in gen_train_texts]
gen_train_lens = [len(str(i).split(" ")) for i in gen_train_texts]
gen_max_len = int(sum(gen_train_lens)/len(gen_train_lens))

print(gen_train_texts[0])
print(gen_train_texts[-1])
print("\Max Length =", gen_max_len)  # get average length!

 <Bad> If I won the lottery, I could have traveled the world. <Fixed> If I won the lottery, I could travel the world.
 <Bad> You shoulda been more careful while handling the equipment. <Fixed> You should have been more careful while handling the equipment.
\Max Length = 20


  print("\Max Length =", gen_max_len)  # get average length!


## Tokenize

In [94]:
special_tokens = {
    "bad": "<Bad>",
    "good": "<Fixed>"
}

# Load the GPT tokenizer.
gen_tokenizer = GPT2Tokenizer.from_pretrained('gpt2', bos_token='<|startoftext|>', eos_token='<|endoftext|>', pad_token='<|pad|>', padding_side='left', extra_special_tokens=special_tokens)
# padding from the left because it's a decoder-only architecture

In [95]:
class GPT2Dataset(Dataset):

  def __init__(self, txt_list, tokenizer, gpt2_type="gpt2", max_length=768):

    self.tokenizer = tokenizer
    self.input_ids = []
    self.attn_masks = []

    for txt in txt_list:

      encodings_dict = tokenizer('<|startoftext|>'+ txt + '<|endoftext|>', truncation=True, max_length=max_length, padding="max_length")

      self.input_ids.append(torch.tensor(encodings_dict['input_ids']))
      self.attn_masks.append(torch.tensor(encodings_dict['attention_mask']))

  def __len__(self):
    return len(self.input_ids)

  def __getitem__(self, idx):
    return self.input_ids[idx], self.attn_masks[idx]

## Datasets/loaders

In [96]:
gen_train_dataset = GPT2Dataset(gen_train_texts, gen_tokenizer, max_length=gen_max_len * 3)

gen_train_batch_size = 16

gen_train_dataloader = DataLoader(
            gen_train_dataset,  # The training samples.
            sampler = RandomSampler(gen_train_dataset), # Select batches randomly
            batch_size = gen_train_batch_size # Trains with this batch size.
        )

##Set up Model

In [97]:
configuration = GPT2Config.from_pretrained('gpt2', output_hidden_states=False)

# instantiate the model
gen_model = GPT2LMHeadModel.from_pretrained("gpt2", config=configuration)

# Fix model size to account for special tokens
gen_model.resize_token_embeddings(len(gen_tokenizer))

# Tell pytorch to run this model on the GPU.
device = torch.device("cuda")
gen_model.cuda()

# Set the seed value all over the place to make this reproducible.
seed_val = 42

random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

gen_train_epochs = 6

learning_rate = 5e-4
warmup_steps = 1e2
epsilon = 1e-8

# this produces sample output every N steps
sample_every = 40

gen_train_opt = optim.AdamW(gen_model.parameters(),
                  lr = learning_rate,
                  eps = epsilon
                )

# Total number of training steps is [number of batches] x [number of epochs].
# (Note that this is not the same as the number of training samples).
total_steps = len(gen_train_dataloader) * gen_train_epochs

# Create the learning rate scheduler.
# This changes the learning rate as the training loop progresses
gen_train_scheduler = get_linear_schedule_with_warmup(gen_train_opt,
                                            num_warmup_steps = warmup_steps,
                                            num_training_steps = total_steps)

def format_time(elapsed):
    return str(datetime.timedelta(seconds=int(round((elapsed)))))

##Train!

In [98]:
gen_train_total_t0 = time.time()

gen_train_training_stats = []

# Push model to device (cuda)
gen_model = gen_model.to(device)

for epoch_i in range(0, gen_train_epochs):

    # ========================================
    #               Training
    # ========================================

    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, gen_train_epochs))
    print('Training...')

    t0 = time.time()

    total_train_loss = 0

    gen_model.train()

    for step, batch in enumerate(gen_train_dataloader):

        # b_input_ids and b_labels are both set to the input sequences of the batch
        # when we pass them to the model, the model will predict one token at a time
        # till the end of the sequence
        b_input_ids = batch[0].to(device)
        b_labels = batch[0].to(device)
        b_masks = batch[1].to(device)

        gen_model.zero_grad()

        outputs = gen_model(  b_input_ids,          # batch input token IDs
                          labels=b_labels,      # batch input token IDs
                          attention_mask = b_masks,
                          token_type_ids=None
                        )

        loss = outputs[0]

        batch_loss = loss.item()
        total_train_loss += batch_loss

        # Get sample (for human evaluation) every x batches.
        if step % sample_every == 0 and not step == 0:

            elapsed = format_time(time.time() - t0)
            print('  Batch {:>5,}  of  {:>5,}. Loss: {:>5,}.   Elapsed: {:}.'.format(step, len(gen_train_dataloader), batch_loss, elapsed))

            gen_model.eval()

            sample_outputs = gen_model.generate(
                                    b_input_ids,    # Use a batch of input IDs for generation
                                    attention_mask=b_masks, # Pass the attention mask
                                    pad_token_id=gen_tokenizer.pad_token_id, # Set the padding token ID as its ID in our tokenizer, otherwise there will be a warning
                                    do_sample=True,
                                    top_k=100, #ANGELA: changed to 100 to possibly add randomness!
                                    max_new_tokens = 200,
                                    top_p=0.95,
                                    num_return_sequences=1
                                )
            for i, sample_output in enumerate(sample_outputs):
                  print("{}: {}".format(i, gen_tokenizer.decode(sample_output, skip_special_tokens=True)))

            gen_model.train()

        loss.backward()

        gen_train_opt.step()

        # Update learning rate based on completed optimizer steps
        gen_train_scheduler.step()

    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(gen_train_dataloader)

    # Measure how long this epoch took.
    training_time = format_time(time.time() - t0)

    print("")
    print("  Average training loss: {0:.2f}".format(avg_train_loss))
    print("  Training epoch took: {:}".format(training_time))

    # Record all statistics from this epoch.
    gen_train_training_stats.append(
        {
            'epoch': epoch_i + 1,
            'Training Loss': avg_train_loss,
            'Training Time': training_time,
        }
    )

print("")
print("Training complete!")
print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-gen_train_total_t0)))



Training...
  Batch    40  of    127. Loss: 4.121891498565674.   Elapsed: 0:00:12.
0:   If you eat too much, you might have felt sick.  If you eat too much, you might feel sick.The Internet Archive is helping kids solve puzzles over grammar and grammar as students help solve simple grammar puzzles.

For the teacher at an elementary school in New York that teaches students how to solve math equations using grammar by using computer training, there was no need to bring a new computer to solve an old problem.
1:   The company wants to ensure they're customers are satisfied.  The company wants to ensure their customers are satisfied.What if we knew how powerful computer technology does not fix our problem? If you have an old computer, or you couldn't repair it, how likely would you be to have it repaired?
2:   The job requires strong analytical skills, attention to detail, and being a good team player.  The job requires strong analytical skills, attention to detail, and good team player a

In [99]:
def getFixed(prompt):
  gen_model.eval()

  generated = torch.tensor(gen_tokenizer.encode("<|startoftext|>" + prompt)).unsqueeze(0)
  generated = generated.to(device)

  ##### TODO #4: MESS WITH PARAMETERS #####
  # a lot of parameters to tinker with for different generation strategies
  sample_outputs = gen_model.generate(
                                  generated,
                                  max_length = 300,
                                  top_k=30,   # sample from the top-k most probable tokens
                                  temperature=1.3,  # a parameter to control randomness on the final softmax function!
                                  top_p=0.955, # nucleus sampling, sample from the subset of most probable tokens with a cumulative probability of p
                                  num_return_sequences=1,
                                  pad_token_id=gen_tokenizer.pad_token_id # pass the padding token ID explicitly, otherwise you'd get a warning.
                                  )

  ret = ""
  for i, sample_output in enumerate(sample_outputs):
    ret = gen_tokenizer.decode(sample_output, skip_special_tokens=False)
  try:
    ret = ret.split("<Fixed> ")[1]
    ret = ret.split("<|endoftext|>")[0]
  except:
    pass
  return(ret)

In [100]:
getFixed(" <Bad> I has alot of homework to do before I eat lunch.")

'I have a lot of homework to do before I eat lunch, so I have a lot of homework to do before I eat.'

# Put them together!!!

In [116]:
def grammarCorrection(yap):
  texts, predicts = predict(yap)
  correctedTexts = []

  for i in range(len(texts)):
    if predicts[i] == 0:
      rep = 0
      j = 0
      while rep == 0 and j < 5:
        fix = getFixed(texts[i])
        _, rep = predict(fix)
        rep = rep[0]
        j += 1
      correctedTexts.append(fix)
    else:
      correctedTexts.append(texts[i])
  return texts, predicts, correctedTexts

yap = "Thank you so much for answering this question. Well, I loaded the whole data set which have 60000 data into the data loader with shuffle = True. But when I train this model, I only use like 6400 of those data. I trained the model in 100 epoch and each epoch have 64 images. So I am just wondering does those 6400 data have all numbers? I means does it possible I was very lucky that only got 0 to 5 and the model could not identify the rest of numbers when it was tested?:joy::joy::joy::joy: If this happened, the accuracy should be lower‚Ä¶ But interesting thing is all my test accuracy is pretty normal. Like I trained it with 6400 data and test it on test data-set which have 10000 data and I still have 80% ~ 70% accuracy."
t, p, c = grammarCorrection(yap)
print(np.dstack([t, p,c]))

[[['Thank you so much for answering this question.' '0'
   'Please answer this question as soon as possible, and as possible, as soon as possible.']
  [' Well, I loaded the whole data set which have 60000 data into the data loader with shuffle = True.'
   '0'
   'I loaded the whole data set into the data loader with 60000 data points.']
  [' But when I train this model, I only use like 6400 of those data.'
   '1'
   ' But when I train this model, I only use like 6400 of those data.']
  [' I trained the model in 100 epoch and each epoch have 64 images.'
   '0' 'I trained the model in 100 epoch and each epoch has 64 images.']
  [' So I am just wondering does those 6400 data have all numbers?' '0'
   '6400 data has all the numbers in one place?']
  [' I means does it possible I was very lucky that only got 0 to 5 and the model could not identify the rest of numbers when it was tested?'
   '1'
   ' I means does it possible I was very lucky that only got 0 to 5 and the model could not ident

#Let's grade some essays!

Data from [this kaggle](https://www.kaggle.com/datasets/lburleigh/asap-2-0/data)

In [110]:
import kagglehub

essay_path = kagglehub.dataset_download("lburleigh/asap-2-0")
print("Path to dataset files:", essay_path)

essay_df = pd.read_csv(essay_path + "/ASAP2_train_sourcetexts.csv")

Using Colab cache for faster access to the 'asap-2-0' dataset.
Path to dataset files: /kaggle/input/asap-2-0


array(['Technology today is crazy. It keeps expanding and getting bigger, better, and faster. Recenltly scientist have made a computer that has the power and technology to read emotions in students and in other people. I feel like this shouldnt be used in a classroom.\n\nI feel that a computer shouldnt be able to tell wether or not a human being is sad or mad, it shouldnt be able to tell our emotions. Our emotions is for us to handle. Yes, us humans can tell when people we know are sad or down, but computers should not. People would use the computer for somthing else, like to see others emotions. Peoples emotions should keep to their self. It isnt for a computer or other humans to now, because its your emotion.\n\nYes, this could be valuable for some things, like movies, and video games. THis is not something we could use everyday in a classroom. We could use the computer to see whats wrong with a student. Once we figure out whats wrong with the student, for instance lets say the child

In [112]:
essays = essay_df["full_text"].sample(1).values
essays

array(["Exploring Venus seems very interesting when thinking about it. All of the beneficial things that we can study and analyze. Studying other planets have always been helpful to mankind and helps people gain insight on our solar system around us. But at the same time exploring these planets are very dangerous too. Not because of animals or humans but because of the harsh living conditions on other planets. So I will tell you why I think exploring Venus is not worth the pursuit of exploring.\n\nMy first reason of why I think Venus is not worth the pursuit is because of the difficulty of exploring Venus. In paragrpah 6 it states that ships orbiting over Venus with lights can't see the ground because of the dense atmosphere. Which makes it hard for scientists because if you can't see anything you can't analyze any data that you get because of Venus's features. But it does state that NASA is trying to find new ways of exploring Venus but it's not as good as hands on exploration. Which 

In [109]:
# Get colored text
# !pip install colorama
import colorama
from colorama import Fore
print(Fore.RED + 'This text is red in color')
print(Fore.GREEN + 'This text is green in color')

[31mThis text is red in color
[32mThis text is green in color


In [104]:
print(essays[0])

Driverless cars are a good inovention for the future. Driverless cars can help effect safety,traffic, and the polution in the air. Driverless cars could help communities and the economy in many ways.

Driverless cars could change the way people commute. As driverless cars begin to come in play, less people will have to drive their own car. As many people know, tons people get into accidents everyday because of the people behind the wheel making harmful decisions, if driverless cars were put out on the roads there would be less traffic accidents because there would not be people behind the wheel who are reckless and causing these accidents to occur. Driverless cars alert the drive when needed or when about to come into contact with something therfore these cars will help reduce the effects of human error and give more reaction time to the human in the passenger side.

This could also shorten the amount of people taking the bus to the places they need to go, which would make the polution

In [117]:
def essayGradeFormatting(essay):
  t, p, c = grammarCorrection(essays[0])
  ret = ""
  for i in range(len(t)):
    printLine = ""
    if(p[i] == 0):
      printLine += Fore.RED + t[i] + " " + Fore.GREEN
    else:
      printLine += Fore.WHITE
    ret += printLine + c[i] + " "

  # Add text wrapping
  ret = ret.split(" ")
  skip = 20
  for i in range(0, len(ret), skip):
    print(" ".join(ret[i:min(len(ret), i+skip)]))

essayGradeFormatting(essays[0])

[31mExploring Venus seems very interesting when thinking about it. [32mThe scientific community is exploring the possibility of exploring the possibility
of extraterrestrial civilizations, and there are various ways to improve our understanding of them. [37m All of the beneficial things
that we can study and analyze. [31m Studying other planets have always been helpful to mankind and helps people gain
insight on our solar system around us. [32mI studied other planets, and I've always been helpful to mankind and helped
people gain insights on our solar system. [31m But at the same time exploring these planets are very dangerous too.
[32mThe risks of exploring these worlds are very dangerous and very dangerous, and they are very dangerous. [31m Not because
of animals or humans but because of the harsh living conditions on other planets. [32mThe United States is the largest
and the largest economy in the world, with a gross domestic product of about 10. [37m So I will tell
you w