# **Applied Natural Language Processing**

# **Project: Grammar Autocorrection**

Wajd Alrabiah

## **Preparation**
In this section, I'm preparing my environment by installing, importing and downloading the necessary and desired libraries or packages for the whole project.

In [None]:
# Installing necessary libraries

!pip install happytransformer
''' - Happy Transformer: A user-friendly wrapper for Transformer models,
      useful for text tasks like text correction, translation, and classification.
'''

!pip install rich
''' - Rich: A library to enhance terminal output with rich text
      formatting, improving the display of logs, errors, or results.
'''

!pip install scikit-learn

!pip install nltk

!pip install rouge-score



In [None]:
# Imports

# Module to handle CSV file reading and writing.
import csv

# Import functions from the datasets library to load and manage datasets.
from datasets import load_dataset

# Import from Happy Transformer for text-to-text tasks like text correction, generation, etc.
from happytransformer import TTTrainArgs, HappyTextToText, TTSettings

# Import console and text classes from Rich for enhanced terminal output.
from rich.console import Console
from rich.text import Text

from sklearn.metrics import precision_score, recall_score, f1_score
from nltk.translate.bleu_score import sentence_bleu
from rouge_score import rouge_scorer

In [None]:
# Create a console for rich text
console = Console()

''' allowing for styled output,
    such as color, bold, and other text effects.
'''
console.print("Hello, World!", style="bold green")

## **Dataset: JFLEG (JHU FLuency-Extended GUG)**

**Dataset Summary** from the author: *Center for Language and Speech Processing @ JHU*

"An English grammatical error correction (GEC) corpus. It is a gold standard benchmark for developing and evaluating GEC systems with respect to fluency (extent to which a text is native-sounding) as well as grammaticality. For each source document, there are four human-written corrections."

\

**I took the dataset *JFLEG* from the link below:**

https://huggingface.co/datasets/jhu-clsp/jfleg

\

**Citation and all rights for the DATASET and only the dataset reserved to:**

@InProceedings


{
  
  napoles-sakaguchi-tetreault:2017:EACLshort,

  author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Tetreault, Joel},

  title     = {JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction},

  booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},

  month     = {April},

  year      = {2017},

  address   = {Valencia, Spain},

  publisher = {Association for Computational Linguistics},

  pages     = {229--234},

  url       = {http://www.aclweb.org/anthology/E17-2037}

}

@InProceedings

{
  
  heilman-EtAl:2014:P14-2,

  author    = {Heilman, Michael  and  Cahill, Aoife  and  Madnani, Nitin  and  Lopez, Melissa  and  Mulholland, Matthew  and  Tetreault, Joel},

  title     = {Predicting Grammaticality on an Ordinal Scale},

  booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},

  month     = {June},

  year      = {2014},

  address   = {Baltimore, Maryland},

  publisher = {Association for Computational Linguistics},

  pages     = {174--180},

  url       = {http://www.aclweb.org/anthology/P14-2029}
  
}

In [None]:
data = load_dataset("jfleg")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/148k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/141k [00:00<?, ?B/s]

Generating validation split:   0%|          | 0/755 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/748 [00:00<?, ? examples/s]

In [None]:
# Print the entire data structure
print(data)

DatasetDict({
    validation: Dataset({
        features: ['sentence', 'corrections'],
        num_rows: 755
    })
    test: Dataset({
        features: ['sentence', 'corrections'],
        num_rows: 748
    })
})


In [None]:
# Function to format and print the example
def format_example_output(example):
    print("Original Sentence:")
    print(f"- {example['sentence'].strip()}\n")
    print("Possible Corrections:")
    print("---------------------")
    for i, correction in enumerate(example['corrections'], start=1):
        print(f"{i}. {correction.strip()}")

# Call the function to print the formatted output
format_example_output(data['validation'][0])

Original Sentence:
- So I think we can not live if old people could not find siences and tecnologies and they did not developped .

Possible Corrections:
---------------------
1. So I think we would not be alive if our ancestors did not develop sciences and technologies .
2. So I think we could not live if older people did not develop science and technologies .
3. So I think we can not live if old people could not find science and technologies and they did not develop .
4. So I think we can not live if old people can not find the science and technology that has not been developed .


In [None]:
# Function to format and print dataset information
def print_dataset_info(data):
    dataset_info = data['validation'].info
    print("Features:")
    print("---------")
    for feature_name, feature_type in dataset_info.features.items():
        print(f"- {feature_name}: {feature_type}")
    print("---------")
    print(f"Dataset Size: {dataset_info.dataset_size} bytes")

# Call the function with your DatasetDict
print_dataset_info(data)

Features:
---------
- sentence: Value(dtype='string', id=None)
- corrections: Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)
---------
Dataset Size: 759678 bytes


In [None]:
# Load the JFLEG dataset: 'validation' split for training and 'test' split for evaluation.
train_dataset = load_dataset("jfleg", split='validation[:]')
eval_dataset = load_dataset("jfleg", split='test[:]')

In [None]:
# Display the first two correction cases from the training dataset
for case in train_dataset["corrections"][:2]:
    print(case)
    print(case[0])
    print("--------------------------------------------------------")

['So I think we would not be alive if our ancestors did not develop sciences and technologies . ', 'So I think we could not live if older people did not develop science and technologies . ', 'So I think we can not live if old people could not find science and technologies and they did not develop . ', 'So I think we can not live if old people can not find the science and technology that has not been developed . ']
So I think we would not be alive if our ancestors did not develop sciences and technologies . 
--------------------------------------------------------
['Not for use with a car . ', 'Do not use in the car . ', 'Car not for use . ', 'Can not use the car . ']
Not for use with a car . 
--------------------------------------------------------


In [None]:
# Function to generate CSV files for training and evaluation
def generate_csv(csv_path, dataset):
  with open(csv_path, 'w', newline='') as csvfile: # Open the CSV file for writing
    writter = csv.writer(csvfile) # Create a CSV writer object
    writter.writerow(["input", "target"]) # Write the header row

    for case in dataset:
 	    # Adding the task's prefix to input
      input_text = "grammar: " + case["sentence"]

      for correction in case["corrections"]:
        # Ensure non-blank strings for both input and correction
        if input_text and correction:
          # Write input and target correction to CSV
          writter.writerow([input_text, correction])

In [None]:
# Generate CSV files for training and evaluation datasets
generate_csv("train.csv", train_dataset)
generate_csv("eval.csv", eval_dataset)

## **Model:  Text-To-Text Transfer Transformer (T5 Base) - google-t5**

**I took the model *t5-base* from the link below:**

https://huggingface.co/google-t5/t5-base

\

**Citation and all rights for the model (T5 Base) reserved to:**

@article

{
  
  2020t5,

  author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},

  title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},

  journal = {Journal of Machine Learning Research},

  year    = {2020},

  volume  = {21},

  number  = {140},

  pages   = {1-67},

  url     = {http://jmlr.org/papers/v21/20-074.html}

}


In [None]:
# Load the model
happy_tt = HappyTextToText("T5", "t5-base")

In [None]:
# Evaluation arguments
eval_args = TTTrainArgs(
    batch_size=8,              # Same batch size as training for consistency
    max_input_length=512,      # Maximum token length for input
    max_output_length=512,     # Maximum token length for output
)

# Train the model using the ee.csv
before_loss = happy_tt.eval("eval.csv", args=eval_args)

# Print the loss before training
print("Before loss:", before_loss.loss)

Generating eval split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/2988 [00:00<?, ? examples/s]

Before loss: 1.2721879482269287


In [None]:
# Training arguments
train_args = TTTrainArgs(
    batch_size=8,               # Adjust batch size based on available resources
    max_input_length=512,       # Maximum token length for input sequences
    max_output_length=512,      # Maximum token length for output sequences
    num_train_epochs=3          # Number of training epochs
)

# Train the model using the train.csv
happy_tt.train("train.csv", args=train_args)

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/2714 [00:00<?, ? examples/s]

Map:   0%|          | 0/302 [00:00<?, ? examples/s]



Step,Training Loss,Validation Loss
1,1.5583,1.124235
102,0.7232,0.548969
204,0.5913,0.499192
306,0.5863,0.466081
408,0.498,0.461695
510,0.509,0.448162
612,0.4422,0.451451
714,0.454,0.441036
816,0.4083,0.438324
918,0.4038,0.440903


In [None]:
# Evaluation arguments
eval_args = TTTrainArgs(
    batch_size=8,              # Same batch size as training for consistency
    max_input_length=512,      # Maximum token length for input
    max_output_length=512,     # Maximum token length for output
)

# Evaluate the model using the eval.csv
after_loss = happy_tt.eval("eval.csv", args=eval_args)

# Print the loss after training
print("After loss:", after_loss.loss)

Map:   0%|          | 0/2988 [00:00<?, ? examples/s]

After loss: 0.45216092467308044


In [None]:
# Set up beam search settings for inference
beam_settings = TTSettings(
    num_beams=5,        # Number of beams for beam search to balance exploration and exploitation
    min_length=1,       # Minimum length of the generated output
    max_length=20       # Maximum length of the generated output
)

In [None]:
# Evaluate on the eval dataset and collect predictions and targets
predictions = []
targets = []

In [None]:
# Make predictions for evaluation dataset
for case in eval_dataset:
    input_text = "grammar: " + case["sentence"]
    predicted_correction = happy_tt.generate_text(input_text, args=beam_settings).text
    predictions.append(predicted_correction)
    targets.append(case["corrections"][0])  # Use the first correction as the target

In [None]:
# Calculate evaluation metrics
precision = precision_score(targets, predictions, average='weighted', zero_division=0)
recall = recall_score(targets, predictions, average='weighted', zero_division=0)
f1 = f1_score(targets, predictions, average='weighted', zero_division=0)

In [None]:
# Calculate BLEU score
bleu_scores = [sentence_bleu([target.split()], prediction.split()) for target, prediction in zip(targets, predictions)]
average_bleu = sum(bleu_scores) / len(bleu_scores)

The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


In [None]:
# Calculate ROUGE score
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Initialize lists to hold ROUGE scores
rouge1_scores = []
rouge2_scores = []
rougeL_scores = []

# Iterate over targets and predictions to calculate ROUGE scores
for target, prediction in zip(targets, predictions):
    scores = scorer.score(target, prediction)
    rouge1_scores.append(scores['rouge1'].fmeasure)  # Store F1 score for ROUGE-1
    rouge2_scores.append(scores['rouge2'].fmeasure)  # Store F1 score for ROUGE-2
    rougeL_scores.append(scores['rougeL'].fmeasure)  # Store F1 score for ROUGE-L

# Calculate average ROUGE scores
average_rouge1 = sum(rouge1_scores) / len(rouge1_scores)
average_rouge2 = sum(rouge2_scores) / len(rouge2_scores)
average_rougeL = sum(rougeL_scores) / len(rougeL_scores)

In [None]:
# Display evaluation metrics
console.print("\nEvaluation Metrics:", style="bold blue")
console.print(f"Precision: {precision:.4f}", style="bold green")
console.print(f"Recall: {recall:.4f}", style="bold green")
console.print(f"F1 Score: {f1:.4f}", style="bold green")
console.print(f"Average BLEU Score: {average_bleu:.4f}", style="bold green")
console.print("\nROUGE Scores:", style="bold blue")
console.print(f"Average ROUGE-1 F1 Score: {average_rouge1:.4f}", style="bold green")
console.print(f"Average ROUGE-2 F1 Score: {average_rouge2:.4f}", style="bold green")
console.print(f"Average ROUGE-L F1 Score: {average_rougeL:.4f}", style="bold green")

## **Try and use the program**

ÿßŸÑÿßÿ≥ŸÖ Ÿàÿπ

In [None]:
# User Instructions
console.print("\n" + "="*50)
console.print(Text("   Welcome to the Grammar Correction Program! üîç", style="bold magenta"))
console.print("="*50)
console.print("\n\nThis program employs a transformer model to correct grammatical errors in sentences.")
console.print("Users can input sentences containing grammatical mistakes, and the model will provide corrections.")
console.print("\nHere are two examples illustrating how it works:\n")

In [None]:
# Example 1 for correction
example_1 = "This sentences, has bads grammar and spelling!"
result_1 = happy_tt.generate_text(example_1, args=beam_settings)
console.print("üîπ Example 1 Input: ", style="bold yellow")
console.print(f"   {example_1}\n")
console.print("üîπ Generated Correction for Example 1: ", style="bold green")
console.print(f"   {result_1.text}")

In [None]:
# Example 2 for correction
example_2 = "I am enjoys, writtings articles ons AI and I also enjoyed write articling on AI."
result_2 = happy_tt.generate_text(example_2, args=beam_settings)
console.print("\nüîπ Example 2 sentence:", style="bold yellow")
console.print(f"   {example_2}\n")
console.print("üîπ Generated Correction for Example 2:", style="bold green")
console.print(f"   {result_2.text}")

In [None]:
# Prompt user to try the program
console.print("\n" + "="*70)
console.print("üîπ Now it's your turn! Enter a sentence with grammatical mistakes: ", style="bold yellow")
user_input = input("   ")
user_result = happy_tt.generate_text("  " + user_input, args=beam_settings)
console.print("\n")
console.print("üîπ Generated Correction for Your Input: ", style="bold green")
console.print(f"   {user_result.text}")
console.print("="*70)

   hi i is likeingg you
