Aim / Objective

To design and evaluate a Sequence-to-Sequence (Seq2Seq) model that automatically detects and corrects grammatical errors in English sentences using a pre-trained transformer-based model.

Problem Statement

Given a grammatically incorrect sentence, the system should generate a corrected version of the sentence while preserving its original meaning.

Dataset Used


*   JFLEG Dataset (conceptual reference)

*   Pre-trained Model: T5-small fine-tuned for grammar correction
(No explicit dataset training required for this experiment)



Algorithm / Model Used
T5 (Text-To-Text Transfer Transformer)

  Treats grammar correction as a Seq2Seq text-to-text task

  Input: Incorrect sentence

  Output: Corrected sentence

Incorrect Sentence       

        ↓
Tokenizer

        ↓
Encoder (T5)
  
        ↓
Decoder (T5)
  
        ↓
Corrected Sentence


Implementation Steps ->

In [1]:
#Install Required Libraries
!pip install transformers sentencepiece torch



In [2]:
#Import Libraries
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch



In [3]:
#Load Pre-trained Model and Tokenizer
model_name = "prithivida/grammar_error_correcter_v1"

tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

In [4]:
#Grammar Correction Function
def correct_grammar(sentence):
    input_text = "gec: " + sentence
    input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=128, truncation=True)

    outputs = model.generate(
        input_ids,
        max_length=128,
        num_beams=5,
        early_stopping=True
    )

    corrected_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return corrected_sentence

In [7]:
#Test the Model
while True:
    sentence = input("\nEnter a sentence (or type 'exit' to stop): ")

    if sentence.lower() == "exit":
        print("Exiting grammar correction...")
        break

    print("Corrected:", correct_grammar(sentence))


Enter a sentence (or type 'exit' to stop): she go to school every day
Corrected: she goes to school every day.

Enter a sentence (or type 'exit' to stop): exit
Exiting grammar correction...


In [8]:
from IPython.display import display
import ipywidgets as widgets

text_box = widgets.Text(
    description='Input:',
    placeholder='Enter sentence here'
)

button = widgets.Button(description="Correct Grammar")
output = widgets.Output()

def on_button_click(b):
    with output:
        output.clear_output()
        print(correct_grammar(text_box.value))

button.on_click(on_button_click)

display(text_box, button, output)

Text(value='', description='Input:', placeholder='Enter sentence here')

Button(description='Correct Grammar', style=ButtonStyle())

Output()

In [None]:
""" Output
Input : She go to college everyday.
Output: She goes to college every day.

Input : He have completed the work yesterday.
Output: He has completed the work yesterday.

Input : I am agree with your opinion.
Output: I agree with your opinion. """


Result / Observation

The Seq2Seq model successfully corrected:
*   Verb tense errors
*   Subject–verb agreement
*   Redundant auxiliary verbs

The meaning of sentences was preserved

Model performed well without explicit training




Conclusion

This experiment demonstrates that pre-trained Seq2Seq transformer models can effectively perform grammar error correction with minimal implementation effort. Such models are useful in applications like writing assistants, chatbots, and educational tools.