
#Automated Grammar and Language Error Correction System

Team Number: Team-8

Team Members:
1. Chandana Gangaraju
2. Akshitha Komatireddy
3. Sai Avinash Polina
4. Venkatesh Rakurthi

Course Details:
- Course Name: Natural Language Processing
- Course Number: AIT 526 , Section 001
- Professor: Duoduo Liao

Project Description:

This project aims to develop an automated system for detecting and correcting
grammatical errors in English sentences. The system leverages two transformer
models, T5 (Text-to-Text Transfer Transformer) and BART (Bidirectional and
Auto-Regressive Transformer), for grammar correction. T5 was chosen as the
primary model for the final implementation due to its superior performance
and versatility in sequence-to-sequence tasks.

Key Features:
1. Grammar Correction:
   - Detects and corrects grammatical errors using T5 and BART models.
   - Provides real-time corrections for user-input sentences.
2. Feedback Generation:
   - Generates detailed grammatical feedback using OpenAI's GPT API.
3. Evaluation:
   - Compares the performance of T5 and BART using GLEU scores.
4. Interactive Prompt:
   - Allows users to input incorrect sentences and receive corrections
     and optional feedback.

Datasets Used:
- JFLEG (The Johns Hopkins Fluency-Extended GEC Dataset) for fine-tuning
  the T5 and BART models.

Tools & Libraries:
- Hugging Face Transformers and Datasets
- PyTorch for model fine-tuning
- OpenAI GPT API for feedback generation
- NLTK for GLEU score evaluation
- Python for scripting and development

Model Details:
1. T5 (Base):
   - A pre-trained transformer model designed for text-to-text tasks.
   - Selected as the primary model due to its ability to handle grammar correction
     tasks efficiently and its state-of-the-art performance in sequence generation.
2. BART (Base):
   - A bidirectional transformer designed for text generation and comprehension.
   - Used for comparative analysis with T5.

Evaluation Metrics:
- GLEU (Generalized Language Evaluation Understanding) score is used to measure
  the fluency and grammatical correctness of the corrected sentences.

Developed By:
Team-8



Step 1: Install Necessary Libraries

In [None]:
!pip install openai transformers datasets nltk wordcloud matplotlib spacy
!python -m spacy download en_core_web_sm



Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m49.3 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Step 2: Import Libraries
 Importing required modules for NLP tasks and visualization.
 Transformer models (T5 and BART) and their tokenizers are used for conditional text generation.
 Trainer and TrainingArguments are used for fine-tuning the models.
 DatasetDict is used to handle datasets, while NLTK and other modules aid in text processing and evaluation.

In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer, BartForConditionalGeneration, BartTokenizer, Trainer, TrainingArguments
from datasets import load_dataset, DatasetDict
import nltk  # NLTK is used for natural language processing tasks, like tokenization and evaluation.
import random  # For random sampling or shuffling when required.
import matplotlib.pyplot as plt  # For visualizing data or results, e.g., creating plots.
from nltk.translate.gleu_score import sentence_gleu  # To compute GLEU scores for evaluating generated text quality.
from wordcloud import WordCloud  # To create word clouds for visual representation of text data.
from nltk.corpus import stopwords  # To remove commonly used words that may not add meaningful context.
import spacy  # For advanced NLP tasks like tokenization, lemmatization, and named entity recognition.

# Downloading necessary NLTK data files to ensure smooth execution of NLTK-related tasks.
nltk.download("punkt")  # Required for tokenizing sentences and words in text data.
nltk.download("stopwords")  # Provides a list of common stop words for filtering irrelevant words.

# Defining the set of English stop words, which can be used to preprocess text data by removing unimportant words.
stop_words = set(stopwords.words("english"))



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Step 3: Load and Split Dataset
   Load the JFLEG dataset, which is designed for evaluating grammar and language error correction systems.

 The dataset contains a mix of original and corrected sentences, useful for tasks like text correction.

In [None]:
dataset = load_dataset("jhu-clsp/jfleg")

# Splitting the 'validation' set into train and validation subsets for training purposes.
train_test_split = dataset['validation'].train_test_split(test_size=0.2)

# Reorganizing the dataset into a DatasetDict with 'train', 'validation', and 'test' subsets.
# This allows for structured access and usage of each subset during model training and evaluation.
dataset = DatasetDict({
    'train': train_test_split['train'],      # The new training set derived from the validation split.
    'validation': train_test_split['test'], # The remaining 20% serves as the validation set.
    'test': dataset['test']                 # The original 'test' set remains unchanged.
})

# Displaying the sizes of each dataset split for verification and debugging purposes.
# This confirms the splits were performed as intended.
print(f"Dataset sizes: { {split: len(dataset[split]) for split in dataset.keys()} }")


Dataset sizes: {'train': 604, 'validation': 151, 'test': 748}


Step 4: Data Augmentation

 Function to introduce errors into a sentence for data augmentation.

  This simulates common grammatical errors to train a robust grammar correction model.


In [None]:
def augment_with_errors(sentence, correction):
    modified_sentence = sentence  # Start with the original sentence.

    # Introduce article misuse by swapping 'a' with 'the' or vice versa.
    if " a " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" a ", " the ", 1)  # Replace the first occurrence of 'a' with 'the'.
    elif " the " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" the ", " a ", 1)  # Replace the first occurrence of 'the' with 'a'.

    # Introduce subject-verb agreement errors.
    if " is " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" is ", " are ", 1)  # Swap singular with plural verb.
    elif " are " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" are ", " is ", 1)  # Swap plural with singular verb.

    # Simulate tense inconsistency by changing verb tense.
    if " was " in modified_sentence and "ing" in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" was ", " is ", 1).replace("ing", "ed", 1)  # Change past continuous to present perfect.
    elif " ate " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" ate ", " eat ", 1)  # Change past tense to present tense.

    # Introduce preposition misuse by swapping 'in' with 'on' or vice versa.
    if " in " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" in ", " on ", 1)  # Replace 'in' with 'on'.
    elif " on " in modified_sentence and random.random() > 0.5:
        modified_sentence = modified_sentence.replace(" on ", " in ", 1)  # Replace 'on' with 'in'.

    return modified_sentence, correction  # Return the modified sentence along with its original correction.

# Function to augment a batch of sentences with simulated errors.
def augment_dataset(batch):
    augmented_sentences = []  # List to store augmented sentences.
    augmented_corrections = []  # List to store corresponding corrections.

    # Iterate through each sentence and its corresponding correction.
    for sentence, correction in zip(batch["sentence"], batch["corrections"]):
        correction = correction[0] if correction else sentence  # Use the original sentence if no corrections are provided.
        augmented_sentence, correction = augment_with_errors(sentence, correction)  # Generate augmented data.
        augmented_sentences.append(augmented_sentence)  # Add augmented sentence to the list.
        augmented_corrections.append(correction)  # Add corresponding correction to the list.

    # Return the augmented batch as a dictionary.
    return {"sentence": augmented_sentences, "corrections": augmented_corrections}

# Augment the training dataset with simulated errors.
# The map function applies the augment_dataset function to each batch in the 'train' dataset.
augmented_train_dataset = dataset["train"].map(augment_dataset, batched=True)

# Combine the original training data with the augmented data.
# This increases the diversity of the training dataset for improved model generalization.
combined_train_dataset = {
    "sentence": dataset["train"]["sentence"] + augmented_train_dataset["sentence"],  # Combine original and augmented sentences.
    "corrections": dataset["train"]["corrections"] + augmented_train_dataset["corrections"],  # Combine original and augmented corrections.
}


Step 5: Normalize Dataset

   Function to normalize a dataset by ensuring all sentences and corrections are strings.

 This is important to avoid type errors and maintain consistency during training and evaluation.

In [None]:
def normalize_combined_dataset(dataset):
    normalized_sentences = []  # List to store normalized sentences.
    normalized_corrections = []  # List to store normalized corrections.

    # Iterate through each sentence and its corresponding correction in the dataset.
    for sentence, correction in zip(dataset["sentence"], dataset["corrections"]):
        # Ensure the sentence is a string. If not, convert it to a string.
        if not isinstance(sentence, str):
            sentence = str(sentence)
        normalized_sentences.append(sentence)  # Add the normalized sentence to the list.

        # Ensure corrections are strings. If it's a list, use the first element if available.
        if isinstance(correction, list):
            correction = correction[0] if correction else ""  # Handle empty lists gracefully.
        if not isinstance(correction, str):
            correction = str(correction)  # Convert non-string corrections to strings.
        normalized_corrections.append(correction)  # Add the normalized correction to the list.

    # Return the normalized dataset as a dictionary.
    return {"sentence": normalized_sentences, "corrections": normalized_corrections}

# Normalize the combined training dataset to ensure uniform data types.
# This step avoids potential issues during model training due to type mismatches.
combined_train_dataset = normalize_combined_dataset(combined_train_dataset)

from datasets import Dataset  # Import Dataset class from Hugging Face's datasets library.
combined_train_dataset = Dataset.from_dict(combined_train_dataset)  # Create a Dataset object from the dictionary.


Step 6: Preprocessing

Function to preprocess the dataset for input into the model.

 It prepares the source sentences and target corrections for tokenization.

In [None]:
def preprocess_function(examples, tokenizer):
    # Prefixing each input sentence with "fix:" to specify the task for the model (grammar correction).
    # This is especially useful for T5-like models that rely on task-specific prefixes.
    inputs = ["fix: " + sentence for sentence in examples["sentence"]]

    # Ensure target corrections are strings. If corrections are lists, take the first correction.
    targets = [
        correction if isinstance(correction, str) else correction[0]
        for correction in examples["corrections"]
    ]

    # Tokenize the input sentences.
    tokenized_inputs = tokenizer(
        inputs, max_length=128, truncation=True, padding="max_length"
    )

    # Tokenize the target sentences (corrections) using the same settings as the inputs.
    tokenized_targets = tokenizer(
        targets, max_length=128, truncation=True, padding="max_length"
    )

    # Return tokenized inputs and targets in the format expected by Hugging Face models.
    return {
        "input_ids": tokenized_inputs["input_ids"],  # Encoded input tokens.
        "attention_mask": tokenized_inputs["attention_mask"],  # Attention mask for padding.
        "labels": tokenized_targets["input_ids"],  # Encoded target tokens (used as labels).
    }

# Load tokenizers for the T5 and BART models.
# These tokenizers convert text into token IDs suitable for input into their respective models.
t5_tokenizer = T5Tokenizer.from_pretrained("t5-base")  # T5 tokenizer.
bart_tokenizer = BartTokenizer.from_pretrained("facebook/bart-base")  # BART tokenizer.

# Tokenize datasets for T5 model.
# The `map` function applies the preprocessing function to the entire dataset in a batched manner.
tokenized_train_t5 = combined_train_dataset.map(
    lambda batch: preprocess_function(batch, t5_tokenizer), batched=True
)
tokenized_validation_t5 = dataset["validation"].map(
    lambda batch: preprocess_function(batch, t5_tokenizer), batched=True
)
tokenized_test_t5 = dataset["test"].map(
    lambda batch: preprocess_function(batch, t5_tokenizer), batched=True
)

# Tokenize datasets for BART model.
tokenized_train_bart = combined_train_dataset.map(
    lambda batch: preprocess_function(batch, bart_tokenizer), batched=True
)
tokenized_validation_bart = dataset["validation"].map(
    lambda batch: preprocess_function(batch, bart_tokenizer), batched=True
)
tokenized_test_bart = dataset["test"].map(
    lambda batch: preprocess_function(batch, bart_tokenizer), batched=True
)

# Print tokenized datasets to verify successful preprocessing.
print("T5 Train Dataset:", tokenized_train_t5)  # Display tokenized training dataset for T5.
print("BART Train Dataset:", tokenized_train_bart)  # Display tokenized training dataset for BART.


Map:   0%|          | 0/1208 [00:00<?, ? examples/s]

Map:   0%|          | 0/1208 [00:00<?, ? examples/s]

Map:   0%|          | 0/151 [00:00<?, ? examples/s]

Map:   0%|          | 0/748 [00:00<?, ? examples/s]

T5 Train Dataset: Dataset({
    features: ['sentence', 'corrections', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 1208
})
BART Train Dataset: Dataset({
    features: ['sentence', 'corrections', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 1208
})


Step 7: Load Models

In [None]:
# Load the pre-trained T5 model and move it to the GPU for faster computation.
# The T5 model is designed for conditional generation tasks, including text-to-text tasks like grammar correction.
t5_model = T5ForConditionalGeneration.from_pretrained("t5-base").to("cuda")

# Load the pre-trained BART model and move it to the GPU for faster computation.
# The BART model is an encoder-decoder model also well-suited for conditional text generation tasks.
bart_model = BartForConditionalGeneration.from_pretrained("facebook/bart-base").to("cuda")


Step 8: Training
Define training arguments for the model fine-tuning process using the `TrainingArguments` class.

These arguments control the training and evaluation configurations.

In [None]:

training_args = TrainingArguments(
    output_dir="./finetuned_t5",  # Directory where the fine-tuned T5 model will be saved.
    evaluation_strategy="epoch",  # Evaluate the model after every epoch.
    learning_rate=5e-5,  # Set the learning rate for the optimizer.
    per_device_train_batch_size=8,  # Batch size for training on each device.
    per_device_eval_batch_size=8,  # Batch size for evaluation on each device.
    num_train_epochs=5,  # Number of epochs to train the model.
    weight_decay=0.01,  # Apply weight decay for regularization.
    save_total_limit=2,  # Limit the number of saved checkpoints to the most recent two.
    report_to="none",  # Disable reporting to external tools (e.g., WandB or TensorBoard).
)

# Load the pre-trained T5 model and move it to the GPU for fine-tuning.
t5_model = T5ForConditionalGeneration.from_pretrained("t5-base").to("cuda")

# Create a `Trainer` instance for fine-tuning the T5 model.
# The trainer automates the training and evaluation loops.
t5_trainer = Trainer(
    model=t5_model,  # The T5 model to fine-tune.
    args=training_args,  # The training arguments defined above.
    train_dataset=tokenized_train_t5,  # The tokenized training dataset for T5.
    eval_dataset=tokenized_validation_t5,  # The tokenized validation dataset for T5.
)

# Train the T5 model using the specified training arguments and datasets.
t5_trainer.train()

# Load the pre-trained BART model and move it to the GPU for fine-tuning.
bart_model = BartForConditionalGeneration.from_pretrained("facebook/bart-base").to("cuda")

# Update the output directory for the BART model fine-tuning to avoid overwriting the T5 outputs.
training_args.output_dir = "./finetuned_bart"

# Create a `Trainer` instance for fine-tuning the BART model.
bart_trainer = Trainer(
    model=bart_model,  # The BART model to fine-tune.
    args=training_args,  # The same training arguments with an updated output directory.
    train_dataset=tokenized_train_bart,  # The tokenized training dataset for BART.
    eval_dataset=tokenized_validation_bart,  # The tokenized validation dataset for BART.
)

# Train the BART model using the specified training arguments and datasets.
bart_trainer.train()




Epoch,Training Loss,Validation Loss
1,No log,0.113926
2,No log,0.108792
3,No log,0.108878
4,0.424100,0.110803
5,0.424100,0.111324


Epoch,Training Loss,Validation Loss
1,No log,0.142441
2,No log,0.148328
3,No log,0.16403
4,0.616300,0.17399
5,0.616300,0.177907




TrainOutput(global_step=755, training_loss=0.4136684133517032, metrics={'train_runtime': 310.2176, 'train_samples_per_second': 19.47, 'train_steps_per_second': 2.434, 'total_flos': 460351025971200.0, 'train_loss': 0.4136684133517032, 'epoch': 5.0})

Step 9: Grammer Correction

In [None]:
def correct_grammar(sentence, model_name="t5"):

    # Select tokenizer and model based on the specified model_name.
    # If model_name is "t5", use the T5 tokenizer and model.
    # Otherwise, use the BART tokenizer and model.
    tokenizer, model = (t5_tokenizer, t5_model) if model_name == "t5" else (bart_tokenizer, bart_model)

    # Prepare the input sentence by prefixing it with "fix:" to match the model's task prompt.
    input_text = f"fix: {sentence}"

    # Tokenize the input sentence, converting it to model-ready tensor format.
    # The `return_tensors="pt"` ensures the output is in PyTorch tensor format.
    # `.to("cuda")` moves the tensor to the GPU for faster computation.
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

    # Use the model to generate the corrected sentence.
    # - max_length: Sets the maximum token length of the generated text.
    # - num_beams: Uses beam search with 4 beams for higher-quality generation.
    # - early_stopping: Stops generation early when all beams agree on the output.
    outputs = model.generate(input_ids, max_length=128, num_beams=4, early_stopping=True)

    # Decode the generated token IDs back into a human-readable sentence.
    # `skip_special_tokens=True` removes tokens like <pad> or <eos>.
    corrected_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Return the corrected sentence.
    return corrected_sentence


Step 10: GLEU Score for T5 and BART

In [None]:
from nltk.translate.gleu_score import sentence_gleu

def evaluate_gleu(corrected_sentences, reference_sentences):

    gleu_scores = []

    # Iterate through each corrected sentence and its corresponding reference
    for corrected_sentence, references in zip(corrected_sentences, reference_sentences):
        # Tokenize the corrected sentence and references into words (tokens)
        corrected_tokens = corrected_sentence.split()
        reference_tokens = [ref.split() for ref in references]  # Multiple references

        # Calculate the GLEU score for the current sentence
        gleu_score = sentence_gleu(reference_tokens, corrected_tokens)
        gleu_scores.append(gleu_score)

    # Calculate the average GLEU score by averaging across all sentences
    avg_gleu_score = sum(gleu_scores) / len(gleu_scores)
    return avg_gleu_score

def evaluate_model_gleu(dataset, model_name="t5"):

    corrected_sentences = []  # List to store the model-generated corrections
    reference_sentences = []  # List to store the ground truth corrections

    # Iterate through each example in the dataset
    for example in dataset:
        # Correct the sentence using the specified model (T5 or BART)
        corrected_sentence = correct_grammar(example["sentence"], model_name)
        corrected_sentences.append(corrected_sentence)
        reference_sentences.append(example["corrections"])

    # Calculate the average GLEU score for the batch of sentences
    avg_gleu_score = evaluate_gleu(corrected_sentences, reference_sentences)

    # Print and return the average GLEU score
    print(f"Average GLEU Score ({model_name.upper()}): {avg_gleu_score:.4f}")
    return avg_gleu_score


# Evaluate T5 Model on the Test Set
print("Evaluating T5 Model on Test Set:")
t5_gleu_score = evaluate_model_gleu(dataset["test"], model_name="t5")

# Evaluate BART Model on the Test Set
print("\nEvaluating BART Model on Test Set:")
bart_gleu_score = evaluate_model_gleu(dataset["test"], model_name="bart")


Evaluating T5 Model on Test Set:
Average GLEU Score (T5): 0.8049

Evaluating BART Model on Test Set:
Average GLEU Score (BART): 0.7744


Step 11: Feedback Function

In [None]:
import openai

openai.api_key = "Insert the api key here"

def provide_feedback(sentence):
    # Construct the prompt for the GPT model to analyze the sentence
    prompt = f"""The sentence is: '{sentence}'
    Please analyze its grammatical correctness, focusing on:
    - Incorrect use of articles
    - Subject-verb agreement
    - Tense consistency
    - Preposition misuse
    Provide suggestions for improvement."""
    # Make the API call to OpenAI's GPT-4 model
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a grammar expert."},
            {"role": "user", "content": prompt}
        ]
    )
    # Extract and return the feedback from the response
    return response['choices'][0]['message']['content']



Step 12: Combined Grammar Correction and Feedback

In [None]:
def correct_and_provide_feedback(sentence, model_name="t5"):
  # Correct the sentence using the selected model (T5 or BART)
    corrected_sentence = correct_grammar(sentence, model_name)
    # Provide feedback on the corrected sentence
    feedback = provide_feedback(corrected_sentence)
    # Print the original sentence, the corrected sentence, and the feedback
    print(f"Original Sentence: {sentence}")
    print(f"Corrected Sentence ({model_name.upper()}): {corrected_sentence}")
    print(f"Feedback: {feedback}")
    print("=" * 80)
    # Return the corrected sentence and the feedback
    return corrected_sentence, feedback


Step 12: Example Usage

In [None]:
# Article
sentence = "i have an idea to visit the europe next summer."
# Call the function to correct grammar and provide feedback on the sentence
correct_and_provide_feedback(sentence, model_name="t5")


Original Sentence: i have an idea to visit the europe next summer.
Corrected Sentence (T5): I have an idea to visit Europe next summer.
Feedback: The sentence 'I have an idea to visit Europe next summer.' is grammatically correct. Here are the elements in focus:

- Incorrect use of articles: The sentence is accurately using the article 'an' before 'idea,' which is a singular, non-specific countable noun. Moreover, 'Europe' usually does not take an article, which is correct here.
   
- Subject-verb agreement: The subject 'I' correctly matches with the verb 'have.' It is correct for singular first-person use.

- Tense consistency: The sentence remains in the present tense throughout ('I have an idea'), and there is no change of tense that could cause inconsistency.

- Preposition misuse: There are no prepositions misused in this sentence. 'To' correctly indicates direction or intended result in the context it is used in the sentence.


('I have an idea to visit Europe next summer.',
 "The sentence 'I have an idea to visit Europe next summer.' is grammatically correct. Here are the elements in focus:\n\n- Incorrect use of articles: The sentence is accurately using the article 'an' before 'idea,' which is a singular, non-specific countable noun. Moreover, 'Europe' usually does not take an article, which is correct here.\n   \n- Subject-verb agreement: The subject 'I' correctly matches with the verb 'have.' It is correct for singular first-person use.\n\n- Tense consistency: The sentence remains in the present tense throughout ('I have an idea'), and there is no change of tense that could cause inconsistency.\n\n- Preposition misuse: There are no prepositions misused in this sentence. 'To' correctly indicates direction or intended result in the context it is used in the sentence.")

In [None]:
# Article

sentence = "he is a honest person."
correct_and_provide_feedback(sentence, model_name="t5")

Original Sentence: he is a honest person.
Corrected Sentence (T5): He is an honest person.
Feedback: The sentence 'He is an honest person.' is grammatically correct considering all the factors you've listed.

- Regarding the use of articles: The indefinite article 'an' is correctly used before 'honest', which starts with a vowel sound.
- Regarding subject-verb agreement: The singular subject 'He' correctly matches the third person singular form 'is'. 
- Regarding tense consistency: The sentence is in the present tense all throughout, making it consistent.
- Regarding preposition misuse: There are no prepositions used in this sentence, therefore, there is no misuse. 

This sentence doesn't require any improvements as it is perfectly correct in its current form.


('He is an honest person.',
 "The sentence 'He is an honest person.' is grammatically correct considering all the factors you've listed.\n\n- Regarding the use of articles: The indefinite article 'an' is correctly used before 'honest', which starts with a vowel sound.\n- Regarding subject-verb agreement: The singular subject 'He' correctly matches the third person singular form 'is'. \n- Regarding tense consistency: The sentence is in the present tense all throughout, making it consistent.\n- Regarding preposition misuse: There are no prepositions used in this sentence, therefore, there is no misuse. \n\nThis sentence doesn't require any improvements as it is perfectly correct in its current form.")

In [None]:
## Subject-Verb Agreement

sentence = "The players in the team is practicing."
correct_and_provide_feedback(sentence, model_name="t5")


Original Sentence: The players in the team is practicing.
Corrected Sentence (T5): The players in the team are practicing.
Feedback: The sentence, 'The players in the team are practicing,' is grammatically correct.

- Incorrect use of articles: There's no misuse of articles here. The definite article 'the' correctly refers to a specific group of players and a specific team.

- Subject-verb agreement: The subject 'The players' correctly corresponds to the plural verb 'are practicing'. The subject and the verb are in agreement here.

- Tense consistency: The sentence is consistent in presenting its idea in the present continuous tense.

- Preposition misuse: The preposition 'in' is used correctly to denote that the players belong to the team. 

No improvements are necessary as the sentence is grammatically sound as it is.


('The players in the team are practicing.',
 "The sentence, 'The players in the team are practicing,' is grammatically correct.\n\n- Incorrect use of articles: There's no misuse of articles here. The definite article 'the' correctly refers to a specific group of players and a specific team.\n\n- Subject-verb agreement: The subject 'The players' correctly corresponds to the plural verb 'are practicing'. The subject and the verb are in agreement here.\n\n- Tense consistency: The sentence is consistent in presenting its idea in the present continuous tense.\n\n- Preposition misuse: The preposition 'in' is used correctly to denote that the players belong to the team. \n\nNo improvements are necessary as the sentence is grammatically sound as it is.")

In [None]:
## Subject-Verb Agreement

sentence = "this apples are on the table ."
correct_and_provide_feedback(sentence, model_name="t5")

Original Sentence: this apples are on the table  .
Corrected Sentence (T5): This apple is on the table .
Feedback: The sentence "This apple is on the table." is grammatically correct.

- The use of articles: The sentence correctly uses the definite article "the" before "table" since it is specifically pointing out a unique table. The demonstrative article "this" is correctly used before "apple" to indicate a specific apple.

- Subject-verb agreement: The sentence shows agreement between the subject "apple" and the verb "is." This is the correct form of the verb to be for singular third person.

- Tense consistency: The sentence is consistent in its use of present tense.

- Preposition misuse: The preposition "on" has been correctly used to specify the location of the apple.

No improvements are required for this sentence, as it is grammatically sound.


('This apple is on the table .',
 'The sentence "This apple is on the table." is grammatically correct.\n\n- The use of articles: The sentence correctly uses the definite article "the" before "table" since it is specifically pointing out a unique table. The demonstrative article "this" is correctly used before "apple" to indicate a specific apple.\n\n- Subject-verb agreement: The sentence shows agreement between the subject "apple" and the verb "is." This is the correct form of the verb to be for singular third person.\n\n- Tense consistency: The sentence is consistent in its use of present tense.\n\n- Preposition misuse: The preposition "on" has been correctly used to specify the location of the apple.\n\nNo improvements are required for this sentence, as it is grammatically sound.')

In [None]:
## Preposition

sentence = "He is good on mathematics."
correct_and_provide_feedback(sentence, model_name="t5")

Original Sentence: He is good on mathematics.
Corrected Sentence (T5): He is good at mathematics.
Feedback: The sentence 'He is good at mathematics.' is grammatically correct. Here's why:

- Incorrect use of articles: The sentence does not require an article, so there is no issue here.
- Subject-verb agreement: The singular subject 'He' correctly corresponds with the singular verb 'is'.
- Tense consistency: The entire sentence is in the present tense, so there are no tense consistency errors.
- Preposition misuse: The preposition 'at' is correctly used to indicate proficiency in some area, in this case, mathematics.

No suggestions for improvement as the sentence is grammatically perfect.


('He is good at mathematics.',
 "The sentence 'He is good at mathematics.' is grammatically correct. Here's why:\n\n- Incorrect use of articles: The sentence does not require an article, so there is no issue here.\n- Subject-verb agreement: The singular subject 'He' correctly corresponds with the singular verb 'is'.\n- Tense consistency: The entire sentence is in the present tense, so there are no tense consistency errors.\n- Preposition misuse: The preposition 'at' is correctly used to indicate proficiency in some area, in this case, mathematics.\n\nNo suggestions for improvement as the sentence is grammatically perfect.")

In [None]:
## Preposition

sentence = "he depends in his parents for money."
correct_and_provide_feedback(sentence, model_name="t5")

Original Sentence: he depends in his parents for money.
Corrected Sentence (T5): He depends on his parents for money.
Feedback: The sentence 'He depends on his parents for money.' is perfectly correct in terms of grammar.

- There's no incorrect use of articles; the sentence doesn't require any.
- Subject-verb agreement is correct. 'He' matches with 'depends', both being in the third person singular.
- Tense consistency is maintained. The verb 'depends' is appropriately in present tense.
- There's no misuse of prepositions. The preposition 'on' is used correctly to show dependence relation between the subject and its parents.

Therefore, no improvements are needed for this sentence.


('He depends on his parents for money.',
 "The sentence 'He depends on his parents for money.' is perfectly correct in terms of grammar.\n\n- There's no incorrect use of articles; the sentence doesn't require any.\n- Subject-verb agreement is correct. 'He' matches with 'depends', both being in the third person singular.\n- Tense consistency is maintained. The verb 'depends' is appropriately in present tense.\n- There's no misuse of prepositions. The preposition 'on' is used correctly to show dependence relation between the subject and its parents.\n\nTherefore, no improvements are needed for this sentence.")

In [None]:
# Tense Consistency

sentence = "she was cooking and eat."
correct_and_provide_feedback(sentence, model_name="t5")

Original Sentence: she was cooking and eat.
Corrected Sentence (T5): She was cooking and eating.
Feedback: The sentence: "She was cooking and eating," does not contain any grammatical errors based on those areas mentioned.

- There's no incorrect use of articles. The sentence doesn't require any articles "a," "an," or "the."

- There's correct subject-verb agreement. The past continuous tense verbs "was cooking" and "was eating" align properly with the singular subject "she."

- Tense consistency is maintained. Both actions ("cooking" and "eating") are in the past continuous tense.

- There's no misuse of prepositions. The sentence doesn't include any prepositions.

No suggestions for improvement are needed as the sentence is grammatically correct.


('She was cooking and eating.',
 'The sentence: "She was cooking and eating," does not contain any grammatical errors based on those areas mentioned.\n\n- There\'s no incorrect use of articles. The sentence doesn\'t require any articles "a," "an," or "the."\n\n- There\'s correct subject-verb agreement. The past continuous tense verbs "was cooking" and "was eating" align properly with the singular subject "she."\n\n- Tense consistency is maintained. Both actions ("cooking" and "eating") are in the past continuous tense.\n\n- There\'s no misuse of prepositions. The sentence doesn\'t include any prepositions.\n\nNo suggestions for improvement are needed as the sentence is grammatically correct.')

In [None]:
# Tense Consistency

sentence = "he was walking to the store when he sees a dog ."
correct_and_provide_feedback(sentence, model_name="t5")

Original Sentence: he was walking to the store when he sees a dog .
Corrected Sentence (T5): He was walking to the store when he saw a dog .
Feedback: The sentence 'He was walking to the store when he saw a dog.' is grammatically correct. 

- The use of articles is correct. 'The' is correctly used before 'store', referring to a specific one, and 'a' before 'dog', referring to any dog he saw.
- In terms of subject-verb agreement, 'He was walking' and 'he saw' both correctly match the singular subject 'he'.
- There is tense consistency. The sentence combines the past continuous 'was walking' with the simple past 'saw', which is correct as one action ('saw a dog') interrupts the other action that was in progress ('was walking to the store').
- There is no misuse of prepositions. 'To' is correctly used to describe movement towards a place (the store).

No suggestions for improvement are necessary because the sentence is grammatically accurate.


('He was walking to the store when he saw a dog .',
 "The sentence 'He was walking to the store when he saw a dog.' is grammatically correct. \n\n- The use of articles is correct. 'The' is correctly used before 'store', referring to a specific one, and 'a' before 'dog', referring to any dog he saw.\n- In terms of subject-verb agreement, 'He was walking' and 'he saw' both correctly match the singular subject 'he'.\n- There is tense consistency. The sentence combines the past continuous 'was walking' with the simple past 'saw', which is correct as one action ('saw a dog') interrupts the other action that was in progress ('was walking to the store').\n- There is no misuse of prepositions. 'To' is correctly used to describe movement towards a place (the store).\n\nNo suggestions for improvement are necessary because the sentence is grammatically accurate.")

Step 15: Interactive Prompt

In [None]:
while True:
    # Take input from the user
    a = input("Enter an incorrect sentence (or type 'exit' to quit): ")

    # Exit condition
    if a.lower() == "exit":
        print("Exiting the interactive prompt.")
        break

    # Process the input and provide correction and feedback
    corrected_sentence, feedback = correct_and_provide_feedback(a, model_name='t5')


Enter an incorrect sentence (or type 'exit' to quit): she was cooking and eat.
Original Sentence: she was cooking and eat.
Corrected Sentence (T5): She was cooking and eating.
Feedback: The sentence 'She was cooking and eating.' is grammatically correct. Here's the analysis based on your description:

- Incorrect use of articles: The sentence does not require the use of any articles (a, an, the). Therefore, there are no errors in this category.
- Subject-verb agreement: The subject 'She' properly agrees with the verb 'was cooking' and 'eating'. Both actions refer to the same subject which is singular.
- Tense consistency: The sentence is consistent in the past continuous tense; 'was cooking' and 'eating' both suggest ongoing actions in the past.
- Preposition misuse: There are no prepositions in this sentence, hence there's no misuse.

There are no suggestions for improvement because the sentence is grammatically accurate. It correctly presents two activities that were occurring at the

References


1. T5 Model Paper
   Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020).  
   "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer."
   Journal of Machine Learning Research (JMLR).  
   [https://arxiv.org/abs/1910.10683](https://arxiv.org/abs/1910.10683)

2. Hugging Face Transformers Library:  
   Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., & Brew, J. (2020).  
   "Transformers: State-of-the-Art Natural Language Processing."
   EMNLP 2020: System Demonstrations.  
   [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)

3. OpenAI GPT Documentation:  
   OpenAI. (2023).  
   "API Reference: Chat Completions Endpoint."
   [https://platform.openai.com/docs/](https://platform.openai.com/docs/)

4. Evaluation Metric - GLEU Score:  
   Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016).  
   "Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation."
   [https://arxiv.org/abs/1609.08144](https://arxiv.org/abs/1609.08144)

5. Grammar Correction with NLP:  
   Napoles, C., Sakaguchi, K., & Tetreault, J. (2017).  
   "JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction."
   Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.  
   [https://arxiv.org/abs/1702.04066](https://arxiv.org/abs/1702.04066)

6. Beam Search Decoding:  
   Och, F. J., & Ney, H. (2004).  
   "The Alignment Template Approach to Statistical Machine Translation."
   Computational Linguistics.  
   [https://dl.acm.org/doi/10.1162/089120104323093344](https://dl.acm.org/doi/10.1162/089120104323093344)

7. Fine-tuning NLP Models:  
   Howard, J., & Ruder, S. (2018).  
   "Universal Language Model Fine-tuning for Text Classification."
   ACL 2018.  
   [https://arxiv.org/abs/1801.06146](https://arxiv.org/abs/1801.06146)

8. Fluency and Accuracy Metrics for NLP:  
   Pavlick, E., & Tetreault, J. (2016).  
   "Analyzing Grammatical Errors in Learner Writing."  
   ACL Workshop on Innovative Use of NLP for Building Educational Applications.  
   [https://aclanthology.org/W16-0506/](https://aclanthology.org/W16-0506/)

9. Top-K Sampling and Nucleus Sampling:  
   Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020).  
   "The Curious Case of Neural Text Degeneration."  
   ICLR 2020.  
   [https://arxiv.org/abs/1904.09751](https://arxiv.org/abs/1904.09751)

10. Hugging Face Datasets Documentation:  
    Hugging Face. (2023).  
    "Datasets: A Community Library for NLP Datasets."  
    [https://huggingface.co/docs/datasets/](https://huggingface.co/docs/datasets/)

11. Python's Natural Language Toolkit (NLTK):  
    Bird, S., Klein, E., & Loper, E. (2009).  
    "Natural Language Processing with Python."
    O'Reilly Media.  
    [https://www.nltk.org/](https://www.nltk.org/)
12. Center for Language and Speech Processing @ JHU. (n.d.). JHU-CLSP/jfleg · datasets at hugging face. jhu-clsp/jfleg · Datasets at Hugging Face. https://huggingface.co/datasets/jhu-clsp/jfleg
