# **Grammar & Spell Fixer**

PES2UG23CS450

The goal of this notebook is to detect grammatical errors in a sentence and suggest the most appropriate correction using a Masked Language Model (MLM).

Instead of rewriting the entire sentence, we mask the incorrect word and allow the model to predict the most grammatically correct alternative based on context.

# **Technologies and Model**

Transformers (Hugging Face)

BERT (bert-base-uncased)

Task: fill-mask

In [80]:
from transformers import pipeline

In [81]:
grammar_fixer = pipeline(
    "fill-mask",
    model="bert-base-uncased"
)

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu


In [82]:
sentence = "I am [MASK] to school"

In [83]:
predictions = grammar_fixer(sentence)

In [84]:
for pred in predictions:
    print(pred["token_str"], "->", round(pred["score"], 4))

going -> 0.7805
off -> 0.1071
headed -> 0.0239
heading -> 0.0157
late -> 0.0115


In [85]:
best_word = predictions[0]["token_str"]
corrected_sentence = sentence.replace("[MASK]", best_word)
print(corrected_sentence)

I am going to school


In [86]:
def grammar_fix(sentence_with_mask):
    predictions = grammar_fixer(sentence_with_mask)
    best_word = predictions[0]["token_str"]
    return sentence_with_mask.replace("[MASK]", best_word)

Example 1:

In [89]:
grammar_fix("She is [MASK] a movie")

'She is in a movie'

Example 2:

In [90]:
grammar_fix("He is [MASK] in the room")

'He is not in the room'

Example 3:

In [91]:
grammar_fix("They will [MASK] tomorrow")

'They will return tomorrow'

Example 4:

In [95]:
grammar_fix("I want to [MASK] today")

'I want to sleep today'

Example 5:

In [101]:
grammar_fix("The dog is [MASK] with the boy")

'The dog is reunited with the boy'