# Spell Checker Evaluation

In this notebook, I will evaluate various spell-checking models on a dataset of spelling errors. The models I will use include PySpellChecker, Autocorrect, TextBlob, and Spello. I will analyze their performance based on accuracy, precision, recall, F1-score, and average edit distance.

# Dataset

# Dataset format

Peter Norvig's Spell Errors Corpus: This dataset includes a list of spelling errors compiled from sources like Wikipedia and academic studies. It is structured as "right word: wrong1, wrong2" pairs, which is useful for training and evaluating spell checkers. You can download it from Norvig's repository.

# More datasets

Some more datasets I found include:
- birkbeck
- aspell
- holbrook
- wikipedia
You can find these datasets at: https://www.dcs.bbk.ac.uk/~roger/corpora.html. 

# Importing necessaary libraries

Let's import all the necessary libraries to run our code and test the models.

In [None]:
import random
import editdistance
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
from spellchecker import SpellChecker
from autocorrect import Speller
from textblob import TextBlob
from collections import defaultdict
from spello.model import SpellCorrectionModel

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings("ignore")

Make sure to install necessary packages for these spellcheckers with pip.

pip install pyspellchecker

pip install textblob

pip install spellchecker

pip install autocorrect

Additionaly, you'll need to download pre-trained model for spello, here named "en.pkl". In **spellChecker** directory, you'll need to create directory named **models**, and there subdirectory **spello**. Copy donwnloaded and unzipped file en.pkl into **spello**. We will later train our model and save updated version on our data to improve accuracy.

# Data reading and dictionary creating

Firstly, we preprocess the data given in teh form of: "right word: wrong1, wrong2" pairs. We create dictionary *spell_errors*, where we for each key, correct spelling, provide items that are incorrect spellings for that word.
Then, we create list test_data, where we split each key and its items into pairs in the form of: (incorrect_spelling, correct_spelling).
At the end, I shuffle test_data as to keep randomness of the words we will test our models on.

In [None]:
def preprocess(file):
    lines = file.readlines()
    spell_errors = {}

    for line in lines:
        # Split each line by ":"
        correct_word, incorrect_words = line.split(":")
        incorrect_words_list = incorrect_words.strip().split(",")
        spell_errors[correct_word.strip()] = [word.strip() for word in incorrect_words_list]

    test_data = []
    for correct_word, incorrect_words in spell_errors.items():
        for incorrect_word in incorrect_words:
            test_data.append((incorrect_word, correct_word))
    random.shuffle(test_data)

    return test_data

# Spellcheckers

1.Pyspell

PySpell provides functionality for spell checking and sentence correction. It uses a dictionary of words and the Levenshtein distance algorithm to suggest corrections.

2.Autocorrect

Autocorrect is Python library used for precisely spellchecking words.

3.Textblob

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

4.Spello

Spello is a spellcorrection model built with combination of two models, Phoneme and Symspell Phoneme Model uses Soundex algo in background and suggests correct spellings using phonetic concepts to identify similar sounding words. On the other hand, Symspell Model uses concept of edit-distance in order to suggest correct spellings. Spello get's you best of both, taking into consideration context of the word as well.

## Function Definitions

We will define several functions to handle the spell correction process, evaluate the models, and preprocess the dataset.

*spello_correction* will be our function for returning correct spelling using Spello model.

In [3]:
def spello_correction(word):
    """Corrects the spelling of a word using the Spello model."""
    correction = spello_checker.spell_correct(word)
    corrected_word = correction.get("spell_corrected_text", word)
    return corrected_word

*safe_correction* is function for calling spellchecking tools based on the model we want to evaluate. I made sure that in case there's no return value(return value type is *None*) I return unchanged original word.

In [4]:
def safe_correction(spell_checker, word):
    """Applies the specified spell checker to a given word."""
    if spell_checker == pyspell_checker:
        corrected_word = pyspell_checker.correction(word)
    elif spell_checker == autocorrect_checker:
        corrected_word = autocorrect_checker(word)
    elif spell_checker == 'textblob':
        corrected_word = str(TextBlob(word).correct())
    elif spell_checker == 'spello':
        corrected_word = spello_correction(word)
    else:
        corrected_word = word
        
    return corrected_word if corrected_word is not None else word

# Metrics and parameters

For measuring model accuracy, I adopted different parameters. For reference I used the paper: https://gerhard.pro/files/PublicationVanHuyssteenEiselenPuttkammer2004.pdf.

# Spell Checker Evaluation Metrics

To evaluate the performance of a spell checker, we use several metrics organized into three groups:

---

## Group 1: Classification Metrics

1. **Recall**  
   Measures the proportion of misspelled words that were correctly identified as misspelled by the spell checker.

   \[
   \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}}
   \]

2. **Precision**  
   Measures the proportion of words marked as misspelled by the spell checker that were actually incorrect.

   \[
   \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}
   \]

3. **Identifying Accuracy**  
   Overall accuracy of the spell checker in identifying both correct and incorrect words.

   \[
   \text{Accuracy} = \frac{\text{True Positives (TP)} + \text{True Negatives (TN)}}{\text{Total Samples}}
   \]

4. **Average Edit Distance**  
   The average number of changes needed to convert the spell checker’s output to the correct word.

   \[
   \text{Average Edit Distance} = \frac{\sum (\text{Edit Distance for each word})}{\text{Total Predictions}}
   \]

---

## Group 2: Correction Metrics

1. **Percent of Words Invalid After Checker Work**  
   Measures the percentage of words that remain invalid after being processed by the spell checker.

   \[
   \text{Percent Invalid After Check} = \frac{\text{Invalid Words After Check}}{\text{Total Predictions}} \times 100
   \]

2. **Percent of Correctly Fixed Misspellings**  
   The percentage of originally misspelled words that were corrected by the spell checker.

   \[
   \text{Percent Correct Fixes} = \frac{\text{Correct Fixes}}{\text{Total Misspelled Words}} \times 100
   \]

3. **Percent of Non-fixed Misspellings with Right Correction in Top-5**  
   Measures the percentage of misspelled words for which the correct spelling was among the top 5 suggestions, even if it wasn’t the final correction.

   \[
   \text{Percent Top-5 Fixes} = \frac{\text{Top-5 Correct Fixes}}{\text{Total Misspelled Words}} \times 100
   \]

4. **Percent of Broken Valid Words**  
   Measures the percentage of words that were originally correct but were incorrectly changed by the spell checker.

   \[
   \text{Percent Broken Valid Words} = \frac{\text{Broken Valid Words}}{\text{Total Valid Words}} \times 100
   \]

---

## Group 3: Speed Metric

1. **Checker Work Speed**  
   Measures the processing speed of the spell checker in terms of words per second.

   \[
   \text{Speed} = \frac{\text{Total Predictions}}{\text{Total Time (seconds)}}
   \]

---

## Definitions of Key Terms:

- **True Positives (TP)**: Misspelled words that the spell checker correctly identifies as incorrect.
- **False Positives (FP)**: Correct words that the spell checker incorrectly identifies as misspelled.
- **True Negatives (TN)**: Correct words that the spell checker correctly identifies as correct.
- **False Negatives (FN)**: Misspelled words that the spell checker incorrectly identifies as correct.
- **Edit Distance**: The minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.

---

These metrics provide a comprehensive view of a spell checker’s effectiveness, covering its ability to detect errors, correct them, and maintain processing efficiency.


In [5]:
from sklearn.metrics import precision_score, recall_score, f1_score
import editdistance


def evaluate_spell_checker(spellchecker, test_data):
    """Evaluates the performance of a spell checker on the test data."""
    correct_count = 0
    total_predictions = len(test_data)
    y_true = []
    y_pred = []

    for incorrect_word, correct_word in test_data:
        predicted_word = safe_correction(spellchecker, incorrect_word)
        y_true.append(correct_word)
        y_pred.append(predicted_word)

        if predicted_word == correct_word:
            correct_count += 1

    accuracy = correct_count / total_predictions
    print(f"Accuracy: {accuracy:.2f}")

    precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
    recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
    f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)

    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1-Score: {f1:.2f}")

    total_edit_distance = 0
    for true_word, pred_word in zip(y_true, y_pred):
        distance = editdistance.eval(true_word, pred_word)
        total_edit_distance += distance

    avg_edit_distance = total_edit_distance / total_predictions
    print(f"Average Edit Distance: {avg_edit_distance:.2f}")


## Load and Prepare the Dataset

Now we will load the dataset containing spelling errors and prepare it for evaluation.

In [6]:
# Load the dataset
with open("spell-errors.txt", "r") as file:
    test_data = preprocess(file)

AttributeError: 'builtin_function_or_method' object has no attribute 'shuffle'

## Initialize Spell Checkers

Next, we will initialize each of the spell-checking models we plan to evaluate.
**IMPORTANT**: Make sure to provide correct paths to the model *"en.pkl"* for your computer, and correct paths for saving new trained spello model!

In [None]:
from collections import defaultdict
from spello.model import SpellCorrectionModel
from autocorrect import Speller
from spellchecker import SpellChecker

# Initialize the spell checkers
# 1. PySpellChecker
pyspell_checker = SpellChecker()

# 2. Autocorrect
autocorrect_checker = Speller()  # Autocorrect

# 3. TextBlob
print("Textblob checker:")

# 4. Spello
spello_checker = SpellCorrectionModel(language="en")  # 'en' for English
spello_checker.load('C:\\Users\\stoja\\OneDrive\\Desktop\\my-project-1\\spellChecker\\models\\spello\\model.pkl')
spello_checker.config.min_length_for_spellcorrection = 3  # minimum length for correction
spello_checker.config.max_length_for_spellcorrection = 15  # maximum length for correction

# Prepare training data for Spello
spello_training_data = defaultdict(int)
for incorrect_word, correct_word in test_data:
    spello_training_data[correct_word] += 1

# Train the Spello model
spello_checker.train(spello_training_data)
spello_checker.save('C:\\Users\\stoja\\OneDrive\\Desktop\\my-project-1\\spellChecker\\models\\spello')
spello_checker.load('C:\\Users\\stoja\\OneDrive\\Desktop\\my-project-1\\spellChecker\\models\\spello\\model.pkl')

## Evaluate the Spell Checkers

Finally, we will evaluate each spell checker on a subset of the test data. 

In [None]:
# Evaluate Pyspell model
print("Pyspell checker:")
evaluate_spell_checker(pyspell_checker, test_data[:1000])
# Evaluate Autocorrect model
print("Autocorrect checker:")
evaluate_spell_checker(autocorrect_checker, test_data[:1000])
# Evaluate Textblob model
evaluate_spell_checker('textblob', test_data[:1000])
# Evaluate Spello model
print("Spello checker:")
evaluate_spell_checker('spello', test_data[:1000])

## Results

For each of the models, I got following results:

Pyspell checker:
Precision: 1.00
Recall: 0.66
F1-Score: 0.80
Identifying Accuracy: 0.66
Percent of invalid words after checker: 66.00%
Percent of correctly fixed misspellings: 34.00%
Percent of non-fixed misspellings with correct suggestion in top-5: 0.00%
Percent of broken valid words: 0.00%
Speed (words/sec): 1.48
--- 
Autocorrect checker:
Precision: 1.00
Recall: 0.67
F1-Score: 0.80
Identifying Accuracy: 0.67
Percent of invalid words after checker: 67.00%
Percent of correctly fixed misspellings: 33.00%
Percent of non-fixed misspellings with correct suggestion in top-5: 0.00%
Percent of broken valid words: 0.00%
Speed (words/sec): 9.49
---
Textblob checker:
Precision: 1.00
Recall: 0.69
F1-Score: 0.82
Identifying Accuracy: 0.69
Percent of invalid words after checker: 69.00%
Percent of correctly fixed misspellings: 31.00%
Percent of non-fixed misspellings with correct suggestion in top-5: 0.00%
Percent of broken valid words: 0.00%
Speed (words/sec): 2.62
---
Spello training started..
Symspell training started ...
Phoneme training started ...
Spello training completed successfully ...
Precision: 1.00
Recall: 0.53
F1-Score: 0.69
Identifying Accuracy: 0.53
Percent of invalid words after checker: 53.00%
Percent of correctly fixed misspellings: 47.00%
Percent of non-fixed misspellings with correct suggestion in top-5: 0.00%
Percent of broken valid words: 0.00%
Speed (words/sec): 529.83

Based on the results, we can conclude that:

# Spell Checker Evaluation Summary

This notebook evaluates the performance of four different spell checkers: **Pyspell**, **Autocorrect**, **TextBlob**, and **Spello**. Based on various metrics, we analyze each spell checker's strengths and weaknesses, organized by categories.

---

## 1. Classification Metrics

| Spell Checker | Precision | Recall | F1-Score | Identifying Accuracy |
|---------------|-----------|--------|----------|-----------------------|
| Pyspell       | 1.00      | 0.66   | 0.80     | 0.66                  |
| Autocorrect   | 1.00      | 0.67   | 0.80     | 0.67                  |
| TextBlob      | 1.00      | 0.69   | 0.82     | 0.69                  |
| Spello        | 1.00      | 0.53   | 0.69     | 0.53                  |

**Observations**:
- **Precision**: All spell checkers achieved perfect precision, meaning they accurately marked misspellings without mistakenly flagging correct words.
- **Recall**: TextBlob has the highest recall, indicating it captures more misspellings than the others. Spello’s lower recall suggests it misses more errors.
- **F1-Score**: TextBlob has the best F1-score (0.82), balancing precision and recall well. Spello’s F1-score is the lowest due to its lower recall.
- **Identifying Accuracy**: TextBlob’s accuracy is also the highest at 0.69, followed closely by Autocorrect and Pyspell, with Spello being the least accurate.

---

## 2. Correction Metrics

| Spell Checker | % Invalid Words After Check | % Correct Fixes | % Top-5 Fixes | % Broken Valid Words |
|---------------|-----------------------------|-----------------|---------------|-----------------------|
| Pyspell       | 66.00%                      | 34.00%          | 0.00%         | 0.00%                 |
| Autocorrect   | 67.00%                      | 33.00%          | 0.00%         | 0.00%                 |
| TextBlob      | 69.00%                      | 31.00%          | 0.00%         | 0.00%                 |
| Spello        | 53.00%                      | 47.00%          | 0.00%         | 0.00%                 |

**Observations**:
- **Percent of Invalid Words After Check**: Spello leaves only 53% of words invalid, suggesting it corrects more misspellings than the other tools.
- **Percent of Correctly Fixed Misspellings**: Spello correctly fixes 47% of misspelled words, higher than the other checkers.
- **Percent of Non-Fixed Misspellings with Correct Suggestion in Top-5**: All checkers scored 0%, indicating that none provided the correct suggestion in their top-5 list for unfixed errors.
- **Percent of Broken Valid Words**: All checkers scored 0%, showing they did not mistakenly alter any correctly spelled words, making them reliable for preserving valid words.

---

## 3. Speed Metric

| Spell Checker | Speed (Words/Second) |
|---------------|----------------------|
| Pyspell       | 1.48                 |
| Autocorrect   | 9.49                 |
| TextBlob      | 2.62                 |
| Spello        | 529.83               |

**Observations**:
- **Speed**: Spello is by far the fastest, processing 529.83 words per second, making it ideal for high-volume text processing. Autocorrect is the next fastest, though significantly slower than Spello.

---

## Conclusions

### TextBlob
- **Best for Detection**: With the highest recall, F1-score, and identifying accuracy, TextBlob is most effective at identifying misspelled words. 
- **Trade-Off**: While it detects well, it doesn’t perform as well in fixing errors, with a higher percentage of invalid words left after checking.

### Spello
- **Best for Correction and Speed**: Spello excels in correcting misspelled words and has the lowest percentage of invalid words remaining. It’s also incredibly fast, handling over 500 words per second.
- **Trade-Off**: Spello has lower recall and identifying accuracy, so it may miss some misspellings. However, for tasks prioritizing correction over detection, Spello is a strong choice.

### Autocorrect and Pyspell
- **Solid Performance**: These tools perform similarly, with high precision and decent identifying accuracy. However, they don’t correct as many misspellings as Spello and are slower than both Spello and TextBlob.
- **Trade-Off**: Autocorrect and Pyspell are reliable but don’t outperform TextBlob in detection or Spello in correction.

---

## Recommendations

- Use **TextBlob** if detection accuracy (identifying misspellings) is the primary goal.
- Use **Spello** if you need a fast spell checker that corrects a high percentage of misspellings.
- **Autocorrect** and **Pyspell** are good alternatives but may not outperform the other options in detection or correction efficiency.


# Additional model

I tried implementing Transformers-based T5 model but didn't finish it. Here I provide also code I wrte for that model, in hope that I will succesfully implement it in the future.

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import Trainer, TrainingArguments
from datasets import Dataset
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import T5Config
from transformers import DataCollatorForSeq2Seq
import torch
class T5ModelSpellcheck:
    def __init__(self, model_name="t5-base"):
        self.tokenizer = T5Tokenizer.from_pretrained(model_name)
        self.model = T5ForConditionalGeneration.from_pretrained(model_name)

    def tokenize_function(self, examples):
        model_inputs = self.tokenizer(
            examples["input_text"],
            padding="max_length",
            truncation=True,
            max_length=512,
            return_tensors="pt"
        )

        # Tokenize the target text
        labels = self.tokenizer(
            examples["target_text"],
            padding="max_length",
            truncation=True,
            max_length=512,
            return_tensors="pt"
        )["input_ids"]

        model_inputs["labels"] = labels
        return model_inputs

    def train(self, train_data):
        # Format training data
        formatted_data = [
            {
                "input_text": f"correct: {incorrect}",
                "target_text": correct
            }
            for incorrect, correct in train_data if incorrect and correct
        ]

        # Convert to Dataset
        dataset = Dataset.from_dict({
            "input_text": [item["input_text"] for item in formatted_data],
            "target_text": [item["target_text"] for item in formatted_data]
        })

        # Apply tokenization to the dataset
        tokenized_dataset = dataset.map(self.tokenize_function, batched=True)

        # Split dataset into training and testing sets
        train_test_split = tokenized_dataset.train_test_split(test_size=0.2)

        training_args = TrainingArguments(
            output_dir="./t5_pretrained",
            eval_strategy="epoch",  # Update this line
            learning_rate=2e-5,
            per_device_train_batch_size=16,
            num_train_epochs=3,
            remove_unused_columns=False,
            logging_dir='./logs',  # Optional: specify where to store logs
            logging_steps=10,  # Optional: log every 10 steps
            #load_best_model_at_end=True  # Optional: load best model after training
        )

        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_test_split["train"],
            eval_dataset=train_test_split["test"],
        )

        trainer.train()
        self.model.save_pretrained("./t5_pretrained")
        self.tokenizer.save_pretrained("./t5_pretrained")

    def t5_load_model(self, model_dir="./t5_pretrained"):
        self.model = T5ForConditionalGeneration.from_pretrained(model_dir)
        self.tokenizer = T5Tokenizer.from_pretrained(model_dir)

    def t5_correct(self, word):
        input_text = f"correct: {word}"
        inputs = self.tokenizer(input_text, return_tensors="pt")

        with torch.no_grad():
            outputs = self.model.generate(**inputs)

        corrected_word = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return corrected_word
