# Spellcheckers

We evaluated five different Python spellchecking libraries, each with its own approach and characteristics:

### TextBlob
A simple NLP library that provides a straightforward API for common natural language processing tasks. Its spellchecking functionality is based on Peter Norvig's algorithm and uses NLTK internally. TextBlob considers both a word list approach and basic pattern matching, but primarily focuses on being an easy-to-use, general-purpose NLP tool rather than a specialized spellchecker.


### PySpellChecker
An implementation of Peter Norvig's spelling correction algorithm that uses word frequency lists to suggest corrections. It calculates edit distance between a misspelled word and potential corrections, suggesting the most probable correct spelling based on word frequency in its dictionary.

### SymSpell
A symbolic spelling correction algorithm that offers extremely fast spell checking through optimized edit distance calculations. It generates all possible terms within an edit distance threshold using a precomputed deletion dictionary, making it significantly faster than traditional approaches.

### Spello
A machine learning-based spell checker that can be trained on domain-specific data. It combines traditional spell checking techniques with modern ML approaches, allowing it to learn from context and adapt to specific use cases.

### Autocorrect
A simple, lightweight spell checker that focuses on common misspellings and typos. It uses a combination of word lists and simple rules to make corrections, primarily targeting common typing errors rather than complex spelling mistakes.

Each spellchecker represents a different approach to the spell-checking problem, from simple word list comparisons to sophisticated machine learning methods, offering various trade-offs between speed, accuracy, and flexibility.

Most models can be imported directly into python, however, to use spello, u need a pretrained model (or train own yourself xD) and place it in [pythonModels](/data/pythonModels).
I used this [model](https://haptik-website-images.haptik.ai/spello_models/en.pkl.zip), unzip it and place it in [/data/pythonModels](/data/pythonModels).

# Dataset

### Dataset information
The dataset we used is the dataset of [Peter Novig](https://www.kaggle.com/datasets/bittlingmayer/spelling), which contains five different sets of english correctly spelled words as well as their incorrect spelling, among others the wikipedia and birkspell sets.

Also, there are interesting ways to generate misspelled words from the correct ones described in this [article](https://www.ijcaonline.org/archives/volume176/number27/yunus-2020-ijca-920288.pdf), where it is suggested to swap letters, add new letters and use keyboard characters relative positions. 
We provided an approach in the [data_preparation.py](/src/data_preparation.py) file in the generate_synthetic_errors() method. 
However, in this evaluation we have focused on the current datasets as they offer already sufficient variability.

Of course, you can use any of the mentioned files, for our evaluation, we used the wikipedia.txt set.

### Dataset preprocessing

Because a different number of potential misspelled forms are presented for each correct word in the dataset, we first need to preprocess the data to a csv file to create
correct-error pairs for each word. This is done by calling the data_preparation.py script.

As mentioned, you can choose the different datasets processed (also multiple at a time). We are currently using wikipedia.txt.
The processed csv-file is stored in the [spelling_errors.csv](/data/processed/spelling_errors.csv).


In [2]:
from src.data_preparation import main
"""
Select ur data files here. Options:

"data/raw/aspell.txt", "data/raw/birkbeck.txt", "data/raw/spell-testset1.txt", "data/raw/spell-testset2.txt", "data/raw/wikipedia.txt"

"""
src_path = ["data/raw/wikipedia.txt"]
main(src_path)

Processed data/raw/wikipedia.txt: 2455 errors
Total spelling errors collected: 2455


# Metrics

For our evaluation, we classify the classic four different cases:
* **True positives (tp)**: invalid words, recognized by spelling checker as misspelled and corrected properly.
* **False positives (fp)**: valid words, recognized by checker as misspelled and changed unnecessarily.
* **True negatives (tn)**: valid words, recognized by checker as correctly spelled and left unchanged.
* **False negatives (fn)**: invalid words, recognized by checker as correctly spelled or corrected incorrectly.

We used the classic data mining metrics for classification, which are the following:

### 1. Recall
Recall describes the proportion of misspelled words that were correctly identified and fixed by the spellchecker compared to all misspelled words in the text. A high recall means the spellchecker catches most spelling errors.

$\text{recall} = \frac{tp}{tp + fn}$

Optimal value: 1.0 (100% of misspellings detected and corrected)

### 2. Precision
Precision measures how many of the spellchecker's corrections were actually necessary and correct. It tells us how trustworthy the spellchecker's suggestions are and whether it tends to make unnecessary corrections.

$\text{precision} = \frac{tp}{tp + fp}$

Optimal value: 1.0 (all corrections made were necessary and correct)

### 3. Accuracy
Accuracy represents the overall correctness of the spellchecker's decisions, including both its ability to correct misspelled words and preserve correct ones. It gives us a general measure of how reliable the spellchecker is across all cases.

$\text{accuracy} = \frac{tp + tn}{tp + tn + fp + fn}$

Optimal value: 1.0 (perfect decisions for both corrections and preservation)

### 4. F1 Score
The F1 score provides a balanced measure between precision and recall. It's particularly useful when we need a single metric to compare spellcheckers, as it penalizes extreme imbalances between precision and recall.

$\text{F1} = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}$

Optimal value: 1.0 (perfect balance between precision and recall)

Furthermore, we also evaluate the processing speed of each spellchecker, as real-world applications often require a balance between accuracy and performance.


# Evaluation

Our `SpellCheckerEvaluator` implements a small framework for testing different spellchecking libraries. The evaluation process works as follows:

1. **Initialization**: The evaluator loads a CSV dataset containing pairs of correct words and their misspelled versions. It initializes our five different spellcheckers

2. **Evaluation Process**: For each spellchecker, the evaluator:
   - Tests error correction by checking if misspelled words are corrected to their proper form
   - Samples a subset (default 10%) of correct words to verify they aren't incorrectly modified
   - Records our four key outcomes in a confusion matrix

3. **Metrics Calculation**: The evaluator computes the standard classification metrics we defined earlier:
   - Precision: Accuracy of corrections made
   - Recall: Proportion of errors caught
   - Accuracy: Overall correctness
   - F1 Score: Balanced metric between precision and recall
   - Processing Time: Speed of corrections

4. **Results**: The evaluation results are:
   - Displayed in a formatted table
   - Saved to a CSV file for further analysis
   - Include relative speed comparisons normalized to the fastest checker

The evaluator is designed to be extensible and can be easily modified to include additional spellcheckers or metrics.




In [3]:
from src.spell_checker_setup import SpellCheckerEvaluator

evaluator = SpellCheckerEvaluator('data/processed/spelling_errors.csv')
evaluator.evaluate()
evaluator.print_results()



from spello.model import SpellCorrectionModel 
sp = SpellCorrectionModel(language='en')  
sp.load('/home/ubuntu/model.pkl')
sp.config.min_length_for_spellcorrection = 4 # default is 3
sp.config.max_length_for_spellcorrection = 12 # default is 15
sp.save(model_save_dir='/home/ubuntu/')





Evaluating textblob...


Processing textblob: 100%|██████████| 2455/2455 [02:52<00:00, 14.24it/s]



Detailed results for textblob:
True Corrections (correctly fixed errors): 1514
False Corrections (incorrectly fixed errors): 941
False Alarms (incorrectly changed valid words): 36
True Negatives (correctly preserved valid words): 227
Total actual errors in dataset: 2455
Total corrections attempted: 1550
Total valid words tested: 263

Evaluating pyspellchecker...


Processing pyspellchecker: 100%|██████████| 2455/2455 [02:31<00:00, 16.22it/s]



Detailed results for pyspellchecker:
True Corrections (correctly fixed errors): 1804
False Corrections (incorrectly fixed errors): 651
False Alarms (incorrectly changed valid words): 11
True Negatives (correctly preserved valid words): 231
Total actual errors in dataset: 2455
Total corrections attempted: 1815
Total valid words tested: 242

Evaluating symspell...


Processing symspell: 100%|██████████| 2455/2455 [00:00<00:00, 10487.50it/s]



Detailed results for symspell:
True Corrections (correctly fixed errors): 1803
False Corrections (incorrectly fixed errors): 652
False Alarms (incorrectly changed valid words): 16
True Negatives (correctly preserved valid words): 234
Total actual errors in dataset: 2455
Total corrections attempted: 1819
Total valid words tested: 250

Evaluating spello...


Processing spello: 100%|██████████| 2455/2455 [00:03<00:00, 662.98it/s] 



Detailed results for spello:
True Corrections (correctly fixed errors): 1724
False Corrections (incorrectly fixed errors): 731
False Alarms (incorrectly changed valid words): 23
True Negatives (correctly preserved valid words): 226
Total actual errors in dataset: 2455
Total corrections attempted: 1747
Total valid words tested: 249

Evaluating autocorrect...


Processing autocorrect: 100%|██████████| 2455/2455 [00:40<00:00, 60.00it/s] 


Detailed results for autocorrect:
True Corrections (correctly fixed errors): 1723
False Corrections (incorrectly fixed errors): 732
False Alarms (incorrectly changed valid words): 18
True Negatives (correctly preserved valid words): 238
Total actual errors in dataset: 2455
Total corrections attempted: 1741
Total valid words tested: 256

Final Results:
--------------------------------------------------
                    textblob  pyspellchecker   symspell     spello  autocorrect
True Corrections   1514.0000       1804.0000  1803.0000  1724.0000    1723.0000
False Corrections   941.0000        651.0000   652.0000   731.0000     732.0000
False Alarms         36.0000         11.0000    16.0000    23.0000      18.0000
True Negatives      227.0000        231.0000   234.0000   226.0000     238.0000
Precision             0.9768          0.9939     0.9912     0.9868       0.9897
Recall                0.6167          0.7348     0.7344     0.7022       0.7018
Accuracy              0.6405      




# Visualization and Results Analysis

## Visualization Approach
Our [visualizations.py](/src/visualizations.py) script provides an overview of the spellcheckers' performance through five distinct visualizations:

1. **Spider/Radar Plot**: Shows the four key metrics (Precision, Recall, Accuracy, F1 Score) for each spellchecker, allowing quick comparison of overall performance patterns and identifying balanced versus specialized tools.

2. **Stacked Percentage Bar Chart**: Displays the proportion of correct versus incorrect decisions, normalizing the results to percentages for fair comparison regardless of the number of words processed.

3. **Processing Time Comparison**: Uses a logarithmic scale bar chart to compare execution times, clearly showing the substantial performance differences between implementations.

4. **Detailed Error Analysis**: Breaks down the specific types of errors (True Corrections, False Corrections, False Alarms) made by each spellchecker, helping identify particular strengths and weaknesses.

5. **Error Pattern Heatmap**: Visualizes the distribution of different error types through a color-coded matrix, making it easy to spot systematic issues in each spellchecker's approach.


The visualizations can be found in the [results/graphs](/results/graphs) directory



In [7]:
from src.visualizations import visualize_results

visualize_results()

# Results Analysis and Recommendations

## Performance Analysis

### Precision (Ability to make correct changes)
All spellcheckers show remarkably high precision (>97%):
- PySpellChecker leads with 99.39%
- SymSpell follows closely at 99.12%
- TextBlob, despite lower overall performance, still achieves 97.68%
This indicates that when these tools make corrections, they are highly confident and accurate.

### Recall (Ability to catch errors)
The recall scores show more variation:
- PySpellChecker leads at 73.48%
- SymSpell very close at 73.44%
- TextBlob significantly lower at 61.67%
This suggests that while corrections are accurate, all spellcheckers miss a significant portion of errors.

### Processing Speed
Speed differences are dramatic:
- SymSpell: 0.24 seconds (fastest)
- Spello: 3.70 seconds (15x slower than SymSpell)
- Autocorrect: 40.92 seconds
- PySpellChecker: 151.40 seconds
- TextBlob: 172.45 seconds (730x slower than SymSpell)

### Error Analysis
- False Alarms (incorrectly changing correct words):
  - PySpellChecker: Only 11 cases
  - SymSpell: 16 cases
  - TextBlob: Highest at 36 cases

- False Corrections:
  - PySpellChecker/SymSpell: ~650 cases
  - TextBlob: Highest at 941 cases

## Recommendations

### Best Overall Choice: SymSpell
- Nearly identical accuracy to the best performer (PySpellChecker)
- Dramatically faster than all alternatives (0.24 seconds vs next best 3.70 seconds)
- Excellent precision (99.12%) and recall (73.44%)
- Perfect for production environments where both speed and accuracy matter

### Best for Quality-Critical Applications: PySpellChecker
- Highest precision (99.39%)
- Best recall (73.48%)
- Lowest false alarm rate (11 cases)
- Best suited for applications where processing time isn't critical

### Best Compromise: Spello
- Good balance of performance (72.12% accuracy)
- Reasonable processing speed (3.70 seconds)
- Solid choice for medium-sized applications

### Use Case Specific Recommendations:
1. **High-Volume Processing**: SymSpell
   - Orders of magnitude faster
   - Minimal accuracy trade-off

2. **Critical Document Processing**: PySpellChecker
   - Highest accuracy
   - Lowest risk of introducing errors
   - Use when processing time isn't critical

3. **Real-time Applications**: SymSpell or Spello
   - Both offer good response times
   - Good accuracy balance

4. **Budget/Resource Constrained**: SymSpell
   - Fastest processing
   - Lowest computational resource requirements
   - Excellent accuracy

Not Recommended: TextBlob for spellchecking
- Lowest performance across most metrics
- Slowest processing time
- Highest false alarm rate
- Better suited for its primary purpose as a general NLP toolkit

The data shows that modern spellcheckers are highly precise but still miss about 25-40% of errors. This suggests room for improvement, possibly through ensemble approaches or better context understanding.
