# Dataset

### Dataset format

The analyzed dataset is a set of english correctly spelled words as well as variations of their incorrect spelling, collected by Wikipedia editors. Most of the spell checking tools support the ability to find errors and correct them in English texts, so English was selected for testing. File with the data can be downloaded from this [article](https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=wikipedia.txt) (file ['wikipedia_misspells.txt'](https://github.com/diffitask/spell-checkers-comparison/blob/main/data/wikipedia_misspells.txt) has already been downloaded and put in the *data* folder)

Also, there are interesting ways to generate misspelled words from the correct ones described in this [article](https://www.ijcaonline.org/archives/volume176/number27/yunus-2020-ijca-920288.pdf), where it is suggested to swap letters, add new letters and use keyboard characters relative positions. However, in this work I have so far stopped at the Wikipedia dataset described above.

### Where to download more datasets for testing

For the dataset parsing method presented here, any files with data in the following format will be suitable:
*{'the correct word': 'its incorrect spelling 1', 'incorrect spelling 2', ..., 'incorrect spelling N'}*
For example, [this](https://www.kaggle.com/datasets/bittlingmayer/spelling?select=aspell.txt), [this](https://www.kaggle.com/datasets/bittlingmayer/spelling?select=birkbeck.txt) and [this](https://www.kaggle.com/datasets/bittlingmayer/spelling?select=spell-testset2.txt) datasets from the site mentioned above can also be used in testing.

### Dataset reading and dictionary creating

Because a different number of potential misspelled forms are presented for each correct word in the dataset, it's impossible to use the *read_csv* method from *pandas* library, where it's required the same number of columns-features for each row. So the dataset will be processed line by line.

In [38]:
def read_misspells_dataset(path_to_misspells_file: str) -> dict:
    """

    Parameters
    ----------
    path_to_misspells_file : str
        Path to the dataset file (may have any format).

    Returns
    -------
    dataset_dict : dict
        Dictionary from misspelled dataset word to list of its possible correct spellings.

    """
    # reading dataset file
    with open(path_to_misspells_file, 'r') as dataset_file:
        dataset_lines = dataset_file.readlines()

    # delete '\n' symbols
    dataset_lines = [line.strip() for line in dataset_lines]

    # filling dataset dictionary
    dataset_dict = {}
    for word_line in dataset_lines:
        line_words = word_line.split()
        correct_word = line_words[0][:-1]  # removing ':'

        misspellings = []
        if len(line_words) > 1:
            misspellings = line_words[1:]
        dataset_dict[correct_word] = misspellings

    return dataset_dict

### Test dataset reading result
Dataset is huge enough, so let's print first 1000 symbols of a string representation of the resulting dictionary, to see if everything was processed correctly.

In [39]:
# -- Testing --
def test_dataset_reading():
    path_to_misspells_dataset = "data/wikipedia_misspells.txt"
    misspells_dict = read_misspells_dataset(path_to_misspells_dataset)
    print(str(misspells_dict)[:1000] + '...')


test_dataset_reading()

{'Apennines': ['Apenines', 'Appenines'], 'Athenian': ['Athenean'], 'Athenians': ['Atheneans'], 'Bernoulli': ['Bernouilli'], 'Blitzkrieg': ['Blitzkreig'], 'Brazilian': ['Brasillian'], 'Britain': ['Britian'], 'British': ['Brittish'], 'Caesar': ['Ceasar'], 'Cambridge': ['Cambrige'], 'Caracas': ['carcas'], 'Caribbean': ['Carribean'], 'Carthaginian': ['Carthagian'], 'Catalina': ['Cataline'], 'Catiline': ['Cataline'], 'Celsius': ['Celcius'], 'Champagne': ['Champange'], 'Connecticut': ['Conneticut'], 'Cypriot': ['Cyprian'], 'Ellis': ['eles'], 'English': ['Enlish'], 'European': ['Europian', 'Eurpean', 'Eurpoean'], 'Europeans': ['Europians'], 'February': ['febuary'], 'Flemish': ['Flemmish'], 'Franciscan': ['Fransiscan'], 'Franciscans': ['Fransiscans'], 'Gael': ['gae'], 'Galatians': ['Galations'], 'Gandhi': ['Ghandi'], 'Gauguin': ['gogin'], 'Guatemala': ['Guatamala'], 'Guatemalan': ['Guatamalan'], 'Guinness': ['Guiness'], 'Israelis': ['Israelies'], 'Ithaca': ['Ihaca'], 'Jacques': ['Jaques'], 'Ja

### Build texts on which spell checkers will be tested

Now let's build a texts that we will give to the checkers.

In parallel, based on the dictionary obtained above, we will generate 2 text string: correct string and string with misspellings.

**Why 2 string?**
To make it easy to get the correct form of an incorrect word. We will assume that the checker worked correctly if it issued exactly such a correction that matches with the word of the correct string.

**What are the requirements for the line with misspellings?**
It is necessary that there are both right spelled and misspelled words in the string, in order not just to evaluate how well the checker can correct errors, but also to see if checker does not spoil the originally right spelled words.

**How will the strings be created?**
We will follow the keys-the correct words lying in the dictionary and do these steps:
1. In the misspelled string there will be put right spelled word and then all these incorrect forms (let there be N forms) that lie in the dictionary (values)
2. In the correct line, the correct word is also written first, but then follows a sequence of the same correct word repeated N times.

Thus, for each word in the misspelled line, its correct version is at the same position in the correct line.

All words are separated by spaces.

In [40]:
def build_correct_and_misspell_sentence(misspells_dict: dict):
    misspell_str = ""
    correct_str = ""
    invalids_in_misspell_str = 0

    for correct_word in misspells_dict:
        misspelled_forms = misspells_dict[correct_word]
        invalids_in_misspell_str += len(misspelled_forms)
        misspelled_forms_str = " ".join(misspelled_forms)
        misspell_str += correct_word + ' ' + misspelled_forms_str + ' '

        correct_forms = " ".join([correct_word] * len(misspelled_forms))
        correct_str += correct_word + ' ' + correct_forms + ' '

    return correct_str, misspell_str, invalids_in_misspell_str

### Test sentence building

Here we will also display only the first 2000 characters of each generated string on the screen, so as not to clutter the notebook.

In [53]:
# -- Testing --
def test_sentences_building():
    path_to_misspells_dataset = "data/wikipedia_misspells.txt"
    misspells_dict = read_misspells_dataset(path_to_misspells_dataset)

    correct_sentence, misspell_sentence, invalids_in_misspell_str = build_correct_and_misspell_sentence(misspells_dict)
    short_correct_sentence = correct_sentence[:2000]
    short_misspell_sentence = misspell_sentence[:2000]
    print('Correct sentence: \n' + short_correct_sentence + '...\n -----------------------\n')
    print('Misspell sentence: \n' + short_misspell_sentence + '...')


test_sentences_building()

Correct sentence: 
Apennines Apennines Apennines Athenian Athenian Athenians Athenians Bernoulli Bernoulli Blitzkrieg Blitzkrieg Brazilian Brazilian Britain Britain British British Caesar Caesar Cambridge Cambridge Caracas Caracas Caribbean Caribbean Carthaginian Carthaginian Catalina Catalina Catiline Catiline Celsius Celsius Champagne Champagne Connecticut Connecticut Cypriot Cypriot Ellis Ellis English English European European European European Europeans Europeans February February Flemish Flemish Franciscan Franciscan Franciscans Franciscans Gael Gael Galatians Galatians Gandhi Gandhi Gauguin Gauguin Guatemala Guatemala Guatemalan Guatemalan Guinness Guinness Israelis Israelis Ithaca Ithaca Jacques Jacques Japanese Japanese Joseph Joseph Judaism Judaism Judaism Libya Libya Malcolm Malcolm Maltese Maltese Mara_Liasson Mara_Liasson Massachusetts Massachusetts Massachusetts Mediterranean Mediterranean Michigan Michigan Miranda Miranda Mississippi Mississippi Mississippi Missouri Miss

# Metrics

## Metrics group
Metrics for spell checker evaluating can be divided into 3 groups:
1. Metrics that evaluate the tool's ability to **classify words as right spelled or misspelled**
2. Metrics that evaluate the tool's ability to **correct misspelled words**
3. Others

## Metrics of each group
1. To understand the 1st metrics group, this [article](https://gerhard.pro/files/PublicationVanHuyssteenEiselenPuttkammer2004.pdf) can be helpful. I relied on this paper, but changed the approach to putting positive and negative labels, based on the idea that we are still looking for misspelled words (as in the problem of detecting cancerous tumors, we usually label cancerous cases as positives). I chose standard metrics without the additional subdivision, described in the article.

    Thus,  there will be used the following metrics:
    * **Recall**
    * **Precision**
    * **Classifying Accuracy**

2. In the second group I will use these metrics:
    * **Percent of words** that are **invalid after checker work**
    * **Percent of the misspelled words**, that were **correctly fixed** by spellchecker
    * **Percent of non-fixed misspelled words**, but for which the right decision was **in top-5** spellchecker suggested word **candidates**
    * **Percent of** originally correct spelled words that were **broken** by the checker

3. In 3rd group there was put one metric:
    * Checker work **speed**

## Formulas and explanation of each metric

As we are looking for misspelled words, positive labels will be set for invalid words with the misspellings, negative labels -- for valid correct spelled words.

In that way:
* **True positives (tp)** -- invalid words, recognized by spelling checker as misspelled.
* **False positives (fp)** -- valid words, recognized by checker as misspelled.
* **True negatives (tn)** -- valid words, recognized by checker as right spelled.
* **False negatives (fn)** -- invalid words, recognized by checker as right spelled.

### 1. Recall

Recall -- number of invalid words that were recognized by the checker as misspelled (true positives), in relation to the total number of invalid words in the text (sum of true positives and false negatives)

$$ recall = \frac{tp}{tp + fn} $$

The ideal for the spelling checker -- to recognize all invalid words as misspelled $\Rightarrow$ to get as high recall as possible, the closer to 100% the better.
Recall indicates comprehensiveness of the lexicon of the spelling checker and whether the spelling checker lexicon contains any erroneous words.

### 2. Precision

Precision -- number of words, that were marked by checker as misspelled and which are really incorrect, in relation to the total number of words, that checker marked as misspelled (sum of true positives and false positives).

$$ precision = \frac{tp}{tp + fp} $$

The ideal for the checker -- to recognize all invalid and only invalid words as misspelled $\Rightarrow$ to get as high precision as possible, the closer to 100% the better.
Precision indicates how accurate is the spelling checker in assigning misspell flag.

### 3. Identifying Accuracy

Accuracy shows what the spelling checker does right (the sum of true positives and true negatives) in terms of everything the spelling checker does (all true and false positives and negatives).

$$ identifyingAccuracy = \frac{tp + tn}{tp + fp + tn + fn} $$

This gives a good overall view of the competence of a checker, since it determines how accurate a spelling checker is at performing the errand it was sent on. The better spell checking tool works, the higher accuracy $\Rightarrow$ the ideal for the checker -- to get 100% accuracy.

### 4. Percent of words that are invalid after checker work
$$ invalidsAfterCheckerPerc = \frac{invalidsAfterCheckerWork}{nWordsInText} $$
The ideal for spellchecker -- 0% -- when all words in text after checker work are valid. The lower, the better.

### 5. Percent of correctly fixed misspellings

$$ fixedMisspellingsPerc = \frac{fixedMisspellings}{allMisspellsInText} $$

The ideal for spellchecker -- 100%. The higher, the better.

### 6. Percent of non-fixed misspellings but with right correction in top-5 candidates

$$ notFixedButInTop5Perc = \frac{notFixedButCorrectionInTop5Candidates}{notFixedMisspells} $$

The ideal for spellchecker -- 100% -- when all not fixed misspells had right correction in top-5 candidates. The higher, the better.

### 7. Percent of broken valid words

$$ brokenValidsPerc = \frac{brokenValids}{allOriginallyValids} $$

The ideal for spellchecker -- 0% -- when it didn't break a single word. The lower, the better.

### 8. Speed

Calculates number of words per second and shows how fast the spell checking tool copes with data processing. Can be calculated as the total time spent by the spell checker for processing the dataset, divided by the number of words processed by it:

$$ speed = \frac{numberOfWordsInText}{totalCheckerTextProcessingTime} $$

The higher the speed of the checker, the less the user will have to wait $\Rightarrow$ the higher speed the better.

### Calculating 1st metrics group

In [42]:
def calculate_main_metrics(tp: int, fp: int, tn: int, fn: int):
    recall = float(tp) / (tp + fn)
    precision = float(tp) / (tp + fp)
    accuracy = float(tp + tn) / (tp + fp + tn + fn)
    return recall, precision, accuracy


# Spell checking tools

### Selected tools

We will consider 6 speech checking tools:
1. *Pyspellchecker* library
2. *Textblob* library
3. Pre-trained model from *Spello* library
4. *Hunspell* spelling corrector
5. *Jamspell* spelling corrector
6. *Autocorrect* library

### Base SpellCorrector class description

For the convenience of working with these tools, a base class *SpellCorrector* was created and each spell checker was wrapped in a class-inheritor of the SpellCorrector.

SpellCorrector has method *correct* that takes as input a list of words and the position of the analyzed word in it. Almost all tools could be given just an analyzed word as input, but the Jamspell interface requires that a list of words and a position be given, so for uniformity, all methods are built this way.

As the output method *correct* returns one of two:
* either single word (**str** type) that matches the original one -- this means that the checker considered the word given to him as right spelled and did not correct it
* either a **list** of all candidates that the spell checker suggested for the word -- in this case, we believe that the checker classifies the word submitted to him as incorrect and tries to correct it

## IMPORTANT! Models downloading

For some analyzed tools, it's necessary to download model files before code running.

I can't upload them to the GitHub due to their large size, so to run the code you need to follow these steps:
1. Create dir named 'models' in the project directory 'spell-checkers-comparison'
2. Download [file](http://downloads.sourceforge.net/wordlist/hunspell-en_US-2020.12.07.zip) for Hunspell
3. Download [file](https://haptik-website-images.haptik.ai/spello_models/en.pkl.zip) for Spello
4. Download [file](https://github.com/bakwc/JamSpell-models/raw/master/en.tar.gz) for Jamspell
5. Unzip downloaded files
6. Place model files in the created folder 'models' or you can specify the paths to your files in the classes.


In [43]:
import spellchecker
from spello.model import SpellCorrectionModel
import spello.settings as ss
import hunspell
import jamspell
import textblob
import autocorrect

# for 'Spello' logging and warnings
ss.logger.disabled = True
import warnings
warnings.filterwarnings("ignore")

Spell checker classes creation:

In [44]:
class SpellCorrector(object):
    def __init__(self, name):
        self.__name = name

    def correct(self, sentence: list, position: int):
        pass

    def get_name(self):
        return self.__name


class PyspellcheckerCorrector(SpellCorrector):
    def __init__(self):
        super(PyspellcheckerCorrector, self).__init__("Pyspellchecker")
        self.__spellchecker = spellchecker.SpellChecker()

    def correct(self, sentence: list, position: int):
        word = sentence[position]
        self.__spellchecker.unknown([word])
        correction_candidates = self.__spellchecker.candidates(word)
        if correction_candidates is None:
            return word
        return list(correction_candidates)


class TextblobCorrector(SpellCorrector):
    def __init__(self):
        super(TextblobCorrector, self).__init__("Textblob")

    def correct(self, sentence: list, position: int):
        word_str = sentence[position]
        blob_word = textblob.Word(word_str)
        correction_candidates = blob_word.spellcheck()
        if len(correction_candidates) == 0:
            return word_str
        return [word_prob[0] for word_prob in correction_candidates]


class SpelloCorrector(SpellCorrector):
    def __init__(self, spello_model_path: str = "models/spello-en.pkl"):
        super(SpelloCorrector, self).__init__("Spello")
        self.__spellchecker = SpellCorrectionModel(language='en')
        self.__spellchecker.load(spello_model_path)

    def correct(self, sentence: list, position: int):
        word = sentence[position]
        correction_candidates = self.__spellchecker.suggest(word)
        if len(correction_candidates) == 0:
            return word
        return [word_prob[0] for word_prob in correction_candidates]


class HunspellCorrector(SpellCorrector):
    def __init__(self, hunspell_model_path: str = "models/hunspell-en_US/en_US"):
        super(HunspellCorrector, self).__init__("Hunspell")
        self.__spellchecker = hunspell.HunSpell(hunspell_model_path + '.dic',
                                                hunspell_model_path + '.aff')

    def correct(self, sentence: list, position: int):
        word = sentence[position]
        # spellchecker thinks that this word is correct
        if self.__spellchecker.spell(word):
            return word
        return self.__spellchecker.suggest(word)


class JamspellCorrector(SpellCorrector):
    def __init__(self, jamspell_model_path: str = "models/jamspell-en.bin"):
        super(JamspellCorrector, self).__init__("Jamspell")
        self.__spellchecker = jamspell.TSpellCorrector()
        self.__spellchecker.LoadLangModel(jamspell_model_path)

    def correct(self, sentence: list, position: int):
        correction_candidates = list(self.__spellchecker.GetCandidates(sentence, position))
        # spellchecker thinks that this word is correct
        if len(correction_candidates) == 0:
            return sentence[position]
        return correction_candidates


class AutocorrectCorrector(SpellCorrector):
    def __init__(self):
        super(AutocorrectCorrector, self).__init__("Autocorrect")
        self.__spellchecker = autocorrect.Speller()

    def correct(self, sentence: list, position: int):
        word = sentence[position]
        correction_candidates = self.__spellchecker.get_candidates(word)
        if len(correction_candidates) == 0:
            return word
        return [word_prob[1] for word_prob in correction_candidates]


The checker will alternately receive words from the misspelled sentence, we will see how it copes with its task and count metrics in parallel:

In [45]:
import time


def evaluate_checker(spellchecker: SpellCorrector, correct_sentence: str, misspell_sentence: str,
                     invalids_in_misspell_sentence: int):
    # values for metrics calculation
    invalid_recognized_misspelled = 0
    valid_recognized_misspelled = 0
    invalid_recognized_correct = 0
    valid_recognized_correct = 0

    fixed_invalids = 0
    broken_valids = 0
    not_fixed_invalids = 0
    not_fixed_but_correction_in_top5 = 0

    checker_worktime = 0

    correct_sentence_words = correct_sentence.split()
    misspell_sentence_words = misspell_sentence.split()

    n_words = len(misspell_sentence_words)
    for pos in range(n_words):
        misspell_sentence_word = misspell_sentence_words[pos]
        correct_sentence_word = correct_sentence_words[pos]

        # mark the time of operation of the correct function
        correction_start_time = time.time()
        checker_correction_candidates = spellchecker.correct(misspell_sentence_words, pos)
        checker_worktime += time.time() - correction_start_time

        if isinstance(checker_correction_candidates, list):
            word_recognized_correct = False
            checker_word_correction = checker_correction_candidates[0]
            top5_correction_candidates = checker_correction_candidates[:5] if \
                len(checker_word_correction) >= 5 else checker_correction_candidates
        else:
            word_recognized_correct = True
            checker_word_correction = checker_correction_candidates
            top5_correction_candidates = [checker_word_correction]

        # the word was originally correct
        if misspell_sentence_word == correct_sentence_word:
            # checker recognized word as correct
            if word_recognized_correct:
                valid_recognized_correct += 1
            else:
                # checker recognized word as misspelled
                valid_recognized_misspelled += 1
                # if checker suggested the wrong replacement, it brakes out the word
                if checker_word_correction != correct_sentence_word:
                    broken_valids += 1
        else:
            # the word was originally misspelled
            # checker recognized word as correct
            if word_recognized_correct:
                invalid_recognized_correct += 1
            else:
                # checker recognized word as misspelled
                invalid_recognized_misspelled += 1
                # if checker suggested the correct replacement, it fixes the word
                if checker_word_correction == correct_sentence_word:
                    fixed_invalids += 1
                else:
                    not_fixed_invalids += 1
                    # if the word wasn't fixed, but right correction was in top5 candidates
                    if correct_sentence_word in top5_correction_candidates:
                        not_fixed_but_correction_in_top5 += 1

    misspells_after_checker = invalids_in_misspell_sentence - fixed_invalids + broken_valids

    # calculating metrics
    recall, precision, accuracy = calculate_main_metrics(invalid_recognized_misspelled,
                                                         valid_recognized_misspelled,
                                                         valid_recognized_correct,
                                                         invalid_recognized_correct)
    misspells_after_checker_percent = float(misspells_after_checker) / n_words
    fixed_invalids_percent = float(fixed_invalids) / invalids_in_misspell_sentence
    not_fixed_invalids_but_correction_in_top5_percent = float(not_fixed_but_correction_in_top5) / not_fixed_invalids
    broken_valids_percent = float(broken_valids) / (n_words - invalids_in_misspell_sentence)
    speed = float(n_words) / checker_worktime

    return recall * 100, \
           precision * 100, \
           accuracy * 100, \
           misspells_after_checker_percent * 100, \
           fixed_invalids_percent * 100, \
           not_fixed_invalids_but_correction_in_top5_percent * 100, \
           broken_valids_percent * 100, \
        speed

In [46]:
def calculate_metrics_and_print(spellchecker: SpellCorrector, misspells_dict: dict):
    correct_sentence, misspell_sentence, invalids_in_misspell_sentence = build_correct_and_misspell_sentence(
        misspells_dict)

    recall, \
        precision, \
        accuracy, \
        misspells_after_checker_percent, \
        fixed_invalids_percent, \
        not_fixed_invalids_but_correction_in_top5_percent, \
        broken_valids_percent, \
        speed = evaluate_checker(spellchecker, correct_sentence, misspell_sentence, invalids_in_misspell_sentence)

    print("Checker: {0}\n"
          "Classifying recall: {1:.2f} %\n"
          "Classifying precision: {2:.2f} %\n"
          "Classifying accuracy: {3:.2f} %\n"
          "Misspells after checker percent: {4:.2f} %\n"
          "Fixed misspellings percent: {5:.2f} %\n"
          "Not fixed but may be corrected by one in top-5 percent: {6:.2f} %\n"
          "Broken valids percent: {7:.2f} %\n"
          "Speed: {8:.7f} words/ sec\n"
          "----------------------------".format(spellchecker.get_name(),
                                                recall,
                                                precision,
                                                accuracy,
                                                misspells_after_checker_percent,
                                                fixed_invalids_percent,
                                                not_fixed_invalids_but_correction_in_top5_percent,
                                                broken_valids_percent,
                                                speed))

In [47]:
def compare_spellcheckers():
    path_to_misspells_dataset = "data/wikipedia_misspells.txt"
    misspells_dict = read_misspells_dataset(path_to_misspells_dataset)

    correctors = [PyspellcheckerCorrector(),
                  SpelloCorrector(),
                  HunspellCorrector(),
                  JamspellCorrector(),
                  TextblobCorrector(),
                  AutocorrectCorrector()
                  ]

    for corrector in correctors:
        calculate_metrics_and_print(corrector, misspells_dict)

In [48]:
compare_spellcheckers()

Checker: Pyspellchecker
Classifying recall: 97.59 %
Classifying precision: 55.87 %
Classifying accuracy: 55.45 %
Misspells after checker percent: 21.73 %
Fixed misspellings percent: 63.79 %
Not fixed but may be corrected by one in top-5 percent: 57.85 %
Broken valids percent: 3.23 %
Speed: 31.9817255 words/ sec
----------------------------
Checker: Spello
Classifying recall: 96.90 %
Classifying precision: 90.86 %
Classifying accuracy: 92.80 %
Misspells after checker percent: 22.53 %
Fixed misspellings percent: 69.57 %
Not fixed but may be corrected by one in top-5 percent: 47.68 %
Broken valids percent: 12.43 %
Speed: 929.9438380 words/ sec
----------------------------
Checker: Hunspell
Classifying recall: 98.78 %
Classifying precision: 97.31 %
Classifying accuracy: 97.78 %
Misspells after checker percent: 15.01 %
Fixed misspellings percent: 75.97 %
Not fixed but may be corrected by one in top-5 percent: 67.74 %
Broken valids percent: 3.49 %
Speed: 62.9287733 words/ sec
---------------

## 1. Pyspellchecker library

### Work principles description

Spell checking [library](https://github.com/barrust/pyspellchecker) that implements [Peter Norvig's](https://norvig.com/spell-correct.html) algorithm idea, which many subsequently turned to for comparison or improvement.

**How does it work?**
Clipping from the library description:

"It uses a [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results."

## 2. Pre-trained spell correction model 'Spello'

## 3. Hunspell library

## 4. JamSpell library

## 5. Textblob library

## 6. Autocorrect library