<a href="https://colab.research.google.com/github/AhmedSSoliman/Arabic-Spelling-correction-system/blob/master/Arabic_Spelling_correction_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Spelling correction system**

# **ar-corrector**
ar-corrector is a simple library to check the spelling of arabic sentences.

This library uses a vocabulary that consists of +500K words, and uses 1-edit_distance and 2-edit_distance to correct the misspelled words.

It also uses 1-ngram language model to correct the words depending on the previous context.

In [None]:
%%capture
!pip install ar-corrector pyarabic

In [None]:
from ar_corrector.corrector import Corrector
corrector = Corrector()

In [None]:
%%time
# Test the spelling correction system
input_word = "عربيياة"
corrections = corrector.spell_correct(input_word)
print(corrections)

[('عربية', 1900), ('عربيا', 428), ('عربيين', 20), ('عربيات', 9), ('عربيان', 5)]
CPU times: user 278 ms, sys: 890 µs, total: 279 ms
Wall time: 283 ms


In [None]:
from ar_corrector.corrector import Corrector
from pyarabic.araby import tokenize, strip_tashkeel, strip_tatweel

class ArabicSpellingCorrector:
    def __init__(self):
        # Initialize the Corrector in the constructor
        self.corrector = Corrector()

    def arabic_spelling_correction(self, word):
        # Tokenize the input word
        #tokens = tokenize(word)

        # Strip tashkeel (diacritics) and tatweel (elongation)
        stripped_word = strip_tashkeel(strip_tatweel(word))

        # Perform spelling correction
        corrections = self.corrector.spell_correct(stripped_word, 5)

        print(f"Original word: {word}")

        if(corrections == True):
            print("No corrections needed. The word is correct.")

        # Print the output corrections
        elif len(corrections) > 0:
            print("Possible corrections:")
            for correction in corrections:
                print(correction)
        else:
            print("You entered a not valid word")


        return corrections

In [None]:
# Create an instance of the ArabicSpellingCorrector class
spelling_corrector = ArabicSpellingCorrector()

In [None]:
%%time
# Test the spelling correction system
input_word = "عربيياة"
corrections = spelling_corrector.arabic_spelling_correction(input_word)


Original word: عربيياة
Possible corrections:
('عربية', 1900)
('عربيا', 428)
('عربيين', 20)
('عربيات', 9)
('عربيان', 5)
CPU times: user 289 ms, sys: 3.33 ms, total: 292 ms
Wall time: 295 ms


In [None]:
%%time
# Test the spelling correction system
input_word = "بح ر"
corrections = spelling_corrector.arabic_spelling_correction(input_word)

Original word: بح ر
Possible corrections:
('بحر', 613)
('بحجر', 144)
('بحذر', 132)
('بحار', 82)
('بحفر', 42)
CPU times: user 827 µs, sys: 0 ns, total: 827 µs
Wall time: 838 µs


In [None]:
%%time
# Test the spelling correction system
input_word = "أراج"
corrections = spelling_corrector.arabic_spelling_correction(input_word)

Original word: أراج
Possible corrections:
('أراد', 1460)
('أراب', 346)
('راج', 245)
('أراه', 239)
('أراض', 163)
CPU times: user 776 µs, sys: 0 ns, total: 776 µs
Wall time: 765 µs


In [None]:
%%time
# Test the spelling correction system
input_word = "بحر"
corrections = spelling_corrector.arabic_spelling_correction(input_word)

Original word: بحر
No corrections needed. The word is correct.
CPU times: user 2.15 ms, sys: 0 ns, total: 2.15 ms
Wall time: 2.11 ms


# **SpellChecker**
*It is good in the word processing time*

It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.

pyspellchecker supports multiple languages including English, Spanish, German, French, Portuguese, Arabic and Basque. For information on how the dictionaries were created and how they can be updated and improved, please see the Dictionary Creation and Updating section of the readme!

pyspellchecker supports Python 3

pyspellchecker allows for the setting of the Levenshtein Distance (up to two) to check. For longer words, it is highly recommended to use a distance of 1 and not the default 2. See the quickstart to find how one can change the distance parameter.

In [None]:
%%capture
!pip install pyspellchecker

In [None]:
from spellchecker import SpellChecker
# english = SpellChecker()  # the default is English (language='en')
# spanish = SpellChecker(language='es')  # use the Spanish Dictionary
# russian = SpellChecker(language='ru')  # use the Russian Dictionary
arabic_spelling_checker = SpellChecker(language='ar', distance=1)   # use the Arabic Dictionary

In [None]:
%%time
correction = arabic_spelling_checker.candidates("عربيياة")
print(correction)

None
CPU times: user 3.05 ms, sys: 0 ns, total: 3.05 ms
Wall time: 3.07 ms


In [None]:
%%time
correction = arabic_spelling_checker.candidates("عرباة")
print(correction)

{'رباة', 'عراة', 'عربدة', 'عرابة', 'عربية', 'عربات', 'عربة'}
CPU times: user 2.59 ms, sys: 40 µs, total: 2.63 ms
Wall time: 2.54 ms


In [None]:
%%time
correction = arabic_spelling_checker.candidates("بح ر")
print(correction)

{'بحظر', 'بحر', 'بحصر', 'بحار', 'بحبر', 'بحجر', 'بحشر', 'بحير', 'بحفر', 'بحذر'}
CPU times: user 2.1 ms, sys: 3 µs, total: 2.11 ms
Wall time: 2.1 ms


In [None]:
%%time
correction = arabic_spelling_checker.candidates("بحر")
print(correction)

{'بحر'}
CPU times: user 2.01 ms, sys: 0 ns, total: 2.01 ms
Wall time: 1.9 ms


# **AraSpell**

AraSpell is a modern technique.

[AraSpell](https://github.com/msalhab96/AraSpell/)

### **CustomArabicSpellingCorrector**
### If you want to do a custom correction

In [None]:
from ar_corrector.corrector import Corrector
from pyarabic.araby import tokenize, strip_tashkeel, strip_tatweel

class CustomArabicSpellingCorrector:
    #def __init__(self):
        # Initialize the Corrector in the constructor
        #self.corrector = Corrector()

    def arabic_spelling_correction(self, word):

        # Strip tashkeel (diacritics) and tatweel (elongation)
        stripped_word = strip_tashkeel(strip_tatweel(word))

        # Perform spelling correction
        corrections = []

        # Add possible corrections to the list
        # You can use your own logic or dictionary for spelling correction

        # For demonstration purposes, let's just add some sample corrections
        corrections.append(stripped_word + "ة")
        corrections.append(stripped_word[:-2] + "ي")
        corrections.append(stripped_word[:-2])
        corrections.append("غ" + stripped_word[:-2] + "ة")

        print(f"Original word: {word}")

        if(corrections == True):
            print("No corrections needed. The word is correct.")

        # Print the output corrections
        elif len(corrections) > 0:
            print("Possible corrections:")
            for correction in corrections:
                print(correction)
        else:
            print("You entered a not valid word")


        return corrections

In [None]:
# Create an instance of the ArabicSpellingCorrector class
spelling_corrector = CustomArabicSpellingCorrector()

In [None]:
%%time
# Test the spelling correction system
input_word = "عربيياة"
corrections = spelling_corrector.arabic_spelling_correction(input_word)


Original word: عربيياة
Possible corrections:
عربيياةة
عربييي
عربيي
غعربيية
CPU times: user 1.78 ms, sys: 1.09 ms, total: 2.87 ms
Wall time: 3.31 ms
