# Implementing a Spell Checker after transcriptions
### Testing multiple functions to count spelling mistakes in UGC
- For now we decided not to use this. The google transcriptions do not give an accurate enough representation of the children's writing to get an accurate count of spelling errrs. Seems to add in extra errors for bad handwriting

In [1]:
# Imports
import pandas as pd
import pkg_resources
from symspellpy import SymSpell, Verbosity
from spellchecker import SpellChecker
import string

In [2]:
# Load transribed stories df
df = pd.read_csv('transcribed_stories.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Submission ID,Transcribed Text
0,0,3132,Page. I 3132 Once there was a little cheatah a...
1,1,3104,"3106 D she was very, a berenang The pony that ..."
2,2,3103,3103 Rainbow the Unica unicom named some een P...
3,3,3117,3117 O gum drop land gumdrop. land is prace We...
4,4,3102,3102 The secret fifth grade E am Anella I am s...


In [7]:
# Check spell check for the first entry
input_term = df['Transcribed Text'][0]

### Symspell package
- Slow

In [3]:
# Load dictionary
sym_spell = SymSpell(max_dictionary_edit_distance=6, prefix_length=7, count_threshold= 15)
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
bigram_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_bigramdictionary_en_243_342.txt")

sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)
sym_spell.load_bigram_dictionary(bigram_path, term_index=0, count_index=2)

True

In [None]:
# Run transcribed text through Symspell to get spelling suggestions back
suggestions = sym_spell.lookup_compound(input_term, max_edit_distance=6, ignore_term_with_digits= True, ignore_non_words= True)

suggestions

In [9]:
for suggestion in suggestions:
    print('------')
    print(suggestion)

------
page i 3132 once there was a little cheetah and the cheetah had a best friend lion the cheetahs name was paws and the lions name was dylan they always played with each other after they went hunting dylan went to play with paws after hunting with pack and paws wont to play after hunting with his mon they always met up near the same rock art the same lake they would talk and have water fights here is what the talked about your parents and my pack might have a fight said dylan we both don't want that to happen we might have to fight against each other said paws after a few minutes it was time for them bale to their family next morning they saw there family having a fight dylan family won and plus mon was here at their us all he meeting time they talked about it my mon got here its not fair said paws i think the best way to fix this is to tell are family we are friend said dylan i think your right said paws so after there talk time and water fight they went back to there family and 

### Python Spellchecker

In [22]:
def spell_check(text):
    # Initialize spellchecker
    spell = SpellChecker()

    # Strip the punctuation
    exclude = set(string.punctuation)
    text = ''.join(ch for ch in text if ch not in exclude)

    # Split text into list of words
    text = text.split()

    # Find misspelled words
    misspelled = spell.unknown(text)

    # Get corrections
    for word in misspelled:
        # Get the most likely correct spelling
        print(f'{word}: {spell.correction(word)}')

In [31]:
input_term = df['Transcribed Text'][4]

In [32]:
df['Transcribed Text'][4]

"3102 The secret fifth grade E am Anella I am starting fifth grade I have a little ster named Emma, Emma is goining into kindergarden withiber ter plete Ara and Isabella, but I don't care about theme when I was a 3rd grader the learned how to invent this year I will try to make an invention that arobot will go to school forme Got to go to sleep. Ezzzzz. I wake up get changed, brush teeth, and do hain I have long curly brown hair. I used to wear glasses and braces, and once broke my leg I am 10, I will turn il in November. The school is called i Los Angeles elementry school, I live in Nevada and will drive there. It is probably an hour When I walked in Mrs. Begula said Hello I Mrsklapen, my last name. I said hi I have to share a desk with Beltness Ay Schedule: Math writting reading vocab spelling typing, and social stadies. It had said No Lunch in Fifth grade! When I was in second grade Kayla called me a division Decimal I did not like it. we had to make recimals. Class dismissed said d

In [33]:
# Return identified typo and the suggested correction
spell_check(input_term)

kayla: layla
recimals: decimals
hotice: notice
withiber: wither
goining: joining
3rd: ord
ezzzzz: ezzzzz
arobot: robot
suid: said
shes: she
soom: room
godess: goddess
asock: sock
mtlava: lava
wampice: vampire
ster: step
kindergarden: kindergarten
anella: nella
writting: writing
oht: out
cof: of
beltness: boldness
didnt: dint
begula: betula
elementry: elementary
vocab: vocal
stadies: studies
mrsklapen: mrsklapen
begular: regular
plete: plate
emania: mania
