# **Setup**
 
Reset the Python environment to clear it of any previously loaded variables, functions, or libraries. Then, import the libraries needed to complete the code Professor Melnikov presented in the video.

In [1]:
%reset -f
from IPython.core.interactiveshell import InteractiveShell as IS
IS.ast_node_interactivity = "all"    # allows multiple outputs from a cell

<span style="color:black">You will be using [TextBlob](https://textblob.readthedocs.io/en/dev/), a popular NLP library, to correct misspellings. Many of its functions overlap with `nltk`, `Spacy`, `Gensim`, and other NLP libraries. For better integration, you will want to do as much as possible with the tools from the same library.

In [2]:
from textblob import Word

<hr style="border-top: 2px solid #606366; background: transparent;">

# Review

TextBlob's `Word` object behaves very much like a string when printed, concatenated with other strings, sliced, etc. However, it has additional complex methods that Python strings do not have. One method corrects misspellings using a popular [algorithm](https://norvig.com/spell-correct.html) created by [Peter Norvig](https://en.wikipedia.org/wiki/Peter_Norvig). 

Explore these functionalities by wrapping the misspelled word `'fianlly'` into a `Word` object.

In [3]:
print(Word('fianlly'))
print(Word('fianlly')+'!')
print(Word('fianlly'[:3]))
print(Word('fianlly').correct())   # Peter Norvig's algorithm

fianlly
fianlly!
fia
finally


<span style="color:black">Peter Norvig's algorithm also calculates a standardized score, between 0 and 1, for the identified candidate(s). A higher score indicates a more likely candidate for the misspelled word. You can get these scores using the `spellcheck()` method.

In [4]:
print(Word('fianlly').spellcheck())  # candidate & confidence score

[('finally', 1.0)]


<span style="color:black"> Shorter words tend to have more candidates. Here is an example with multiple candidates.

In [5]:
print(Word('teh').spellcheck())

[('the', 0.9941491410044596), ('ten', 0.0027204630998372693), ('tea', 0.0013291760350803096), ('eh', 0.0011055763282443697), ('th', 0.0006335325027018298), ('ted', 2.4844411870659992e-05), ('heh', 2.4844411870659992e-05), ('te', 1.2422205935329996e-05)]


<span style="color:black"> To correct a sentence, you can tokenize the sentence, loop through the word tokens to correct any misspellings, then join the corrected words back together into a single string.

In [6]:
sScrambled = '''Thea ordirng oof leeetters in a wrod ies noot imporant.'''
LsCorrected = [Word(s).correct() for s in sScrambled.split()]
print(' '.join(LsCorrected))

The ordering of letters in a word is not important


<hr style="border-top: 2px solid #606366; background: transparent;">

# **Optional Practice**

You will now practice using `Word` objects.
    
As you work through these tasks, check your answers by running your code in the *#check solution here* cell, to see if you've gotten the correct result. If you get stuck on a task, click the See **solution** drop-down menu to view the answer. You will need the following small helper function to find candidates or bets for the misspelled word. 

In [7]:
def SpellBets(sScrambled='het'):
  'Prints a count and a list of candidates'
  LsCandidates = Word(sScrambled).spellcheck()  # find bets
  print(f'{len(LsCandidates)},', [w+f',{n:.3f}' for w,n in LsCandidates])
SpellBets('the')  # returns 1 candidate
SpellBets('eth')  # returns 6 candidates
SpellBets('het')  # returns 25 candidates

1, ['the,1.000']
6, ['eh,0.489', 'th,0.280', 'etc,0.115', 'et,0.104', 'ety,0.005', 'beth,0.005']
25, ['he,0.602', 'her,0.256', 'met,0.026', 'let,0.025', 'yet,0.024', 'get,0.023', 'set,0.016', 'hot,0.006', 'hat,0.005', 'heat,0.004', 'wet,0.003', 'hut,0.003', 'aet,0.002', 'net,0.001', 'hit,0.001', 'et,0.001', 'pet,0.001', 'bet,0.001', 'hey,0.000', 'hen,0.000', 'jet,0.000', 'hem,0.000', 'heh,0.000', 'hew,0.000', 'cet,0.000']


## Task 1

By exploring different permutations of letter positions only, scramble the spelling of `'junk'` (without dropping or introducing letters) to have more than twenty candidates. You can use the `SpellBets()` method for convenience.

<b>Hint:</b> Try permuting letters in some consistent manner to avoid confusion.

In [8]:
SpellBets('nujk')

20, ['neck,0.613', 'luck,0.085', 'sunk,0.082', 'null,0.033', 'nut,0.030', 'dusk,0.030', 'bulk,0.027', 'nuts,0.018', 'duck,0.012', 'turk,0.009', 'suck,0.009', 'nun,0.009', 'numb,0.009', 'tuck,0.006', 'nuns,0.006', 'nook,0.006', 'buck,0.006', 'nur,0.003', 'nick,0.003', 'fuck,0.003']



<font color=#606366>
    <details><summary><font color=#b31b1b>▶ </font>See <b>solution</b>.</summary>
            <pre>
SpellBets('nujk')
            </pre>
    </details> 
</font>
<hr>

## Task 2

By mixing the letter positions (except letter `'t'`), scramble the spelling of `'trash'` to have more than thirty candidates. You can use the `SpellBets()` for convenience.

<b>Hint:</b> To avoid confusion, try writing down your scrambled words and generate them by permuting letters in some consistent manner.

In [None]:
# check solution here


<font color=#606366>
    <details><summary><font color=#b31b1b>▶ </font>See <b>solution</b>.</summary>
            <pre>
SpellBets('thasr')
            </pre>
    </details> 
</font>
<hr>

## Task 3

By mixing the letter positions (except first and last letters), scramble the spelling of `'garbage'` to have more than two candidates. You can use the `SpellBets()` for convenience. 

<b>Hint:</b> You might even find a scrambled version of the word with three candidates of corrected words: <code>['gargle,0.333', 'garage,0.333', 'barge,0.333']</code>. Now, can you guess the scrambled word?

In [None]:
# check solution here


<font color=#606366>
    <details><summary><font color=#b31b1b>▶ </font>See <b>solution</b>.</summary>
            <pre>
SpellBets('gbargae')
            </pre>
    </details> 
</font>
<hr>