# Spellchecking with NLU 

Implements Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information



# 1. Install Java and NLU

In [1]:

import os
! apt-get update -qq > /dev/null   
# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install nlu  > /dev/null    

## 2. Load Model and spellcheck a sample string

In [2]:
import nlu
pipe = nlu.load('spell')
pipe.predict('I liek pentut buttr ant jely')

check_spelling_dl download started this may take some time.
Approx size to download 112.1 MB
[OK!]


Unnamed: 0_level_0,checked,token
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,I,I
0,like,liek
0,peanut,pentut
0,butter,buttr
0,and,ant
0,jelly,jely


# 4. Checkout possible configurations for the Spellchecker

In [3]:
pipe.print_info()

The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> pipe['document_assembler'] has settable params:
pipe['document_assembler'].setCleanupMode('disabled')  | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : disabled
>>> pipe['sentence_detector'] has settable params:
pipe['sentence_detector'].setCustomBounds([])   | Info: characters used to explicitly mark sentence bounds | Currently set to : []
pipe['sentence_detector'].setDetectLists(True)  | Info: whether detect lists during sentence detection | Currently set to : True
pipe['sentence_detector'].setExplodeSentences(False)  | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
pipe['sentence_detector'].setMaxLength(99999)   | Info: Set the maximum allowed length for each sentence | Currently set to : 99999
pipe['sentence_detect

# 4.1 Configure  more candidates for correction    
We can cofnigure a lot of things on our spellchecker and it leaves a lot of room to experiment.    
For now we can increase the number of candidates which are considerd for every word.

In [4]:
pipe['context_spell'].setMaxCandidates(2) 
pipe.predict('Hello world!')

Unnamed: 0_level_0,checked,token
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Hello,Hello
0,world,world
0,!,!


## Too High max candidates can cause wrong predictions like Bello instead of Hello

In [5]:
pipe['context_spell'].setMaxCandidates(6) 
pipe.predict('Hello world!')

Unnamed: 0_level_0,checked,token
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Bello,Hello
0,world,world
0,!,!
