作业四 文本纠错  
任务描述：给出三个文件，词典库vocab.txt（任何不在词典中出现的词都认为是拼写错误），spell-errors.txt（给出正确单词与常见的拼写错误）与测试数据testdata.txt（1000个存在拼写错误的句子），利用bigram语言模型与Noisy Channel Model进行文本纠错训练。  
步骤（每一步注释标注）：  
一、读取词典库vocab.txt；  
二、对于词库里没有的词，取编辑距离为1和2，生成候选词集合；  
三、读取spell-errors.txt，计算概率p(错误的单词|正确的单词)；  
四、选用nltk语料库movie_reviews，构建bigram语言模型；  
五、读取testdata.txt，找出拼错词，生成候选词，并计算每个候选词的概率；  
六、选出概率最高的候选词，作为修改词。  
注：为便于计算，可采用似然对数，将概率乘积转化为求和。注意平滑。  
输出：对每个句子，输出句子编号，拼错词，修改词（若没有则输出False），候选词列表及概率。最后输出总拼错词数。  
例如：  
第1句：  
protectionst, protectionist  
{'protectionist': -30.484951869432177}  
第2句：  
Tkyo's, False  
第3句：  
retaliation, retaliation  
{'retaliation': -30.97301670912914}  
Japan's, Japan  
{'Japan': -30.748299246173307, 'Japanese': -30.748299246173307}  
……  
Total mistakes: 1351  


**一、读取词典库vocab.txt**

In [1]:
import re
from collections import Counter

#   对词典库中的单词进行预处理 (全部转小写并只允许英文字母)
def words(text): return re.findall(r'[a-zA-Z]+', text.lower())

#   将vocab.txt保存到集合VOCABS中. 使用集合的原因：可去重
VOCABS = set(words(open('vocab.txt').read()))

**二、对于词库里没有的词，取编辑距离为1和2，生成候选词集合**

In [2]:
#   将testdata.txt保存到集合WORDS中.
WORDS = set(words(open('testdata.txt').read()))

#   通过集合差的方法求得词库中不存在的词汇，（这个其实没用，后面还要重新处理
UNKNOWNS = WORDS - VOCABS

#   定义求编辑距离为1的词汇集合的函数
def edits1(word):
    letters    = 'abcdefghijklmnopqrstuvwxyz'   #   替代或插入时可能的26个字母
    splits     = [(word[:i], word[i:])    for i in range(len(word) + 1)]
    deletes    = [L + R[1:]               for L, R in splits if R]
    transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
    replaces   = [L + c + R[1:]           for L, R in splits if R for c in letters]
    inserts    = [L + c + R               for L, R in splits for c in letters]
    return set(deletes + transposes + replaces + inserts)

#   定义求编辑距离为2的词汇集合的函数（在编辑距离为1的集合基础上再求一遍）
def edits2(word): 
    return (e2 for e1 in edits1(word) for e2 in edits1(e1))

#   暂时不用生成候选词列表，等到使用时再生成

**三、读取spell-errors.txt，计算概率p(错误的单词|正确的单词)**

In [3]:
spell_errors = open("spell-errors.txt")
#   定义一个保存所有单词错词频率词典dict_outer
dict_outer = {}
while True:
    line = spell_errors.readline()
    
    #    输入中止
    if(line == ""):
        break
    
    #    记录当前正确词
    head = line.split(":")[0]
    
    #   去除空格和换行符并读取当前head对应的错词表
    errors = [i.replace(" ","").replace("\n","") for i in  line.split(":")[1].split(",")]
    
    #   定义一个保存当前head对应的错词频率词典dict_inner
    dict_inner={}
    
    #   为当前dict_inner读入数据，先算频数
    for i in errors:
        if (i.find("*") != -1):    #   多次出现的情形
            dict_inner[i[:i.find("*")]] = int(i[i.find("*") + 1:])
        else:                      #   单次出现的情形
            dict_inner[i] = 1
            
    #    把频数转换为频率
    SUM = sum(dict_inner.values())
    for i in dict_inner.keys():
        dict_inner[i] /= SUM
        
    #    把内部字典放在外部字典中
    dict_outer[head] = dict_inner
print(dict_outer)

{'raining': {'rainning': 0.5, 'raning': 0.5}, 'writings': {'writtings': 1.0}, 'disparagingly': {'disparingly': 1.0}, 'yellow': {'yello': 1.0}, 'four': {'forer': 0.08333333333333333, 'fours': 0.08333333333333333, 'fuore': 0.08333333333333333, 'fore': 0.4166666666666667, 'for': 0.3333333333333333}, 'woods': {'woodes': 1.0}, 'hanging': {'haing': 1.0}, 'aggression': {'agression': 1.0}, 'looking': {'loking': 0.09090909090909091, 'begining': 0.09090909090909091, 'luing': 0.09090909090909091, 'look': 0.18181818181818182, 'locking': 0.09090909090909091, 'lucking': 0.09090909090909091, 'louk': 0.09090909090909091, 'looing': 0.09090909090909091, 'lookin': 0.09090909090909091, 'liking': 0.09090909090909091}, 'eligible': {'eligble': 0.3333333333333333, 'elegable': 0.3333333333333333, 'eligable': 0.3333333333333333}, 'electricity': {'electrisity': 0.25, 'electricty': 0.5, 'electrizity': 0.25}, 'scold': {'schold': 0.5, 'skold': 0.5}, 'adaptable': {'adabtable': 1.0}, 'caned': {'canned': 0.5, 'cained'

**四、选用nltk语料库movie_reviews，构建bigram语言模型**

In [4]:
from nltk.corpus import movie_reviews

categories = movie_reviews.sents()

#    计算词频
words_count = Counter(w for s in categories for w in (['<s>'] + s))

#    计算bigram频数
pairs = []
for s in categories:
    s = ['<s>'] + s
    for i in range(len(s) - 1):
        pairs.append(s[i]+'|'+s[i + 1])
pairs_count = Counter(pairs)

#    随便输出几个

print (words_count['<s>'])
print (pairs_count['<s>|the'])

71532
8071


**五、读取testdata.txt，找出拼错词，生成候选词，并计算每个候选词的概率；**


**六、选出概率最高的候选词，作为修改词。**

In [89]:
import numpy as np

#    计算词的总和
count = len(words_count.keys())  

testdata = open("testdata.txt")
false_num = 0

#   返回候选词中存在于词典中的词
def known(words): 
    return set(w for w in words if w in VOCABS)

#   选取候选词
def candidates(word): 
    return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word])

while True:
    line  = testdata.readline()
    
    #   输入终止
    if line == "":
        break
        
    #   处理句子
    line = line.replace("\n","").split("\t")
    
    #   记录句序号
    order = int(line[0])
    
    #   分词并处理标点
    sentence = ['<s>'] + re.findall(r'[a-zA-Z]+', line[2])
    
    curr_false = {}
    #   遍历句子并找出不在词表中的词语
    for i in range(1, len(sentence) - 1):
        if sentence[i].lower() not in VOCABS:
            false_num+=1
            curr = {}
            for j in candidates(sentence[i]):
                #    定义概率（对数后的似然概率）
                prob = 0.0
                #   计算P(错误单词|正确单词)
                #   PS：这里能命中的非常少，不知道是不是我弄错了
                if j in dict_outer and sentence[i] in dict_outer[j]:
                    prob += np.log(dict_outer[j][sentence[i]])
                else:
                #   为了平滑
                    prob += np.log(0.00001)
                    
                #   计算bigram概率
                bigram_w = sentence[i] + '|' + sentence[i + 1]
                if bigram_w in pairs_count:
                    prob += np.log((pairs_count[bigram_w] + 1) / (words_count[sentence[i+1]] + count))
                else:
                #   为了平滑
                    prob += np.log(1.0 / count)
                curr[j] = prob
            curr_false[sentence[i]] = curr
    
    print("Sentence #%d:"%order)
    
    #   标记false情况和最大值
    tag = 0
    Max = -100
    best_candicate = set()
    
    #   选出概率最高的输出
    for i in curr_false:
        tag = 1
        for j in curr_false[i]:
            if curr_false[i][j] > Max:
                best_candicate = j
        print(i,best_candicate)
        print(curr_false[i])
        
    if(tag==0):
        print("False")
print("Total mistakes:%d"%false_num)

Sentence #1:
protectionst protectionist
{'protectionist': -22.103768458274494}
Sentence #2:
Tkyo kyo
{'kyo': -22.103768458274494}
Sentence #3:
retaiation retaliation
{'retaliation': -22.103768458274494}
Sentence #4:
False
Sentence #5:
wouldn would
{'woulda': -22.103768458274494, 'would': -22.103768458274494}
busines business
{'business': -22.103768458274494}
Sentence #6:
False
Sentence #7:
Taawin darwin
{'alwin': -22.103768458274494, 'darwin': -22.103768458274494}
Sentence #8:
seriousnyss seriousness
{'seriousness': -22.103768458274494}
Sentence #9:
aganst against
{'against': -12.382602462532319}
Sentence #10:
bililon billion
{'billion': -22.103768458274494}
Sentence #11:
sewll sell
{'swell': -22.103768458274494, 'sell': -22.103768458274494}
Sentence #12:
importsi imports
{'imports': -22.103768458274494}
Sentence #13:
Sheem helm
{'thee': -22.103768458274494, 'sheet': -22.103768458274494, 'thees': -22.103768458274494, 'theme': -22.103768458274494, 'sheer': -22.103768458274494, 'seem': -

Sentence #66:
Rtbusoa Rtbusoa
{'Rtbusoa': -22.103768458274494}
Sentence #67:
rubebr rubber
{'rubber': -22.103768458274494}
Sentence #68:
afte ante
{'arte': -22.103768458274494, 'aft': -22.103768458274494, 'fate': -22.103768458274494, 'after': -22.103768458274494, 'ate': -22.103768458274494, 'ante': -22.103768458274494}
Sentence #69:
Salh mali
{'galt': -22.103768458274494, 'hash': -22.103768458274494, 'yale': -22.103768458274494, 'gall': -22.103768458274494, 'pah': -22.103768458274494, 'palm': -22.103768458274494, 'hal': -22.103768458274494, 'hals': -22.103768458274494, 'ual': -22.103768458274494, 'call': -22.103768458274494, 'gale': -22.103768458274494, 'pall': -22.103768458274494, 'male': -22.103768458274494, 'ali': -22.103768458274494, 'ash': -22.103768458274494, 'talk': -22.103768458274494, 'ah': -22.103768458274494, 'sale': -22.103768458274494, 'bath': -22.103768458274494, 'ball': -22.103768458274494, 'path': -22.103768458274494, 'bach': -22.103768458274494, 'rash': -22.10376845827

Sentence #132:
ltBondr ltbond
{'ltbond': -22.103768458274494}
Sentence #133:
False
Sentence #134:
undergrduno underground
{'underground': -22.103768458274494}
Sentence #135:
whle while
{'whoe': -22.103768458274494, 'whee': -22.103768458274494, 'whole': -22.103768458274494, 'while': -22.103768458274494}
Sentence #136:
quarteir quarter
{'quarter': -22.103768458274494}
Sentence #137:
rurthef further
{'further': -22.103768458274494}
Sentence #138:
furter further
{'further': -22.103768458274494}
Sentence #139:
False
Sentence #140:
Importt imports
{'import': -22.103768458274494, 'imports': -22.103768458274494}
Sentence #141:
demend demand
{'depend': -22.103768458274494, 'defend': -22.103768458274494, 'demand': -22.103768458274494}
Sentence #142:
Mtsuki mutsuki
{'mutsuki': -22.103768458274494}
Sentence #143:
volum volume
{'volume': -11.28399017386421}
Sentence #144:
teir tear
{'their': -22.103768458274494, 'heir': -22.103768458274494, 'weir': -22.103768458274494, 'meir': -22.103768458274494, 

Sentence #219:
extorps extras
{'exports': -22.103768458274494, 'extras': -22.103768458274494}
Sentence #220:
Thede weede
{'hyde': -22.103768458274494, 'thee': -22.103768458274494, 'shed': -22.103768458274494, 'shewe': -22.103768458274494, 'sheds': -22.103768458274494, 'theme': -22.103768458274494, 'there': -22.103768458274494, 'bede': -22.103768458274494, 'these': -22.103768458274494, 'here': -22.103768458274494, 'whee': -22.103768458274494, 'shade': -22.103768458274494, 'chide': -22.103768458274494, 'heade': -22.103768458274494, 'hee': -22.103768458274494, 'hedge': -22.103768458274494, 'hide': -22.103768458274494, 'rhode': -22.103768458274494, 'where': -22.103768458274494, 'phedre': -22.103768458274494, 'heed': -22.103768458274494, 'weede': -22.103768458274494}
Sentence #221:
prttecoionist protectionist
{'protectionist': -22.103768458274494}
Sentence #222:
stlocks stocks
{'stocks': -22.103768458274494}
Sentence #223:
includd include
{'included': -22.103768458274494, 'include': -22.103

Sentence #313:
excesvise excessive
{'excessive': -22.103768458274494}
Sentence #314:
cxeess creeps
{'creeks': -22.103768458274494, 'cheers': -22.103768458274494, 'excess': -22.103768458274494, 'creeds': -22.103768458274494, 'caress': -22.103768458274494, 'cheeks': -22.103768458274494, 'chess': -22.103768458274494, 'cheese': -22.103768458274494, 'creeps': -22.103768458274494}
Sentence #315:
blema blest
{'cleva': -22.103768458274494, 'bella': -22.103768458274494, 'edema': -22.103768458274494, 'blum': -22.103768458274494, 'plea': -22.103768458274494, 'lena': -22.103768458274494, 'bleed': -22.103768458274494, 'bleak': -22.103768458274494, 'flem': -22.103768458274494, 'blame': -22.103768458274494, 'lem': -22.103768458274494, 'bleat': -22.103768458274494, 'blimp': -22.103768458274494, 'beman': -22.103768458274494, 'bled': -22.103768458274494, 'buena': -22.103768458274494, 'bea': -22.103768458274494, 'bela': -22.103768458274494, 'beam': -22.103768458274494, 'lemma': -22.103768458274494, 'elen

Sentence #358:
Cujuangco cojuangco
{'cojuangco': -22.103768458274494}
Sentence #359:
wich rich
{'which': -12.627724920565305, 'witch': -10.996308101412428, 'wish': -11.689455281972373, 'with': -22.103768458274494, 'ich': -22.103768458274494, 'bich': -22.103768458274494, 'wick': -22.103768458274494, 'rich': -22.103768458274494}
Sentence #360:
owted owned
{'owed': -22.103768458274494, 'opted': -22.103768458274494, 'owned': -22.103768458274494}
Sentence #361:
amgno amino
{'amino': -22.103768458274494}
Sentence #362:
Neptuna neptunia
{'neptune': -22.103768458274494, 'neptunia': -22.103768458274494}
Sentence #363:
buyeras buyers
{'buyers': -22.103768458274494}
Sentence #364:
sequestedred sequestered
{'sequestered': -22.103768458274494}
Sentence #365:
aojuCngco cojuangco
{'cojuangco': -22.103768458274494}
Sentence #366:
stokc stoic
{'stock': -22.103768458274494, 'stoic': -22.103768458274494}
Sentence #367:
False
Sentence #368:
everythirng everything
{'everything': -22.103768458274494}
Senten

Sentence #433:
Aprl mpl
{'april': -22.103768458274494, 'hurl': -22.103768458274494, 'nrl': -22.103768458274494, 'carl': -22.103768458274494, 'curl': -22.103768458274494, 'girl': -22.103768458274494, 'pal': -22.103768458274494, 'wwrl': -22.103768458274494, 'pry': -22.103768458274494, 'karl': -22.103768458274494, 'pre': -22.103768458274494, 'pro': -22.103768458274494, 'earl': -22.103768458274494, 'burl': -22.103768458274494, 'mpl': -22.103768458274494}
Sentence #434:
prevnted prevented
{'prevented': -22.103768458274494}
Sentence #435:
duspite despite
{'despite': -22.103768458274494}
Sentence #436:
False
Sentence #437:
calledi called
{'called': -22.103768458274494}
Sentence #438:
agenad agenda
{'agenda': -22.103768458274494}
Sentence #439:
proposas proposals
{'proposes': -22.103768458274494, 'proposal': -22.103768458274494, 'proposals': -22.103768458274494}
Sentence #440:
interom interim
{'interim': -22.103768458274494}
Sentence #441:
conplicated complicated
{'complicated': -22.1037684582

Sentence #499:
Europeatn european
{'european': -22.103768458274494}
Sentence #500:
INTERVNTION INTERVNTION
{'INTERVNTION': -22.103768458274494}
Sentence #501:
oBard molard
{'board': -22.103768458274494, 'toward': -22.103768458274494, 'guard': -22.103768458274494, 'ward': -22.103768458274494, 'bard': -22.103768458274494, 'yeard': -22.103768458274494, 'coward': -22.103768458274494, 'hard': -22.103768458274494, 'award': -22.103768458274494, 'oxnard': -22.103768458274494, 'card': -22.103768458274494, 'howard': -22.103768458274494, 'heard': -22.103768458274494, 'beard': -22.103768458274494, 'lard': -22.103768458274494, 'onward': -22.103768458274494, 'yard': -22.103768458274494, 'molard': -22.103768458274494}
Sentence #502:
traeders traders
{'traders': -22.103768458274494}
Sentence #503:
Denmarc denmark
{'denmark': -22.103768458274494}
Sentence #504:
False
Sentence #505:
tradeis trades
{'traders': -22.103768458274494, 'trades': -22.103768458274494}
Sentence #506:
authorisatcions authorisatio

Sentence #591:
Gonia ionic
{'monic': -22.103768458274494, 'conic': -22.103768458274494, 'donna': -22.103768458274494, 'fonta': -22.103768458274494, 'tonic': -22.103768458274494, 'bona': -22.103768458274494, 'konga': -22.103768458274494, 'goria': -22.103768458274494, 'tonio': -22.103768458274494, 'monica': -22.103768458274494, 'nomia': -22.103768458274494, 'mania': -22.103768458274494, 'toni': -22.103768458274494, 'doria': -22.103768458274494, 'xenia': -22.103768458274494, 'sonic': -22.103768458274494, 'ionic': -22.103768458274494}
Sentence #592:
billiono billions
{'billion': -22.103768458274494, 'billions': -22.103768458274494}
Sentence #593:
prrssuees pressures
{'pressures': -22.103768458274494}
Sentence #594:
yesterdy yesterday
{'yesterday': -22.103768458274494}
Sentence #595:
liscdosed disclosed
{'disclosed': -22.103768458274494}
Sentence #596:
profyt profit
{'profet': -22.103768458274494, 'profit': -22.103768458274494}
Sentence #597:
divdend dividend
{'dividend': -22.10376845827449

Sentence #669:
marnetikg marketing
{'marketing': -22.103768458274494}
Sentence #670:
Ivery very
{'avery': -22.103768458274494, 'every': -22.103768458274494, 'very': -22.103768458274494}
Sentence #671:
meetingn meetings
{'meeting': -22.103768458274494, 'meetings': -22.103768458274494}
Sentence #672:
sufply supply
{'supply': -22.103768458274494}
Sentence #673:
Honduas honduras
{'honduras': -22.103768458274494}
Sentence #674:
maximm maximum
{'maxim': -22.103768458274494, 'maximum': -22.103768458274494}
Sentence #675:
proten proton
{'protein': -22.103768458274494, 'proven': -22.103768458274494, 'proton': -22.103768458274494}
Sentence #676:
inclde include
{'include': -22.103768458274494}
Sentence #677:
laydsya laydays
{'laydays': -22.103768458274494}
Sentence #678:
Offsre Offsre
{'Offsre': -22.103768458274494}
Sentence #679:
decisuon decision
{'decision': -22.103768458274494}
Sentence #680:
repiorted reported
{'reported': -22.103768458274494}
Sentence #681:
becausel because
{'because': -22.

Sentence #748:
Canadivan canadian
{'canadian': -22.103768458274494}
Sentence #749:
tadoy toy
{'badly': -22.103768458274494, 'taffy': -22.103768458274494, 'maroy': -22.103768458274494, 'tahoe': -22.103768458274494, 'taboo': -22.103768458274494, 'ado': -22.103768458274494, 'sadly': -22.103768458274494, 'lady': -22.103768458274494, 'tawny': -22.103768458274494, 'tanny': -22.103768458274494, 'tansy': -22.103768458274494, 'today': -22.103768458274494, 'tangy': -22.103768458274494, 'teddy': -22.103768458274494, 'tudor': -22.103768458274494, 'tawdry': -22.103768458274494, 'cady': -22.103768458274494, 'tao': -22.103768458274494, 'tasty': -22.103768458274494, 'tarry': -22.103768458274494, 'daddy': -22.103768458274494, 'tardy': -22.103768458274494, 'troy': -22.103768458274494, 'taos': -22.103768458274494, 'savoy': -22.103768458274494, 'tidy': -22.103768458274494, 'madly': -22.103768458274494, 'tally': -22.103768458274494, 'caddy': -22.103768458274494, 'talky': -22.103768458274494, 'toy': -22.103

Sentence #836:
Insitute institute
{'institute': -22.103768458274494}
Sentence #837:
Natonal national
{'tonal': -22.103768458274494, 'rational': -22.103768458274494, 'national': -22.103768458274494}
Sentence #838:
statementa statement
{'statements': -22.103768458274494, 'statement': -22.103768458274494}
Sentence #839:
statemant statement
{'statement': -22.103768458274494}
Sentence #840:
repeytedla repeatedly
{'repeatedly': -22.103768458274494}
Sentence #841:
ananounced announced
{'announced': -22.103768458274494}
Sentence #842:
avirted averted
{'averted': -22.103768458274494}
Sentence #843:
histodric historic
{'historic': -22.103768458274494}
Sentence #844:
False
Sentence #845:
conetnts contents
{'contents': -22.103768458274494}
Sentence #846:
False
Sentence #847:
False
Sentence #848:
prirce price
{'prince': -22.103768458274494, 'price': -22.103768458274494}
Sentence #849:
cmmittee committee
{'committee': -22.103768458274494}
Sentence #850:
eliminati eliminate
{'eliminate': -22.10376845

Sentence #914:
advnance advance
{'advance': -22.103768458274494}
occupiede occupied
{'occupied': -22.103768458274494}
nraIians iranians
{'arabians': -22.103768458274494, 'iranians': -22.103768458274494}
Sentence #915:
wouded worded
{'wounded': -22.103768458274494, 'wooded': -22.103768458274494, 'worded': -22.103768458274494}
theirf their
{'theirs': -22.103768458274494, 'their': -22.103768458274494}
Sentence #916:
seuthwestorn southwestern
{'southwestern': -22.103768458274494}
druing drying
{'during': -22.103768458274494, 'drying': -22.103768458274494}
Sentence #917:
denmied denied
{'denied': -22.103768458274494}
reporn reborn
{'report': -22.103768458274494, 'reborn': -22.103768458274494}
Sentence #918:
vanal canal
{'banal': -22.103768458274494, 'canal': -22.103768458274494}
tornhern northern
{'northern': -22.103768458274494}
Sentence #919:
Iriqi iraqi
{'iraqi': -22.103768458274494}
unitsx units
{'units': -22.103768458274494}
attmpting attempting
{'attempting': -22.103768458274494}
Sent

Sentence #944:
Anlysts analysts
{'enlists': -22.103768458274494, 'analysts': -22.103768458274494}
vulnerabel vulnerable
{'vulnerable': -22.103768458274494}
atcatk attack
{'attack': -22.103768458274494}
Sentence #945:
adted added
{'acted': -22.103768458274494, 'dated': -22.103768458274494, 'added': -22.103768458274494}
servilce service
{'servile': -22.103768458274494, 'service': -22.103768458274494}
Sentence #946:
eoncerncd concerned
{'concerned': -22.103768458274494}
Sentence #947:
stootd stood
{'stood': -22.103768458274494}
styategr strategy
{'strategy': -22.103768458274494}
boginning beginning
{'beginning': -22.103768458274494}
Sentence #948:
emphasizin emphasizing
{'emphasizing': -22.103768458274494}
cganginh changing
{'changing': -22.103768458274494}
Allegits allegis
{'allegis': -22.103768458274494}
Sentence #949:
takeevor takeover
{'takeover': -22.103768458274494}
eslacated escalated
{'escalated': -22.103768458274494}
valies values
{'yalies': -22.103768458274494, 'varies': -22.103

Sentence #985:
satisfaed satisfied
{'satisfied': -22.103768458274494}
Japanesi japanese
{'japanese': -22.103768458274494}
Wilsonu wilson
{'wilson': -22.103768458274494}
Sentence #986:
Mrnistei Mrnistei
{'Mrnistei': -22.103768458274494}
Baelladur balladur
{'balladur': -22.103768458274494}
Sentence #987:
censral central
{'central': -22.103768458274494}
FRace pace
{'grace': -22.103768458274494, 'trace': -22.103768458274494, 'lace': -22.103768458274494, 'face': -22.103768458274494, 'ace': -22.103768458274494, 'brace': -22.103768458274494, 'space': -22.103768458274494, 'peace': -22.103768458274494, 'place': -22.103768458274494, 'race': -22.103768458274494, 'pace': -22.103768458274494}
returnin returning
{'returning': -22.103768458274494}
Sentence #988:
arounda around
{'around': -22.103768458274494}
Sentence #989:
Thare share
{'hare': -22.103768458274494, 'share': -22.103768458274494}
whwose whose
{'whose': -22.103768458274494}
Chsirtian Chsirtian
{'Chsirtian': -22.103768458274494}
Sentence 