Using spacy English large model:

python -m spacy download en_core_web_lg


In [9]:
import spacy
nlp = spacy.load("en_core_web_lg")

In [10]:
w1 = "red"
w2 = "blue"

w1 = nlp.vocab[w1]
w2 = nlp.vocab[w2]
w1.similarity(w2)

0.8438411951065063

In [11]:
w1 = "labor"
w2 = "lorem"

w1 = nlp.vocab[w1]
w2 = nlp.vocab[w2]
w1.similarity(w2)

-0.07253860682249069

In [12]:
w1 = "the"
w2 = "a"

w1 = nlp.vocab[w1]
w2 = nlp.vocab[w2]
w1.similarity(w2)

0.5925824642181396

In [13]:
w1 = "."
w2 = "?"

w1 = nlp.vocab[w1]
w2 = nlp.vocab[w2]
w1.similarity(w2)

0.5152676105499268

In [14]:
s1 = "packages,[3]["
s2 = "parcel"

s1 = nlp(s1)
s2 = nlp(s2)
s1.similarity(s2)

0.010776557959616184

In [15]:
s1 = "Sigmund Freud"
s2 = "Psychology"

s1 = nlp(s1)
s2 = nlp(s2)
s1.similarity(s2)

0.3366757333278656

In [121]:
s1 = "Lorem ipsum"
s2 = "Placeholder"

s1 = nlp(s1)
s2 = nlp(s2)
s1.similarity(s2)

0.3752337098121643

Some words are too different. If all things are negative in evaluation, we will use text similarity instead with rapidfuzz library:

pip install rapidfuzz



In [17]:
from rapidfuzz import fuzz

In [18]:
w1 = "labor"
w2 = "lorem"

print(fuzz.ratio(w1, w2) / 100)

0.6


In [19]:
w1 = "Sigmund Freud"
w2 = "Psychology"

print(fuzz.ratio(w1, w2) / 100)

0.08695652173913049


## Designing a scrambling and unscrambling procedure

In [20]:
maintext = '''
Lorem ipsum (/ˌlɔː.rəm ˈɪp.səm/ LOR-əm IP-səm) is a dummy or placeholder text commonly used in graphic design, publishing, and web development. Its purpose is to permit a page layout to be designed, independently of the copy that will subsequently populate it, or to demonstrate various fonts of a typeface without meaningful text that could be distracting. \n

Lorem ipsum is typically a corrupted version of De finibus bonorum et malorum, a 1st-century BC text by the Roman statesman and philosopher Cicero, with words altered, added, and removed to make it nonsensical and improper Latin. The first two words themselves are a truncation of dolorem ipsum ("pain itself"). \n

Versions of the Lorem ipsum text have been used in typesetting at least since the 1960s, when it was popularized by advertisements for Letraset transfer sheets.[1] Lorem ipsum was introduced to the digital world in the mid-1980s, when Aldus employed it in graphic and word-processing templates for its desktop publishing program PageMaker. Other popular word processors, including Pages and Microsoft Word, have since adopted Lorem ipsum,[2] as have many LaTeX packages,[3][4][5] web content managers such as Joomla! and WordPress, and CSS libraries such as Semantic UI.
'''

header = '''
A common form of Lorem ipsum reads:
'''

headertext = '''
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
'''

captions = ["Using Lorem ipsum to focus attention on graphic elements in a webpage design proposal"]

In [21]:
# TOKENIZATION: breaking down text into individual words
# Process the text using spaCy
doc = nlp(maintext)

# Extract words (tokens)
words = [token.text for token in doc]

# Extract lemmas
lemmas = [token.lemma_ for token in doc]

# Extract part-of-speech tags: https://stackoverflow.com/questions/40288323/what-do-spacys-part-of-speech-and-dependency-tags-mean
tags = [token.pos_ for token in doc]

# Print the results
print("Words:", words)
print("Lemmas:", lemmas)
print("Tags:", tags)

Words: ['\n', 'Lorem', 'ipsum', '(', '/ˌlɔː.rəm', 'ˈɪp.səm/', 'LOR', '-', 'əm', 'IP', '-', 'səm', ')', 'is', 'a', 'dummy', 'or', 'placeholder', 'text', 'commonly', 'used', 'in', 'graphic', 'design', ',', 'publishing', ',', 'and', 'web', 'development', '.', 'Its', 'purpose', 'is', 'to', 'permit', 'a', 'page', 'layout', 'to', 'be', 'designed', ',', 'independently', 'of', 'the', 'copy', 'that', 'will', 'subsequently', 'populate', 'it', ',', 'or', 'to', 'demonstrate', 'various', 'fonts', 'of', 'a', 'typeface', 'without', 'meaningful', 'text', 'that', 'could', 'be', 'distracting', '.', '\n\n\n    ', 'Lorem', 'ipsum', 'is', 'typically', 'a', 'corrupted', 'version', 'of', 'De', 'finibus', 'bonorum', 'et', 'malorum', ',', 'a', '1st', '-', 'century', 'BC', 'text', 'by', 'the', 'Roman', 'statesman', 'and', 'philosopher', 'Cicero', ',', 'with', 'words', 'altered', ',', 'added', ',', 'and', 'removed', 'to', 'make', 'it', 'nonsensical', 'and', 'improper', 'Latin', '.', 'The', 'first', 'two', 'words

In [22]:
for token in doc:
    print(token.text + " | " + token.lemma_ + " | " + token.pos_)


 | 
 | SPACE
Lorem | Lorem | PROPN
ipsum | ipsum | NOUN
( | ( | PUNCT
/ˌlɔː.rəm | /ˌlɔː.rəm | PUNCT
ˈɪp.səm/ | ˈɪp.səm/ | DET
LOR | LOR | PROPN
- | - | PUNCT
əm | əm | PROPN
IP | IP | PROPN
- | - | PUNCT
səm | səm | NOUN
) | ) | PUNCT
is | be | AUX
a | a | DET
dummy | dummy | ADJ
or | or | CCONJ
placeholder | placeholder | NOUN
text | text | NOUN
commonly | commonly | ADV
used | use | VERB
in | in | ADP
graphic | graphic | ADJ
design | design | NOUN
, | , | PUNCT
publishing | publishing | NOUN
, | , | PUNCT
and | and | CCONJ
web | web | NOUN
development | development | NOUN
. | . | PUNCT
Its | its | PRON
purpose | purpose | NOUN
is | be | AUX
to | to | PART
permit | permit | VERB
a | a | DET
page | page | NOUN
layout | layout | NOUN
to | to | PART
be | be | AUX
designed | design | VERB
, | , | PUNCT
independently | independently | ADV
of | of | ADP
the | the | DET
copy | copy | NOUN
that | that | PRON
will | will | AUX
subsequently | subsequently | ADV
populate | populate | VERB
it | 

In [23]:
# Convert tokens back into text
temp = ""

for token in doc:
    if token.pos_ == "SPACE" or token.pos_ == "PUNCT":
        temp += token.text

    else:
        temp += " " + token.text

print(temp)


 Lorem ipsum(/ˌlɔː.rəm ˈɪp.səm/ LOR- əm IP- səm) is a dummy or placeholder text commonly used in graphic design, publishing, and web development. Its purpose is to permit a page layout to be designed, independently of the copy that will subsequently populate it, or to demonstrate various fonts of a typeface without meaningful text that could be distracting.


     Lorem ipsum is typically a corrupted version of De finibus bonorum et malorum, a 1st- century BC text by the Roman statesman and philosopher Cicero, with words altered, added, and removed to make it nonsensical and improper Latin. The first two words themselves are a truncation of dolorem ipsum(" pain itself").


     Versions of the Lorem ipsum text have been used in typesetting at least since the 1960s, when it was popularized by advertisements for Letraset transfer sheets.[1 ] Lorem ipsum was introduced to the digital world in the mid-1980s, when Aldus employed it in graphic and word- processing templates for its desktop 

Current Plan:
- Take the tokenized version and convert into a dictionary mapping each word to its scrambled version
- For each guess the user makes, unscramble the closest words semantically to it (and update the dictionary)
- Reconstruct the scrambled text by iterating over the tokens and referencing the dictionary
- This dictionary can then represent the user's "state" of unscrambling the text
- We can also precompute the dictionary of all scrambled text so that all users get the same starting scrambled text

Small Details:
- We could use the set of all letters that appear in the article as the list of possible random letters for scrambling the text

In [120]:
game_state = {}
punct_thresh = 2

for token in doc:
    if token.pos_ == "SPACE":
        pass
    
    elif token.pos_ == "PUNCT":
        if len(token.text) > 2:
            game_state[token.text]= token.text
        

    else:
        game_state[token.text] = token.text

game_state

{'Lorem': 'Lorem',
 'ipsum': 'ipsum',
 '/ˌlɔː.rəm': '/ˌlɔː.rəm',
 'ˈɪp.səm/': 'ˈɪp.səm/',
 'LOR': 'LOR',
 'əm': 'əm',
 'IP': 'IP',
 'səm': 'səm',
 'is': 'is',
 'a': 'a',
 'dummy': 'dummy',
 'or': 'or',
 'placeholder': 'placeholder',
 'text': 'text',
 'commonly': 'commonly',
 'used': 'used',
 'in': 'in',
 'graphic': 'graphic',
 'design': 'design',
 'publishing': 'publishing',
 'and': 'and',
 'web': 'web',
 'development': 'development',
 'Its': 'Its',
 'purpose': 'purpose',
 'to': 'to',
 'permit': 'permit',
 'page': 'page',
 'layout': 'layout',
 'be': 'be',
 'designed': 'designed',
 'independently': 'independently',
 'of': 'of',
 'the': 'the',
 'copy': 'copy',
 'that': 'that',
 'will': 'will',
 'subsequently': 'subsequently',
 'populate': 'populate',
 'it': 'it',
 'demonstrate': 'demonstrate',
 'various': 'various',
 'fonts': 'fonts',
 'typeface': 'typeface',
 'without': 'without',
 'meaningful': 'meaningful',
 'could': 'could',
 'distracting': 'distracting',
 'typically': 'typically',
 

In [60]:
letter_bag = set(maintext + header + headertext)
letter_exclude = {'\n', '\t', '\"', '\'', '.', ',', '(', ')', '[', ']', '{', '}', '\\', '/', ' ', '*', '!', '?', ':', ' '}
letter_bag = letter_bag.difference(letter_exclude)
letter_bag = list(letter_bag)
letter_bag

['v',
 'ə',
 'd',
 'x',
 'r',
 'm',
 'i',
 '3',
 'ˈ',
 'f',
 'n',
 'J',
 'e',
 '6',
 'y',
 'T',
 'P',
 'q',
 'B',
 '1',
 'D',
 '2',
 'o',
 'k',
 'A',
 'M',
 's',
 'p',
 'C',
 'ː',
 '9',
 '8',
 'l',
 'g',
 '0',
 'O',
 'R',
 'ɪ',
 'S',
 't',
 'X',
 '4',
 'W',
 'E',
 'c',
 'h',
 'ˌ',
 'ɔ',
 'u',
 'a',
 '5',
 'w',
 '-',
 'V',
 'I',
 'z',
 'b',
 'L',
 'U']

In [117]:
import random
# INITIAL RANDOMIZED STATE
def init_random(game_state: dict, letter_bag: list):
    for key in game_state:
        temp = ""
        for i in range(len(key)):
            temp += random.choice(letter_bag)
        game_state[key] = temp

# RENDER GAME STATE AS ARTICLE
def render_state(doc: spacy.tokens.doc.Doc, game_state: dict):
    text = ""

    for token in doc:
        if (token.pos_ == "SPACE" and len(token.pos_) <= 2) or (token.pos_ == "PUNCT"):
            text += token.text

        else:
            if token.text in game_state:
                text += " " + game_state[token.text]
            else:
                text += " " + token.text
        

    return text

# UPDATE A GAME STATE BASED ON GUESS AND RETURN TITLE SIMILARITY
def guess_update(game_state: dict, guess: str, title: str):
    guess_spacy = nlp(guess)
    title_spacy = nlp(title)
    title_sim = guess_spacy.similarity(title_spacy)

    thresh_full = 0.55 - title_sim*1.2
    thresh_partial = 0.40 - title_sim*1.2

    print("thresh_full: " + str(thresh_full) + ", thresh_partial: " + str(thresh_partial))

    for key in game_state:
        key_spacy = nlp(key)
        sim = guess_spacy.similarity(key_spacy)

        if sim >= thresh_full:
            print("Full Unscrambling " + key)
            game_state[key] = key

        elif sim >= thresh_partial:
            print("Partial Unscrambling " + key)
            targets = random.sample(range(len(key)), int(len(key)/2))
            for i in targets:
                game_state[key] = game_state[key][:i] + key[i] + game_state[key][i+1:]
    
    return title_sim
    

        
    


In [34]:
game_state

{'Lorem': 'Lorem',
 'ipsum': 'ipsum',
 'ˈɪp.səm/': 'ˈɪp.səm/',
 'LOR': 'LOR',
 'əm': 'əm',
 'IP': 'IP',
 'səm': 'səm',
 'is': 'is',
 'a': 'a',
 'dummy': 'dummy',
 'or': 'or',
 'placeholder': 'placeholder',
 'text': 'text',
 'commonly': 'commonly',
 'used': 'used',
 'in': 'in',
 'graphic': 'graphic',
 'design': 'design',
 'publishing': 'publishing',
 'and': 'and',
 'web': 'web',
 'development': 'development',
 'Its': 'Its',
 'purpose': 'purpose',
 'to': 'to',
 'permit': 'permit',
 'page': 'page',
 'layout': 'layout',
 'be': 'be',
 'designed': 'designed',
 'independently': 'independently',
 'of': 'of',
 'the': 'the',
 'copy': 'copy',
 'that': 'that',
 'will': 'will',
 'subsequently': 'subsequently',
 'populate': 'populate',
 'it': 'it',
 'demonstrate': 'demonstrate',
 'various': 'various',
 'fonts': 'fonts',
 'typeface': 'typeface',
 'without': 'without',
 'meaningful': 'meaningful',
 'could': 'could',
 'distracting': 'distracting',
 'typically': 'typically',
 'corrupted': 'corrupted',
 

In [62]:
letter_bag

['v',
 'ə',
 'd',
 'x',
 'r',
 'm',
 'i',
 '3',
 'ˈ',
 'f',
 'n',
 'J',
 'e',
 '6',
 'y',
 'T',
 'P',
 'q',
 'B',
 '1',
 'D',
 '2',
 'o',
 'k',
 'A',
 'M',
 's',
 'p',
 'C',
 'ː',
 '9',
 '8',
 'l',
 'g',
 '0',
 'O',
 'R',
 'ɪ',
 'S',
 't',
 'X',
 '4',
 'W',
 'E',
 'c',
 'h',
 'ˌ',
 'ɔ',
 'u',
 'a',
 '5',
 'w',
 '-',
 'V',
 'I',
 'z',
 'b',
 'L',
 'U']

In [105]:
init_random(game_state, letter_bag)
game_state

{'Lorem': 'I5wvX',
 'ipsum': 'Rn0yf',
 'ˈɪp.səm/': 'l6Ut6zxS',
 'LOR': 'hdM',
 'əm': 'R8',
 'IP': 'L3',
 'səm': 'L55',
 'is': 'Rq',
 'a': 'm',
 'dummy': 'IvJRə',
 'or': 'l8',
 'placeholder': 'tJnl3V50zA2',
 'text': 'OPz9',
 'commonly': 'pcRviBoo',
 'used': '9lTs',
 'in': 'nW',
 'graphic': 'hSw4-JX',
 'design': 'AcVvrU',
 'publishing': 'ElxnSrtX3D',
 'and': 'hWe',
 'web': 'k2B',
 'development': 'WAuvAˈ0ːfeu',
 'Its': 'Iqˌ',
 'purpose': 'ObˌIbr6',
 'to': '4D',
 'permit': 'k4rW02',
 'page': 'lS55',
 'layout': 'aVsTːO',
 'be': '9E',
 'designed': '4xsS9tiM',
 'independently': '0OvgbafksAmD-',
 'of': 'oP',
 'the': 'v8O',
 'copy': 'p4yd',
 'that': 'w3ˈf',
 'will': 'T-Ee',
 'subsequently': 'Omg12cCWfhO8',
 'populate': 'tvRcUɪWe',
 'it': 'A0',
 'demonstrate': '58MyWs48AXB',
 'various': 'VDhurPM',
 'fonts': '0VsmS',
 'typeface': 'woəCP61ɔ',
 'without': 'qPngˈːV',
 'meaningful': 'apJ-tkAɔMS',
 'could': 'WWAab',
 'distracting': 'W4ɔzetrRDvˈ',
 'typically': 'zXn5lakvk',
 'corrupted': '9yɪoh8ALq',
 

In [106]:
render_state(doc, game_state)

' \n I5wvX Rn0yf(/ˌlɔː.rəm l6Ut6zxS hdM- R8 L3- L55) Rq m IvJRə l8 tJnl3V50zA2 OPz9 pcRviBoo 9lTs nW hSw4-JX AcVvrU, ElxnSrtX3D, hWe k2B WAuvAˈ0ːfeu. Iqˌ ObˌIbr6 Rq 4D k4rW02 m lS55 aVsTːO 4D 9E 4xsS9tiM, 0OvgbafksAmD- oP v8O p4yd w3ˈf T-Ee Omg12cCWfhO8 tvRcUɪWe A0, l8 4D 58MyWs48AXB VDhurPM 0VsmS oP m woəCP61ɔ qPngˈːV apJ-tkAɔMS OPz9 w3ˈf WWAab 9E W4ɔzetrRDvˈ. \n\n\n     I5wvX Rn0yf Rq zXn5lakvk m 9yɪoh8ALq tIC63hm oP ɪi T84gˈuɪ ˌgMT3b9 əJ wOcWDTS, m OAm- T5MP0Lm SD OPz9 bh v8O psbEf XgyumWoɪg hWe yJoɪDiX9StV ti8o4n, 9MRq Jv1tc h4ə9ˌ9x, z8sEq, hWe tɔ-isai 4D cːoe A0 ɪAbˈtxˌAːd- hWe Tt3zRSoː -w3Ub. JuX niyf4 hTw Jv1tc -Byˌ-wms4n rts m Vfsdn8dvM6 oP 1ˌCdRg- Rn0yf(" BSUC Joc4S8"). \n\n\n     xio6tvrE oP v8O I5wvX Rn0yf OPz9 5Tl3 eSuP 9lTs nW Md352ɪyRtAz 9f duɪEB khLnˌ v8O 6Tmt0, lAgC A0 aoA VM69L-ːXtiɪ bh BSMBnb8ɪ69JRgV 8xE rT9g5WMy sTːSz5Bg VPyəy91gI m I5wvX Rn0yf aoA p08əɪpnJg5 4D v8O A5RvSTa ˈ8cx9 nW v8O 38geupoqˌ, lAgC ySUˈR 6Okskh3V A0 nW hSw4-JX hWe q2AW- vsːgˌːmhs4 -nɪVwW2R6 8xE v

In [109]:
guess_update(game_state, "internet", "Lorem Ipsum")

thresh_full: 0.5811834110878408, thresh_partial: 0.3811834110878408
Partial Unscrambling IP
Partial Unscrambling publishing
Full Unscrambling web
Partial Unscrambling without


  sim = guess_spacy.similarity(key_spacy)


Partial Unscrambling advertisements
Partial Unscrambling digital
Partial Unscrambling world
Partial Unscrambling popular
Partial Unscrambling Microsoft
Partial Unscrambling content
Partial Unscrambling WordPress


0.012544392608106136

In [110]:
render_state(doc, game_state)

' \n I5wvX Rn0yf(/ˌlɔː.rəm l6Ut6zxS hdM- R8 IP- L55) Rq m IvJRə l8 tJnl3V50zA2 OPz9 pcRviBoo 9lTs nW hSw4-JX AcVvrU, EublirtX3g, hWe web WAuvAˈ0ːfeu. Iqˌ ObˌIbr6 Rq 4D k4rW02 m lS55 aVsTːO 4D 9E 4xsS9tiM, 0OvgbafksAmD- oP v8O p4yd w3ˈf T-Ee Omg12cCWfhO8 tvRcUɪWe A0, l8 4D 58MyWs48AXB VDhurPM 0VsmS oP m woəCP61ɔ wPngouV apJ-tkAɔMS OPz9 w3ˈf WWAab 9E W4ɔzetrRDvˈ. \n\n\n     I5wvX Rn0yf Rq zXn5lakvk m 9yɪoh8ALq tIC63hm oP ɪi T84gˈuɪ ˌgMT3b9 əJ wOcWDTS, m OAm- T5MP0Lm SD OPz9 bh v8O psbEf XgyumWoɪg hWe yJoɪDiX9StV ti8o4n, 9MRq Jv1tc h4ə9ˌ9x, z8sEq, hWe tɔ-isai 4D cːoe A0 ɪAbˈtxˌAːd- hWe Tt3zRSoː -w3Ub. JuX niyf4 hTw Jv1tc -Byˌ-wms4n rts m Vfsdn8dvM6 oP 1ˌCdRg- Rn0yf(" BSUC Joc4S8"). \n\n\n     xio6tvrE oP v8O I5wvX Rn0yf OPz9 5Tl3 eSuP 9lTs nW Md352ɪyRtAz 9f duɪEB khLnˌ v8O 6Tmt0, lAgC A0 aoA VM69L-ːXtiɪ bh advBnt8s6mJngV 8xE rT9g5WMy sTːSz5Bg VPyəy91gI m I5wvX Rn0yf aoA p08əɪpnJg5 4D v8O d5gitTl ˈ8rl9 nW v8O 38geupoqˌ, lAgC ySUˈR 6Okskh3V A0 nW hSw4-JX hWe q2AW- vsːgˌːmhs4 -nɪVwW2R6 8xE v

In [114]:
guess_update(game_state, "Latin", "Lorem Ipsum")

thresh_full: 0.260327260196209, thresh_partial: 0.11032726019620898
Partial Unscrambling Lorem
Partial Unscrambling ipsum
Partial Unscrambling IP
Partial Unscrambling is
Partial Unscrambling a
Partial Unscrambling dummy
Partial Unscrambling or
Partial Unscrambling text
Partial Unscrambling commonly
Partial Unscrambling used
Partial Unscrambling in
Partial Unscrambling graphic
Partial Unscrambling publishing
Partial Unscrambling and
Partial Unscrambling web
Partial Unscrambling Its
Partial Unscrambling layout
Partial Unscrambling that
Partial Unscrambling populate
Partial Unscrambling it
Partial Unscrambling various
Partial Unscrambling fonts
Partial Unscrambling typeface
Partial Unscrambling without
Partial Unscrambling meaningful
Partial Unscrambling could
Partial Unscrambling corrupted
Partial Unscrambling version
Partial Unscrambling De
Partial Unscrambling century
Partial Unscrambling BC
Full Unscrambling Roman
Partial Unscrambling philosopher
Partial Unscrambling Cicero
Partial Un

  sim = guess_spacy.similarity(key_spacy)


Full Unscrambling Latin
Partial Unscrambling first
Partial Unscrambling two
Partial Unscrambling themselves
Partial Unscrambling are
Partial Unscrambling pain
Partial Unscrambling Versions
Partial Unscrambling have
Partial Unscrambling typesetting
Partial Unscrambling when
Partial Unscrambling popularized
Partial Unscrambling digital
Partial Unscrambling world
Full Unscrambling word
Partial Unscrambling templates
Partial Unscrambling its
Partial Unscrambling Other
Partial Unscrambling popular
Partial Unscrambling Pages
Partial Unscrambling Microsoft
Full Unscrambling Word
Partial Unscrambling as
Partial Unscrambling many
Full Unscrambling LaTeX
Partial Unscrambling content
Partial Unscrambling Joomla
Partial Unscrambling WordPress
Partial Unscrambling CSS


0.19311515986919403

In [115]:
render_state(doc, game_state)

' \n LoweX ip0ym(/ˌlɔː.rəm l6Ut6zxS hdM- R8 IP- L55) is m IumRy or tJnl3V50zA2 Oext coRvoBly uled iW gra4-Jc AcVvrU, EublistXng, ane web WAuvAˈ0ːfeu. Itˌ ObˌIbr6 is 4D k4rW02 m lS55 layout 4D 9E 4xsS9tiM, 0OvgbafksAmD- oP v8O p4yd t3ˈt T-Ee Omg12cCWfhO8 populɪte it, or 4D 58MyWs48AXB vDruous fonms oP m topef6ce winhouV aeaningɔMl Oext t3ˈt Wouad 9E W4ɔzetrRDvˈ. \n\n\n     LoweX ip0ym is zXn5lakvk m coɪrupALd vIrsion oP De T84gˈuɪ ˌgMT3b9 əJ wOcWDTS, m OAm- Tent0Ly SC Oext bh v8O Roman XgyumWoɪg ane yJoloiopher Cicoro, wiRh Jvrds h4ə9ˌ9x, z8sEq, ane tɔ-isai 4D cake it nAnsensicd- ane Tt3zRSoː Latin. JuX niyst tww Jvrds -heˌselv4s ats m Vfsdn8dvM6 oP 1ˌCdRg- ip0ym(" pSiC Joc4S8"). \n\n\n     ViosionE oP v8O LoweX ip0ym Oext hale eSuP uled iW MdpesettiAg 9f duɪEB khLnˌ v8O 6Tmt0, wAgn it aoA pMpul-rXzed bh advBnt8s6mJngV 8xE rT9g5WMy sTːSz5Bg VPyəy91gI m LoweX ip0ym aoA p08əɪpnJg5 4D v8O digitTl world iW v8O 38geupoqˌ, wAgn ySUˈR 6Okskh3V it iW gra4-Jc ane word- vsːgˌːmhs4 -emVlatR6 8xE v

In [119]:
guess_update(game_state, "Lorem", "Lorem Ipsum")

thresh_full: -0.5878515005111693, thresh_partial: -0.7378515005111693
Full Unscrambling Lorem
Full Unscrambling ipsum
Full Unscrambling ˈɪp.səm/
Full Unscrambling LOR
Full Unscrambling əm
Full Unscrambling IP
Full Unscrambling səm
Full Unscrambling is
Full Unscrambling a
Full Unscrambling dummy
Full Unscrambling or
Full Unscrambling placeholder
Full Unscrambling text
Full Unscrambling commonly
Full Unscrambling used
Full Unscrambling in
Full Unscrambling graphic
Full Unscrambling design
Full Unscrambling publishing
Full Unscrambling and
Full Unscrambling web
Full Unscrambling development
Full Unscrambling Its
Full Unscrambling purpose
Full Unscrambling to
Full Unscrambling permit
Full Unscrambling page
Full Unscrambling layout
Full Unscrambling be
Full Unscrambling designed
Full Unscrambling independently
Full Unscrambling of
Full Unscrambling the
Full Unscrambling copy
Full Unscrambling that
Full Unscrambling will
Full Unscrambling subsequently
Full Unscrambling populate
Full Unscramb

  sim = guess_spacy.similarity(key_spacy)


Full Unscrambling removed
Full Unscrambling make
Full Unscrambling nonsensical
Full Unscrambling improper
Full Unscrambling Latin
Full Unscrambling The
Full Unscrambling first
Full Unscrambling two
Full Unscrambling themselves
Full Unscrambling are
Full Unscrambling truncation
Full Unscrambling dolorem
Full Unscrambling pain
Full Unscrambling itself
Full Unscrambling Versions
Full Unscrambling have
Full Unscrambling been
Full Unscrambling typesetting
Full Unscrambling at
Full Unscrambling least
Full Unscrambling since
Full Unscrambling 1960s
Full Unscrambling when
Full Unscrambling was
Full Unscrambling popularized
Full Unscrambling advertisements
Full Unscrambling for
Full Unscrambling Letraset
Full Unscrambling transfer
Full Unscrambling sheets.[1
Full Unscrambling ]
Full Unscrambling introduced
Full Unscrambling digital
Full Unscrambling world
Full Unscrambling mid-1980s
Full Unscrambling Aldus
Full Unscrambling employed
Full Unscrambling word
Full Unscrambling processing
Full Unscr

0.9482095837593079

In [118]:
render_state(doc, game_state)

' \n Lorem ipsum(/ˌlɔː.rəm ˈɪp.səm/ LOR- əm IP- səm) is a dummy or placeholder text commonly used in graphic design, publishing, and web development. Its purpose is to permit a page layout to be designed, independently of the copy that will subsequently populate it, or to demonstrate various fonts of a typeface without meaningful text that could be distracting. \n\n\n     Lorem ipsum is typically a corrupted version of De finibus bonorum et malorum, a 1st- century BC text by the Roman statesman and philosopher Cicero, with words altered, added, and removed to make it nonsensical and improper Latin. The first two words themselves are a truncation of dolorem ipsum(" pain itself"). \n\n\n     Versions of the Lorem ipsum text have been used in typesetting at least since the 1960s, when it was popularized by advertisements for Letraset transfer sheets.[1 ] Lorem ipsum was introduced to the digital world in the mid-1980s, when Aldus employed it in graphic and word- processing templates for i