## Process the 2018 Scrabble dictionary:

- read in 2016 and 2018 dictionaries
- do new 2's 3's JXQZ lists
- possible enhancements:
- make into HTML tables 
- format the same as the [cheat sheet](https://www.cross-tables.com/download/CHEAT_PRO_RED_2014.pdf)
- 2's to 3's with bolds for new ones
- JXQZ with bolds for new ones
- make into a little app for the web to do challenges
- scrape the pixiepit games for most common words

Here's a nice little reference code [source](http://sdsawtelle.github.io/blog/output/scrabble-cheatsheet-with-python.html) for some of the harder tasks.

In [1]:
# import libraries
import re
from collections import defaultdict

In [2]:
file18     = 'twl2018.txt'
file18_new = 'twl2018-new.txt'
with open(file18) as f:
    twl18 = f.read().splitlines()
with open(file18_new) as f:
    twl18_new = f.read().splitlines()
    

In [3]:
len(twl18_new)

3385

In [35]:
# construct list of words in twl18 not in twl16 from those lists and compare to twl18_new
set18 = set(twl18)
set18_new = set(twl18_new)
# construct with HTML tags to make the new words red
def remove_html_color(x):
    import re
    return(re.sub("<.*?>", "",x))

def add_html_color(x,col='red'):
    return('<font color="'+col+'">'+x+'</font>')

set18_html = set18.difference(set18_new)
for item in set18_new:
    set18_html.add(add_html_color(item,col='red'))

set18_html
# if you need to remove the html marks, here it is:
# for item in set18_html:
#     print(remove_html_color(item))

{'hutzpas',
 '<font color="green">garburator</font>',
 'scandalizes',
 'clubheads',
 'potty',
 'contemporaneity',
 'monatomic',
 'glycosuria',
 'transpirations',
 'suiters',
 'vacations',
 'informatics',
 'profaneness',
 'brokings',
 'ranker',
 'overplot',
 'attornments',
 'momenta',
 'ovoidal',
 'whetstones',
 'aortal',
 'gladstone',
 'caddie',
 'letterings',
 'detailednesses',
 'lucubrate',
 'toyshop',
 'globulin',
 'trackless',
 'presterilizes',
 'cadaverines',
 'superscriptions',
 'unbosoming',
 'courses',
 'supersaturate',
 'eased',
 'wontednesses',
 'comorbidities',
 'acquiescing',
 'goodbye',
 'anthropophagus',
 'snufflier',
 'tercet',
 'perigean',
 'unpotted',
 'misadjust',
 'ceaselessnesses',
 'expounding',
 'wispy',
 'apartness',
 'misconnected',
 'berkeliums',
 'transmogrifying',
 'barkeepers',
 'willywaw',
 'adoringly',
 'unessential',
 'purposively',
 'naturally',
 'stompiest',
 'mittimus',
 'immovables',
 'imbalming',
 'narrater',
 'lampyrids',
 'misname',
 'dimidiated',


In [5]:
# These are our constructions to cross-check with the supplied list
# set18_or_16 = set18.union(set16)
# set18_and_16 = set18.intersection(set16)
# set18_xor_16 = set18.symmetric_difference(set16)
# set18_only = set18.difference(set16)
# set16_only = set16.difference(set18)
# #
# should_be_null = set18_only.difference(set18_new)

### Now work on word lists: 2's 3's JXQZ


In [6]:
new4s = set()
new3s = set()
new2s = set()
newJXQZs = set()
for item in set18_new:
    if (len(item)==4):
        new4s.add(item)
    elif (len(item)==3):
        new3s.add(item)
    elif (len(item)==2):
        new2s.add(item)
    if (re.search(r'[jxqz]',item)!=None):
        newJXQZs.add(item)

# for item in set18_only:
#     if 'j' in item:
#         print(item)

In [7]:
print('New 2s:',*sorted(new2s))
print('New 3s:',*sorted(new3s))
print('New 4s:',*sorted(new4s))
print('New JXQZs',*sorted(sorted(newJXQZs),key=len),sep='\n')

New 2s: ew ok
New 3s: zen
New 4s: noni owie vape yuke
New JXQZs
zen
ajies
emoji
exome
judgy
qapik
yowza
ansatz
bizjet
cotija
emojis
exomes
judgey
nutjob
qapiks
bizjets
grawlix
grizzes
judgier
nutjobs
quorate
zomboid
ansatzes
aquafaba
exoneree
judgiest
unglitzy
unsexily
anonymize
beatboxer
exonerees
gochujang
grawlixes
inquorate
jackalope
liquidise
nontoxics
reliquify
rezonings
therapize
uncrazier
volumizer
adjudgment
anonymized
anonymizer
anonymizes
asphyxiant
azeotropic
bamboozler
bartizaned
beatboxers
bequeather
bijuralism
casualized
cetirizine
chaptalize
craquelure
dazzlement
duplexings
expendably
expertised
explodable
frowziness
geotextile
gochujangs
hexahedral
hexametric
hexavalent
hypoxaemia
intersquad
interzones
iodization
jackalopes
jouissance
lansquenet
liquidised
liquidises
manzanilla
monologize
nominalize
objectival
omnisexual
overadjust
panegyrize
panzerotti
panzerotto
paroxytone
patronizer
pejoration
phytotoxin
pixelation
pizzicatos
quadruplex
quangocrat
quantitive
quarter

#### Page one of the cheat sheet (2's, 3's, short J Q X Z words)

In [8]:
'tisanes' in set18
word1 = 'tisanes'
newstr = word1.replace('s', '')
newstr

'tiane'

In [9]:
# find all anagrams of TISANER
test_word = 'tisaner'

def is_anagram(word1,word2):
    if len(word1)!=len(word2):
        return False
    else:
        for item in word1:
            if item not in word2:
                return False
            else:
                word2 =word2.replace(item,'',1)
    return True

def find_anagrams(word, lexicon_list):
    anagrams = [item for item in lexicon_list if is_anagram(item,word)]
    return anagrams
    

word1 = 'tisanen'
word2 = 'tisanen'
is_anagram(word1,word2)
find_anagrams('tisaner',twl18)

# that's brute force.  Much better is to preprocess into alphagrams then make that into a dict whose
# values are keyed to the alphagram
# a default dict might be really helpful here. 

['anestri',
 'antsier',
 'nastier',
 'ratines',
 'retains',
 'retinas',
 'retsina',
 'stainer',
 'stearin']

In [10]:
# from 'http://oldtownscrabble.com/05/12/bingo-stems-study-grids/'
top_stems = ['TISANE',
 'SATIRE',
 'RETAIN',
 'ARSINE',
 'SENIOR',
 'STERNA–',
 'TOE-SIN',
 '–ORATES–',
 'REASON',
 'INSERT',
 'TONERS–',
 '–EASTER',
 'AIDERS–',
 'RAINED',
 'LESION',
 'TORIES',
 'TOILES–',
 'SERIAL',
 'NAILER',
 '–ALIENS–',
 'I-ON-SEA',
 'SALTIE',
 'RETAIL',
 'RETIES–',
 'ENTIRE',
 'SAD-TIE',
 '–ATONES–',
 'SOLATE',
 'OR-ALES',
 'I-SAT-ON',
 'SAINED',
 'DORIES',
 'GAINER',
 'LISTER',
 'ENTERS–',
 '–SALTER',
 '–TIRADE',
 'SILENT',
 'DIALER',
 'LADIES',
 'DATERS–',
 'SNIDER–',
 'ADORES–',
 'SERINE–',
 'SEEN-IT',
 'LATENS–',
 'STANED–',
 'TENAIL',
 'TEINDS–',
 'DESIRE',
 '–SENATE',
 '–LEARNS–',
 'SENITI',
 'ORIENT',
 '–TRAINS–',
 'STRIDE',
 'INTROS–',
 'SANDER',
 'OATIES*',
 'STORED–',
 'TODIES',
 'OILERS–',
 'ANI-TOE',
 'STEROL',
 'TENIAE–',
 'TEARIE*',
 'NEROLI–',
 'RANEES–',
 'SOIGNE',
 'ARIOSE',
 'DETAIN',
 'SINGER–',
 'EASIER',
 'EOLIAN',
 'RESALE',
 '–UNITER–',
 '–SOILED',
 'STEREO–',
 'LINERS–',
 'ALE-ROT',
 'ENTOIL',
 'GARNET',
 'EASING',
 'OIL-SAT',
 'TINIER',
 'ATE-SOD',
 'SNORED–',
 'ELITES–',
 'INMATE',
 '–NEATER',
 'LANOSE',
 'GREATS–',
 'ATTIRE',
 'TRONAS–',
 'TUNERS–',
 '–PRAISE',
 'DOE-RAN',
 'SUITER–',
 'OALIES*',
 'ALDERS–']


In [11]:
top_stems

['TISANE',
 'SATIRE',
 'RETAIN',
 'ARSINE',
 'SENIOR',
 'STERNA–',
 'TOE-SIN',
 '–ORATES–',
 'REASON',
 'INSERT',
 'TONERS–',
 '–EASTER',
 'AIDERS–',
 'RAINED',
 'LESION',
 'TORIES',
 'TOILES–',
 'SERIAL',
 'NAILER',
 '–ALIENS–',
 'I-ON-SEA',
 'SALTIE',
 'RETAIL',
 'RETIES–',
 'ENTIRE',
 'SAD-TIE',
 '–ATONES–',
 'SOLATE',
 'OR-ALES',
 'I-SAT-ON',
 'SAINED',
 'DORIES',
 'GAINER',
 'LISTER',
 'ENTERS–',
 '–SALTER',
 '–TIRADE',
 'SILENT',
 'DIALER',
 'LADIES',
 'DATERS–',
 'SNIDER–',
 'ADORES–',
 'SERINE–',
 'SEEN-IT',
 'LATENS–',
 'STANED–',
 'TENAIL',
 'TEINDS–',
 'DESIRE',
 '–SENATE',
 '–LEARNS–',
 'SENITI',
 'ORIENT',
 '–TRAINS–',
 'STRIDE',
 'INTROS–',
 'SANDER',
 'OATIES*',
 'STORED–',
 'TODIES',
 'OILERS–',
 'ANI-TOE',
 'STEROL',
 'TENIAE–',
 'TEARIE*',
 'NEROLI–',
 'RANEES–',
 'SOIGNE',
 'ARIOSE',
 'DETAIN',
 'SINGER–',
 'EASIER',
 'EOLIAN',
 'RESALE',
 '–UNITER–',
 '–SOILED',
 'STEREO–',
 'LINERS–',
 'ALE-ROT',
 'ENTOIL',
 'GARNET',
 'EASING',
 'OIL-SAT',
 'TINIER',
 'ATE-SOD',
 

In [12]:
# pre-process lexicon to alphagram every word and map into a dict of all alphagrams

def to_alphagram(word):
    return ''.join(sorted(word))

alpha_lexicon = defaultdict(list)

for item in twl18:
    k = to_alphagram(item)
    alpha_lexicon[k].append(item)

    



In [13]:
word1 = 'abhorred'
print(sorted(alpha_lexicon[to_alphagram(word1)]))
word2 = 'aaa'
print(sorted(alpha_lexicon[to_alphagram(word2)]))


['abhorred', 'harbored']
[]


### The HTML basics of starting the Cheat Sheet

Basic plan:
- write out the HTML to a file
- attach the red font labels to the new 2018 words. 
- attach the red font labels to two-to-make-threes letters (how?)


<table style="font-family:Courier, monospace; width:100%">
  <tr>
    <th>FIRSTNAME</th>
    <th>LASTNAME</th>
    <th>AGE</th>
  </tr>
  <tr>
    <td>Jill</td>
    <td>Smith</td>
    <td>50</td>
  </tr>
  <tr>
    <td>Eve</td>
    <td><font color="red">Jackson</font></td>
    <td>94</td>
  </tr>
</table> 
    


In [16]:
# trying out just attaching the red to the new words as part of the original list;
#
import HTML
l = ['Jill', 'Smith', 'Eve', '<font color="red">Jackson</font>']

In [19]:
table_data = [
        ['Smith',       'John',         30],
        ['Carpenter',   'Jack',         47],
        ['Johnson',     '<font color="red">Jackson</font>',         62],
    ]

htmlcode = HTML.table(table_data,
                      attribs={'style': 'font-family:Courier, monospace'},
                      border = '0',
                      col_align=['center', 'center', 'center'],
                      cellpadding = '1',
                      header_row=['Last name',   '',   ''])
print(htmlcode)

<TABLE style="border-collapse: collapse; font-family:Courier, monospace" border="0" cellpadding="1">
 <TR>
  <TH>Last name</TH>
  <TH>&nbsp;</TH>
  <TH>&nbsp;</TH>
 </TR>
 <TR>
  <TD align="center">Smith</TD>
  <TD align="center">John</TD>
  <TD align="center">30</TD>
 </TR>
 <TR>
  <TD align="center">Carpenter</TD>
  <TD align="center">Jack</TD>
  <TD align="center">47</TD>
 </TR>
 <TR>
  <TD align="center">Johnson</TD>
  <TD align="center"><font color="red">Jackson</font></TD>
  <TD align="center">62</TD>
 </TR>
</TABLE>


<TABLE style="border: 1px solid #000000; border-collapse: collapse; font-family:Courier, monospace" border="1" cellpadding="4">
 <TR>
  <TH>Last name</TH>
  <TH>First name</TH>
  <TH>Age</TH>
 </TR>
 <TR>
  <TD align="center">Smith</TD>
  <TD align="center">John</TD>
  <TD align="center">30</TD>
 </TR>
 <TR>
  <TD align="center">Carpenter</TD>
  <TD align="center">Jack</TD>
  <TD align="center">47</TD>
 </TR>
 <TR>
  <TD align="center">Johnson</TD>
  <TD align="center"><font color="red">Jackson</font></TD>
  <TD align="center">62</TD>
 </TR>
</TABLE>
