# **Bananagrams**
## [Riddler Classic, Mar 1, 2019](https://fivethirtyeight.com/features/youre-home-alone-you-can-buy-zillions-of-video-game-cigarettes-or-beat-yourself-at-bananagrams/)

### solution by [Laurent Lessard](https://laurentlessard.com)

---

## Some preliminaries

Here I tabulate the tile distribution and collect the list of admissible words. I used the [ENABLE word list](https://norvig.com/ngrams/enable1.txt), as instructed in the problem statement.

In [1]:
import numpy as np
import pandas as pd
from gurobipy import *

In [2]:
# list of all tiles in Bananagrams
tiles = 2*['J','K','Q','X','Z'] + 3*['B','C','F','H','M','P','V','W','Y'] + 4*['G'] + 5*['L'] + \
        6*['D','S','U'] + 8*['N'] + 9*['T','R'] + 11*['O'] + 12*['I'] + 13*['A'] + 18*['E']

# create array of letter frequency counts
alphabet = 'abcdefghijklmnopqrstuvwxyz'
L = len(alphabet)
def letterhist(word):
    return np.array([word.count(s) for s in alphabet])

lettercounts = letterhist(''.join(tiles).lower())

# assemble word list and sort it by word length. There are N words total
f = open("enable1.txt","r")
wordlist = sorted( [ w[:-1] for w in f ], key=len, reverse=True )
N = len(wordlist)
print('The word list contains', N, 'words.')

# create Nx26 array of letter frequencies
A = np.array( [letterhist(w) for w in wordlist] )

The word list contains 172820 words.


## First problem: Use fewest possible words

I solved the problem by modeling it as a mixed-integer program (it's a variant of a knapsack problem). Each word contributes to the total tally of letters, and we are seeking a combination of words that hits our letter quotas so that we use up all the letters. The catch is that the words must overlap, so some letters should get double-counted. The number of overlaps must be at least (number of words minus one), and no letter can be overlapped more than twice its total tally.

In [4]:
# Solve integer program using Gurobi
# using a possibly smaller word list. Setting M = N will use all words.
# Setting M < N will use only the M longest words (faster if M is small!)
M = N

m = Model("bananagrams")

# Create variables (how many of each word to use). Reusing words is allowed
w = m.addVars(N, vtype=GRB.INTEGER, lb=0, name="w")

# total number of words used and number of each letter used
words_used = quicksum(w[i] for i in range(M))
letters_used = [ quicksum(A[i,j]*w[i] for i in range(M)) for j in range(L) ]

# Set objective (minimize number of words used)
m.setObjective( words_used, GRB.MINIMIZE)

# Constraint: must use as many letters as we have tiles of each sort
m.addConstrs(  (letters_used[j] >= lettercounts[j] for j in range(L) ) )

# Constraint: a letter can be reused at most twice (two crossing words)
# m.addConstrs(  (letters_used[j] <= 2*lettercounts[j] for j in range(L)) )

# Constraint: assume a chain of words (w-1 common letters assuming w words used)
m.addConstr(  quicksum(letters_used[j] - lettercounts[j] for j in range(L)) == words_used-1 )

m.optimize()


# print the optimal solution
opt = pd.DataFrame( data = [[ix,int(v.x),wordlist[ix]] for ix,v in enumerate(m.getVars()) if v.x > 0], columns = ['idx', 'quantity', 'word'] )
print('Optimal solution:\n\n',opt)
print('Total number of words:', opt.quantity.sum() )

# print overlapping letters
overlaps = sum( A[opt.idx[i],:] * opt.quantity[i] for i in range(len(opt)) ) - lettercounts
d = dict([ item for item in zip(alphabet,overlaps) if item[1] > 0 ] )
print('Number of overlaps', sum(d.values()))
print('Overlapping letters:', d )

Optimize a model with 27 rows, 172820 columns and 1424015 nonzeros
Variable types: 0 continuous, 172820 integer (0 binary)
Coefficient statistics:
  Matrix range     [1e+00, 3e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [2e+00, 1e+02]
Presolve removed 0 rows and 16348 columns
Presolve time: 3.08s
Presolved: 27 rows, 156472 columns, 1308797 nonzeros
Variable types: 0 continuous, 156472 integer (0 binary)

Starting sifting (using dual simplex for sub-problems)...

    Iter     Pivots    Primal Obj      Dual Obj        Time
       0          0     infinity      0.0000000e+00      4s
       1         77   1.1124519e+06   1.9651829e+00      4s
       2        148   9.1371561e+00   6.6733631e+00      4s
       3        232   8.7203149e+00   8.5027023e+00      4s

Sifting complete


Root relaxation: objective 8.693757e+00, 273 iterations, 0.30 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  