# Wolfram|Alpha can't, so here's how

From [Wolfram|Alpha Can't](https://twitter.com/wacnt/status/737795874596622336):
>largest number of phonemes shared by a pair of English words that share no letters

Of course, this is a perfect [nerdsnipe](https://xkcd.com/356/) for me. English orthography is famously decoupled from its phonology for a number of historical reasons, principally that (1) the orthography was standardized just prior to major sound changes and (2) when English borrows, it tends to keep the spelling from the source language (which often has different sound-letter correspondences).

In [1]:
from __future__ import print_function
import sys
from itertools import combinations

In [4]:
# How many phonemes do a and b share?
# NB: this is not a set intersection, so multiples of the same phoneme count)
def shared_phonemes(a, b):
    x = list(a)
    y = list(b)
    count = 0
    for s in x:
        if s in y:
            count += 1
            y.remove(s)
    return count

In [3]:
# Load the CMU Pronouncing Dictionary into a dict with form (letters -> phonemes)
# CPD has its own notation for phonemes, but they map onto IPA.
# CPD also has stress and POS, but we only care about phonemes right now.
# NB: only one pronunciation per spelling!
newdic = dict()

with open('newdic.txt') as f:
    for line in f:
        spl = line.split('\t')
        newdic[spl[3]] = spl[0]

In [5]:
sofar = 0
curr_num = 0
curr_pairs = []

# Compare each pair of words (without repeats)
for (key1,key2) in combinations(newdic.keys(), 2):
    # We're only interested in pairs with no letters in common
    if not bool(set(key1).intersection(set(key2))):
        # How many phonemes in common?
        so = shared_phonemes(newdic[key1], newdic[key2])
        # New record!
        if so > curr_num:
            curr_num = so
            curr_pairs = [(key1, key2)]
        # Ties for first place
        elif so == curr_num:
            curr_pairs.append((key1, key2))
    # Keep track of progress (190-some million comparisons total)
    sofar += 1
    if sofar % 1000000 == 0:
        sys.stdout.write("\r{} million comparisons...".format(sofar / 1000000))
        sys.stdout.flush()

190 million comparisons...

In [8]:
print("{} phonemes shared by:\n".format(curr_num))
for (a, b) in curr_pairs:
    print("{} /{}/ <=> {} /{}/".format(a, newdic[a], b, newdic[b]))

5 phonemes shared by:

askew /xskyu/ <=> circumlocution /sRkxmlokyuS|n/
heterodox /hEtxrxdaks/ <=> casaba /kxsabx/
commissary /kamxsEri/ <=> exegete /EksxJit/
inflexibility /InflEksxbIl|ti/ <=> sarcophagus /sarkafxg|s/
polyphony /pxlIfxni/ <=> cafeteria /k@fxtIrix/
philosophic /fIl|safIk/ <=> extravagant /Ikstr@vIg|nt/
extravagant /Ikstr@vIg|nt/ <=> umbilicus /^mbIlIk|s/
marzipan /martsxpan/ <=> ocelot /asxlat/
beneficence /bxnEfxs|ns/ <=> asparagus /xsp@rxg|s/
sycophant /sIkxfxnt/ <=> fixed /fIkst/
sarcophagus /sarkafxg|s/ <=> flexibility /flEksxbIl|ti/
quadrivium /kwadrIvixm/ <=> ecology /IkalxJi/
