# Algorithms

*Data Structures and Information Retrieval in Python*

Copyright 2021 Allen Downey

License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/)

**Exercise 1:** Write a function that takes two words and returns `True` if they are anagrams, that is, if you can rearrange the letters in one word to spell the other.

Test your function with the examples below.

In [1]:
def is_anagram(word1, word2):
    return False

In [2]:
# Solution

def is_anagram(word1, word2):
    t = list(word2)
    for letter in word1:
        if letter not in t:
            return False
        t.remove(letter)
    return len(t) == 0

In [3]:
# Solution

from collections import Counter

def is_anagram(word1, word2):
    return Counter(word1) == Counter(word2)

In [4]:
# Solution

def is_anagram(word1, word2):
    return sorted(word1) == sorted(word2)

In [5]:
is_anagram('tachymetric', 'mccarthyite') # True

True

In [6]:
is_anagram('post', 'top') # False, letter not present

False

In [7]:
is_anagram('pott', 'top') # False, letter present but not enough copies

False

In [8]:
is_anagram('top', 'post') # False, letters left over at the end

False

**Exercise 2:** Use `timeit` to see how fast your function is for this example:

In [9]:
%timeit is_anagram('tops', 'spot')

434 ns ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [10]:
%timeit is_anagram('tachymetric', 'mccarthyite')

766 ns ± 3.15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## Searching for anagram pairs

**Exercise 3:** Write a function that takes a word list and returns a list of all anagram pairs.

In [16]:
short_word_list = ['proudest', 'stop', 'pots', 'tops', 'sprouted']

In [17]:
def all_anagram_pairs(word_list):
    return []

In [18]:
# Solution

def all_anagram_pairs(word_list):
    """Finds all anagram pairs
    """
    res = []
    for i, word1 in enumerate(word_list):
        for word2 in word_list[i+1:]:
            if is_anagram(word1, word2):
                res.append((word1, word2))
    return res  

In [19]:
all_anagram_pairs(short_word_list)

[('proudest', 'sprouted'),
 ('stop', 'pots'),
 ('stop', 'tops'),
 ('pots', 'tops')]

The following cell downloads a file containing a list of English words.

In [20]:
from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://github.com/AllenDowney/DSIRP/raw/main/american-english')

The following function reads a file and returns a list of words.

In [21]:
def read_words(filename):
    """Read lines from a file and split them into words."""
    res = []
    for line in open(filename):
        for word in line.split():
            res.append(word.strip())
    return res

In [22]:
word_list = read_words('american-english')
len(word_list)

102401

**Exercise 4:** Loop through the word list and print all words that are anagrams of `stop`.

In [23]:
# Solution

for word in word_list:
    if is_anagram(word, 'stop'):
        print(word)

opts
post
pots
spot
stop
tops


Now run `all_anagram_pairs` with the full `word_list`:

In [24]:
%time pairs = all_anagram_pairs(word_list)

KeyboardInterrupt: 

**Exercise 5:** While that's running, let's estimate how long it's going to take.

## A better algorithm

**Exercise 6:** Write a better algorithm!

In [26]:
# Solution

def all_anagram_lists(word_list):
    """Finds all anagrams in a list of words.

    word_list: list of strings
    
    returns: a map from each word to a list of its anagrams.
    """
    d = {}
    for word in word_list:
        t = tuple(sorted((word)))

        if t not in d:
            d[t] = [word]
        else:
            d[t].append(word)
    return d

In [27]:
%time lists = all_anagram_lists(word_list)

CPU times: user 119 ms, sys: 12.1 ms, total: 131 ms
Wall time: 131 ms


**Exercise:**

Find the longest word with at least one anagram

Find the longest list of anagrams

Find the longest list for each word length

Actually enumerate all pairs

In [29]:
for key, value in lists.items():
    if len(value) > 4:
        print(value)

['abets', 'baste', 'bates', 'beast', 'beats', 'betas']
['alerting', 'altering', 'integral', 'relating', 'triangle']
['angered', 'derange', 'enraged', 'grandee', 'grenade']
['angriest', 'gantries', 'ingrates', 'rangiest', 'tasering']
['arced', 'cadre', 'cared', 'cedar', 'raced']
['ares', 'ears', 'eras', 'sear', 'sera']
['arts', 'rats', 'star', 'tars', 'tsar']
['aster', 'rates', 'stare', 'tares', 'taser', 'tears']
['bares', 'baser', 'bears', 'saber', 'sabre']
['capers', 'crapes', 'parsec', 'recaps', 'scrape']
['caret', 'cater', 'crate', 'react', 'recta', 'trace']
['carets', 'caster', 'caters', 'crates', 'reacts', 'recast', 'traces']
['dearths', 'hardest', 'hatreds', 'threads', 'trashed']
['deltas', 'lasted', 'salted', 'slated', 'staled']
['drapes', 'padres', 'parsed', 'rasped', 'spared', 'spread']
['drawer', 'redraw', 'reward', 'warder', 'warred']
['east', 'eats', 'sate', 'seat', 'teas']
['emits', 'items', 'mites', 'smite', 'times']
['enlist', 'inlets', 'listen', 'silent', 'tinsel']
['es