# Chapter xx

*Data Structures and Information Retrieval in Python*

Copyright 2021 Allen Downey

License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/)

**Exercise 1:** Write a function that takes two words and returns `True` if they are anagrams, that is, if you can rearrange the letters in one word to spell the other.

Test your function with the examples below.

In [53]:
def is_anagram(word1, word2):
    t = list(word2)
    for letter in word1:
        if letter not in t:
            return False
        t.remove(letter)
    return len(t) == 0

In [54]:
from collections import Counter

def is_anagram(word1, word2):
    return Counter(word1) == Counter(word2)

In [55]:
def is_anagram(word1, word2):
    return sorted(word1) == sorted(word2)

In [56]:
is_anagram('tachymetric', 'mccarthyite') # True

True

In [57]:
is_anagram('post', 'top') # False, letter not present

False

In [58]:
is_anagram('pott', 'top') # False, letter present but not enough copies

False

In [59]:
is_anagram('top', 'post') # False, letters left over at the end

False

**Exercise 2:** Use `timeit` to see how fast your function is for this example:

In [78]:
%timeit is_anagram('tops', 'spot')

419 ns ± 3.77 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [79]:
%timeit is_anagram('tachymetric', 'mccarthyite')

771 ns ± 0.885 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


**Exercise 3:** Write a function that takes a word list and returns a list of all anagram pairs.

In [63]:
def all_anagram_pairs(word_list):
    """Finds all anagram pairs
    """
    res = []
    for i, word1 in enumerate(word_list):
        for word2 in word_list[i+1:]:
            if is_anagram(word1, word2):
                res.append((word1, word2))
    return res  

In [64]:
words = ['proudest', 'stop', 'pots', 'tops', 'sprouted']

In [65]:
all_anagram_pairs(words)

[('proudest', 'sprouted'),
 ('stop', 'pots'),
 ('stop', 'tops'),
 ('pots', 'tops')]

## Searching for algorithm pairs


In [37]:
from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://github.com/AllenDowney/DSIRP/raw/main/american-english')

In [72]:
def read_words(filename):
    """Read lines from a file and split them into words."""
    res = []
    for line in open(filename):
        for word in line.split():
            res.append(word.strip())
    return res

In [73]:
word_list = read_words('american-english')
len(word_list)

102401

In [76]:
for word in word_list:
    if is_anagram(word, 'stop'):
        print(word)

opts
post
pots
spot
stop
tops


In [77]:
%time pairs = all_anagram_pairs(word_list)

KeyboardInterrupt: 

**Exercise:** While that's running, let's estimate how long it's going to take.

In [None]:
len(pairs)

In [None]:
for word1, word2 in pairs[:20]:
    print(word1, word2)

## A better algorithm


In [50]:
def all_anagram_lists(word_iterator):
    """Finds all anagrams in a list of words.

    filename: string filename of the word list

    Returns: a map from each word to a list of its anagrams.
    """
    d = {}
    for word in word_iterator:
        t = tuple(sorted((word)))

        if t not in d:
            d[t] = [word]
        else:
            d[t].append(word)
    return d

In [52]:
%time lists = all_anagram_lists(word_list)

CPU times: user 172 ms, sys: 4 ms, total: 176 ms
Wall time: 175 ms
