# Think Python

## Chapter 9 Case study: word play

*HTML version of this chapter in "Think Python 2e" can be found [here](http://greenteapress.com/thinkpython2/html/thinkpython2010.html "Chpt 9").*

### 9.1 Reading word lists

*Importing the word list from [this link](http://thinkpython2.com/code/words.txt "words.txt"), and using `open()` to open the file:*

*`.readline()` reads characters until it gets to a newline, and then returns the result as a string:*

In [2]:
fin.readline()

'aa\n'

*`.readline()` will remember our place in the file:*


In [3]:
fin.readline()

'aah\n'

*Using `.strip()` to get rid of the newline character:*

In [4]:
fin.readline().strip()

'aahed'

*Here's a `for`-loop that will read each word in `words.txt` and print it, one word per line:*

```
fin = open('words.txt')
for line in fin:
    print(line.strip())
```

*It takes quite a while for the iPython interpreter at GitHub to iterate through all the words in the list, so for the sake of speed and space, the next cell uses a `for` loop to go through just the first 30.*

In [5]:
fin = open('words.txt')
i = 0
for line in fin:   
    if i < 30:
        print(line.strip())
    i += 1


aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
aardvarks
aardwolf
aardwolves
aas
aasvogel
aasvogels
aba
abaca
abacas
abaci
aback
abacus
abacuses
abaft
abaka
abakas
abalone
abalones
abamp
abampere


### 9.2  Exercises
*There are solutions to these exercises in the next section. You should at least attempt each one before you read the solutions.*

#### Exercise 1   

*Write a program that reads `words.txt` and prints only the words with more than 20 characters (not counting whitespace).*

In [6]:
fin = open('words.txt')
for line in fin:
    word = line.strip()
    if len(word) > 20:
        print(word)

counterdemonstrations
hyperaggressivenesses
microminiaturizations


#### Exercise 2  

*In 1939 Ernest Vincent Wright published a 50,000 word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do.*

*In fact, it is difficult to construct a solitary thought without using that most common symbol. It is slow going at first, but with caution and hours of training you can gradually gain facility.*

*All right, I’ll stop now.*

*Write a function called `has_no_e` that returns `True` if the given word doesn’t have the letter “e” in it.*

*Write a program that reads `words.txt` and prints only the words that have no “e”. Compute the percentage of words in the list that have no “e”.*


In [7]:
def has_no_e(word):
    """
    Returns False if word contains the character 'e'.
    """
    
    if "e" in word:
        return False
    return True

In [8]:
has_no_e("banana")

True

In [9]:
has_no_e("grape")

False

In [10]:
fin = open('words.txt')

# Tallying and printing words in words.txt without 'e'

no_e_tally = 0
i = 0

# As was the case above, this list returns only the first
# 30 results
for line in fin:
    word = line.strip()
    if has_no_e(word):
        if i < 30:
            print(word)
            i += 1
        no_e_tally += 1
        
no_e_tally

aa
aah
aahing
aahs
aal
aalii
aaliis
aals
aardvark
aardvarks
aardwolf
aas
aba
abaca
abacas
abaci
aback
abacus
abaft
abaka
abakas
abamp
abamps
abandon
abandoning
abandons
abas
abash
abashing
abasing


37641

In [11]:
# Tallying number of words in words.txt

fin = open('words.txt')
tally = 0
for line in fin:
    tally += 1

# Calculating and printing the percentage of words without 'e'
    
print("{0:.6}% of the words in 'words.txt' do not have the letter 'e'.".format((no_e_tally/tally) * 100))

33.0738% of the words in 'words.txt' do not have the letter 'e'.


#### Exercise 3  
*Write a function named `avoids` that takes a word and a string of forbidden letters, and that returns `True` if the word doesn’t use any of the forbidden letters.*

*Write a program that prompts the user to enter a string of forbidden letters and then prints the number of words that don’t contain any of them. Can you find a combination of 5 forbidden letters that excludes the smallest number of words?*


In [54]:
def avoids(forbidden, word):
    """
    Returns False if word contains one of 
    the characters in the forbidden string.
    """
    for character in forbidden:
        if character in word:
            return False
    return True

In [13]:
forbidden = "abcde"
word = "grape"
avoids(forbidden, word)

False

In [14]:
forbidden = "abcde"
word = "stuff"
avoids(forbidden, word)

True

*__There are $26 \choose 5$ - or 65,780 - different ways of choosing five letters from the alphabet. The most efficient way of selecting the five letters that exclude the fewest words might be to look at a list of letter frequency (e.g., [this one from Wikipedia](https://en.wikipedia.org/wiki/Letter_frequency "letter frequency")), and choose the five least common letters:__*

In [15]:
# Tallying and printing words in words.txt without 'zqxjk'

fin = open('words.txt')
forbidden = "zqxjk"
avoids_tally = tally
for line in fin:
    if avoids(forbidden, line.strip()):
        avoids_tally -= 1
    
avoids_tally

17945

*__While this may seem a logical way to approach the problem, it unfortunately does not generate the correct answer.  For the sake of completeness, I created an algorithm that generated all 65,780 possible five-letter combinations, and tested all of them with the function `avoids`.  The algorithm that generated the combinations is fairly verbose, but it uses only functions that have been introduced in the book so far:__*

In [16]:
# create a string with all of the letters of the alphabet

alphabet = ""
for i in range(ord('a'), ord('z') + 1):
    alphabet += chr(i)
    
# Use that string to generate all the possible combinations
n = 0
p1 = n

# Again we're limiting results to the first 30
i = 0
tally = 0
while p1 < len(alphabet):
    p2 = p1 + 1
    while p2 < len(alphabet):
        p3 = p2 + 1
        while p3 < len(alphabet):
            p4 = p3 + 1
            while p4 < len(alphabet):
                p5 = p4 + 1
                while p5 < len(alphabet):
                    letter_seq = (alphabet[p1] + alphabet[p2] + alphabet[p3] +
                                 alphabet[p4] + alphabet[p5])
                    if i < 30:
                        print(letter_seq)
                        i += 1
                    tally += 1
                    p5 += 1
                p4 += 1
            p3 += 1
        p2 += 1
    p1 += 1
    
print("...\nThe total number of combinations would be:", tally)

abcde
abcdf
abcdg
abcdh
abcdi
abcdj
abcdk
abcdl
abcdm
abcdn
abcdo
abcdp
abcdq
abcdr
abcds
abcdt
abcdu
abcdv
abcdw
abcdx
abcdy
abcdz
abcef
abceg
abceh
abcei
abcej
abcek
abcel
abcem
...
The total number of combinations would be: 65780


*__There are recursive functions that generate the same combinations - cf. [here](http://amjith.blogspot.com/2011/10/picking-items-from-list-of-recursion.html "recursive function to generate combinations"); but these use `lists`, which have not yet been covered in "Think Python 2e".  Furthermore, recursive functions tend to be slow.__*

In [17]:
# from http://amjith.blogspot.com/2011/10/picking-items-from-list-of-recursion.html

def combo(w, l):
    lst = []
    for i in range(len(w)):
        if l == 1:
            lst.append(w[i])
        for c in combo(w[i+1:], l-1):
            lst.append(w[i] + c)
    return lst



In [46]:
combo_list = combo(alphabet, 5)
combo_list[:30]

['abcde',
 'abcdf',
 'abcdg',
 'abcdh',
 'abcdi',
 'abcdj',
 'abcdk',
 'abcdl',
 'abcdm',
 'abcdn',
 'abcdo',
 'abcdp',
 'abcdq',
 'abcdr',
 'abcds',
 'abcdt',
 'abcdu',
 'abcdv',
 'abcdw',
 'abcdx',
 'abcdy',
 'abcdz',
 'abcef',
 'abceg',
 'abceh',
 'abcei',
 'abcej',
 'abcek',
 'abcel',
 'abcem']

*__I tested all the combinations on a fairly fast machine (Core i7 processor, 8GB RAM), but the following code was very slow and took nearly two hours to run. If you have a couple of hours to kill, the code can be found [here](https://github.com/Sturzgefahr/ThinkPython/blob/master/Think%20Python%20-%20Chapter%2009/least_frequent_combo.py "least_frequent_combo.py").__*


```
# Tallying and printing words in words.txt without every five letter combination

fin = open('words.txt')

# Tallying number of words in words.txt

tally = 0
for line in fin:
    tally += 1
    
# initializing the tally and setting the least_freq_combo

least_freq_tally = 0
least_freq_combo = ""

# creating all possible combinations 

n = 0
p1 = n
while p1 < len(alphabet):
    p2 = p1 + 1
    while p2 < len(alphabet):
        p3 = p2 + 1
        while p3 < len(alphabet):
            p4 = p3 + 1
            while p4 < len(alphabet):
                p5 = p4 + 1
                while p5 < len(alphabet):
                    forbidden = (alphabet[p1] + alphabet[p2] + alphabet[p3] +
                                 alphabet[p4] + alphabet[p5])
                    avoids_tally = tally
                    fin = open('words.txt')
                    
                    # testing the new combination
                    
                    for line in fin:
                        if avoids(forbidden, line.strip()):
                            avoids_tally -=1
                    
                    # updating the tally and least_freq_combo
                    
                    if avoids_tally > least_freq_tally:
                        least_freq_tally = avoids_tally
                        least_freq_combo = forbidden
                    
                    p5 += 1
                p4 += 1
            p3 += 1
        p2 += 1
    p1 += 1
    
print(least_freq_combo)
print(least_freq_tally)
```

*__After about two hours, I got my result: `jqwxz` excluded the fewest words: 17384, or about 600 fewer that just using the five least common letters in English:__*

In [56]:
fin = open('words.txt')
tally = 0
for line in fin:
    tally += 1

fin = open('words.txt')
forbidden = "jqwxz"
avoids_tally = tally
for line in fin:
    if avoids(forbidden, line.strip()):
        avoids_tally -= 1
    
avoids_tally

17384

*__For the sake of comparison I tried to run through the different combinations once more, this time using those produced by the recursive algorithm.  The differences in time to complete were negligible (85 mins. vs. 89 mins.), and the two programs produced the same result. The code is also available [here](https://github.com/Sturzgefahr/ThinkPython/blob/master/Think%20Python%20-%20Chapter%2009/least_frequent_combo_from_list.py "least_frequent_combo_from_list.py").__* 

```
# Tallying and printing words in words.txt without every five letter combination

fin = open('words.txt')

# Tallying number of words in words.txt

tally = 0
for line in fin:
    tally += 1
    
# initializing the tally and setting the least_freq_combo

least_freq_tally = tally
least_freq_combo = ""

for c in combo_list:
    avoids_tally = tally
    fin = open('words.txt')
    for line in fin:
        if avoids(c, line.strip()):
            avoids_tally -=1

    if avoids_tally < least_freq_tally:
        least_freq_tally = avoids_tally
        least_freq_combo = c
```

```
print(least_freq_combo)
print(least_freq_tally)

jqwxz
17384
```

#### Exercise 4  

*Write a function named `uses_only` that takes a word and a string of letters, and that returns `True` if the word contains only letters in the list. Can you make a sentence using only the letters `acefhlo`? Other than “Hoe alfalfa”?*

*__Here are three ways we could write the function.  The first uses the index location to traverse `word`:__*

In [20]:
def uses_only(word, string):
    """
    Returns True if word contains only
    characters also found in string
    """
    i = 0
    while i < len(word):
        if word[i] in string:
            i += 1
        else:
            return False
    return True
        

*__The second uses a rather inelegant combination of `in` and `pass` to eliminate words with characters not in the string:__*

In [21]:
def uses_only(word, string):
    """
    Returns True if word contains only
    characters also found in string
    """
    i = 0
    for char in word:
        if char in string:
            pass
        else:
            return False
    return True

*__The third is more concise, but uses `not`, which I believe has not yet been introduced:__*

In [22]:
def uses_only(word, string):
    """
    Returns True if word contains only
    characters also found in string
    """
    i = 0
    for char in word:
        if char not in string:
            return False
    return True

In [23]:
uses_only('aahed', 'acefhlo')

False

In [24]:
fin = open('words.txt')

i = 0
tally = 0 
for line in fin:
    word = line.strip()
    if uses_only(word, 'acefhlo'):
        
        # As before, only printing the first 30 results
        if i < 30:
            print(word)
            i += 1
        tally += 1
        
print("...\nThe final tally would be:", tally)

aa
aah
aal
ace
ache
achoo
ae
aff
ah
aha
ahchoo
ala
alae
alcohol
ale
alec
alee
alef
alfa
alfalfa
all
allele
allheal
aloe
aloha
aloof
cacao
cache
caeca
caecal
...
The final tally would be: 188


*__I'm not terribly interested in the second challenge of this exercise, but just for shits & giggles: "each cacao leaf fell, achoo!"__*

#### Exercise 5  

*Write a function named `uses_all` that takes a word and a string of required letters, and that returns `True` if the word uses all the required letters at least once. How many words are there that use all the vowels `aeiou`? How about `aeiouy`?*

*__As with `uses_only`, there are three ways we could write the function.  The first way:__*

In [25]:
def uses_all(word, string):
    """
    Returns True if word uses all
    characters in string
    """
    i = 0
    while i < len(string):
        if string[i] in word:
            i += 1
        else:
            return False
    return True
        

*__The second:__*

In [26]:
def uses_all(word, string):
    """
    Returns True if word uses all
    characters in string
    """
    i = 0
    for char in string:
        if char in word:
            pass
        else:
            return False
    return True

*__The third:__*

In [27]:
def uses_all(word, string):
    """
    Returns True if word uses all
    characters in string
    """
    i = 0
    for char in string:
        if char not in word:
            return False
    return True

In [28]:
uses_all('eulogia', 'aeiou')

True

In [29]:
uses_all('eulogy', 'aeiou')

False

In [30]:
# Words that use all vowels at least once:

fin = open('words.txt')

# As before, we'll print a partial list and keep a tally to show the size of 
# the full list
i = 0
tally = 0

for line in fin:
    word = line.strip()
    if uses_all(word, 'aeiou'):
        if i < 30:
            print(word)
            i += 1
        tally += 1
        
print("...\nThe final tally would be", tally, 
      "words in the list use all five vowels.")

aboideau
aboideaus
aboideaux
aboiteau
aboiteaus
aboiteaux
abstemious
abstemiously
accentuation
accentuations
accountabilities
accountancies
accoutering
adulteration
adulterations
adventitious
adventitiously
adventitiousness
adventitiousnesses
aerobium
aeronautic
aeronautical
aeronautically
aeronautics
agouties
ambidextrous
ambidextrously
antibourgeois
anticonsumer
antievolution
...
The final tally would be 598 words in the list use all five vowels.


In [31]:
# Words that use all vowels and 'y' at least once:

fin = open('words.txt')
for line in fin:
    word = line.strip()
    if uses_all(word, 'aeiouy'):
        print(word)

abstemiously
adventitiously
aeronautically
ambidextrously
antievolutionary
antirevolutionary
antiunemployment
authoritatively
autotypies
buoyancies
counterinflationary
evolutionary
extracommunity
facetiously
genitourinary
gregariously
hyperanxious
hypercautious
hyperfastidious
inconsequentially
instantaneously
intravenously
mendaciously
miscellaneously
nefariously
neurologically
neurotically
ostentatiously
outwearying
postrevolutionary
precariously
precautionary
prerevolutionary
revolutionary
sacrilegiously
simultaneously
tenaciously
uncomplimentary
unconventionally
unequivocally
unintentionally
unquestionably


#### Exercise 6  

*Write a function called `is_abecedarian` that returns `True` if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?*

In [32]:
def is_abecedarian(word):
    i = 0
    while i < len(word) - 1:
        if word[i] <= word[i + 1]:
            i += 1
        else:
            return False
    return True


In [33]:
is_abecedarian("about")

False

In [34]:
is_abecedarian("abbot")

True

*__Technically, looping with indices isn't introduced until the next section of this chapter; but the alternatives (which are presented in 9.4) seem clunky in comparison.__*

In [35]:
# find abecedarian words:

fin = open('words.txt')

abc_tally = 0
i = 0

for line in fin:
    word = line.strip()
    if is_abecedarian(word):
        # As above, printing only the first 30 words
        if i < 30:
            print(word)
            i += 1
        abc_tally += 1
        
print("...\nThere are", abc_tally, "abecedarian words in the list.")

aa
aah
aahs
aal
aals
aas
abbe
abbes
abbess
abbey
abbot
abet
abhor
abhors
ably
abo
abort
abos
abuzz
aby
accent
accept
access
accost
ace
acers
aces
achoo
achy
act
...
There are 596 abecedarian words in the list.


### 9.3 Search

__Reduction to a previously solved problem:__ `uses_all` could have been written using `uses_only`:



In [36]:
def uses_all(word, required):
    return uses_only(required, word)

In [37]:
# Words that use all vowels and 'y' at least once:

fin = open('words.txt')
for line in fin:
    word = line.strip()
    if uses_all(word, 'aeiouy'):
        print(word)

abstemiously
adventitiously
aeronautically
ambidextrously
antievolutionary
antirevolutionary
antiunemployment
authoritatively
autotypies
buoyancies
counterinflationary
evolutionary
extracommunity
facetiously
genitourinary
gregariously
hyperanxious
hypercautious
hyperfastidious
inconsequentially
instantaneously
intravenously
mendaciously
miscellaneously
nefariously
neurologically
neurotically
ostentatiously
outwearying
postrevolutionary
precariously
precautionary
prerevolutionary
revolutionary
sacrilegiously
simultaneously
tenaciously
uncomplimentary
unconventionally
unequivocally
unintentionally
unquestionably


### 9.4 Looping with indices

*__No notes.__*

### 9.5 Debugging

*__No notes.__*

### 9.6 Glossary

*__No notes.__*

### 9.7 Exercises

#### Exercise 7  

*This question is based on a Puzzler that was broadcast on the radio program Car Talk (http://www.cartalk.com/content/puzzlers):*

> *Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the ‘i’ that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word?*

*Write a program to find it.*

In [38]:
def has_three_consecutive_double_letters(w):
    i = 0
    while i < (len(w) - 5):
        if w[i] == w[i + 1] and w[i + 2] == w[i + 3] and w[i + 4] == w[i + 5]:
            return True
        i += 1
    return False
        
        

In [39]:
has_three_consecutive_double_letters('sesquipedalian')

False

In [40]:
has_three_consecutive_double_letters('abcdeffgghh')

True

In [41]:
# Words with three consecutive double letters:

fin = open('words.txt')
for line in fin:
    word = line.strip()
    if has_three_consecutive_double_letters(word):
        print(word)

bookkeeper
bookkeepers
bookkeeping
bookkeepings


#### Exercise 8   Here’s another Car Talk Puzzler (http://www.cartalk.com/content/puzzlers):

> *“I was driving on the highway the other day and I happened to notice my odometer. Like most odometers, it shows six digits, in whole miles only. So, if my car had 300,000 miles, for example, I’d see 3-0-0-0-0-0.
“Now, what I saw that day was very interesting. I noticed that the last 4 digits were palindromic; that is, they read the same forward as backward. For example, 5-4-4-5 is a palindrome, so my odometer could have read 3-1-5-4-4-5.*

> *“One mile later, the last 5 numbers were palindromic. For example, it could have read 3-6-5-4-5-6. One mile after that, the middle 4 out of 6 numbers were palindromic. And you ready for this? One mile later, all 6 were palindromic!*

> *“The question is, what was on the odometer when I first looked?”*

*Write a Python program that tests all the six-digit numbers and prints any numbers that satisfy these requirements.* 

In [42]:
def is_palindrome(word):
    """
    Returns True if word is palindromic.
    """
    return word == word[::-1]

In [43]:
for i in range(100000, 999999):
    if is_palindrome(str(i)[2:]):
        if is_palindrome(str(i + 1)[1:]):
            if is_palindrome(str(i + 2)[1:5]):
                if is_palindrome(str(i + 3)):
                    print(i)

198888
199999


*__While both 198888 and 199999 satisfy the requirements of the puzzle, it seems that 198888 is the better answer.  In 198888 the last four digits are palindromic; in 198889 the last five are; in 198890 the middle four are; and in 198891 all six are.  However, in 199999, the last five - not four - digits - are palindromic, and it seems the creator of the puzzle would have specified this if that is what he was looking for.__*

#### Exercise 9   Here’s another Car Talk Puzzler you can solve with a search (http://www.cartalk.com/content/puzzlers):

> *“Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other topics and we never came up with an answer.
“When I got home I figured out that the digits of our ages have been reversible six times so far. I also figured out that if we’re lucky it would happen again in a few years, and if we’re really lucky it would happen one more time after that. In other words, it would have happened 8 times over all. So the question is, how old am I now?”*

*Write a Python program that searches for solutions to this Puzzler. Hint: you might find the string method `zfill` useful.*

In [44]:
def are_palindromes(word1, word2):
    """
    Returns True if word1 and word2 are palindromic.
    """
    return word1 == word2[::-1]

# Initialize counts at zero

num_reversed_ages = 0
age_difference_with_most_reversed_ages = 0

# Search for occurrences of reversed ages for a reasonable range
# of age differences, e.g. 10 - 55 (i.e., it's crazy to think 
# his mother was younger than 9 or older than 56 when she had him).

for age_difference in range(10, 55):
    reversed_ages = 0
    
    # tally the number of reversed ages for this age difference
    for my_age in range(0, 99):
        moms_age = my_age + age_difference
        
        # this riddle won't work for triple-digit ages
        if moms_age < 100:
            if are_palindromes(str(my_age).zfill(2), str(moms_age)):
                reversed_ages += 1
    
    # reset counts if we have a new maximum
    if reversed_ages > num_reversed_ages:
        num_reversed_ages = reversed_ages
        age_difference_with_most_reversed_ages = age_difference
        reversed_ages = 0
        
print("The age difference with the most reversed ages:",
      age_difference_with_most_reversed_ages)
print("Number of reversed ages:", num_reversed_ages)

The age difference with the most reversed ages: 18
Number of reversed ages: 8


In [45]:
# print out reversed ages for this difference

for my_age in range(0, 99):
    age_difference = 18
    moms_age = my_age + age_difference
    if moms_age < 100:
        if are_palindromes(str(my_age).zfill(2), str(moms_age)):
            print(str(my_age).zfill(2), " ", str(moms_age))

02   20
13   31
24   42
35   53
46   64
57   75
68   86
79   97
