# Think Python 

## Chapter 12 Tuples

HTML version can be found [here](http://greenteapress.com/thinkpython2/html/thinkpython2013.html "Chpt 12").



### 12.4 Variable-length argument tuples

*As an exercise, write a function called `sum_all` that takes any number of arguments and returns their sum.*

In [1]:
def sum_all(*args):
    return sum(args)

In [2]:
sum_all(1, 2, 3)

6



### 12.10 Exercises

#### Exercise 1  

*Write a function called `most_frequent` that takes a string and prints the letters in decreasing order of frequency. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at http://en.wikipedia.org/wiki/Letter_frequencies.*

*__One immediate problem with this exercise is that the most frequent character in most texts will be the space character.  It's likely that we won't want any non-alphabetic characters (e.g., punctuation, numbers, etc...) in our analysis, so it's a good idea to include code to remove these characters.  Instead of writing two functions - one for only alphabetic characters and a second for all characters - I decided to include a conditional for both options.  Additionally, I included an option for the user to either print the formatted results, or to return the results in a table. Although this hasn't been covered yet in the book, I added default arguments for both of these conditions.  And as has been the case in earlier exercises, I'm using `print("".format())` for the sake of aesthetics.__*

In [3]:
def most_frequent(sample, only_alpha = True, return_table = False):
    """
    Tabulates the number of characters in a text and 
    prints out the characters and their 
    percentage of the sample in order of frequency.  
    
    Arguments:
    
    sample: text to be analyzed
    
    only_alpha: determines if non-alphabetic
    characters (e.g., punctuation, numbers, and spaces) 
    are tabulated. Default is True.
    
    return_table: determines if results are to be 
    formatted and printed, or returned in a non-formatted
    table. Default is False.
    """
    
    d = {}

    # if we only want to consider alphabetic characters
    
    if only_alpha:
    
        for s in sample:
            if s.isalpha():
                s = s.lower()
                d[s] = 1 + d.get(s, 0)
            
    # if we want to analayze all characters in the text
    
    else:             
        
        for s in sample:
            s = s.lower()
            d[s] = 1 + d.get(s, 0) 
            
    # the next conditional will either return a table, or 
    # print formatted results
    
    if return_table:
        
        t = []
        for(y, z) in reversed(sorted(zip(d.values(), d.keys()))):
            t.append([z, y/len(sample) * 100])

        return t
    
    else:

        for(y, z) in reversed(sorted(zip(d.values(), d.keys()))):
            print("'{}': {:.3f}%".format(z, (y/len(sample)) * 100))

*__Using the Project Gutenberg text of "Alice Through the Looking-Glass" by Lewis Carroll, available [here](https://www.gutenberg.org/files/12/12-0.txt "Alice Through the Looking-Glass").  N.B. that Project Gutenberg texts start and end with boilerplate that I removed manually before analysis.  I also had to set the encoding when I opened the text.__*

In [5]:
alice = open('alice.txt', encoding="utf8").read()
most_frequent(alice)

'e': 9.397%
't': 7.132%
'a': 5.880%
'o': 5.384%
'i': 5.131%
'h': 4.984%
'n': 4.959%
's': 4.415%
'r': 3.524%
'd': 3.463%
'l': 3.247%
'u': 2.514%
'w': 1.878%
'g': 1.785%
'y': 1.740%
'c': 1.522%
'm': 1.517%
'f': 1.328%
'p': 0.996%
'b': 0.962%
'k': 0.938%
'v': 0.606%
'q': 0.215%
'j': 0.097%
'x': 0.091%
'z': 0.031%


*__The same analysis, but with the results in a table.__*

In [6]:
most_frequent(alice, return_table = True)[0:5]

[['e', 9.396908392475074],
 ['t', 7.131538219643485],
 ['a', 5.879850290724561],
 ['o', 5.384107879468002],
 ['i', 5.131303913528712]]

*__A table of the same text, now with non-alphabetic characters.__*

In [7]:
most_frequent(alice, only_alpha = False, return_table = True)[0:5]

[[' ', 17.72340779746086],
 ['e', 9.396908392475074],
 ['t', 7.131538219643485],
 ['a', 5.879850290724561],
 ['o', 5.384107879468002]]

*__I tried a Hungarian text to see how the function would deal with diacritics. The text is "Az arany ember (2. rész)" by Mór Jókai, and was also found at [Project Gutenberg](https://www.gutenberg.org/files/56592/56592-0.txt "Hungarian Text").__*

In [8]:
hungarian = open('hungarian.txt', encoding="utf8").read()

In [9]:
most_frequent(hungarian)

'e': 8.122%
'a': 7.480%
't': 6.877%
'n': 4.897%
'l': 4.841%
's': 4.121%
'i': 3.604%
'o': 3.567%
'k': 3.433%
'z': 3.382%
'm': 3.351%
'r': 3.181%
'g': 2.976%
'á': 2.573%
'é': 2.320%
'y': 2.011%
'd': 1.801%
'h': 1.626%
'v': 1.570%
'b': 1.338%
'j': 0.931%
'ö': 0.866%
'f': 0.788%
'u': 0.786%
'ő': 0.654%
'c': 0.641%
'p': 0.627%
'ó': 0.613%
'ü': 0.397%
'í': 0.213%
'ú': 0.166%
'ű': 0.084%
'w': 0.060%
'x': 0.007%
'q': 0.003%
'æ': 0.002%
'è': 0.000%


*__Finally, I tried an Italian text: "La Gioconda" by Gabriele d'Annunzio, which is available [here](https://www.gutenberg.org/ebooks/23297.txt.utf-8 "Italian Text").__*

In [10]:
italian = open('italian.txt', encoding="utf8").read()

In [11]:
most_frequent(italian)

'a': 9.429%
'e': 8.504%
'i': 7.529%
'o': 7.014%
'l': 5.580%
'n': 5.104%
't': 4.761%
'r': 4.558%
's': 4.272%
'c': 3.464%
'd': 2.752%
'u': 2.655%
'm': 2.122%
'p': 1.777%
'v': 1.722%
'g': 1.283%
'h': 1.022%
'b': 0.811%
'f': 0.776%
'z': 0.468%
'q': 0.360%
'è': 0.323%
'à': 0.202%
'ù': 0.125%
'ò': 0.092%
'ì': 0.075%
'ó': 0.005%
'é': 0.005%
'k': 0.003%
'ú': 0.001%
'í': 0.001%
'æ': 0.001%
'º': 0.001%
'x': 0.001%


*__The rankings and frequencies are very close to what one can find at the [Wikipedia entry for letter frequencies](https://en.wikipedia.org/wiki/Letter_frequency "Letter frequency"), though we have to bear in mind that it would be quite difficult to get reliable results on the basis of only one sample.__*

#### Exercise 2  

*More anagrams!*

*Write a program that reads a word list from a file (see Section 9.1) and prints all the sets of words that are anagrams.*
*Here is an example of what the output might look like:*

```
['deltas', 'desalt', 'lasted', 'salted', 'slated', 'staled']
['retainers', 'ternaries']
['generating', 'greatening']
['resmelts', 'smelters', 'termless']
```

*Hint: you might want to build a dictionary that maps from a collection of letters to a list of words that can be spelled with those letters. The question is, how can you represent the collection of letters in a way that can be used as a key?*

*Modify the previous program so that it prints the longest list of anagrams first, followed by the second longest, and so on.*

*In Scrabble a “bingo” is when you play all seven tiles in your rack, along with a letter on the board, to form an eight-letter word. What collection of 8 letters forms the most possible bingos?*


In [12]:
fin = open('words.txt')

In [13]:
def sort_letters(string):
    """
    Returns the letters in string as a new string
    whose letters are in alphabetical order.
    """
        
    return ''.join(sorted(list(string.lower())))

def find_anagrams(text):
    """
    Takes a text and returns a list of tuples of anagrams.
    First item in the tuple is the number of anagrams formed.
    Second item is the letters used.
    Third item is the anagrams.
    """

    sorted_dict = {}

    for line in text:
        orig_word = line.strip()
        sorted_word = sort_letters(orig_word)
        sorted_dict.setdefault(sorted_word, []).append(orig_word)
        
    anagrams = []

    for k, v in sorted_dict.items():
        l = len(v)
        if l > 1:
            anagrams.append((l, k, v))
            
    return anagrams          

def find_longest_list_anagrams(text):
    """
    Takes a text and returns a list of tuples of anagrams.
    First item in the tuple is the number of anagrams formed.
    Second item is the letters used.
    Third item is the anagrams.
    Tuples are listed in decreasing order of frequency.
    """
    
    anagrams = find_anagrams(text)
    longest_list_anagrams = []

    for l, k, v  in reversed(sorted(anagrams)):
        longest_list_anagrams.append((l, k, v))
        
    return longest_list_anagrams

In [14]:
# For the sake of brevity, only listing first five

longest_list_anagrams = find_longest_list_anagrams(fin)
longest_list_anagrams[:5]

[(11,
  'aeprs',
  ['apers',
   'asper',
   'pares',
   'parse',
   'pears',
   'prase',
   'presa',
   'rapes',
   'reaps',
   'spare',
   'spear']),
 (11,
  'aelrst',
  ['alerts',
   'alters',
   'artels',
   'estral',
   'laster',
   'ratels',
   'salter',
   'slater',
   'staler',
   'stelar',
   'talers']),
 (10,
  'aelst',
  ['least',
   'setal',
   'slate',
   'stale',
   'steal',
   'stela',
   'taels',
   'tales',
   'teals',
   'tesla']),
 (9,
  'einrst',
  ['estrin',
   'inerts',
   'insert',
   'inters',
   'niters',
   'nitres',
   'sinter',
   'triens',
   'trines']),
 (9,
  'aceprs',
  ['capers',
   'crapes',
   'escarp',
   'pacers',
   'parsec',
   'recaps',
   'scrape',
   'secpar',
   'spacer'])]

In [15]:
def find_most_scrabble_bingos(text):
    """
    Takes a text and returns a list with tuples of 
    eight-letter combinations that can be used to form anagrams.
    Tuples are listed in decreasing frequency of anagrams formed.
    First item in the tuple is the number of anagrams formed.
    Second item is the letters used.
    Third item is the anagrams.
    """
    lla = find_longest_list_anagrams(text)
    
    bingos = []
    for l, k, v in lla:
        if len(k) == 8:
            bingos.append((l, k, v))
        
    return bingos

In [16]:
# Need to reinitialize `fin`

fin = open('words.txt')

*__We actually only need the top result, but I'll show the first five, just as confirmation that the first result is actually the correct result.__*

In [17]:
find_most_scrabble_bingos(fin)[:5]

[(7,
  'aeginrst',
  ['angriest',
   'astringe',
   'ganister',
   'gantries',
   'granites',
   'ingrates',
   'rangiest']),
 (6,
  'aeinprst',
  ['painters', 'pantries', 'pertains', 'pinaster', 'pristane', 'repaints']),
 (6,
  'aegilnrt',
  ['alerting', 'altering', 'integral', 'relating', 'tanglier', 'triangle']),
 (6,
  'aegilnrs',
  ['aligners', 'engrails', 'nargiles', 'realigns', 'signaler', 'slangier']),
 (6,
  'aeegnrst',
  ['estrange', 'grantees', 'greatens', 'negaters', 'reagents', 'sergeant'])]

### Exercise 3  

*Two words form a “metathesis pair” if you can transform one into the other by swapping two letters; for example, “converse” and “conserve”. Write a program that finds all of the metathesis pairs in the dictionary. Hint: don’t test all pairs of words, and don’t test all possible swaps.*

*__If two words form a "metathesis pair", that means they would also be anagrams of each other.  So we can save a tremendous amount of time by only considering words which are already anagrams, instead of all the words in the list.__*

*__I created `calculate difference` to calculate by how many letters two words differ.  The function will only give a valid result in the words are the same length.  Although it hasn't been introduced yet in "Think Python 2e", I'm using `assert` to make sure this is the case.__*

In [18]:
# Need to reinitialize `fin`

fin = open('words.txt')

In [19]:
def calculate_difference(word1, word2):
    """
    Calculates the number of different characters
    in word1 and word2.  Words must be same length.
    """
    assert len(word1) == len(word2), "Words must be same length."
    
    diff = 0
    for x, y in zip(word1, word2):
        if x != y:
            diff += 1
            
    return diff

def find_metathesis_pairs(text):
    """
    Returns a list of 'metathesis pairs' - words that can
    be made into new words by exchanging two letters - 
    from a text.
    """

    # make anagrams
    anagrams = find_anagrams(text)

    mps = []

    # cycle through the tuples of anagrams and pull out words
    # which only differ by two letters
    for lng, let, ana in anagrams:
        for i in range(len(ana)):
            for j in range(i + 1, len(ana)):
                if calculate_difference(ana[i], ana[j]) == 2:
                    mps.append([ana[i], ana[j]])

    return mps

In [20]:
metathesis_pairs = find_metathesis_pairs(fin)

In [21]:
# ten random metathesis pairs:

from random import randint

for i in range(10):
    print(metathesis_pairs[randint(0, len(metathesis_pairs))])

['maters', 'matres']
['decenter', 'decentre']
['boss', 'sobs']
['deucing', 'educing']
['certes', 'terces']
['keel', 'leek']
['arias', 'raias']
['whist', 'whits']
['realters', 'relaters']
['tarps', 'traps']


#### Exercise 4  

*Here’s another Car Talk Puzzler (http://www.cartalk.com/content/puzzlers):*

> *What is the longest English word, that remains a valid English word, as you remove its letters one at a time?* 

> *Now, letters can be removed from either end, or the middle, but you can’t rearrange any of the letters. Every time you drop a letter, you wind up with another English word. If you do that, you’re eventually going to wind up with one letter and that too is going to be an English word—one that’s found in the dictionary. I want to know what’s the longest word and how many letters does it have?*

> *I’m going to give you a little modest example: Sprite. Ok? You start off with sprite, you take a letter off, one from the interior of the word, take the r away, and we’re left with the word spite, then we take the e off the end, we’re left with spit, we take the s off, we’re left with pit, it, and I.*


*Write a program to find all words that can be reduced in this way, and then find the longest one.*

*This exercise is a little more challenging than most, so here are some suggestions:*

<ol>
<li><i>You might want to write a function that takes a word and computes a list of all the words that can be formed by removing one letter. These are the “children” of the word.</i></li>
<li><i>Recursively, a word is reducible if any of its children are reducible. As a base case, you can consider the empty string reducible.</i></li>
<li><i>The wordlist I provided, `words.txt`, doesn’t contain single letter words. So you might want to add “I”, “a”, and the empty string.</i></li>
<li><i>To improve the performance of your program, you might want to memoize the words that are known to be reducible.</i></li>

*__As was the case with all the problems in this chapter, I first came across this problem while I was working my way through ["Think Julia"](https://benlauwens.github.io/ThinkJulia.jl/latest/book.html#_exercises_14 "Think Julia, ex. 12.5"), and I remember finding the exercise extremely difficult.  Since there is no solution for this problem in "Think Julia", I had to consult [Allen Downey's Python code for this problem](http://thinkpython2.com/code/reducible.py "reducible.py"); but that was far from straightforward, as Python and Julia handle things such as dictionaries and conditional booleans quite differently, and I recall that I needed quite some time to translate the Python code into working Julia code.__*

*__While I remembered how to solve the rest of the problems in this chapter without consulting the code I had written while studying from "Think Julia", this exercise is much, much more sophisticated, and I didn't have the patience to solve it from scratch.  Therefore, most of this code is based on code I wrote for the Julia implementation of this problem, which is ultimately heavily based on Allen Downey's original solution.  I believe I found one error in Downey's solution.  The code will still run despite the error; but I still think it's an error, nonetheless.__*

In [22]:
def make_word_dict(text):
    """
    Reads lines from text and 
    returns a dictionary.
    """
    d = {}
    for line in fin:   
        d[line.strip().lower()] = None
    to_add = ["i", "a", ""]
    for ta in to_add:
        d[ta] = None
    return d

In [23]:
def find_children(word, word_dict):
    """
    Returns a list of all valid words in word_dict 
    that can be formed by removing one letter from word.
    """
    
    children = []
    l = len(word)
    
    if l == 1:
        children.append("")
    
    else:
        for i in range(l):
            if i == 0:
                child = (word[1:])
            elif i == l:
                child = (word[:-1])
            else:
                child = (word[0:i] + word[i + 1:])
                
            if child in word_dict:
                children.append(child)
                
    return children



In [24]:
memo = {}
memo[""] = [""]

def is_reducible(word, word_dict):
    """
    Returns a list of its reducible children if a word 
    is reducible. A string is reducible if it has at 
    least one child that is also reducible. The empty 
    string is considered to be reducible. Adds an entry 
    to the memo dictionary.
    """
    
    if word in memo:
        return memo[word]
    
    results = []
    for child in find_children(word, word_dict):
        if is_reducible(child, word_dict):
            results.append(child)
            
    memo[word] = results
    return results

In [25]:
def all_reducible(word_dict):
    """
    Checks all words in word_dict and returns a list of reducible ones.
    """
    results = []
    for word in word_dict:
        t = is_reducible(word, word_dict)
        if t != []:
            results.append(word)
    return results

In [26]:
# Different in Downey's solution.  Error in original?

def print_trail(word, word_dict):
    """
    Prints the sequence of words that reduces this word 
    to the empty string.  Chooses the first if there is 
    more than one word in the array of reducible words.
    """
    
    if len(word) == 0:
        return
    
    print(word, end = " ")
    t = is_reducible(word, word_dict)
    print_trail(t[0], word_dict)

In [27]:
def print_longest_words(word_dict, n = 5):
    """
    Finds the longest reducible words in word_dict and 
    prints them and their trails. 
    """
    
    words = all_reducible(word_dict)
    
    t = []
    for word in words:
        t.append((len(word), word))
        
    t.sort(reverse = True)
    
    for x, word in t[0:n]:
        print_trail(word, word_dict)
        print("\n")
        

In [28]:
fin = open('words.txt')
my_dict = make_word_dict(fin)
print_longest_words(my_dict, 20)

complecting completing competing compting comping coping oping ping pig pi i 

twitchiest witchiest withiest withies withes wites wits its is i 

stranglers strangers stranger strange strang stang tang tag ta a 

staunchest stanchest stanches stances stanes sanes anes ane ae a 

restarting restating estating stating sating sting ting tin in i 

insolating isolating solating slating sating sting ting tin in i 

completing competing compting comping coping oping ping pig pi i 

carrousels carousels carouses arouses arouse arose arse are ae a 

wrappings wrapping rapping raping aping ping pig pi i 

wranglers wanglers anglers angers agers ages age ae a 

witchiest withiest withies withes wites wits its is i 

whittlers whitters whitter whiter white wite wit it i 

upreaches preaches peaches peaces paces aces ace ae a 

upreached preached peached peaced paced aced ace ae a 

upraisers praisers raisers rasers rases rase ras as a 

twitchier witchier withier wither withe wite wit it i 

twit