# Day 7 Reading Journal

This journal includes several required exercises, but it is meant to encourage active reading more generally.  You should use the journal to take detailed notes, catalog questions, and explore the content from Think Python deeply.

Reading: Think Python Chapter 11, 12

**Due: Thursday, February 18 at 12 noon**



## [Chapter 11](http://www.greenteapress.com/thinkpython/html/thinkpython012.html)


Dictionaries

A dictionary is like a list, but more general. The indices in a dictionary can be almost any type, unlike a list which requires integers.

A dictionary can be thought of as a mapping between a set of indices (called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a kay-value pair or sometimes an item.

Ex. eng2sp = dict()
eng2sp['one'] = 'uno'
eng2sp['two'] = 'dos'
eng2sp['three'] = 'tres'
print eng2sp
{'one':'uno','three':'tres','two':'dos'}

The order of items in a dictionary is unpredictable. But that doesn't matter because elements of a dictionary are found using the keys to look up the corresponding values.

print eng2sp['two'] 
'dos'

The key 'two' always maps to 'dos', so the order of the items doesn't matter.

If the key isn't in the dictionary, you get an exception.

The len function works on dictionaries; it returns the number of key-value pairs.

The in operator works on dictionaries; it tells you whether something appears as a key in the dicionary (not the value)

The .values() method can be used to see whether something appears as a value in a dictionary

The in operator is different in dictionaries than for lists. Lists use a search algorithm, whereas dictionaries utilize a hashtable that  takes the same amont of time no matter how many items there are in a dictionary.

Suppose you are given a string and you want to count how many times each letter appears. There are several ways to do this:
1. Create 2 variables, then travserse the string and increment each variables per letter.
2. Create a list of 26 variables, convert each character to a number, use the number as an index into the list, and increment the appropriate counter.
3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. Then increment existing values.

Implementation is a way of performing a computation; some are better than others. For dictionaries, an advantage is that  we don't know ahead of time which letters appear in the string and only have to make room for the letters that do appear.

Using a for statement on a dictionary traverses the keys of the dictionary. Remember that the keys are in no particular order.

Lookup - given dictionary and key, find the corresponding value

Reverse Lookup - given value, find key, requires searching algorithms.

Raise - causes an exception. takes an argument for messaging.

Reverse lookup much slower than forward lookup

Lists can appear as values in a dictionary. Ex. given a dictionary that maps letters to frequencies, but now invert it so that the dictionary maps frequencies to letters.

Lists can be values, but not keys

Keys must be hashable. A hash is any function that takes a value and returns an integer. Dictionaries use these to store and lok up items.

It works fine only if the keys are immutable; that's why lists are bad as keys.

Memos - a previously computed value that is stored for later use.

Create a dictionary that stores computed values.


Global variables - variables that are declared outside functions and are part of the global "__main__" frame. Can be accessed by any function and persists through all function calls.

Global variables can't be changed with a variable within functions because those functions are local.

To reassign, you must declare the global variable before you use it in a function.

You can add, remove, and replace elements of mutable global variables, but if you want to reassign it, you must declare it.

global variable name

Numbers with an L in the back are long integers, they can be arbitrarily long, but they consume more time and space as they get bigger.

As you work with bigger datasets it can become unwieldy to debug by printing and checking data by hand. here are some suggestions for debugging large datasets:
1. Scale down the input
2. Check summaries and types.
3. Write self checks
4. Pretty print the output

**Quick check:** In about one sentence using your own words, what is a dictionary?

A dictionary is a mutable list of keys and values, where the keys can of almost any type.

### Exercise 2  

Dictionaries have a [method called `get`](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict) that takes a key and a default value. If the key appears in the dictionary, `get` returns the corresponding value; otherwise it returns the default value. For example:

```
>>> h = histogram('a')
>>> print h
{'a': 1}
>>> h.get('a', 0)
1
>>> h.get('b', 0)
0
```

Use `get` to write `histogram` more concisely. You should be able to eliminate the `if` statement. Add unit tests for your histogram implementation.

In [7]:
def histogram(s):
    """
    This method will create a dictionary that stores all the frequencies of the letters in a word.
    Using the get method, it is much more concise. I am testing to see if it works.
    
    However since the order of the dictionary is unpredictable, it's hard to really come up with good doctests. I just
    did one to show proof that it works.
    >>> histogram('chicken')
    {'c': 2, 'e': 1, 'i': 1, 'h': 1, 'k': 1, 'n': 1}
    """
    d = dict()
    for c in s:
        d[c] = d.get(c,0)+1
    print d
    
import doctest
doctest.run_docstring_examples(histogram, globals(),verbose=True)


Finding tests in NoName
Trying:
    histogram('chicken')
Expecting:
    {'c': 2, 'e': 1, 'i': 1, 'h': 1, 'k': 1, 'n': 1}
ok


### Exercise 4  

Modify `reverse_lookup` so that it builds and returns a list of all keys that map to `v`, or an empty list if there are none. Add unit tests for your implementation.

In [10]:
def reverse_lookup(d, v):
    """
    This method utilizes the dictionary and lists to make reverse lookup return a list instead of just one key.
    Doctesting to check that it works properly.
    
    Once again, it appears that dictionary is randomizing the order, so doctesting is hard.
    
    >>> reverse_lookup({'Im': 1, 'chicken': 1, 'and': 2, 'wut': 3},1)
    ['chicken', 'Im']
    >>> reverse_lookup({'Im': 2, 'chicken': 2, 'and': 2, 'wut': 3},1)
    []
    """
    result = []
    for k in d:
        if d[k] == v:
            result.append(k)
    return result

import doctest
doctest.run_docstring_examples(reverse_lookup,globals(),verbose=True)


Finding tests in NoName
Trying:
    reverse_lookup({'Im': 1, 'chicken': 1, 'and': 2, 'wut': 3},1)
Expecting:
    ['chicken', 'Im']
ok
Trying:
    reverse_lookup({'Im': 2, 'chicken': 2, 'and': 2, 'wut': 3},1)
Expecting:
    []
ok


If you'd like to learn more about errors and exceptions, you can check out the [Python tutorial](https://docs.python.org/2/tutorial/errors.html) or read ahead to [Appendix A](http://www.greenteapress.com/thinkpython/html/thinkpython021.html) of Think Python. If you choose to use doctest for your unit testing, it can also [deal with exceptions](https://docs.python.org/2/library/doctest.html#what-about-exceptions).

**Quick check** What type of objects can be used as keys to a dictionary, i.e. what property must they have?

A lot of objects can be used as keys to a dictionary, including tuples, strings, integers. The objects must be hashable, or in other words immutable. Mutable objects like lists would mess up the key-value pairing.

### Exercise 6 (modified)

Create a memoized version of your Levenshtein distance function from Day 7. What kind of performance change do you see?

Optional: If you'd like to get some quantitative results, you could check out the [timeit](https://docs.python.org/2/library/timeit.html) module

Note: You can also study Fibonacci here if you prefer.

In [2]:
dicts = {}

def levenshtein_distance(a, b):
    """
    This method uses a dictionary to keep track of all possible character set pairs, and
    their corresponding values representing how many more operations are required until the end of the string.
    
    I tested it for both long and small, obviously not using doctest because IDK what the number of operations
    required is.
    """
	if (a,b) in dicts.keys():
		return dicts[a,b] 
	else:	
		if len(a)==0:
			return len(b)
		elif len(b)==0:
			return len(a)
		elif a[0] == b[0]:
			option1 = levenshtein_distance(a[1:],b[1:])
		else:
			option1 = 1 + levenshtein_distance(a[1:],b[1:])
		option2 = 1 + levenshtein_distance(a[1:],b)
		option3 = 1 + levenshtein_distance(a,b[1:])

		minimum = min(option1,option2,option3)

		dicts[a,b] = minimum
		return minimum

print levenshtein_distance('a','b')
print levenshtein_distance('ab','bb')
print levenshtein_distance('imagianthickcenwholoveschickens','imagiantbuttwholovesbutts')
print levenshtein_distance('dfsndnvsdjscovhweofvrogihvwochohv','epfijcmasidcqwpdjcjwqAJLDASKSJKSDJSGERGVSOSI')

1
1
14
39


## [Chapter 12](http://www.greenteapress.com/thinkpython/html/thinkpython013.html)

**Quick check:** In about one sentence using your own words, what is a tuple?

A tuple is an immutable sequence of values, where the values can be any type and are indexed by integers.

A tuple is a sequence of values. The values can be any type, and they are indexed by integers, so in that respect they are a lot like lists. The importance difference is that tuples are immutable.

t = ('a','b','c','d','e')
t1 = 'a',
t2 = tuple()

Different ways to create a tuple, also if the argument in tuple() is a sequence of some sort, the result is a tuple with the element of the sequence.

They work similarly like lists:
1. Bracket operator indxes an element
2. slice operator selects a range of elements.
3. can't modify the elements of a tuple, can you replace one tuple with another.

Tuple assignment allows for east swapping of variables.
a,b = b,a

The left side is tuple of variables, and the right side is a tuple of expressions.

Each value is assigned to its respective vairable. All the expressions on the right are evaluated before any of the assignments.

The number of variables on the left and the values on the right have to be the same.

More generally, the right side can be any kind of sequence.


Tuples as return values can allow for returning multiple values.

qout, rem = divmod(7,3)

or 

def min_max(t):
    return min(t), max(t)
    
Functions can take a variable number of arguments. A parameter name that begins with * gathers arguments into a tuple.

def printall(*args):
    print args

printall(1,2,3)
(1,2,3)

The complement of gather is scatter. If you want to make a sequence of value be passed as a function of multiple arguments, you can use the * operator.

t = (7,3)
divmod(t) Error
divmod(*t)
(2,1)

The zip built in function takes two or more sequences and zips them into a list of tuple where each tuple contains one element from each sequence

s = 'abc'
t = [0,1,2]
zip(s,t)
[('a',0),('b',1),('c',2)]

if the sequences are not the same length, the result has the length of the shorter one.

You can use tuple assignment in a for loop to traverse the list of tuples.

You can use tuple assignment, zip, and for to traverse two sequences at the same time.

Dictionaries have a method called items that returns a list of tuples, each one as a key-value pair

They are in no particular order.

You can also use a tuple list to initialize a new dictionary with the dict() function

Combining zip with dictionaries is a concise way to create a dictionary.

dict(zip(stuff, otherstuff))

Dictionary method update also takes a list of tuples and adds them, as key-value pairs, to an existing dictionary.

Combining items, tuple assignment, and for, you can traverse the keys and values of a dictionary.

You can use tuples as keys in dictionaries

Relational operators work with tuples and other sequences. Python starts by comparing the first element from each sequence. If they are equal, it goes on the next element, until it finds differing elements. Subsequent elements are not considered regardless of their size.

The sort function works similarly. It sorts by the first element, and in the case of a tie, it moves to the second.

A pattern called DSU:
1. Decorate a sequence by building a list of tuples with one or more keys preceding the elements from the sequence
2. Sort the list of tuples
3. Undedcorate by extracting the sorts elements of the sequence.

Sequences of sequences, why choose one over the other?

Strings are more limited than others because of the elements have to be characters. They are also immutable. If you want to change characters, you might want to use a list instead.

Lists are more common than tuples because they are mutable, but a few cases where you prefer tuples:
1. In some contexts, like a return statement, it is syntactically simpler to create a tuple than a list.
2. If you want to use a sequence as a dictionary key, you have to use an immutable type like a string or tuple
3. If you are passing a sequence as an argument to a function, using tuples reduces the potential for unexpected behavior due to aliasing.

Tuples are immutable - can't modify with sort and reverse. But can use sorted and reversed, which take any sequence and return a new list with the same elements in a different order.

Compound data structures - prone to shape errors - errors caused when a data structure has the wrong type, size, or composition.

Ex. expecting a list of one integer, but instead gave a plain old integer.

Structshape debugging program is nice.


### Exercise 1  

Many of the built-in functions use variable-length argument tuples. For example, `max` and `min` can take any number of arguments:

```
>>> max(1,2,3)
3
```

But `sum` does not.

```
>>> sum(1,2,3)
TypeError: sum expected at most 2 arguments, got 3
```

Write a function called ```sumall``` that takes any number of arguments and returns their sum. 

Write unit tests for your function. Do I actually need to keep saying this? Let's assume it's always a good idea :)

In [11]:
def sumall(*args):
    """
    This method utilizes the * operator that groups all arguments into a tuple, thus allowing for an infinite number of arguments
    It then sums all of them buy iterating through the tuple. Here are some doctests to prove functionality.
    
    >>> sumall(1,2,3,4,5,6,7,8,9,10)
    55
    >>> sumall()
    0
    >>> sumall(1,1)
    2
    """
    total = 0
    for i in args:
        total +=i
    return total

doctest.run_docstring_examples(sumall, globals(),verbose=True)
    

Finding tests in NoName
Trying:
    sumall(1,2,3,4,5,6,7,8,9,10)
Expecting:
    55
ok
Trying:
    sumall()
Expecting:
    0
ok
Trying:
    sumall(1,1)
Expecting:
    2
ok


If you're interested in more flexible ways to pass arguments to functions, check out the [Python tutorial](https://docs.python.org/2/tutorial/controlflow.html#more-on-defining-functions). For instance, you can also use keyword arguments, which are collected into a dictionary just like `*` gathers variable numbers of positional arguments into a tuple.

This pattern is very common for defining functions with complex optional behaviors in Python, and you will often see definitions like:

```
def my_func(required_argument1, *arguments, **keywords):
    ...
```

### Exercise

Write a function `sort_by_last_letter` that takes a list of words and returns a new list with the words sorted alphabetically by the _last letter_ in the word. Hint: use the **Decorate, Sort, Undecorate** pattern. Write unit tests for your function.

In [12]:
def sort_by_last_letter(wordlist):
        
        """
        This method utilizes DSU to sort words by the last letter. Doctesting with the stuffs.
        
        >>> sort_by_last_letter(['chicken','butt','Im','a'])
        ['a', 'Im', 'chicken', 'butt']
        >>> sort_by_last_letter([])
        []
        >>> sort_by_last_letter(['chicken','butt','bun','a'])
        ['a', 'bun', 'chicken', 'butt']
        """
        t = []
        for word in wordlist:
            t.append((word[-1],word))
        
        t.sort()
        
        result = []
        for lastletter, word in t:
            result.append(word)
        return result
    
doctest.run_docstring_examples(sort_by_last_letter,globals(), verbose=True)    

Finding tests in NoName
Trying:
    sort_by_last_letter(['chicken','butt','Im','a'])
Expecting:
    ['a', 'Im', 'chicken', 'butt']
ok
Trying:
    sort_by_last_letter([])
Expecting:
    []
ok
Trying:
    sort_by_last_letter(['chicken','butt','bun','a'])
Expecting:
    ['a', 'bun', 'chicken', 'butt']
ok


**Quick check** Give an example of when you might use each sequence type:

- tuple

- list

- string

A tuple would be used in some contexts where perhaps a return statement is being used, or when you want to have a multiple valued key in a dictionary, or when you're passing arguments to a function.

A list would be used for when things need to be mutable. They can be changed and can be used to serve the needs of modification.

A string would be most useful when you're dealing mainly with characters and immutability is a thing.

### Exercise 3  

Write a function called `most_frequent` that takes a string and prints the letters in decreasing order of frequency. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at http://en.wikipedia.org/wiki/Letter_frequencies. 

Allen's solution (try it on your own first): http://thinkpython.com/code/most_frequent.py. 

In [36]:
def most_frequent(string):
    """
    This method sorts a string by decreasing order of frequency. It uses both DSU and the get method.
    One thing I found I couldn't stop is that if the frequencies are the same, then it will still display the letters in reverse order
    
    In any case, here are some doctests to show functionality, bigger strings got too complicated to try and figure out.
    
    >>> most_frequent('Im a chicken')
    i c n m k h e a
    >>> most_frequent('')
    
    >>> most_frequent('abcdefg')
    g f e d c b a
    """
    string1 = string.replace(" ", "").lower()
    d = {}
    for letter in string1:
        d[letter] = d.get(letter,0) + 1
    t = []
    for letter in d:
        t.append((d[letter],letter))
    t.sort(reverse=True)
    for freq, letter in t:
        print letter,
        
import doctest   
doctest.run_docstring_examples(most_frequent, globals(),verbose=True)

print ""

most_frequent("Im a giant chicken")
print ""
most_frequent('im ein RiesenHuhn') #German
print ""
most_frequent('im un pollo gigante') #Spanish
print ""
most_frequent("im un poulet géant") #French,  IDK what to do when letters get weird.

Finding tests in NoName
Trying:
    most_frequent('Im a chicken')
Expecting:
    i c n m k h e a
ok
Trying:
    most_frequent('')
Expecting nothing
ok
Trying:
    most_frequent('abcdefg')
Expecting:
    g f e d c b a
ok

i n c a t m k h g e 
n i e h u s r m 
o n l i g u t p m e a 
u t n � � p o m l i g e a


### Challenge: Exercise 6   (optional)

From a [Car Talk Puzzler](http://www.cartalk.com/content/puzzlers):

What is the longest English word, that remains a valid English word, as you remove its letters one at a time?

Now, letters can be removed from either end, or the middle, but you can’t rearrange any of the letters. Every time you drop a letter, you wind up with another English word. If you do that, you’re eventually going to wind up with one letter and that too is going to be an English word—one that’s found in the dictionary. I want to know what’s the longest word and how many letters does it have?

I’m going to give you a little modest example: Sprite. Ok? You start off with sprite, you take a letter off, one from the interior of the word, take the r away, and we’re left with the word spite, then we take the e off the end, we’re left with spit, we take the s off, we’re left with pit, it, and I. 

Write a program to find all words that can be reduced in this way, and then find the longest one.

This exercise is a little more challenging than most, so here are some suggestions:

- You might want to write a function that takes a word and computes a list of all the words that can be formed by removing one letter. These are the “children” of the word.
- Recursively, a word is reducible if any of its children are reducible. As a base case, you can consider the empty string reducible.
- The word list from [Chapter 9.1](http://www.greenteapress.com/thinkpython/html/thinkpython010.html) Exercise 1 doesn’t contain single letter words. So you might want to add “I”, “a”, and the empty string.
- To improve the performance of your program, you might want to memoize the words that are known to be reducible.

Allen's solution: http://thinkpython.com/code/reducible.py.

In [3]:
fin = open('words.txt')
wordlist = []
for line in fin:
    word = line.strip()
    wordlist.append(word)
    
    
     
def is_reducible(word):
    if word == '':
        return True
    elif word in wordlist:
        for i in range(len(word)):
            word.replace(word[i],"")
            return is_reducible(word)
    else:
        return False
    
def longest_word():
    longestword = ""
    max_length = 0
    for word in wordlist:
        if is_reducible(word) and len(word)>max_length:
            longestword = word
    return word  

print "I'm sad because I didn't have the time to complete this code :("
longest_word() # So after my code just completely broke jupyter notebook, and I lost the last 15 minutes of my work,
#I decided not to continue to bash my head against this problem, as I know how to do it, I just don't have
#the time to implement and debug it, and also risk losing my work in the process. I did read Allen's solution as well,
#so I understand how it works.
            

I'm sad because I didn't have the time to complete this code :(


In [1]:
def make_word_dict():
    """Reads the words in words.txt and returns a dictionary
    that contains the words as keys."""
    d = dict()
    fin = open('words.txt')
    for line in fin:
        word = line.strip().lower()
        d[word] = word

    # have to add single letter words to the word list;
    # also, the empty string is considered a word.
    for letter in ['a', 'i', '']:
        d[letter] = letter
    return d


"""memo is a dictionary that maps from each word that is known
to be reducible to a list of its reducible children.  It starts
with the empty string."""

memo = {}
memo[''] = ['']


def is_reducible(word, word_dict):
    """If word is reducible, returns a list of its reducible children.

    Also adds an entry to the memo dictionary.

    A string is reducible if it has at least one child that is 
    reducible.  The empty string is also reducible.

    word: string
    word_dict: dictionary with words as keys
    """
     # if have already checked this word, return the answer
    if word in memo:
        return memo[word]

    # check each of the children and make a list of the reducible ones
    res = []
    for child in children(word, word_dict):
        t = is_reducible(child, word_dict)
        if t:
            res.append(child)

    # memoize and return the result
    memo[word] = res
    return res


def children(word, word_dict):
    """Returns a list of all words that can be formed by removing one letter.

    word: string

    Returns: list of strings
    """
    res = []
    for i in range(len(word)):
        child = word[:i] + word[i+1:]
        if child in word_dict:
            res.append(child)
    return res


def all_reducible(word_dict):
    """Checks all words in the word_dict; returns a list reducible ones.

    word_dict: dictionary with words as keys
    """
    res = []
    for word in word_dict:
        t = is_reducible(word, word_dict)
        if t != []:
            res.append(word)
    return res


def print_trail(word):
    """Prints the sequence of words that reduces this word to the empty string.

    If there is more than one choice, it chooses the first.

    word: string
    """
    if len(word) == 0:
        return
    print word,
    t = is_reducible(word, word_dict)
    print_trail(t[0])


def print_longest_words(word_dict):
    words = all_reducible(word_dict)

    # use DSU to sort by word length
    t = []
    for word in words:
        t.append((len(word), word))
    t.sort(reverse=True)

    # print the longest 5 words
    for length, word in t[0:5]:
        print_trail(word)
        print '\n'


if __name__ == '__main__':
    word_dict = make_word_dict()
    print_longest_words(word_dict)

complecting completing competing compting comping coping oping ping pig pi i 

twitchiest witchiest withiest withies withes wites wits its is i 

stranglers strangers stranger strange strang stang tang tag ta a 

staunchest stanchest stanches stances stanes sanes anes ane ae a 

restarting restating estating stating sating sting ting tin in i 



## Reading Journal feedback

Have any comments on this Reading Journal? Feel free to leave them below and we'll read them when you submit your journal entry. This could include suggestions to improve the exercises, topics you'd like to see covered in class next time, or other feedback.

If you have Python questions or run into problems while completing the reading, you should post them to Piazza instead so you can get a quick response before your journal is submitted.

The last problem was ridiculous, and while I get how to do it, I just couldn't implement it due to the fact that the complexity of the code akin to that of a mini project's worth of code which I don't have the time to do (as shown by Allen Downey). Not sure if it's the best thing to put in an exercise. 