# Day 9 Reading Journal

This journal includes several required exercises, but it is meant to encourage active reading more generally.  You should use the journal to take detailed notes, catalog questions, and explore the content from Think Python deeply.

Reading: Think Python Chapter 13, 15

**Due: Monday, February 22 at 12 noon**



## [Chapter 13](http://www.greenteapress.com/thinkpython/html/thinkpython014.html)

The content in this chapter could be very helpful for the text mining mini project. The reading and all exercises within are optional.

 - Section 13.3-4 gives a good example of some techniques for working with files, processing text, and doing some simple analysis. 
 - Section 13.8 and the Markov generation in Exercise 8 can be a lot of fun. 
 - Now that you know a wide range of different data structures, Section 13.9 starts to give some guidance for choosing between them
 - Section 13.10 explains Allen's "4 r's" of debugging strategy

**Deterministic** programs generate same output every time, predictable
  * to make nondeterministic, use algorithms that generate **pseudorandom** numbers
    * can do by using (random) module
      * random.random()
      * random.randint()
      * random.choice()

In [4]:
# program that reads a file and builds a histogram of the words in the file

import string

def process_file(filename):
    hist = dict()
    fp = open(filename)
    for line in fp:
        process_line(line, hist)
    return hist
def process_line(line, hist):
    line = line.replace('-', ' ')
    
    for word in line.split():
        word = word.strip(string.punctuation + string.whitespace)
        word = word.lower()
        
        hist[word] = hist.get(word, 0) + 1
#hist = process_file('emma.txt')
# program reads emma.txt = Emma by Jane Austen

# to count total number of words in file, add up frequencies in hist
def total_words(hist):
    return sum(hist.values())
# number of different words = number of items in dictionary
def different_words(hist):
    return len(hist)
#print 'Total number of words:', total_words(hist)
#print 'Number of different words:', different_words(hist)




In [10]:
# program to find most common words using DSU

def most_common(hist):
    t = []
    for key, value in hist.items():
        t.append((value, key))
        
    t.sort(reverse = True)
    return t

# loop that prints ten most common words
"""
t = most_common(hist)
print 'The most common words are:'
for freq, word in t[0:10]:
    print word, '\t', freq
"""

In [None]:
# optional parameters

"""
def print_most_common(hist, num=10):
    t = most_common(hist)
    print 'The most common words are:'
    for freq, word in t[:num]:
        print word, '\t', freq""
"""

# second parameter is optional
# the default value of num = 10
# if include a second argument, it overrides the default value

(subtract) takes dictionaries d1 and d2 and returns a new dictionary that contains all the keys from d1 that are not in d2

In [12]:
# example of dictionary subtraction

"""
def subtract(d1, d2):
    res = dict()
    for key in d1:
        if key not in d2:
            res[key] = None
    return res

words = process_file('words.txt')
diff = subtract(hist, words)

print "The words in the book that aren't in the word list are:"
for word in diff.keys():
    print word,
"""

'\ndef subtract(d1, d2):\n    res = dict()\n    for key in d1:\n        if key not in d2:\n            res[key] = None\n    return res\n\nwords = process_file(\'words.txt\')\ndiff = subtract(hist, words)\n\nprint "The words in the book that aren\'t in the word list are:"\nfor word in diff.keys():\n    print word,\n'

In [13]:
# to choose a random word from the histogram, build a list 
# with multiple copies of each word according to observed freq
# and choose from list

def random_word(h):
    t = []
    for word, freq in h.items():
        t.extend([word] * freq)    # creates list with freq copies of word
        
    return random.choice(t)

Could also choose random words by 

1) use (keys) to get a list of the words in the book

2) build a list that contains the cumulative sum of the word frequencies 
  * the last item in this list = total number of words in book, n
  
3) choose random number from 1 to n, use bisection search to find index where random number would be inserted in cumulative sum

4) use index to find corresponding word int he word list

Can use random words to do Markov analysis which characterizes probability of the word that comes next

## [Chapter 15](http://www.greenteapress.com/thinkpython/html/thinkpython016.html)

This chapter has very few (and short) exercises, and is more focused on starting to think about classes and objects. If you haven't seen user defined types like classes before, you should read closely and try out some examples on your own. For example, you can write a [Python Tutor example like this one](http://pythontutor.com/visualize.html#code=%23+Example+for+visualizing+object+diagrams+by+stepping+through+the+code%0A%0Aclass+Point(object%29%3A%0A++++%22%22%22Represents+a+point+in+2-D+space.%22%22%22%0A++++pass%0A%0Aclass+Rectangle(object%29%3A%0A++++%22%22%22Represents+a+rectangle.+%0A%0A++++attributes%3A+width,+height,+corner.%0A++++%22%22%22%0A++++pass%0A%0A%0A%23+Create+a+point+to+serve+as+origin+for+our+rectangles%0Ap+%3D+Point(%29%0Ap.x+%3D+10%0Ap.y+%3D+15%0A%0A%23+Create+two+rectangles+with+different+size%0Ar1+%3D+Rectangle(%29%0Ar1.corner+%3D+p%0Ar1.width+%3D+100%0Ar1.height+%3D+100%0A%0Ar2+%3D+Rectangle(%29%0Ar2.corner+%3D+p%0Ar2.width+%3D+50%0Ar2.height+%3D+200%0A%0A%23+Change+the+width+of+r2+-+what+(if+any%29+is+the+effect+on+r1+and+why%3F%0Ar2.width+%3D+150%0Aprint+r1.width%0A%0A%23+Change+the+corner+position+of+r1+-+what+(if+any%29+is+the+effect+on+r2+and+why%3F%0Ar1.corner.x+%3D+20%0Aprint+r2.corner.x&mode=display&origin=opt-frontend.js&cumulative=false&heapPrimitives=false&textReferences=false&py=2&rawInputLstJSON=%5B%5D&curInstr=0) to explore object diagrams and aliasing.

**Note**: The sequence of operations we use in this chapter to create class instances and assign their attributes, e.g. 

```
box = Rectangle()
box.width = 100.0
box.height = 200.0
box.corner = Point()
box.corner.x = 0.0
box.corner.y = 0.0
```

is somewhat clumsy and error prone. Things will get better in the next couple chapters; feel free to look ahead if you'd like a sneak preview.


  * Defining/creating a new type --> a user-defined type = **class**
  * Creating new object = **instantiation**
    * object = **instance** of the class
    * when you print an instance, Python tells you what class it belongs to and where it is stored in memory

In [18]:
# define new type that represents point in 2D space

class Point(object):
    """Represents a 2D space."""
    # print Point --> <class '__main__.Point'>
blank = Point()
    # print blank --> <__main__.Point instance at 0xb7e9d3ac>

# assigning values to named elements of object
# elements are called attributes
blank.x = 3.0
blank.y = 4.0
    

In [19]:
class Rectangle(object):
    """Represents a rectangle.
    
    attributes: width, height, corner.
    """
box = Rectangle()
box.width = 100.0
box.height = 200.0
box.corner = Point()
box.corner.x = 0.0
box.corner.y = 0.0

**Quick check:** In about one sentence using your own words, what is a class?

### Exercise 1  

Write a function called `distance_between_points` that takes two `Points` as arguments and returns the distance between them.

In [20]:
import math
import copy

p1 = Point()
p1.x = 3.0
p1.y = 4.0

p2 = copy.copy(p1)
p2.x = 5.0
p2.y = 6.0

def distance_between_points(Point_1, Point_2):
    return math.sqrt((Point_1.x - Point_2.x)**2 + (Point_1.y - Point_2.y)**2)

distance_between_points(p1, p2)

2.8284271247461903

### Exercise 2  

Write a function named `move_rectangle` that takes a `Rectangle` and two numbers named `dx` and `dy`. It should change the location of the rectangle by adding `dx` to the `x` coordinate of `corner` and adding `dy` to the `y` coordinate of `corner`.

In [22]:
def move_rectangle(Rectangle, dx, dy):
    Rectangle.corner.x += dx
    Rectangle.corner.y += dy
move_rectangle(box, 5, 7)

### Exercise 3  

Write a version of `move_rectangle` that creates and returns a new `Rectangle` instead of modifying the old one.

In [23]:
import copy

def move_rectangle(Rectangle, dx, dy):
    Rect2 = copy.copy(Rectangle)
    Rect2.corner.x += dx
    Rect2.corner.y += dy

## Quick poll
About how long did you spend working on this Reading Journal?
  * 50 minutes

## Reading Journal feedback

Have any comments on this Reading Journal? Feel free to leave them below and we'll read them when you submit your journal entry. This could include suggestions to improve the exercises, topics you'd like to see covered in class next time, or other feedback.

If you have Python questions or run into problems while completing the reading, you should post them to Piazza instead so you can get a quick response before your journal is submitted.