In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

# Manipulating Text and File I/O

So a cool thing about this is that we can work with files in a very natural way.  For example, I can import text files such as 'beyonce.txt' and then print the lines by writing the following 

In [None]:
queen_bee = open('beyonce.txt')
count = 0
for line in queen_bee:
    count += 1
    print "The line number is %d." % count
    print line
queen_bee.close()

So we see that yes, the text file is treated as a series of lines. And in terms of reading the lines, that is also straightforward.

In [None]:
queen_bee = open('beyonce.txt')
for line in queen_bee:
    line = line.rstrip() # Remove trailing white space
    words = line.split() # Turns the line into a list of words
    print words[0]
queen_bee.close()

We can then readily start answering various questions.  Like, what if we want to print out only those lines that contain the word 'fire' in them.  Then what we do is the following

In [None]:
queen_bee = open('beyonce.txt')
for line in queen_bee:
    line = line.rstrip() # Remove trailing white space
    words = line.split() # Turns the line into a list of words by breaking line up across spaces.
    if "fire" in words:
        print line
queen_bee.close()

Okay, that didn't do anything.  Why not?  Write the code that will actually do something.

So maybe we also want to process each line not just into words, but remove punctuation marks. How would you do this for just commas?

In [None]:
queen_bee = open('beyonce.txt')

for line in queen_bee:
    line = line.rstrip() # Remove trailing white space
    words = line.split() # Turns the line into a list of words by breaking line up across spaces.
    cnt = 0
    cuts = []
    for word in words:
        if "," in word:
            words[cnt] = word[0:len(word)-1] # Thus, we change the list element, not the word itself.
        cnt+=1
    if "fire" in words:
        print line

queen_bee.close()

Okay, your turn.  Write a program that does two things.  First, not only should it remove commas, but it should keep track from which words the commas were removed.  Second, it should determine if a line contains the word "fire", and if it does, replace "fire" with "flames".  Lastly, it should print the modified line.  To do the word replacement easily, use the list command

`words.index("fire")`

Now of course, the punctuation is all wacky.  Can you find a way to put the punctuation back where it belongs?  To do this, we need to try and use string concantenation.  So for example if I type

In [None]:
print "flames"+","

So using this, and noting that you have kept track of when you removed commas, to put the commas back in the right places, try using string concantenation in just the right places and then use 

`delimeter = ' '`

`print delimeter.join(words)`

to transform the list of words back into a line.  

Now that you can do that, write a program which counts and then removes all punctuation.  Further, your program should count how many times the letter 't' occurs in the document.  

## More Text Processing

Let us build a function whose job it is to provide a histogram of the letters in a given body of text.  Along the way, we need to build helper functions which the main function will call.  But let's get a template of an idea down first.  

We want to pass in a filename, say "juliet.txt" or "beyonce.txt", and then get a plot of the frequency with which different letters appear.  We should return the distribution as a list.

But okay, we want to do a frequency analysis of letters.  That means we want to get rid of punctuation.  So we need to develop a helper function which takes a list of words, strips the punctuation while keeping all the letters.  

Now note, because we have Shakespeare to analyze, we have to allow that things like apostrophes can come at almost any point in the word.  So be careful.  Your helper function should return a list of words with no punctuation.  You will need to use the string helper function

`word.isalnum()`

which tests if a string consists only of alpha/numeric characters.  Also, don't forget about good stuff like 

 `word.append(char)`

In [None]:
def punc_remove(words):
    ind = 0
    for word in words:
            # Here is where you need to start introducing code.
            if :
                wordt = []
                for char in word:
                    if :
                        wordt.append(char)
                delimeter = ""
                words[ind] = delimeter.join(wordt) # Note the absence of a return statement since we rely
                                                   # on pass by reference to modify words both within
                                                   # the function and then used afterwards in its 
                                                   # modified form.
            ind+=1

We then need to think about lower and upper case.  So, we would like a helper function which takes a list of words and converts everything to upper case.  Note, you will need to make use of the string helper function

`word.upper()`

which makes every character in a string uppercase. 

In [None]:
def make_upper(words):
    ind = 0
    for word in words:
        

Okay, now the hard part.  We need to build a helper function which is able to take a given string or word, and a list which is keeping track of the frequency of occurences of each letter.  

In [None]:
def letter_cnt(word,freq_d):
    for let in word: freq_d[let]+=1        

In [None]:
import string
    
def letter_freq(fname):
    fhand = open(fname)
    alpha = list(string.uppercase[:26])
    freq_d = dict()
    for let in alpha: freq_d[let] = 0
