#Text analysis
A lot of people's projects involve text. Whether it's extracting numerical data from text, or dealing with text directly, string manipulation is essential for any number of data analysis projects. 

Here, we're going to take a walk over to the humanities and see if we can learn anything new about everyones favorite author... William Shakespeare. If we all had to learn it, so should a computer right?

1. Should _Othello_ really be called _Iago_?

2. Can a computer learn the difference between a comedy and a tragedy?

3. Who is the most verbose Shakespearean character?

4. Who has the largest vocabulary? 

5. Did the complexity of Shakespeare's vocabulary change over time?

6. Which is Shakespeares most feminist play?

We could think of any number of quantitative questions to pursue, each requiring slightly different skills and analytical depth. But let's start with something easy and build our way up. 

First things first... **ALWAYS LOOK AT YOUR DATA**. Now is the time to open `'../Data/Shakespeare.txt'` and get a sense for the file formatting. (You can do this quickly by just clicking: [Shapespeare.txt](../Data/Shakespeare.txt)). 

Let's do a brief refresher on some text parsing basics which will give us an excuse to spoil Hamlet for you. 

In [145]:
spoiler_alert = ["O, I die, Horatio!\n",
    "The potent poison quite o'ercrows my spirit.\n",
    "I cannot live to hear the news from England,\n",
    "     But I do prophesy th' election lights\n",
    "On Fortinbras. He has my dying voice.\n",
    "So tell him, with th' occurrents, more and less,\n",
    "\n"             
    "Which have solicited- the rest is silence.             Dies.\n"]

Of course, we can iterate through the lines to see what's in `spoiler_alert`

In [146]:
for line in spoiler_alert:
    print(line)

O, I die, Horatio!

The potent poison quite o'ercrows my spirit.

I cannot live to hear the news from England,

     But I do prophesy th' election lights

On Fortinbras. He has my dying voice.

So tell him, with th' occurrents, more and less,


Which have solicited- the rest is silence.             Dies.



Remember, `\n` is a new line character and as such isn't actually shown on the screen. It's still there, `print` just interprets it to say `new line` which is why we have a double spacing going on here. 

In [147]:
for line in spoiler_alert:
    print(line.strip('\n'))

O, I die, Horatio!
The potent poison quite o'ercrows my spirit.
I cannot live to hear the news from England,
     But I do prophesy th' election lights
On Fortinbras. He has my dying voice.
So tell him, with th' occurrents, more and less,
Which have solicited- the rest is silence.             Dies.


Now the `\n` is really gone, and our text isn't double spaced. But remember, the way we are doing this it is only removing the `\n` in order to print it. We aren't actually changing the list spoiler_alert by removing the `\n`'s in the file. Thus:

In [148]:
for line in spoiler_alert:
    print(line)

O, I die, Horatio!

The potent poison quite o'ercrows my spirit.

I cannot live to hear the news from England,

     But I do prophesy th' election lights

On Fortinbras. He has my dying voice.

So tell him, with th' occurrents, more and less,


Which have solicited- the rest is silence.             Dies.



The `\n` characters are still there! That's okay for now, I just wanted to remind you of this little fact. 

Another handy thing that we may or may not have covered by now is `enumerate`. Suppose I wanted to know the line numbers of everything? It's pretty trivial with this small little list but we won't always work with small little lists. For this, `enumerate` is a lifesaver. Let's see it in action:

In [149]:
for line in enumerate(spoiler_alert):
    print(line)

(0, 'O, I die, Horatio!\n')
(1, "The potent poison quite o'ercrows my spirit.\n")
(2, 'I cannot live to hear the news from England,\n')
(3, "     But I do prophesy th' election lights\n")
(4, 'On Fortinbras. He has my dying voice.\n')
(5, "So tell him, with th' occurrents, more and less,\n")
(6, '\nWhich have solicited- the rest is silence.             Dies.\n')


`enumerate` took our list and made it a list of tuples! The first thing in each tuple was the index within the list, and the second item is the actual item as it resides in the list. This may come in handy later, but for now treat it as a brief aside.

So `string.strip('\n')` removed the `\n` characters. But there is still that weird spacing before "But I do prophesy...". 

We can just say `string.strip()` and by default it will remove _all_ white space which includes `\t` (tab) `\n` (new line) and '' (spaces) from both the right _and_ left ends

In [150]:
for line in spoiler_alert:
    print(line.strip())

O, I die, Horatio!
The potent poison quite o'ercrows my spirit.
I cannot live to hear the news from England,
But I do prophesy th' election lights
On Fortinbras. He has my dying voice.
So tell him, with th' occurrents, more and less,
Which have solicited- the rest is silence.             Dies.


Remember, this doesn't remove anything from the center. There is still a ton of space before 'Dies', for instance. `strip()` will only remove from the ends. Not a problem for us but something else to remember. 

Let's do a little searching:

In [151]:
for line in spoiler_alert:
    if 'The' in line:
        print(line)

The potent poison quite o'ercrows my spirit.



Why isn't line 3 ("I cannot live...") or 6 ("Which have solicited...") printed to the screen?

Capitalization counts!

In [152]:
for line in spoiler_alert:
    if 'the' in line:
        print(line)

I cannot live to hear the news from England,


Which have solicited- the rest is silence.             Dies.



Maybe we don't care about capitalization, and just want to know if 'the' appears anywhere within the line. We can convert the line to all lowercase!

In [153]:
for line in spoiler_alert:
    if 'the' in line.lower():
        print(line)

The potent poison quite o'ercrows my spirit.

I cannot live to hear the news from England,


Which have solicited- the rest is silence.             Dies.



The same works for uppercase:

In [154]:
for line in spoiler_alert:
    if 'the' in line.upper():
        print(line)

We didn't find anything here because we made the lines uppercase, therefore `the` never appears in any of them. We might have meant to say:

In [155]:
for line in spoiler_alert:
    if 'THE' in line.upper():
        print(line)

The potent poison quite o'ercrows my spirit.

I cannot live to hear the news from England,


Which have solicited- the rest is silence.             Dies.



Again we found all three lines!

Is that the only way to find something in a line? Of course not! Programming wouldn't be fun if there weren't 1000 ways to do the same thing. 

In [156]:
for line in spoiler_alert:
    if line.find('the') != -1:
        print(line.strip(), line.find('the'))

I cannot live to hear the news from England, 22
Which have solicited- the rest is silence.             Dies. 23


`find` doesn't just tell us if the text appears in the string. It tells us exactly where the text appears in the line. And if it doesn't find 'the' it just returns -1. 

`index` works similarly:

In [157]:
for line in spoiler_alert:
    if line.index('the') != -1:
        print(line.strip(), line.find('the'))

ValueError: substring not found

Except for the fact that when you use `index`, if the string isn't found it raises an error. So we need to do something slightly different. Remember our old friend try/except?

In [158]:
for line in spoiler_alert:
    try:
        print(line.strip(), line.index('the'))
    except:
        pass

I cannot live to hear the news from England, 22
Which have solicited- the rest is silence.             Dies. 23


Alright, so we know how to search for things inside of strings. Remember, it's pretty easy to find the line number where things occurred by using `enumerate`:

In [159]:
for line in enumerate(spoiler_alert):
    if 'the' in line[1].lower(): #We have to look in the string, which is index one in the enumerate tuple
        print(line[0])

1
2
6


So 'the' in some form or another appears on lines 1, 2, and 6. This might be useful information for you to have at some point. 

Let's get a refresher on splitting strings up a bit. Maybe we actually just want to look at the words in each line as a list rather than working with the entire line as a string:

In [160]:
for line in spoiler_alert:
        print(line.split(','))

['O', ' I die', ' Horatio!\n']
["The potent poison quite o'ercrows my spirit.\n"]
['I cannot live to hear the news from England', '\n']
["     But I do prophesy th' election lights\n"]
['On Fortinbras. He has my dying voice.\n']
['So tell him', " with th' occurrents", ' more and less', '\n']
['\nWhich have solicited- the rest is silence.             Dies.\n']


Here we split up each line and made it a list according to where the commas occurr. If there were no commas, then the line just became a list with a single element. If there were commas, the string was split into separate strings based on the position of those commas. Notice that the commas themselves are gone! 

We could also split them up based on spaces to isolate single words:

In [161]:
for line in spoiler_alert:
        print(line.split(' '))

['O,', 'I', 'die,', 'Horatio!\n']
['The', 'potent', 'poison', 'quite', "o'ercrows", 'my', 'spirit.\n']
['I', 'cannot', 'live', 'to', 'hear', 'the', 'news', 'from', 'England,\n']
['', '', '', '', '', 'But', 'I', 'do', 'prophesy', "th'", 'election', 'lights\n']
['On', 'Fortinbras.', 'He', 'has', 'my', 'dying', 'voice.\n']
['So', 'tell', 'him,', 'with', "th'", 'occurrents,', 'more', 'and', 'less,\n']
['\nWhich', 'have', 'solicited-', 'the', 'rest', 'is', 'silence.', '', '', '', '', '', '', '', '', '', '', '', '', 'Dies.\n']


The last word in each line still has that pesky '`\n`' so lets put some commands together:

In [162]:
for line in spoiler_alert:
        print(line.strip().split(' '))

['O,', 'I', 'die,', 'Horatio!']
['The', 'potent', 'poison', 'quite', "o'ercrows", 'my', 'spirit.']
['I', 'cannot', 'live', 'to', 'hear', 'the', 'news', 'from', 'England,']
['But', 'I', 'do', 'prophesy', "th'", 'election', 'lights']
['On', 'Fortinbras.', 'He', 'has', 'my', 'dying', 'voice.']
['So', 'tell', 'him,', 'with', "th'", 'occurrents,', 'more', 'and', 'less,']
['Which', 'have', 'solicited-', 'the', 'rest', 'is', 'silence.', '', '', '', '', '', '', '', '', '', '', '', '', 'Dies.']


The operations were performed in order. First we stripped white space off the left and right sides of the string. Then, whatever that created was split into a list based on spaces. Now we have a list of words for each line, but what if we wanted a list of words for the entire text?

In [163]:
total_list = []
for line in spoiler_alert:
    line_as_list = line.strip().split(' ')
    for word in line_as_list:
        total_list.append(word)
print(total_list)

['O,', 'I', 'die,', 'Horatio!', 'The', 'potent', 'poison', 'quite', "o'ercrows", 'my', 'spirit.', 'I', 'cannot', 'live', 'to', 'hear', 'the', 'news', 'from', 'England,', 'But', 'I', 'do', 'prophesy', "th'", 'election', 'lights', 'On', 'Fortinbras.', 'He', 'has', 'my', 'dying', 'voice.', 'So', 'tell', 'him,', 'with', "th'", 'occurrents,', 'more', 'and', 'less,', 'Which', 'have', 'solicited-', 'the', 'rest', 'is', 'silence.', '', '', '', '', '', '', '', '', '', '', '', '', 'Dies.']


This isn't so bad, but there are still some weird things in here that we probably don't want. For instance all of those empty strings that precede 'Dies'. Or maybe some of the punctuation. 

In [164]:
total_list = []
for line in spoiler_alert:
    line_as_list = line.strip().split(' ')
    for word in line_as_list:
        total_list.append(word.rstrip('!'))
print(total_list)

['O,', 'I', 'die,', 'Horatio', 'The', 'potent', 'poison', 'quite', "o'ercrows", 'my', 'spirit.', 'I', 'cannot', 'live', 'to', 'hear', 'the', 'news', 'from', 'England,', 'But', 'I', 'do', 'prophesy', "th'", 'election', 'lights', 'On', 'Fortinbras.', 'He', 'has', 'my', 'dying', 'voice.', 'So', 'tell', 'him,', 'with', "th'", 'occurrents,', 'more', 'and', 'less,', 'Which', 'have', 'solicited-', 'the', 'rest', 'is', 'silence.', '', '', '', '', '', '', '', '', '', '', '', '', 'Dies.']


Here we're stripping (from the right side only) the exclamation marks. But we might also want to strip the periods, colons, semi-colons, and question marks. We can just lump all these guys into rstrip:

In [165]:
total_list = []
for line in spoiler_alert:
    line_as_list = line.strip().split(' ')
    for word in line_as_list:
        total_list.append(word.rstrip('!?.-;:'))
print(total_list)

['O,', 'I', 'die,', 'Horatio', 'The', 'potent', 'poison', 'quite', "o'ercrows", 'my', 'spirit', 'I', 'cannot', 'live', 'to', 'hear', 'the', 'news', 'from', 'England,', 'But', 'I', 'do', 'prophesy', "th'", 'election', 'lights', 'On', 'Fortinbras', 'He', 'has', 'my', 'dying', 'voice', 'So', 'tell', 'him,', 'with', "th'", 'occurrents,', 'more', 'and', 'less,', 'Which', 'have', 'solicited', 'the', 'rest', 'is', 'silence', '', '', '', '', '', '', '', '', '', '', '', '', 'Dies']


We'll still have `'`s but maybe we want to keep them? Also how do we feel about hyphenated words? Our current practice will keep `well-known` as one long word. Perhaps we are okay with that, and perhaps not. It's a limitation to be aware of.

Let's wrap this up with a few little changes

In [166]:
total_list = []
for line in spoiler_alert:
    line_as_list = line.strip().split(' ')
    for word in line_as_list:
        if len(word) > 0:
            total_list.append(word.rstrip('!?.-;:').lower())
print(total_list)

['o,', 'i', 'die,', 'horatio', 'the', 'potent', 'poison', 'quite', "o'ercrows", 'my', 'spirit', 'i', 'cannot', 'live', 'to', 'hear', 'the', 'news', 'from', 'england,', 'but', 'i', 'do', 'prophesy', "th'", 'election', 'lights', 'on', 'fortinbras', 'he', 'has', 'my', 'dying', 'voice', 'so', 'tell', 'him,', 'with', "th'", 'occurrents,', 'more', 'and', 'less,', 'which', 'have', 'solicited', 'the', 'rest', 'is', 'silence', 'dies']


Now we can at least get word counts. For which we'll rely on Counter:

In [167]:
from collections import Counter
Counter(total_list)

Counter({'i': 3, 'the': 3, 'my': 2, "th'": 2, 'with': 1, 'dying': 1, 'cannot': 1, 'spirit': 1, 'do': 1, 'hear': 1, 'and': 1, 'quite': 1, 'prophesy': 1, 'silence': 1, 'to': 1, 'die,': 1, 'but': 1, 'voice': 1, 'election': 1, 'tell': 1, 'live': 1, 'fortinbras': 1, 'have': 1, 'less,': 1, 'so': 1, 'more': 1, 'which': 1, 'horatio': 1, 'occurrents,': 1, 'poison': 1, 'potent': 1, 'rest': 1, 'o,': 1, 'him,': 1, 'lights': 1, 'on': 1, 'he': 1, 'england,': 1, 'dies': 1, 'from': 1, 'is': 1, 'solicited': 1, 'news': 1, "o'ercrows": 1, 'has': 1})

#Back to Shakespeare.txt

So we were able to get a dictionary of all the word counts in our little sample text. Of course, we're not working with a sample text. Our text is gigantic and contains a lot of stuff we don't need. So let's get serious and move on to some big(-ger) data. We need to start by reading in the file using python:

In [168]:
complete_works = open('../Data/Shakespeare.txt').readlines()

For the next few tasks, I'm only interested  in 'Othello'. So look at where Othello begins in the file: how are we going to extract Othello and _only_ Othello from this large list of lines? How do _we_ know when Othello begins and ends?

**Exercise:** Iterate through `complete_works` and add only the lines relevant to 'Othello' to the new list `othello_lines`. (remember CAPITALIZATION counts!)

In [169]:
othello_lines = []
###Place your code here

###Answer
othello = False
for line in complete_works:
    if 'THE TRAGEDY OF OTHELLO' in line:
        othello = True
    if othello == True:
        othello_lines.append(line)
        if 'THE END' in line:
            othello = False

Before moving on, always make sure you really have what you _think_ you have. 

In [170]:
print(othello_lines[0:20])
print('**************')
print(othello_lines[-20:])

['THE TRAGEDY OF OTHELLO, MOOR OF VENICE\n', '\n', 'by William Shakespeare\n', '\n', '\n', '\n', 'Dramatis Personae\n', '\n', '  OTHELLO, the Moor, general of the Venetian forces\n', '  DESDEMONA, his wife\n', '  IAGO, ensign to Othello\n', '  EMILIA, his wife, lady-in-waiting to Desdemona\n', '  CASSIO, lieutenant to Othello\n', '  THE DUKE OF VENICE\n', '  BRABANTIO, Venetian Senator, father of Desdemona\n', '  GRATIANO, nobleman of Venice, brother of Brabantio\n', '  LODOVICO, nobleman of Venice, kinsman of Brabantio\n', '  RODERIGO, rejected suitor of Desdemona\n', '  BIANCA, mistress of Cassio\n', '  MONTANO, a Cypriot official\n']
**************
["  GRATIANO.                  All that's spoke is marr'd.\n", "  OTHELLO. I kiss'd thee ere I kill'd thee. No way but this,\n", '    Killing myself, to die upon a kiss.\n', '                                          Falls on the bed, and dies.\n', '  CASSIO. This did I fear, but thought he had no weapon;\n', '    For he was great of hear

#Dialogue
So we should all have a line-by-line reading of Othello. Now what? Well, your choice here is going to depend a lot on what you want to analyze! I'm particularly interested in knowing whether Othello or Iago is smarter. Specifically, what is the size of their vocabulary? And how does it compare to that of other characters?

First off, since we only care about OTHELLO and IAGO, maybe it's not worth our time trying to extract the character list. If we want to compare the vocabulary of other characters, we'll have to do this but for now let's just worry about OTHELLO and IAGO. The fact that they are capitalized makes things nice and easy for us (their names _probably_ aren't capialized when they are referred to in dialogue from other characters. Can we be sure? No. As a rough approximation? Sure.).
```
"  CHARACTER. blahblahblah
     blahblahblah"
```
Two things jump out at me. How about you?

1. Before a character speaks there are always (usually?) *2* spaces, followed by the character in all capital letters, followed by a period. 
2. It also looks like some speeches are longer than a single line, but those speeches have 4 spaces(!). We'll make note of that for later.

In [171]:
sample_text = othello_lines[3792:3810]
for line in sample_text:
    print(line.rstrip())

  LODOVICO. Where is this rash and most unfortunate man?
  OTHELLO. That's he that was Othello. Here I am.
  LODOVICO. Where is that viper? Bring the villain forth.
  OTHELLO. I look down towards his feet; but that's a fable.
    If that thou be'st a devil, I cannot kill thee.      Wounds Iago.
  LODOVICO. Wrench his sword from him.
  IAGO.                                I bleed, sir, but not kill'd.
  OTHELLO. I am not sorry neither. I'ld have thee live,
    For, in my sense, 'tis happiness to die.
  LODOVICO. O thou Othello, that wert once so good,
    Fall'n in the practice of a damned slave,
    What shall be said to thee?
  OTHELLO.                      Why, anything;
    An honorable murtherer, if you will,
    For nought did I in hate, but all in honor.
  LODOVICO. This wretch hath part confess'd his villainy.
    Did you and he consent in Cassio's death?
  OTHELLO. Ay.


**Exercise:** Given the sample_text above, make a list of all words spoken by Lodovico

In [172]:
lodovico_list = []

###Place your code here


###Answer
lodovico_speaking = False
for line in sample_text:
    if 'LODOVICO.' in line:
        lodovico_speaking = True
        listy = line.strip().split(' ')
        for word in listy:
            if len(word) > 0:
                lodovico_list.append(word)
    elif lodovico_speaking == True:
        if line[:4] == '    ':
            listy = line.strip().split(' ')
            for word in listy:
                if len(word) > 0:
                    lodovico_list.append(word)
        else:
            lodovico_speaking = False
    else:
        pass

In [173]:
print(lodovico_list)

['LODOVICO.', 'Where', 'is', 'this', 'rash', 'and', 'most', 'unfortunate', 'man?', 'LODOVICO.', 'Where', 'is', 'that', 'viper?', 'Bring', 'the', 'villain', 'forth.', 'LODOVICO.', 'Wrench', 'his', 'sword', 'from', 'him.', 'LODOVICO.', 'O', 'thou', 'Othello,', 'that', 'wert', 'once', 'so', 'good,', "Fall'n", 'in', 'the', 'practice', 'of', 'a', 'damned', 'slave,', 'What', 'shall', 'be', 'said', 'to', 'thee?', 'LODOVICO.', 'This', 'wretch', 'hath', 'part', "confess'd", 'his', 'villainy.', 'Did', 'you', 'and', 'he', 'consent', 'in', "Cassio's", 'death?']


The code that we wrote above should (might?) be generalizable. But who knows. We'll have to check slowly.

In [174]:
for line in othello_lines[:500]: #No reason to work with the full text while we're still learning
    if 'OTHELLO.' in line:
        print(line)
print('##########')

for line in othello_lines[-100:]: #No reason to work with the full text while we're still learning
    if 'OTHELLO.' in line:
        print(line)

  OTHELLO. 'Tis better as it is.

  OTHELLO.               Let him do his spite.

  OTHELLO.               Not I; I must be found.

  OTHELLO. The servants of the Duke? And my lieutenant?

  OTHELLO.               What is the matter, think you?

  OTHELLO.             'Tis well I am found by you.

  OTHELLO.                                     Have with you.

  OTHELLO.                  Holla! Stand there!

  OTHELLO. Keep up your bright swords, for the dew will rust them.

  OTHELLO.                   Hold your hands,

  OTHELLO.               What if I do obey?

  OTHELLO. Most potent, grave, and reverend signiors,

##########
  OTHELLO. I look down towards his feet; but that's a fable.

  OTHELLO. I am not sorry neither. I'ld have thee live,

  OTHELLO.                      Why, anything;

  OTHELLO. Ay.

  OTHELLO. I do believe it, and I ask your pardon.

  OTHELLO. Well, thou dost best.

  OTHELLO. O villain!

  OTHELLO.                     O the pernicious caitiff!

  OTHELLO.   

Looks good to me. Of course, this is only finding the first line that Othello speaks. Not any of his other lines. 

Let's go ahead and run our code on Othello and Iago to see who has a more complex vocabulary!

In [175]:
iago_list = []
othello_list = []
###Place your code here


###Answer
iago_speaking = False
othello_speaking = False

for line in othello_lines:
    if 'IAGO.' in line:
        iago_speaking = True
        listy = line.strip().split(' ')
        for word in listy:
            if len(word) > 0:
                iago_list.append(word)
    elif iago_speaking == True:
        if line[:4] == '    ':
            listy = line.strip().split(' ')
            for word in listy:
                if len(word) > 0:
                    iago_list.append(word)
        else:
            iago_speaking = False
    else:
        pass
    
    if 'OTHELLO.' in line:
        othello_speaking = True
        listy = line.strip().split(' ')
        for word in listy:
            if len(word) > 0:
                othello_list.append(word)
    elif othello_speaking == True:
        if line[:4] == '    ':
            listy = line.strip().split(' ')
            for word in listy:
                if len(word) > 0:
                    othello_list.append(word)
        else:
            othello_speaking = False
    else:
        pass
    
    
print(Counter(iago_list))
print('#######')
print(Counter(othello_list))



Counter({'IAGO.': 272, 'I': 232, 'the': 200, 'and': 184, 'to': 159, 'a': 128, 'of': 127, 'you': 118, 'in': 109, 'my': 85, 'that': 81, 'not': 78, 'be': 75, 'is': 74, 'his': 71, 'your': 68, 'with': 67, 'And': 62, 'he': 58, 'her': 57, 'for': 56, 'it': 55, 'this': 54, 'do': 52, 'have': 51, 'him': 47, 'as': 46, 'will': 44, 'are': 44, 'me': 40, 'but': 38, 'by': 36, 'shall': 34, 'she': 31, 'so': 31, 'To': 30, 'But': 30, 'That': 28, 'Cassio': 28, 'If': 25, 'am': 24, 'would': 24, 'what': 24, 'or': 23, 'The': 22, 'You': 22, 'love': 22, 'if': 22, 'may': 21, 'on': 21, 'thy': 21, 'hath': 21, 'As': 20, 'from': 20, 'at': 20, 'see': 20, 'It': 20, 'must': 19, "'tis": 19, 'such': 19, 'For': 19, 'What': 19, 'out': 19, 'their': 19, 'they': 19, 'an': 18, 'She': 18, 'thou': 18, 'He': 18, 'you.': 18, 'go': 18, 'some': 17, 'all': 17, 'more': 17, 'was': 17, 'had': 17, 'My': 16, 'yet': 16, 'no': 16, "I'll": 16, 'good': 15, 'sir,': 15, 'you,': 15, 'think': 15, 'which': 15, 'most': 15, 'Do': 15, 'him,': 15, 'than

**Exercise:** Our first step towards analyzing Othello was to extract out the Othello specific text from the complete works file, which we did in a pretty arbitrary way to make our lives easier. How would you do it in an automated fashion? Specifically, read the following file and return a _list_ containing the names of every play within the file. 

In [176]:
complete_works = open('../Data/Shakespeare.txt').readlines()
play_list = []
###Your code here


**Exercise:** How well does the code that we wrote above work on Hamlet? Romeo and Juliet? Who has the biggest vocabulary in all of Shakespeare?