# Hamming out  
Prince Hamlet has a skull for a coworker, a ghost for middle management, and a vocabulary so frilly it needs its own dressing room. In a play where daggers are mostly metaphors and everyone’s feelings arrive in iambic pentameter, the real drama might be the words themselves.  

Today’s goal: spy on Shakespeare’s word choices like literary detectives, see which characters hog the spotlight, and find out just how far into the show Hamlet finally launches the “To be, or not to be” speech. By the end, there will be fewer mysteries, more counts, and at least one triumphant “Aha!”—hopefully with fewer ghosts.

<div align="center"> <img src="https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExYTlpbThmbHF0enAxYXFnY3U4YzZreGVkaDNjbm56bHNvNWh0Y205ayZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/RZQIIUO9qrTRC/giphy.gif" alt="ham"> </div>

# Goals  
1. Find the most common "words" used by Shakespeare in Hamlet. 
2. Find out which character is mentioned the most  
3. Figure out how far into the text the famous "To be, or not to be" speech is given

<div align="center"> <img src="https://media0.giphy.com/media/v1.Y2lkPTc5MGI3NjExeWM0aXdmZG13ajU3NW5mc2E4MHlsMmJldHp6Ym4xN283aDVuYTVsdSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/PpaTrf7NdB4sKXBNwl/giphy.gif" alt="ham"> </div>

# Getting started  
Reading files in Python is straightforward and powerful: use the built‑in open with a context manager to read everything at once, or loop line by line for streaming and parsing. Keeping the file in the same folder as the notebook simplifies paths; otherwise, use pathlib for robust, cross‑platform paths.  

<div align="center"> <img src="https://media0.giphy.com/media/v1.Y2lkPTc5MGI3NjExZW1ocTF2c3Jva3pveXF2dG12Z2FxaGlxZ2xpenppdHBqemdsdTFkdSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/NFA61GS9qKZ68/giphy.gif" alt="ham"> </div>

## Reading the whole file at once  
* Best when the file is small enough to fit in memory comfortably.  
* The with block automatically closes the file.


In [11]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    text = f.read()

print(type(text))
print(text[:300], "...\n")
print(f"Total characters: {len(text)}")

<class 'str'>
﻿The Project Gutenberg eBook of Hamlet
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included  ...

Total characters: 185760


## Reading line by line  
* Useful for very large files or when processing sequentially.  
* Each iteration returns the next line, including the newline at the end.


In [18]:
c = 0
with open("hamlet.txt", "r", encoding="utf-8") as f:
    for line in f:
        c+=1 
        if c <= 5: 
            print(repr(line)) # repr shows newline characters clearly
        if c > 5:
            break

'\ufeffThe Project Gutenberg eBook of Hamlet\n'
'    \n'
'This ebook is for the use of anyone anywhere in the United States and\n'
'most other parts of the world at no cost and with almost no restrictions\n'
'whatsoever. You may copy it, give it away or re-use it under the terms\n'


## Reading into a list of lines: 
* f.readlines() loads all lines into a list at once.  
* Equivalent to list(f), but explicit and sometimes clearer.


In [10]:
with open("hamlet.txt", "r", encoding="utf-8") as f:  
    lines = f.readlines()

print(type(lines), "with", len(lines), "lines")
print("First 3 lines as raw strings:")
for ln in lines[:3]:
    print(repr(ln))

<class 'list'> with 5391 lines
First 3 lines as raw strings:
'\ufeffThe Project Gutenberg eBook of Hamlet\n'
'    \n'
'This ebook is for the use of anyone anywhere in the United States and\n'


## Stripping and parsing while streaming: 
* strip removes trailing newline characters. 
* Split lines into fields; ignore blanks; accumulate parsed results.


In [15]:
words = []
with open("hamlet.txt", "r", encoding="utf-8") as f: 
    for line in f:
        cleaned = line.strip()
        if not cleaned:
            continue
        # simple tokenization on whitespace
        parts = cleaned.split()
        words.extend(parts)
words[0:100]

['\ufeffThe',
 'Project',
 'Gutenberg',
 'eBook',
 'of',
 'Hamlet',
 'This',
 'ebook',
 'is',
 'for',
 'the',
 'use',
 'of',
 'anyone',
 'anywhere',
 'in',
 'the',
 'United',
 'States',
 'and',
 'most',
 'other',
 'parts',
 'of',
 'the',
 'world',
 'at',
 'no',
 'cost',
 'and',
 'with',
 'almost',
 'no',
 'restrictions',
 'whatsoever.',
 'You',
 'may',
 'copy',
 'it,',
 'give',
 'it',
 'away',
 'or',
 're-use',
 'it',
 'under',
 'the',
 'terms',
 'of',
 'the',
 'Project',
 'Gutenberg',
 'License',
 'included',
 'with',
 'this',
 'ebook',
 'or',
 'online',
 'at',
 'www.gutenberg.org.',
 'If',
 'you',
 'are',
 'not',
 'located',
 'in',
 'the',
 'United',
 'States,',
 'you',
 'will',
 'have',
 'to',
 'check',
 'the',
 'laws',
 'of',
 'the',
 'country',
 'where',
 'you',
 'are',
 'located',
 'before',
 'using',
 'this',
 'eBook.',
 'Title:',
 'Hamlet',
 'Author:',
 'William',
 'Shakespeare',
 'Release',
 'date:',
 'July',
 '1,',
 '2000',
 '[eBook',
 '#2265]']

# Hamletting  
To begin, we should get rid of the header on our text. The header ends with a line having "======" in it. Find out which line that is and cutoff everything before. You can use slicing e.g., `my_list[100:]` to get every line from 100 onwards. 

In [None]:
# 

## Which words are the most common?  
Outline of steps:
1. Make a function to remove any punctuation or problematic characters. \* *Hint--you've done something similar previously...  
2. Split the text into words. We'll consider something a word if it's surrounded by spaces. 
3. Clean up the words with your function. You'll also want to make everything lower or upper case. 
3. Count each unique word. \* *Hint--take a look at the [defaultdict](https://docs.python.org/3/library/collections.html#collections.defaultdict) in the collections package. 
4. Sort your word counts by frequency and show the top 10  

## Which character is mentioned the most? 
Take a look through your output from the last section and find the first character. A list can be found [here](https://www.sparknotes.com/shakespeare/hamlet/characters/). 