<a href="https://colab.research.google.com/github/alerods-ds/python-for-everybody-colab/blob/main/notebooks/chapter_09.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📘 Chapter 9: Dictionaries - Exercises

This notebook contains the solutions to the exercises from Chapter 9 of *Python for Everybody* by Charles Severance.

## 🧠 Exercise 1
### Download a copy of the file www.py4e.com/code3/words.txt
### Write a program that reads the words in *words.txt* and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the `in` operator as a fast way to check whether a string is in the dictionary.

✅ Answer:

In [2]:
fhand = open('/content/drive/My Drive/python-for-everybody/data/words.txt', 'r')

words_dict = {}

for line in fhand:
    words = line.split()
    for word in words:
        words_dict[word] = True

print(words_dict)

{'Writing': True, 'programs': True, 'or': True, 'programming': True, 'is': True, 'a': True, 'very': True, 'creative': True, 'and': True, 'rewarding': True, 'activity': True, 'You': True, 'can': True, 'write': True, 'for': True, 'many': True, 'reasons': True, 'ranging': True, 'from': True, 'making': True, 'your': True, 'living': True, 'to': True, 'solving': True, 'difficult': True, 'data': True, 'analysis': True, 'problem': True, 'having': True, 'fun': True, 'helping': True, 'someone': True, 'else': True, 'solve': True, 'This': True, 'book': True, 'assumes': True, 'that': True, '{\\em': True, 'everyone}': True, 'needs': True, 'know': True, 'how': True, 'program': True, 'once': True, 'you': True, 'program,': True, 'will': True, 'figure': True, 'out': True, 'what': True, 'want': True, 'do': True, 'with': True, 'newfound': True, 'skills': True, 'We': True, 'are': True, 'surrounded': True, 'in': True, 'our': True, 'daily': True, 'lives': True, 'computers': True, 'laptops': True, 'cell': Tru

💡 Explanation:

This program reads the contents of a text file and stores each word as a key in a dictionary. Since dictionary keys must be unique, repeated words are automatically ignored. The values in the dictionary are arbitrary — in this case, we simply assign `True`, because only the existence of the key matters. Using a dictionary in this way enables fast lookup operations with the `in` operator, allowing us to efficiently check if a given word appears in the file. This demonstrates a common use case of dictionaries for fast membership testing.

## 🧠 Exercise 2
### Write a program that categorizes each mail message by which day of the week the commit was done. To do this look for lines that start with “From”, then look for the third word and keep a running count of each of the days of the week. At the end of the program print out the contents of your dictionary (order does not matter).

### Sample Line:
```
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
```
### Sample Execution:
```
python dow.py
Enter a file name: mbox-short.txt
{'Fri': 20, 'Thu': 6, 'Sat': 1}
```

✅ Answer:

In [3]:
input_file = input('Enter a file name: ')

file_name = f"/content/drive/My Drive/python-for-everybody/data/{input_file}"

fhand = open(file_name, 'r')

days = {}

for line in fhand:
    words = line.split()
    if len(words) < 3 or words[0] != 'From': continue
    if words[2] in days:
        days[words[2]] += 1
    else:
        days[words[2]] = 1

print(days)

Enter a file name: mbox-short.txt
{'Sat': 1, 'Fri': 20, 'Thu': 6}


💡 Explanation:

This program counts how many email messages were sent on each day of the week. It looks for lines starting with `"From"` and extracts the third word, which represents the day. Each day is used as a key in a dictionary, with its value tracking the count. The result is a frequency table of messages by weekday, illustrating how to use dictionaries for simple text-based data analysis.

## 🧠 Exercise 3
### Write a program to read through a mail log, build a histogram using a dictionary to count how many messages have come from each email address, and print the dictionary.

```
Enter file name: mbox-short.txt
{'gopal.ramasammycook@gmail.com': 1, 'louis@media.berkeley.edu': 3,
'cwen@iupui.edu': 5, 'antranig@caret.cam.ac.uk': 1,
'rjlowe@iupui.edu': 2, 'gsilver@umich.edu': 3,
'david.horwitz@uct.ac.za': 4, 'wagnermr@iupui.edu': 1,
'zqian@umich.edu': 4, 'stephen.marquard@uct.ac.za': 2,
'ray@media.berkeley.edu': 1}
```

✅ Answer:

In [4]:
input_file = input('Enter a file name: ')

file_name = f"/content/drive/My Drive/python-for-everybody/data/{input_file}"

fhand = open(file_name, 'r')

email_addresses = {}

for line in fhand:
    words = line.split()
    if len(words) < 2 or words[0] != 'From': continue
    if words[1] in email_addresses:
        email_addresses[words[1]] += 1
    else:
        email_addresses[words[1]] = 1

print(email_addresses)

Enter a file name: mbox-short.txt
{'stephen.marquard@uct.ac.za': 2, 'louis@media.berkeley.edu': 3, 'zqian@umich.edu': 4, 'rjlowe@iupui.edu': 2, 'cwen@iupui.edu': 5, 'gsilver@umich.edu': 3, 'wagnermr@iupui.edu': 1, 'antranig@caret.cam.ac.uk': 1, 'gopal.ramasammycook@gmail.com': 1, 'david.horwitz@uct.ac.za': 4, 'ray@media.berkeley.edu': 1}


💡 Explanation:

This program reads an email log file and builds a histogram of how many messages were sent from each email address. It processes lines that start with `"From"` and extracts the second word, which is the sender’s email. These addresses are used as dictionary keys, with their corresponding values counting how many times each one appears. The result is a frequency dictionary showing the distribution of senders in the dataset.

## 🧠 Exercise 4
### Add code to the above program to figure out who has the most messages in the file. After all the data has been read and the dictionary has been created, look through the dictionary using a maximum loop (see Chapter 5: Maximum and minimum loops) to find who has the most messages and print how many messages the person has.

```
Enter a file name: mbox-short.txt
cwen@iupui.edu 5

Enter a file name: mbox.txt
zqian@umich.edu 195
```

✅ Answer:

In [7]:
input_file = input('Enter a file name: ')

file_name = f"/content/drive/My Drive/python-for-everybody/data/{input_file}"

fhand = open(file_name, 'r')

email_addresses = {}

for line in fhand:
    words = line.split()
    if len(words) < 2 or words[0] != 'From': continue
    if words[1] in email_addresses:
        email_addresses[words[1]] += 1
    else:
        email_addresses[words[1]] = 1

max_count = None
max_sender = None

for email in email_addresses:
    if max_count is None or email_addresses[email] > max_count:
        max_count = email_addresses[email]
        max_sender = email

print(f'The email address that sent the most messages is {max_sender}, with a total of {max_count}.')

Enter a file name: mbox.txt
The email address that sent the most messages is zqian@umich.edu, with a total of 195.


💡 Explanation:

This program builds on the previous exercise by not only counting how many messages each email address sent, but also determining which address sent the most. It uses a maximum loop to iterate through the dictionary of email counts, comparing each value to the current maximum. If a larger value is found, the corresponding email address and count are stored. At the end, it prints the address with the highest message count. This exercise reinforces how to use dictionaries for aggregation and how to extract the maximum value from a key-value mapping.

## 🧠 Exercise 5
### This program records the domain name (instead of the address) where the message was sent from instead of who the mail came from (i.e., the whole email address). At the end of the program, print out the contents of your dictionary.
```
python schoolcount.py
Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7,
'gmail.com': 1, 'caret.cam.ac.uk': 1, 'iupui.edu': 8}
```

✅ Answer:

In [10]:
input_file = input('Enter a file name: ')

file_name = f"/content/drive/My Drive/python-for-everybody/data/{input_file}"

fhand = open(file_name, 'r')

email_addresses = {}

for line in fhand:
    words = line.split()
    if len(words) < 2 or words[0] != 'From': continue

    domain = words[1].split('@')[1]

    if domain in email_addresses:
        email_addresses[domain] += 1
    else:
        email_addresses[domain] = 1

print(email_addresses)

Enter a file name: mbox-short.txt
{'uct.ac.za': 6, 'media.berkeley.edu': 4, 'umich.edu': 7, 'iupui.edu': 8, 'caret.cam.ac.uk': 1, 'gmail.com': 1}


💡 Explanation:

This program analyzes an email log and counts how many messages came from each domain. It processes lines that start with `"From"` and extracts the email address (the second word). Then it uses `split('@')` to isolate the domain part of the address. Each domain is stored as a key in a dictionary, with its value representing the number of times it appears. This approach is useful for identifying which email providers or organizations are most active in the dataset.

# 📚 Summary – What I Learned from These Exercises

In this chapter, I deepened my understanding of Python dictionaries and how they can be used to count, categorize, and analyze data efficiently. I learned how to extract relevant parts of structured text—such as email addresses or domains—and store them as keys in a dictionary with associated counts. I also practiced using maximum loops to identify the most frequent entries and saw how dictionaries enable fast membership tests.

These exercises showed how to process real-world datasets, such as email logs, by combining string manipulation, conditionals, and dictionary-based aggregation. This is a powerful pattern for building simple but effective data analysis tools.
