In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("project1.ipynb")

<img src="data6.png" style="width: 15%; float: right; padding: 1%; margin-right: 2%;"/>

# Final Project – Dictionaries

## Data 6, Summer 2024

In this project, you will solve problems involving dictionaries, a key data structure you'll need to be familiar with moving forward. You'll also gain some experience with reading in real data.

This project is due on **Wednesday, August 7th at 11:00PM**. It is due one day earlier than usual so that you'll have time to focus on final exam studying. You must submit the assignment to Gradescope. Submission instructions can be found at the bottom of this notebook. See the [syllabus](https://data6.org/su24/syllabus/) for our late submission policy.

In [None]:
# Just run this cell to load in the relevant dependencies

from datascience import *
from data6_utils import *
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from ipywidgets import interact, widgets
from IPython.display import HTML, display

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## Part 1: Encryption

<br></br>
<hr style="border: 1px solid #fdb515;" />


In this project, we'll be taking a deep dive into a new Python data structure: **dictionaries**. While other data types we've seen in class are quite useful in many ways, dictionaries have a special purpose.

<img src='dictionary.png' width=300>

**Dictionaries** can be very useful. They store key/value pairs that can be used to map one value to another. You can think of a dictionary as a list where the indexes (locations) of the values of the list are no longer their integer locations, but rather their keys.

>In an array, you access the first item with `my_array.item(0)`.

>In a dictionary, you access the "key-th" item with `my_dictionary[key]`.

If we think of list items as having their "address" be their location in the list, then a dictionary value's "address" is its key.

Some important properties of dictionaries to note:
- The key and value **do not** have to be of the same type
- We designate a new key/value entry in a dictionary in this format: *key* **:** *value*
- We store all these key/value entries in a dictionaries with braces `{}` around the ends (like `[]` with a list) and commas separating the entries
- Keys in a dictionary are unique, but values don't have to be unique. 
    - For example {'a': 100, 'a': 200} is not a valid dictionary
    - but, {'a': 100, 'b': 100} is a valid dictionary
    
Let's take a closer look at a dictionary in practice:

In [None]:
my_dictionary = {"a": 100, "b": 200, "c": 300}
print("The value 'a' maps to the value:", my_dictionary["a"])
print("The value 'b' maps to the value:", my_dictionary["b"])
print("The value 'c' maps to the value:", my_dictionary["c"])

### How to Access the Data

We can't access a dictionary's values like we can access a list's values. If we want the "first" item in a dictionary, we cannot ask for `my_dictionary[0]`, because this request is really asking "What does the key 0 map to in this dictionary?". If your dictionary does not have a value associated with the key 0, you will get an error.

In [None]:
my_dictionary[0]

A `KeyError` warning means that you asked for a key that is not in your dictionary. This may happen when you are writing a function with a dictionary, so if you see it, this is what it means.

### Changing the Data

We can add the key value pair `(key, value)` with the following syntax:

> `my_dictionary[key] = value`

Run the cell below to change the `"d"` entry of the dictionary to 400:

In [None]:
my_dictionary["d"] = 400 # Add the key/value pair ("d", 400) to our dictionary
my_dictionary

We can use **any** data type we know as a value in a dictionary...

In [None]:
# Here, the value we add is a list!
my_dictionary["grocery list"] = make_array("apples", "bananas", "carrots")
my_dictionary

...including even having a **dictionary itself** as a value! 

In [None]:
my_dictionary["squares"] = {1: 1, 2: 4, 3: 9, 4: 16}
my_dictionary

### Dictionary Iteration

We can get a list of a dictionary's keys with the `.keys()` function.

In [None]:
my_keys = my_dictionary.keys()
my_keys

In [None]:
# Note the type of this list of keys
type(my_keys)

To iterate over the keys in a dictionary, we can use a `for` loop!

In [None]:
for key in my_dictionary:
    print("I am a key, and my name is:", key)

We can also get a list of a dictionary's values with the `.values()` function.

In [None]:
my_values = my_dictionary.values()
my_values

We can iterate over the values of a dictionary like this:

In [None]:
for value in list(my_dictionary.values()):
    print("I am a value, and my name is:", value)

We can use this to do cool things like change all the values in a dictionary!

In [None]:
def add_one_to_dictionary_values(dictionary):
    for key in dictionary:
        dictionary[key] = dictionary[key] + 1
    return dictionary

new_dictionary = {"data": 6, "cs": 61, "poli sci": 1}
modified_dictionary = add_one_to_dictionary_values(new_dictionary)
modified_dictionary

---
## Question 1.1 - Fake-lish

Let's try writing a function that uses a dictionary that can help us make up a whole new language so we can communicate in secret with our friends! We want to convert all of our text messages to our new language, which we call *Fake-lish*.

*Fake-lish* converts all letters in a message to another letter. We make a dictionary that maps every letter to another letter, which makes our message impossible to read for anyone other than other people who have the *Fake-lish* dictionary!

Spaces should be preserved by this function, so leave spaces as spaces when we convert the message to *Fake-lish*.

Your function below should find each non white-space (i.e. " ") character in `text`, use it as a key to find its corresponding value in `fake_lish_dictionary`, and then use those values to build a new word in Fake-lish. 


In [None]:
def fake_lish(text, fake_lish_dictionary):
    output_text = ...
    for letter in text:
        if letter != " ":
            converted_letter = ...
            output_text = output_text + converted_letter
        else:
            output_text = output_text + " "
    return output_text

# This is the fake-lish dictionary we will use for this question
# You do not need to know how this works, and you do not need to touch it
fld = {}
for char in list(map(chr, range(97,123))):
    fld[char] = chr((ord(char) - 97 + 13) % 26 + 97)
fld

In [None]:
grader.check("q1.1")

### Using Our Function

Now we can use this function to send messages that nobody will understand (unless they crack our code...)!

In [None]:
fake_lish("hello world", fld)

In [None]:
fake_lish("i am speaking in secret hehe", fld)

Just a cool property of the dictionary we chose to use, look what happens when we encrypt one of our messages... we can use the function again to *decrypt* the messages too!

In [None]:
fake_lish("hello can you hear me", fld)

In [None]:
fake_lish("uryyb pna lbh urne zr", fld)

Now we can talk in secret! See **Question 1.3** to see how this can be useful in a cool way!

---

## Question 1.2 - Login

In this problem, we will be using dictionaries to implement a login system. For **new accounts**, we **create** a new username with the password given, and for **existing accounts**, we **log in** if the password given matches the correct password for the given username. If a user tries to make an account with a username that **already exists**, we **do not** allow them to make that new account, and if the password **does not** match the username's password, login **fails**.

Here is what the function should return:
- It should return `"New account"` when a new account is successfully created
- It should return `"No new account"` when a new account is not successfully created
- It should return `"Successful login"` when login is successful
- It should return `"No successful login"` when login is not successful

You will write two parts of this function:
- You must add a new username/password pair to the `accounts` dictionary when a new account is being created
- You must check if the given password is correct for an existing account in `accounts`

You can see that the argument `new_account` appears as `new_account=False`. This makes `new_account` an *optional* argument, and if no third argument is given to `login`, the default value with be `False`. If you want the value of `new_account` to be `True`, you must put `True` in as the third argument (ex. `login("data6student", "1234", True)`).

In [None]:
accounts = {}

In [None]:
def login(username, password, new_account=False):
    if new_account:
        if username not in accounts:
            ...
            print("Account with username:", username, "created with password:", password)
            result = ...
            return result
        else:
            print("Username already exists, please select another username")
            result = ...
            return result
    ...
        print("Successfully logged in as user:", username)
        result = ...
        return result
    else:
        print("Incorrect password, please try again")
        result = ...
        return result

In [None]:
grader.check("q1.2")

<details>
    <summary>Solution (for after you have tried yourself - click this Markdown cell)</summary>
    <code>def login(username, password, new_account=False):
    if new_account:
        if username not in accounts:
            accounts[username] = password
            print("Account with username:", username, "created with password:", password)
            result = "New account"
            return result
        else:
            print("Username already exists, please select another username")
            result = "No new account"
            return result
    elif password == accounts[username]:
        print("Successfully logged in as user:", username)
        result = "Successful login"
        return result
    else:
        print("Incorrect password, please try again")
        result = "No successful login"
        return result</code>
</details>

Let's look at this function at work:

In [None]:
login("ian", "ian12345", True)

In [None]:
login("isaac", "isaac9876", True)

In [None]:
login("kseniya", "kseniya4567", True)

In [None]:
login("ian", "ian12345")

In [None]:
login("isaac", "isaac9876")

In [None]:
login("kseniya", "kseniya4567")

In [None]:
login("ian", "password")

In [None]:
login("isaac", "password")

In [None]:
login("kseniya", "password")

Now if we take a look at our accounts dictionary, we can see that the username/password pairs we have for login are here!

In [None]:
accounts

In [None]:
# Use this cell to explore how the login function works!
# Try and make your own accounts to see how the dictionary helps us log in!


---

## Question 1.3 - Secured Login

Imagine our `accounts` dictionary has been obtained by some people who want to hack into our login system. They have access to all the passwords! We should figure out a way to make sure that even if people have access to the `accounts` dictionary, they still cannot steal peoples' passwords. We can do this using our `fake_lish` function from earlier! Modify the `login` function in `login_secure` so that it not only stores passwords in fake-lish, but also converts from fake-lish back to english while logging someone in!

*Remember*: you have to pass in `fld` as the second input to `fake_lish` for it to work properly.

In [None]:
accounts_secure = {}

In [None]:
def login_secure(username, password, new_account=False):
    if new_account:
        if username not in accounts_secure:
            password_fake_lish = ...
            ...
            print("Account with username:", username, "created with secure password:", password)
            result = ...
            return result
        else:
            print("Username already exists, please select another username")
            result = ...
            return result
    ...
        print("Successfully logged in as user:", username)
        result = ...
        return result
    else:
        print("Incorrect password, please try again")
        result = ...
        return result

In [None]:
grader.check("q1.3")

<details>
    <summary>Solution (for after you have tried yourself - click this Markdown cell)</summary>
    <code>def login_secure(username, password, new_account=False):
    if new_account:
        if username not in accounts_secure:
            password_fake_lish = fake_lish(password, fld) # SOLUTION
            # BEGIN SOLUTION
            accounts_secure[username] = password_fake_lish
            # END SOLUTION
            print("Account with username:", username, "created with secure password:", password)
            result = "New account" # SOLUTION
            return result
        else:
            print("Username already exists, please select another username")
            result = "No new account" # SOLUTION
            return result
    elif fake_lish(password, fld) == accounts_secure[username]: # SOLUTION
        print("Successfully logged in as user:", username)
        result = "Successful login" # SOLUTION
        return result
    else:
        print("Incorrect password, please try again")
        result = "No successful login" # SOLUTION
        return result</code>
</details>

### Put Your Function To the Test

In [None]:
login_secure("ian", "berkeley", True)

In [None]:
login_secure("isaac", "datascience", True)

In [None]:
login_secure("kseniya", "iscool", True)

In [None]:
login_secure("ian", "berkeley")

In [None]:
login_secure("isaac", "datascience")

In [None]:
login_secure("kseniya", "iscool")

In [None]:
login_secure("ian", "password")

In [None]:
login_secure("isaac", "password")

In [None]:
login_secure("kseniya", "password")

Now if we look at our accounts dictionary, it is useless to those hackers!

In [None]:
accounts_secure

If they try to use these passwords to log in, they won't work! Go cybersecurity!

In [None]:
login_secure("data6admin", "tbbqcnffjbeq")

In [None]:
login_secure("ian", "orexryrl")

In [None]:
login_secure("isaac", "qngnfpvrapr")

In [None]:
login_secure("kseniya", "vfpbby")

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## Part 2: Dictionary Fundamentals

<br></br>
<hr style="border: 1px solid #fdb515;" />

## Question 1 – Syntax

In this question, you will solidify your understanding of the syntax necessary for working with dictionaries. You'll also learn how to read in data from external files.

---
## Question 2.1.1

Below, we create a dictionary that contains modern-day slang acronyms and their corresponding full forms.

In [None]:
# DO NOT EDIT THIS CELL – just run it!

more_slang = {
    'haha': 'that was not funny',
    'smh': 'shake my head',
    'lol': 'laugh out loud',
    'GOAT': 'greatest of all time'
}

In the cell below, add a new key-value pair to `more_slang`, corresponding to the abbreviation `'ofr'`. The value can be any string consisting of three words whose first letter is `'o'`, second letter is `'f'`, and third letter is `'r'`. You should not change the cell above.


In [None]:
...

In [None]:
grader.check("q2.1.1")

---
## Question 2.1.2

In the cell below, we've created a new dictionary `even_more_slang` which is a copy of your `more_slang` from 1a. We did this in order to make the autograder work correctly.

**Task:** Your job is to add another key-value pair to `even_more_slang`. The key should be the string `'explicit'`, and the value should be another dictionary. In this nested dictionary, the two keys should be the strings `'lmao'` and `'fml'`, and the values should be four-word and three-word strings that abbreviate to `'lmao'` and `'fml'`, respectively. Don't use any swear words – we don't want to lose our jobs! 😅

That is, after running your cell, `even_more_slang['explicit']['fml']` should be a string consisting of three words.

*Reminder:* The keys of a dictionary can be strings, numbers, bools, or even `None` – just not a list or other dictionary. On the other hand, values in a dictionary can be anything!


In [None]:
even_more_slang = more_slang.copy() # Don't change this

explicit_dict = {
    ...
}

even_more_slang['explicit'] = ...

In [None]:
grader.check("q2.1.2")

---
## Question 2.1.3

We can also read and convert JSON files into Python dictionaries. That's what you'll do in this question.

Before following these instructions, make sure to save your notebook (which you should be doing frequently anyways)!

1. Right click the Jupyter logo in the top left of your screen, and click "Open Link in New Tab" (it may appear as Open...)
2. Navigate to the project1 folder
3. Identify the name of the `.json` file that contains Google Maps data. You may have to open both `.json` files to determine which one it is; you can open files by clicking on them.
4. Set the string `maps_path` below equal to the path to that file. `maps_path` end with `'.json'`.


In [None]:
maps_path = ...

In [None]:
grader.check("q2.1.3")

If you answered the previous part correctly, you should be able to run the following cell:

In [None]:
maps_data = read_json(maps_path)
maps_data

---
## Question 2.1.4

The dictionary above is quite unwieldy, and contains many nested dictionaries! Let's try and extract some data from it programatically (that is, using code).

**Task**: Assign `maps_data_keys` to the `dict_keys` object of all of `maps_data`'s keys. Don't just manually type in all of the keys.

_Hint_: `len(maps_data_keys)` will tell you that there are 6 keys. `'long_name'` is not a key of `maps_data`.


In [None]:
maps_data_keys = ...
maps_data_keys

In [None]:
grader.check("q2.1.4")

---
## Question 2.1.5

Finally, assign `key_1`, `key_2`, and `key_3` below so that `maps_data[key_1][key_2][key_3]` evaluates to the latitude of the location whose data is stored in `maps_data`. We've done `key_2` for you.

_Hint_: Work one step at a time. You know that `key_1` must be one of the six keys in `maps_data_keys`, which you found above. Then, given what we've set `key_2` to, what must `key_3` be?


In [None]:
key_1 = ...
key_2 = 'location'
key_3 = ...

maps_data[key_1][key_2][key_3]

In [None]:
grader.check("q2.1.5")

By the way, `maps_data` contains location information for 84 Viet, a Vietnamese restaurant in Downtown Berkeley. It's quite good, you should try it!

<br></br>
<hr style="border: 1px solid #fdb515;" />

## Question 2 – Emojify 

The default keyboard on iOS suggests emojis for you to use in place of boring, ordinary words.

<img src = "https://support.apple.com/library/content/dam/edam/applecare/images/en_US/iOS/ios12-iphone-x-messages-replace-words-with-emoji.jpg" width=200>

In this question, you will replicate some of that behavior using dictionaries!


### Emojis in Python
In Python, emojis can be included as part of a string. For example:

In [None]:
'🤤'

If you remove the quotes from the emoji above, you will see `SyntaxError: invalid character in identifier`. **Make sure that throughout this question, your emojis are contained within strings!** Fun fact, they cannot currently be used as variable names. Try it and see what error you get!

---
## Question 2.2.1

In the cell below, define a dictionary `fav_emojis` that has the following **five** keys:
- `'happy'`
- `'annoyed'`
- `'tired'`
- `'love'`
- `'food'`

The values corresponding to these five keys must be an emoji. [getemoji.com](https://getemoji.com) allows you to copy and paste emojis. To select an emoji, double click it to highlight it. You may choose any emojis you would like **as long as**:
>- it is copied from the site above
>- it is not in the "New Emojis" category at the bottom

Have fun with it! We've chosen an emoji for `'happy'` for you, but feel free to change it.

**Some troubleshooting tips:**
- After defining your dictionary, you may see some emoijs displayed with `'\U001...'` instead of their actual graphic. **If this happens, pick different emojis**.
- If you fail the test case that says your emojis are invalid, and you're certain you correctly defined your dictionary, you may consider choosing other emojis that are more generic that are more likely to be recognized by our autograder. This most likely won't be a problem.


In [None]:
fav_emojis = {
    'happy': '😀',
    ...
}

fav_emojis

In [None]:
grader.check("q2.2.1")

---
## Question 2.2.2

Now, complete the implementation of the function `emojify`, which takes in a string `message` and returns a new string with all instances of any of the keys in `fav_emojis` replaced with their corresponding emoji value. Example behavior is shown below, though the emojis will be different, depending on what you put in `fav_emojis`. If you passed the previous question, you don't need to change your emojis!

```py
>>> emojify('Filing taxes makes me tired and want food.')
'Filing taxes makes me 😵 and want 🌽.'

>>> emojify('I LOVE love life right now. I am so happy – why do you look so annoyed?!')
'I 💋 💋 life right now. I am so 😀 – why do you look so 💀?!'

>>> emojify("It's not you, it's me... I don't make you haPPy, I make you tired.")
"It's not you, it's me... I don't make you 😀, I make you 😵."
```

*Hint*: You may have seen a similar exercise in lecture.


In [None]:
def emojify(message):
    # This line ensures your code replaces correctly if any of
    # the keys in fav_emojis appears in uppercase in the message
    message = message.lower()

    ...
    
    # Don't change this
    return message

In [None]:
grader.check("q2.2.2")

### Fun Demo

Run the cell below to produce a text box (don't worry about the code itself). Type text in the text box and watch it get emojified live!

In [None]:
def emojify_live(type_here):
    display(HTML('<h2>' + emojify(type_here) + '</h2>'))
interact(emojify_live, type_here="I LOVE food");

<!-- BEGIN QUESTION -->

---
## Question 2.2.3

Nice and simple: What's your favorite emoji? Place it in the Markdown cell below.


_Type your answer here, replacing this text._

<!-- END QUESTION -->

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## Part 3: The Bee Movie Word Counter

<hr style="border: 1px solid #fdb515;" />

In this section of the project, we will begin by loading a `txt` file that contains the complete script of the Bee Movie. The end goal will be to create a dictionary of the word counts from the movie and doing some analysis.

---
## Question 3.1 - The Text File

Navigate to your Jupyter directory and locate the folder for project1. Inside this folder, find the `txt` file containing the *Bee Movie* script. Assign `file_name` to file name as a string.

*Note*: A `txt` file is a file that stores unformatted text.

In [None]:
# Replace with the name of the txt file
file_name = ...
file_path = Path(file_name)

# This code reads the content of the file
file_content = file_path.read_text()

# The first 1000 characters from the script
file_content[:1000]

In [None]:
grader.check("q3.1")

<!-- BEGIN QUESTION -->

---
## Question 3.2 - Analyzing the File Contents

Notice that the file contents look a bit odd. What in particular do you notice that's strange about the file contents, and what does it mean? You can use online resources to research its meaning.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Question 3.3 - Cleaning the Data

We would like to get rid of a couple of items from the script. Most notably, `'-'`, `","`, `"?"`, `"."`, `"!"` since we do not consider these a part of a word. Write a function, `remove_punctuation` that takes in a string and removes these five punctuations.

_Hint_: `str.replace(...)` from lecture 14 may be helpful

In [None]:
# For example, we wish to have the word "fly" and not "fly."
file_content.split()[17]

In [None]:
def remove_punctuation(script):
    ...

# The last line of code should return 'fly' if implemented correctly
no_punctuation = remove_punctuation(file_content)
no_punctuation.split()[17]

In [None]:
grader.check("q3.3")

---
## Question 3.4 - Tomato vs tomato

In Python, strings are case-sensitive, which means that `"Tomato"` and `"tomato"` are considered different words. To accurately get the word counts in a text file, we need to standardize the text to ensure consistency in word counting.

For this question, make all words in `no_punctuation` lower cased and assign it to `cleaned_data`.

In [None]:
# 'Tomato' is not the same as 'tomato'
'Tomato' == 'tomato'

In [None]:
cleaned_data = ...
cleaned_data[:9]

In [None]:
grader.check("q3.4")

---
## Question 3.5 - Word Counter

Implement the function `word_counts` that takes a single parameter, `script`. The goal is to count the frequency of each word in the given text. The function should process the text to create a dictionary where each key is a unique word from the text, and the corresponding value is the number of times that word appears.

Your function should handle the text by splitting it into individual words and then tallying the occurrences of each word. The final output should be a dictionary where words are keys and their counts are integers.

*Note*: `str.split()` ignores `\n` when splitting words. 

In [None]:
"This\n is an example\n".split()

In [None]:
def word_counts(script):
    ...

# Uncomment the last line to see the full dictionary           
bee_movie_dictionary = word_counts(cleaned_data)
# bee_movie_dictionary

In [None]:
grader.check("q3.5")

---
## Question 3.6 - Top 10 Most Popular Words in the Bee Movie

To analyze the word counts from the *Bee Movie* script, follow these tasks:

Create a table named `bee_movie` that has two columns: 
1. `Words`, containing each unique word from the script

2. `Counts`, the count of each word.

Sort this table by the `Counts` column in descending order and assign the sorted table to a variable called `bee_movie_sorted`.
What are the top 10 most popular words in the Bee Movie? Assign these words to `top_10`.

*Note:* Use the dictionary, `bee_movie_dictionary`, to create the table.

In [None]:
bee_movie = ...
bee_movie_sorted = ...
top_10 = ...
top_10

In [None]:
grader.check("q3.6")

## Done! 😇

Congratulations, you've finished the final project for Data 6!

The point breakdown for this assignment is given in the table below:

| **Category** | Points |
| --- | --- |
| Autograder | 40 |
| Written | 4 |
| **Total** | 44 |

## Pets of Data 6
Make sure to be well rested for the Final Exam, just like these two cats!

<img src="sophia.jpeg" width="50%" alt="Two cute sleepy cats on a floral cushion"/>

## Submission

Below, you will see two cells. Running the first cell will automatically generate a PDF of all questions that need to be manually graded, and running the second cell will automatically generate a zip with your autograded answers. You are responsible for submitting both the coding portion (the zip) and the written portion (the PDF) to their respective Gradescope portals. **Please save before exporting!**

> **Important: You must correctly assign the pages of your PDF after you submit to the correct gradescope assignment. If your pages are not correctly assigned and/or not in the correct PDF format by the deadline, we reserve the right to award no points for your written work.**

If there are issues with automatically generating the PDF in the first cell, you can try downloading the notebook as a PDF by colicking on `File -> Save and Export Notebook As... -> PDF`. If that doesn't work either, you can manually take screenshots of your answers to the manually graded questions and submit those. Either way, **you are responsible for ensuring your submision follows our requirements, we will NOT be granting regrade requests for submissions that don't follow instructions.**

In [None]:
from otter.export import export_notebook
from os import path
from IPython.display import display, HTML
name = 'project1'
export_notebook(f"{name}.ipynb", filtering=True, pagebreaks=True)
if(path.exists(f'{name}.pdf')):
    display(HTML(f"Download your PDF <a href='{name}.pdf' download>here</a>."))
else:
    print("\n Pdf generation failed, please try the other methods described above")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)