In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab02.ipynb")

# Lab 2: List and Dictionary Comprehension
Welcome to Lab 2 of Data 271! This document contains examples and small tasks ("appetizers") for you to make sure you understand the examples. The culminating task ("main course") at the end of the document is more complex, and uses most of the topics you have will have worked through. You should rarely remain stuck for more than a few minutes on questions in labs, so feel free to ask for help. Collaborating on labs is more than okay -- it's encouraged! Explaining things is beneficial -- the best way to solidify your knowledge of a subject is to explain it. Please don't just share answers, though.

For this lab and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `my_list` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you passed previously!

### In today's lab, we will
- Master the syntax for list and dictionary comprehension
- Rewrite for loops using comprehension as an alternative
- Implement comprehension to filter on a condition
- Reinforce familiarity with lists and dictionaries, such as slicing, accessing an element, and populating.


## Overview
List comprehensions are one of the most loved Python language features.  They are useful because they allow us to concisely form a new list by filtering elements of a collection and transforming the elements that pass the filter in one concise expression.  This idea also extends to dictionaries.

## 1. For Loops and List Comprehension
Without list comprehension, we can write a for loop to extract the words from one list and put them in a new list.

In [None]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
    newlist.append(x)

print(newlist)

Alternatively, using list comprehension, we can write this much more succinctly.

In [None]:
newlist = [x for x in fruits]
newlist

**Question 1.1:** Let `lst1=[1,2,3,4,5]`. Create an identical list from the `lst1` using list comprehension.  

In [None]:
lst1 = ...
identical_lst1 = ...
identical_lst1

In [None]:
grader.check("q11")

**Question 1.2:** Create a list from the elements of a range from 1200 to 2000 with steps of 130, using list comprehension.  *Hint*: `range` can be constructed using three arguments - *start*, *stop* and *step size*. First assign `rng` to the appropriate `range`, then use list comprehension.

In [None]:
rng = ...
lst_from_rng = ...
lst_from_rng

In [None]:
grader.check("q12")

**Question 1.3:** Given `lst2` below, use list comprehension to construct a new list by adding 6 to each item.

In [None]:
lst2=[44,54,64,74,104]
new_lst2 = ...
new_lst2

In [None]:
grader.check("q13")

**Question 1.4:** Let `lst3 = [2,4,6,8,10,12,14]`. Use list comprehension to construct a new list from the squares of each element in `lst3`.

In [None]:
lst3 = ...
new_lst3 = ...
new_lst3

In [None]:
grader.check("q14")

## 2. Filtering With List Comprehension
In the previous questions, we transformed every element of the original list and created a new list.  (By "transformed," we added 6, squared the entries, etc.)  In some examples, we might have an additional contraint that filters out members of the new list.

As an example, without list comprehension, we can write a for loop to extract the words that have the letter `a` and put them in a new list.

In [None]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
    if "a" in x:
        newlist.append(x)

print(newlist)

Alternatively, we can also filter using list comprehension.

In [None]:
newlist = [x for x in fruits if "a" in x]
newlist

**Question 2.1:** Given `lst3` defined above, use list comprehension to construct a new list from the squares of each element in `lst3` if the square is greater than 50.

In [None]:
new_filtered_lst3 = ...
new_filtered_lst3

In [None]:
grader.check("q21")

**Question 2.2:** Use list comprehension to construct a list containing the square of all even numbers between 1 and 10 (inclusive).

In [None]:
even_squares = ...
even_squares

In [None]:
grader.check("q22")

## 3. Dictionary Comprehension

Consider a list of words  `words = ['data', 'science', 'machine', 'learning']`.  Our goal is to associate with each word its length.

Using list comprehension, we create a list that contains the length of each word.  In the dictionary comprehension, we need to specify both keys and values based on the iteration.  For the dictionary, the word is the key and the length of the word is the value.

In [None]:
words = ['data', 'science', 'machine', 'learning']

#list comprehension
length = [len(i) for i in words]
length

In [None]:
#dictionary comprehension
length_dict = {i:len(i) for i in words}
length_dict

The above example did not include any filtering.  We can add an additional condition to pull out only the words that have length more than 4 letters.

In [None]:
words = ['data', 'science', 'machine', 'learning']
#list comprehension
length = [len(i) for i in words if len(i) >= 5]
length

In [None]:
#dictionary comprehension
dict_length = {i:len(i) for i in words if len(i) >= 5}
dict_length

In this example, we want to keep all the words in the dictionary.  If the word length is greater than 5, we will record the length.  If it is less than or equal to 5, we will record "short" as the value.  Dictionary comprehension has a convenient one line syntax for this.

In [None]:
words = ['data', 'science', 'machine', 'learning']
words_dict = {i:len(i) if len(i) > 5 else 'short' for i in words}
words_dict

It can also be convenient to iterate over two lists of the same length and put their information into a dictionary.

In [None]:
words = ['data', 'science', 'machine', 'learning']
values = [5, 3, 1, 8]
dict_a = {i:j for i, j in zip(words, values)}
print(dict_a)

**Question 3.1:** Consider the list `country = [ 'US', 'France', 'Mexico']`, as well as the list of the capitals of each of these countries.  Put this information into a dictionary with countries as keys and capitals as values by iterating over two lists and using the `zip()` function.

In [None]:
country = ...
capital = ['Washington DC', 'Paris', 'Mexico City']
dict_cap = ...
dict_cap

In [None]:
grader.check("q31")

**Question 3.2:** Given the list of integers below, construct a dictionary where the keys are the even numbers from the list, and the values are the squares of those even numbers.

In [None]:
numbers_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_squares_dict = ...
even_squares_dict

In [None]:
grader.check("q32")

**Question 3.3:** The given dictionary consists of vehicles and their weights in kilograms. Construct a list containing the names (in upper case) of vehicles that have a weight below 5000 kilograms.

In [None]:
vehicle_dict = {"Sedan": 1500, "SUV": 2000, "Pickup": 2500,
                "Minivan": 1600, "Van": 2400, "Semi": 13600, 
                "Bicycle": 7, "Motorcycle": 110}
vehicle_list = ...
vehicle_list

In [None]:
grader.check("q33")

**Question 3.4:** Construct a dictionary where the keys are the numbers from `numbers_list` in question 3.2 and the values are the square of the number if it's even, and the negative square if it's odd.

In [None]:
pos_neg_squares = ...
pos_neg_squares

In [None]:
grader.check("q34")

# 4. Main Course
A data set from https://www.metacritic.com/ which contains a list of the top video games from 1995-2021 is in your directory.  The data set contains a lot of information, but we will focus on creating a dictionary with the name of a video game as a key and the platform (e.g., Nintendo 64) as the value.  

### Step 1: Import the data and put it in a list
The data for this task is in `.csv` format. We will learn more about `.csv` formats and how to import them into Python later in this course. For now, just run the cell below to import the data for all the games in the dataset. 

In [None]:
from csv import reader #Python's built-in csv module
# Open and read the dataset
all_games = open('all_games.csv')
all_games = reader(all_games) # type is csv.reader

In [None]:
type(all_games)

**Question 4.1:** Convert `all_games` to a list.

In [None]:
all_games = ...
all_games

In [None]:
grader.check("q41")

**Question 4.2:** What data type is the first entry of `all_games`?

1. int
2. str
3. list
4. tuple
5. dict

Assign `all_games_entry_type` to 1, 2, 3, 4, or 5.

In [None]:
all_games_entry_type = ...
all_games_entry_type

In [None]:
grader.check("q42")

**Question 4.3:** Create a dictionary `platform_dict` with the video game name as a key and the platform as the value. Do not include the header (column names) in your dictionary. 

In [None]:
# Dictionary without the column names
platform_dict = ...

# Look at five items of the dictionary (converting to a list first)
list(platform_dict.items())[:5]

In [None]:
grader.check("q43")

**Question 4.4:** Each platform has an extra space before its name starts.  Remove it using the `.strip()` string method and dictionary comprehension.  Recall that the method `.items()` returns a object containing the key-value pairs of the dictionary, as tuples in a list.

In [None]:
platform_dict_clean = ...

# Look at five items of the dictionary (need to transform it into a list before)
list(platform_dict_clean.items())[:5]

In [None]:
grader.check("q44")

**Question 4.5:** Create an list for each of the 5 variables in the data set (`name`, `platform`, `date`, `summary`, `meta_score`). We are not considering `user_score` becauase of the tbd entries, which can't be converted to a float.  Populate the lists with the values from the respective column in the data set. Be sure to convert the Metacritic score from a string to a float, and remove the extra space from the platform.

In [None]:
    
print(name[:3])
print(platform[:3])
print(date[:3])
print(summary[:3])
print(meta_score[:3])

In [None]:
grader.check("q45")

**Question 4.6:** Create a dictionary with game name as keys and release year as values (Note: the year, not the entire date can be entered as a string).  The `zip()` function will useful for this task.  You can do this with either a for loop or with dictionary comprehension. Your choice! You may use mutliple lines if needed.

In [None]:
year_dict = ...
year_dict

In [None]:
grader.check("q46")

**Question 4.7:** Use `year_dict` to determine the year that *The Last of Us* was released. Make your answer an int.

In [None]:
last_of_us_year = ...
last_of_us_year

In [None]:
grader.check("q47")

**Question 4.8:** Video games have changed a lot in the recent years.  Filter and create a dictionary of games released **after** 2018.  *Hint*: Use `year_dict.items()` as your iterator and make sure to convert the year from a string to an integer in your if statement.

In [None]:
after_2018 = ...

# Look at the first five items of the dictionary
list(after_2018.items())[:5]

In [None]:
grader.check("q48")

## 5. Dessert
In a previous assignment, we used a dictionary to represent word frequency in the first chapter of Jane Austen's Pride and Predjudice.  This code was written with a for loop to populate the values in the dictionary by looping through words.  Specifically, for each word in the text, we look to see if the word is already in the dictionary.  If it is not, we put it in the dictionary (as the key) with a count value of 1 (as the value).  If the word is already in the dictionary, we update its count value by 1, and then keep reading.  This task can also be accomplished using dictionary comprehension rather than a for loop.  

**Question 5.1:** For the following poem, first clean it to remove special characters. Then use dictionary comprehension to create a dictionary where each key is a word from the quote and each value is the word's frequency in the quote. Sort the dictionary to determine the most frequent words in the quote. *Hint* `top_words` should be a list of tuples. 

In [None]:
# Poem by Gelett Burgess
quote = """I never saw a Purple Cow, I never hope to see one; But I can tell you, anyhow, I'd rather see than be one."""

# Clean the quote
cleaned_quote = ...

# Count frequency of words in the quote
frequency_dict = ...

# sort the dictionary based on descending frequency of count 
# (use frequency as key value, and return with highest frequency first)
top_words = ...
top_words

In [None]:
grader.check("q51")

### References
- Kaggle data set: https://www.kaggle.com/datasets/deepcontractor/top-video-games-19952021-metacritic
- Purple Cow poem by Gelett Burgess: https://poets.org/poem/purple-cow

### Congratulations!
Gus says congrats on finishing Lab 2! Run the cell below to download a zip and submit to Canvas. 

<img src="gus_this_me.JPG" alt="drawing" width="300"/>

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)