# List and Dictionary Comprehension
List comprehensions are one of the most loved Python language features.  They are useful because they allow us to concisely form a new list by filtering elements of a collection and transforming the elements that pass the filter in one concise expression.  This idea also extends to dictionaries.

## Learning Goals
- Master the syntax for list and dictionary comprehension
- Rewrite for loops using comprehension as an alternative
- Implement comprehension to filter on a condition
- Reinforce familiarity with lists and dictionaries, such as slicing, accessing an element, and populating.


## Comments to the Instructor
- This document contains solutions.  Remove solutions before assigning to students.

## For Loops and List Comprehension
Without list comprehension, we can write a for loop to extract the words that have the letter a and put them in a new list.

In [19]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = []
for x in fruits:
  if "a" in x:
    newlist.append(x)

print(newlist)

['apple', 'banana', 'mango']


Alternatively, using list comprehension, we can write this much more succinctly.

In [2]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]

newlist = [x for x in fruits if "a" in x]

print(newlist)



['apple', 'banana', 'mango']


### Task
Create an identical list from the first list using list comprehension.  The first list is lst1=[1,2,3,4,5].

### Solution


In [20]:
lst1=[1,2,3,4,5]
lst2 = [i for i in lst1] # filter takes all elements
print(lst2)

[1, 2, 3, 4, 5]


### Task
Create a list from the elements of a range from 1200 to 2000 with steps of 130, using list comprehension.  Hint: range can be constructed using three parameters-- start, stop and step size.


### Solution

In [22]:
rng = range(1200, 2000, 130)
lst = [i for i in rng]
print(lst)

[1200, 1330, 1460, 1590, 1720, 1850, 1980]


### Task
Use list comprehension to contruct a new list from an original list by adding 6 to each item.  The original list is lst1=[44,54,64,74,104].

### Solution

In [5]:
lst1=[44,54,64,74,104]
lst2 = [i+6 for i in lst1]
print(lst2)

[50, 60, 70, 80, 110]


### Task
Using list comprehension, construct a list from the squares of each element in the list.  The original list is lst1=[2, 4, 6, 8, 10, 12, 14].

### Solution

In [23]:
lst1=[2, 4, 6, 8, 10, 12, 14]
lst2 = [i**2 for i in lst1]
print(lst2)

[4, 16, 36, 64, 100, 144, 196]


## Filtering With List Comprehension
In the previous examples, we transformed every element of the original list and created a new list.  (By "transformed," we added 6, squared the entries, etc.)  In some examples, we might have an additional contraint that filters out members of the new list.

### Task
Using list comprehension, construct a list from the squares of each element in the list, if the square is greater than 50.  The original list is lst1=[2, 4, 6, 8, 10, 12, 14].

In [7]:
lst1=[2, 4, 6, 8, 10, 12, 14]
lst2 = [i**2 for i in lst1 if i**2>50] # only take squares which are greater than 50
print(lst2)

[64, 100, 144, 196]


## Dictionary Comprehension

Consider a list of words  ['data', 'science', 'machine', 'learning'].  Our goal is to associate with each word its length.

This example has an interable (a list) called "words."  Using list comprehension, we create a list that contains the length of each word.  In the dictionary comprehension, we need to specify both keys and values based on the iteration.  For the dictionary, the word is the key and the length of the word is the value.

In [9]:
words = ['data', 'science', 'machine', 'learning']
#list comprehension
length = [len(i) for i in words]
print(length)


[4, 7, 7, 8]


In [10]:
words = ['data', 'science', 'machine', 'learning']
#dictionary comprehension
length_dict = {i:len(i) for i in words}
print(length_dict)

{'data': 4, 'science': 7, 'machine': 7, 'learning': 8}


The above example did not have any filtering occuring.  We can add an additional condition to pull out only the words that have length more than 4 letters.

In [11]:
words = ['data', 'science', 'machine', 'learning']
#list comprehension
length = [len(i) for i in words if len(i) > 5]
print(length)

[7, 7, 8]


In [12]:
#dictionary comprehension
dict_length = {i:len(i) for i in words if len(i) > 5}
print(dict_length)

{'science': 7, 'machine': 7, 'learning': 8}


In this example, we want to keep all the words in the dictionary.  If the word length is greater than 5, we will record the length.  If it is less than or equal to 5, we will record "short" as the value.  Dictionary comprehension has a convenient one line syntax for this.

In [16]:
words = ['data', 'science', 'machine', 'learning']
words_dict = {i:len(i) if len(i) > 5 else 'short' for i in words}
print(words_dict)

{'data': 'short', 'science': 7, 'machine': 7, 'learning': 8}


It can also be convenient to iterate over two lists of the same length and put their information into a dictionary.

In [17]:
words = ['data', 'science', 'machine', 'learning']
values = [5, 3, 1, 8]
dict_a = {i:j for i, j in zip(words, values)}
print(dict_a)

{'data': 5, 'science': 3, 'machine': 1, 'learning': 8}


### Task
Consider the list country = [ 'US', 'France', 'Mexico'], as well as the list of the capitals of each of these countries.  Put this information into a dictionary with countries as keys and capitals as values by iterating over two lists and using the zip() function.

### Solution


In [18]:
country = [ 'US', 'France', 'Mexico']
capital = ['Washington DC', 'Paris', 'Mexico City']
dict_cap = {i:j for i, j in zip(country, capital)}
print(dict_cap)

{'US': 'Washington DC', 'France': 'Paris', 'Mexico': 'Mexico City'}


### Task
The given dictionary consists of vehicles and their weights in kilograms. Contruct a list of the names of vehicles with weight below 5000 kilograms. In the same list comprehension make the key names all upper case.
The dictionary is the following:
dict={"Sedan": 1500, "SUV": 2000, "Pickup": 2500, "Minivan": 1600, "Van": 2400, "Semi": 13600, "Bicycle": 7, "Motorcycle": 110}


### Solution

In [24]:
dict={"Sedan": 1500, "SUV": 2000, "Pickup": 2500, "Minivan": 1600, "Van": 2400, "Semi": 13600, "Bicycle": 7, "Motorcycle": 110}
lst = [i.upper() for i in dict if dict[i]<5000]
print(lst)

['SEDAN', 'SUV', 'PICKUP', 'MINIVAN', 'VAN', 'BICYCLE', 'MOTORCYCLE']


### Task (Main Course)
Kaggle provides a data set scraped from metacritic.com which contains a list of the top video games from 1995-2021.  The data set contains a lot of information, but we will focus on creating a dictionary with the name of the video game as a key and the platform (e.g., Nintendo 64) as the value.  
- Download the all_games.csv file provided by Kaggle here: https://www.kaggle.com/datasets/deepcontractor/top-video-games-19952021-metacritic.  Make sure this file is in the same directory as your Jupyter notebook or .py file or make sure you indicate the full file path when reading it.
- Import the data set and save it to a variable all_games.
- If you look at the CSV file, you might notice that there are some entries in the user_score column which are tbd (for "to be determined").  Later in the course we will learn to deal with and potentially replace this type of entry.  For this assignment, we will avoid using this column when it is problematic.
- Create a dictionary with the video game name as a key and the platform as the value.  Print the first 5 values to verify the dictionary has been properly created.  (Note: it might be easiest to convert to a list and use slicing to print the first 5 values to verify.)
- Each platform has an extra space before its name starts.  Remove it using the strip() method.  You can either do this with a for loop or with dictionary comprehension.  Recall that the method items() returns a view object. The view object contains the key-value pairs of the dictionary, as tuples in a list. 
- Create an empty list for each of the six variables in the data set (e.g., name, platform, date, etc.).  Populate the lists with the values from the respective column in the data set.  Hint: loop through your list and use the append() method.  Remember to remove the extra space that comes with the platform variable and convert the Metacritic score from a string to a float.
- Now create a dictionary with game name and release year (note: the year, not the entire date).  The zip() function might be useful for this task.  You can do this with either a for loop or with dictionary comprehension.  Your choice!
- Use this dictionary to determine the year that The Last of Us was released.
- Video games have change a lot in the recent years.  Filter and create a dictionary of games released **after** 2018.  Hint: use an if conditional statement to apply a filter, and change the year from a string to an integer for your if statement.


### Solution


In [29]:
# change working directory if necessary
# navigate to folder that contains CSV file
import os
os.getcwd( )

'/Users/kamilalarripa'

In [30]:
os.chdir('/Users/kamilalarripa/Desktop/Data 271/DictComprehension') 

In [35]:
from csv import reader
# Open and read the dataset
all_games = open('all_games.csv')
all_games = reader(all_games) # type is csv.reader
all_games = list(all_games) # convert to list

In [39]:
print(all_games[0:5]) # look at the first few entries

[['name', 'platform', 'release_date', 'summary', 'meta_score', 'user_review'], ['The Legend of Zelda: Ocarina of Time', ' Nintendo 64', 'November 23, 1998', 'As a young boy, Link is tricked by Ganondorf, the King of the Gerudo Thieves. The evil human uses Link to gain access to the Sacred Realm, where he places his tainted hands on Triforce and transforms the beautiful Hyrulean landscape into a barren wasteland. Link is determined to fix the problems he helped to create, so with the help of Rauru he travels through time gathering the powers of the Seven Sages.', '99', '9.1'], ["Tony Hawk's Pro Skater 2", ' PlayStation', 'September 20, 2000', "As most major publishers' development efforts shift to any number of next-generation platforms, Tony Hawk 2 will likely stand as one of the last truly fantastic games to be released on the PlayStation.", '98', '7.4'], ['Grand Theft Auto IV', ' PlayStation 3', 'April 29, 2008', '[Metacritic\'s 2008 PS3 Game of the Year; Also known as "GTA IV"] What

Notice that this is a list of a lists.  The first entry (list) contains the types of information contained in this data set.

In [40]:
# Dictionary to store video game platforms
platform_dict = {}

# Create dictionary of platforms
for game in all_games[1:]:  # Do not include the header
    name = game[0]
    platform = game[1]
    platform_dict[name] = platform

# Print five items of the dictionary (need to transform it into a list before)
#print(list(platform_dict.items())[:5])
print(list(platform_dict.items())[:5])

[('The Legend of Zelda: Ocarina of Time', ' Nintendo 64'), ("Tony Hawk's Pro Skater 2", ' Nintendo 64'), ('Grand Theft Auto IV', ' PC'), ('SoulCalibur', ' Xbox 360'), ('Super Mario Galaxy', ' Wii')]


In [None]:
# using dictionary comprehension
platform_dict = {name: platform.strip() for name, platform in platform_dict.items()}

In [None]:
# alternatively, with a for loop
 Using a for loop
for name, platform in platform_dict.items():
    platform_dict[name] = platform.strip()

In [42]:
# Initialize empty lists to store column values
name = []
platform = []
date = []
summary = []
meta_score = []
# we are not considering user_score because of the tbd entries, which can't be converted to a float

# Iterate over columns and append values to lists
for game in all_games[1:]: # we start at 1 instead of 0 to skip the header
    name.append(game[0]) 
    platform.append(game[1].replace(" ", "")) # eliminate extra space in front of platform
    date.append(game[2])
    summary.append(game[3])
    meta_score.append(float(game[4])) # convert the string to a float
   

In [44]:
# Dictionary to store dates
year_dict = {}

# Populate dictionary with game's names and release years
for key, value in zip(name, date): # zip lets us iterate over several variables at the same time
    year_dict[key] = value[-4:] # extract the year by taking the last 4 characters of the date

# Print release year of The Last of Us
print(f"The Last of Us was released in {year_dict['The Last of Us']}.")

The Last of Us was released in 2013.


In [45]:
# Alternatively, with dictionary comprehension create a dictionary for name and year

year_dict = {key: value[-4:] for key, value in zip(name, date)} # keep key the same, change value to year

# Print release year of The Last of Us
print(f"The Last of Us was released in {year_dict['The Last of Us']}.")

The Last of Us was released in 2013.


In [46]:
# Video games released after 2018
after_2018 = {key: value[-4:] for key, value in zip(name, date) if int(value[-4:]) > 2018}

# Print the first five items of the dictionary
print(list(after_2018.items())[:5])

[('Disco Elysium: The Final Cut', '2021'), ('The House in Fata Morgana - Dreams of the Revenants Edition -', '2021'), ('Persona 5 Royal', '2020'), ('Tetris Effect: Connected', '2020'), ('The Last of Us Part II', '2020')]


### Task (Dessert)
In a previous assignment, we used a dictionary to represent word frequency in the first chapter of Jane Austen's Pride and Predjudice.  This code was likely written with a for loop to populate the values in the dictionary by looping through words.  Specifically, for each word in the text, we look to see if the word is already in the dictionary.  If it is not, we put it in the dictionary (as the key) with a count value of 1 (as the value).  If the word is already in the dictionary, we update its count value by 1, and then keep reading.  This task can also be accomplished using dictionary comprehension rather than a for loop.  

For the following poem, which word appears three times?  Which words appear two times?  You can check your code by counting yourself, but turn in a solution that uses dictionary comprehension.


### Solution

In [53]:
# Poem by Gelett Burgess
quote = """I never saw a Purple Cow, I never hope to see one; But I can tell you, anyhow, I'd rather see than be one."""

# Count frequency of words in the quote
frequency_dict = {word: quote.split(" ").count(word) for word in quote.split(" ")}

print(frequency_dict)

{'I': 3, 'never': 2, 'saw': 1, 'a': 1, 'Purple': 1, 'Cow,': 1, 'hope': 1, 'to': 1, 'see': 2, 'one;': 1, 'But': 1, 'can': 1, 'tell': 1, 'you,': 1, 'anyhow,': 1, "I'd": 1, 'rather': 1, 'than': 1, 'be': 1, 'one.': 1}


In [55]:
# sort the dictionary based on descending frequency of count
top_words = sorted(frequency_dict.items(), key=lambda x: x[1], reverse=True) # sort words based on frequency as key value, and return with highest frequency first
print(top_words)

[('I', 3), ('never', 2), ('see', 2), ('saw', 1), ('a', 1), ('Purple', 1), ('Cow,', 1), ('hope', 1), ('to', 1), ('one;', 1), ('But', 1), ('can', 1), ('tell', 1), ('you,', 1), ('anyhow,', 1), ("I'd", 1), ('rather', 1), ('than', 1), ('be', 1), ('one.', 1)]


### References
- Dataquest Tutorial: https://www.dataquest.io/blog/python-dictionary-comprehension-tutorial/
- Datacamp List Comprehension Tutorial: https://www.datacamp.com/tutorial/python-list-comprehension
- Datacamp Dictionary Comprehension Tutorial: https://www.datacamp.com/tutorial/python-dictionary-comprehension
- Kaggle data set: https://www.kaggle.com/datasets/deepcontractor/top-video-games-19952021-metacritic
- Purple Cow poem by Gelett Burgess: https://poets.org/poem/purple-cow