<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Python List & Dictionary Comprehensions

---

### Learning Objectives
*After this lesson, you will be able to:*
- Create list comprehensions 
- Create dictionary comprehensions 
- Use conditional logic (`if`/`else`) within list & dictionary comprehensions
- Use `zip()` and `enumerate()` within list & dictionary comprehensions
- Use nested list & dictionary comprehensions 

---

### Lesson Guide

- [Warm-Up on Python Basics](#warm-up)
- [Basic List Comprehensions](#list_comprehensions)
- [Basic Dictionary Comprehensions](#dictionary_comprehensions)
- [Conditional Logic within Comprehensions](#conditional_comprehensions)
- [Zip and Enumerate within Comprehensions](#zip_enumerate)
- [Nested Comprehensions](#nested_comprehensions)

<a id='warm-up'></a>

### Warm-Up on  Python Basics

---

In the next 10-15min try to write the code for the questions below on the Python basics that you reviewed yesterday.

#### Warm-Up A:  Remove the last element in `lstA` below, then sort it, insert the number `22` into the 5th position, and take a slice of the 7th through the 10th elements (inclusive). 

**Hint:** You can use the function `dir()` to find out which attributes and methods are available for any python object.

In [192]:
lstA = [13,15,-4,8,23,25,17,44,-7,-10,0,1,5,0,2,8,45]

In [206]:
ls = sorted(lstA[:-1])

In [207]:
ls.insert(4, 22)

In [210]:
ls[6:10]

[1, 2, 5, 8]

#### Warm-Up B:  Remove Diesel from `dictB` below.  Add Teddy to the dictionary with a value of 5. Get a list of the key, value tuples now in the dictionary.

In [211]:
dictB = {'Mabel':10,
         'Wilbur':12,
         'Diesel':4,
         'Schatzie':9}

In [212]:
dictB.pop('Diesel')

4

In [214]:
dictB['Teddy'] = 5

In [217]:
list(dictB.items())

[('Mabel', 10), ('Wilbur', 12), ('Schatzie', 9), ('Teddy', 5)]

#### Warm-Up C:  For the string below, first strip the whitespace on both sides, then replace the '&' with 'and', then get rid of the exclamation marks, then convert it all to lowercase letters, and finally split the string into a list of individual words.

In [27]:
#FIRST: strip the whitespace on both sides
#SECOND: replace the '&' with 'and'
#THIRD: get rid of the exclamation marks
#FOURTH: convert the whole string to lowercase letters
#FIFTH: split the string into a list of individual words

stringC = ' Pizzas & BURRITOS!!! are indisputably the BEST foods!   '

In [222]:
stringC.strip().replace('&', 'and').replace('!', '').lower().split()

['pizzas', 'and', 'burritos', 'are', 'indisputably', 'the', 'best', 'foods']

#### Warm-Up D:  Write a for-loop that iterates through `exam_scores`. Create a new dictionary that keeps each student's name, but records the exam scores as letter grades. Each entry should look like `'Bradley': ['B','A','B','C']`  

In [224]:
#Use the following grade boundaries:
# A : 70 or above
# B : between 30 and 69
# C : between 10 and 29
# D : anything less than 10
# if one of the grades is missing, simply skip over it

exam_scores = {'Emily':[72,52,48,63],
               'Demi':[78,65,55,75],
               'Kush':[25,12,20,8],
               'Fortune':[45,58,62,73],
               'Sarah':[43,'?',38,52]}

In [223]:
def grade_converter(g):
    if isinstance(g, int):
        if g < 10:
            return 'D'
        elif  (10 <= g ) & (g <= 29):
            return 'C'
        elif (30 <= g ) & (g <= 69):
            return 'B'
        else: 
            return 'A'
    else:
        return None

In [None]:
from collections import defaultdict

d=defaultdict(list)

In [226]:
exam_scores.keys()

dict_keys(['Emily', 'Demi', 'Kush', 'Fortune', 'Sarah'])

In [231]:
from collections import defaultdict

d=defaultdict(list)

for k in exam_scores.keys():
    l_ = []
    for s in exam_scores[k]:
        l_.append(grade_converter(s))
    d[k] = l_

In [234]:
 d['Leo'] = ['A']

In [235]:
d

defaultdict(list,
            {'Emily': ['A', 'B', 'B', 'B'],
             'Demi': ['A', 'B', 'B', 'A'],
             'Kush': ['C', 'C', 'C', 'D'],
             'Fortune': ['B', 'B', 'B', 'A'],
             'Sarah': ['B', None, 'B', 'B'],
             'Leo': ['A']})

In [228]:
l_

['A', 'B', 'B', 'B']

Solution (double click):

<p style="color:white">

def grade_converter(g):
    if isinstance(g, int):
        if g < 10:
            return 'D'
        elif  (10 <= g ) & (g <= 29):
            return 'C'
        elif (30 <= g ) & (g <= 69):
            return 'B'
        else: 
            return 'A'
    else:
        return None
    
from collections import defaultdict

d=defaultdict(list)

for k in exam_scores.keys():
    l_ = []
    for s in exam_scores[k]:
        l_.append(grade_converter(s))
    d[k] = l_
</p>

<a id='list_comprehensions'></a>

### Basic List Comprehensions

---

List comprehensions are a simple and powerful syntax that allow for fast, efficient, and intuitive manipulation of array-like data types.

They are very useful replacements for iteration control statements!

In [239]:
#Let's write a for-loop to take the list below and return a list where each element has been squared:
numbers_A = [1,2,3,4,5,6,10,12]

ls_ = []
for n in numbers_A:
    ls_.append(n**2)


In [240]:
ls_

[1, 4, 9, 16, 25, 36, 100, 144]

In [243]:
#Now, let's do the same thing with a list comprehension:
numbers_B = [1,3,5,7,9,11,15]


[n**2 for n in numbers_B]

[1, 9, 25, 49, 81, 121, 225]

- Within the brackets these elements are similar to a for loop:
  1. The **operation per element** or **expression for the outcome** comes first: `n**2`
  2. Next is the **for loop variable assignment**: `for n`
  3. Last comes the **list of elements to iterate over**: `in numbers_B`

#### Quick Practice: Try these basic list comprehensions!

In [None]:
#Multiply every element in this list by 10, and then subtract 4:
numbers = [6,10,8,5,3]

In [244]:
[n*10-4 for n in numbers]

[56, 96, 76, 46, 26]

In [245]:
#Use .capitalize() to get a list of the names with the first letters capitalized:
names = ['alex','TOM','kate','Emily','hilde']


In [247]:
[s.capitalize() for s in names]

['Alex', 'Tom', 'Kate', 'Emily', 'Hilde']

In [252]:
#Create a list of just the first two characters from the strings in the list below:
strings = ['SK1908','RK1905','SB1001','GM1406','EL3005']


In [253]:
[s[:2] for s in strings]

['SK', 'RK', 'SB', 'GM', 'EL']

<a id='dictionary_comprehensions'></a>

### Basic Dictionary Comprehensions

---

You can also use comprehensions to create dictionaries instead of lists!
You'll need to use `{}` instead of `[]`, and you'll need to determine what you want the key:value pair to look like!

In [254]:
#let's write a for-loop to create a dictionary that stores how many 'e's there are in the words below:
words_A = ['exasperated','angry','elated','incredulous']


In [255]:
'exasperated'.count('e')

3

In [256]:
d = dict()
for w in words_A:
    d[w] = w.count('e')

In [257]:
d

{'exasperated': 3, 'angry': 0, 'elated': 2, 'incredulous': 1}

In [259]:
#now let's do the same thing with a dictionary comprehension:
words_B = ['embarrassed','exhausted','overjoyed','embittered']
{w:w.count('e') for w in words_B}

{'embarrassed': 2, 'exhausted': 2, 'overjoyed': 2, 'embittered': 3}

In [264]:
#now let's do the same thing again, but this time, let's count both the 'e's and the 'a's:
words_B = ['embarrassed','exhausted','overjoyed','embittered']
{w:w.count('e')+w.count('a') for w in words_B}

{'embarrassed': 4, 'exhausted': 3, 'overjoyed': 2, 'embittered': 3}

#### Quick Practice: Try these basic dictionary comprehensions!

In [266]:
#Create a dictionary storing the length of each word in the list below:
words = ['bus','train','airplane','tram','helicopter']
{w:len(w) for w in words}

{'bus': 3, 'train': 5, 'airplane': 8, 'tram': 4, 'helicopter': 10}

In [268]:
#Create a dictionary that stores the length of each of the surnames in the list below, but with the names capitalized:
#ie: GRANT: 5, etc
surnames = ['grant','Sketchley','REUSTLE','huse','Mellgard']
{n.upper():len(n) for n in surnames}

{'GRANT': 5, 'SKETCHLEY': 9, 'REUSTLE': 7, 'HUSE': 4, 'MELLGARD': 8}

In [269]:
#Create a dictionary that stores the square and the cube of each of the numbers below:
numbers = [1,2,3,4,5]

{n:(n**2, n**3) for n in numbers}


{1: (1, 1), 2: (4, 8), 3: (9, 27), 4: (16, 64), 5: (25, 125)}

<a id='conditional_comprehensions'></a>

### Conditional Logic within Comprehensions

---

You can use if/else statements within comprehensions, just the same way that you can in a for loop! 

A rule of thumb is:
- If the 'if' is related to **changing the outcome** you actually have, then it goes at **the beginning of your comprehension** after the expression for the outcome
- If the 'if' is **filtering out some of the values** (for example, you ONLY want to find the square roots of the positive numbers in a list, and skip all the negatives), then it goes right **at the end of your comprehension**

In [271]:
#Let's write a for-loop to binarize the list of numbers below depending on whether they are above or below 10
#(If the number is below 10, we replace it with 0; if it's above 10, we replace it with 1)
numbers_A = [5,7,8,19,30]

for n in numbers_A:
    if n <10:
        print(0)
    else:
        print(1)

0
0
0
1
1


In [273]:
#Now let's do the same thing with a list comprehension:
numbers_B = [34,2,8,13,20]
[0 if n < 10 else 1 for n in numbers_B]

[1, 0, 0, 1, 1]

In [275]:
#Let's write a dictionary comprehension to store whether each word is 'short' or 'long' in the list below
#If the length of the word is over six letters, then we'll say it's 'long'; otherwise it's 'short'
#We want to skip over any items that aren't words
lst_A = ['ostentatious','house','industrial', None,'dog',8,'eat']


In [276]:
{w:'s' if len(w) < 6 else 'l' for w in lst_A if isinstance(w, str)}

{'ostentatious': 'l', 'house': 's', 'industrial': 'l', 'dog': 's', 'eat': 's'}

In [279]:
#Now let's try the same thing as above, but this time,
# if the word is between 4 and 6 letters, classify it as 'medium'
lst_A = ['ostentatious','house','industrial',None,'dog',8,'eat']
{w:'s' if len(w) < 4 else 'm' if len(w) <6 else 'l' \
 for w in lst_A if isinstance(w, str)}

{'ostentatious': 'l', 'house': 'm', 'industrial': 'l', 'dog': 's', 'eat': 's'}

#### Quick Practice: Try these comprehensions with conditionals!

In [281]:
#Write a dictionary comprehension to store the length of each of the words in the list below, 
#but only for the words that end in 't'!
words = ['cat','dog','elephant','rabbit','lizard']

{w:len(w) for w in words if w[-1] == 't'}


{'cat': 3, 'elephant': 8, 'rabbit': 6}

In [100]:
#Write a list comprehension to multiply all the even numbers by 2 and all the odd numbers by 3
#BUT only do this for the positive numbers!
#(remember, you can use % to find the remainder after division for two numbers, 
#so 10%5 would be 0 because 5 fits into 10 evenly with no remainder)
numbers = [4,5,3,10,-6,7]


In [None]:
[n*2 if n % 2 == 0 else n*3 for n in numbers if n > 0]

<a id='zip_enumerate'></a>

### Zip and Enumerate within Comprehensions

---

The functions `zip()` and `enumerate()` can be really helpful for list and dictionary comprehensions!

`zip()` is great for pairing together items from two different lists.

`enumerate()` is helpful when you want to use both the items and also the position of the item in the list

In [285]:
#Let's write a for-loop to create a dictionary that stores the populations of the cities below:
cities_A = ['Tokyo','Shanghai','Jakarta','Delhi','Seoul']
populations_A = [37.8,34.9,31.7,26.5,25.5]

d = dict()
for c, p1, p2 in zip(cities_A, populations_A, populations_A):
    d[c] = (p1, p2)

In [288]:
#Let's write a dictionary comprehension to store the population of the cities below to the nearest million, 
#but ONLY if they're more than 22 million
cities_B = ['Karachi','Guangzhou','Beijing','Shenzhen','Mexico City']
populations_B = [25.1,25.0,24.9,23.3,21.5]
{c: p for c, p in zip(cities_B, populations_B) if p > 22 }

{'Karachi': 25.1, 'Guangzhou': 25.0, 'Beijing': 24.9, 'Shenzhen': 23.3}

In [291]:
list(zip(cities_B, populations_B))

[('Karachi', 25.1),
 ('Guangzhou', 25.0),
 ('Beijing', 24.9),
 ('Shenzhen', 23.3),
 ('Mexico City', 21.5)]

In [300]:
#Let's combine the two lists of cities together and then write a list comprehension to get a list of strings 
#that looks like ['1 Tokyo','2 Shanghai',...]

# [str(i+1) + " " + c for i, c in enumerate(cities_A + cities_B)]

[f"the city number {i+1} is {c}" for i, c in enumerate(cities_A + cities_B)]


In [313]:
#Let's create a dictionary that holds each city as the key, 
#and a tuple containing the ranking of the city and its population as the value
#but ONLY for the top 8 cities
#each entry should look like:  'Delhi': (4, 26.5)

{t[0]: (i, t[1]) for i, t in enumerate(zip(cities_A, populations_A), 1)
 if i < 3}


{'Tokyo': (1, 37.8), 'Shanghai': (2, 34.9)}

#### Quick Practice: Try these comprehensions with zip and enumerate!

In [316]:
#create a dictionary that stores each person's name with the total number of hours they worked last week
#each entry should look like:   'Ollie': 25 
employees = ['Faye','Ollie','Roberto']
hours = [(5,8,10,10,8),(4,0,6,10,5),(8,8,7,9,10)]

{e:sum(h) for e, h in zip(employees, hours)}

{'Faye': 41, 'Ollie': 25, 'Roberto': 42}

In [324]:
#A player rolls two dice and add their results together to get a total number of points
#EXCEPT if either of the dice is a 1, in which case the player gets no points at all
#OR if both of the dice are the same number (other than 1), in which case the player gets 20 points
#create a dictionary of scores for the player with the rolls below
die_1 = [3,4,2,5,6,4]
die_2 = [2,1,2,6,4,4]

d = {i:0 if t[0]==1 or t[1]==1 else 20 if t[0]==t[1] else sum(t) 
 for i, t in enumerate(zip(die_1,die_2))}

Solution:

<p style="color:white">{i:0 if t[0]==1 or t[1]==1 else 20 if t[0]==t[1] else sum(t) for i, t in enumerate(zip(die_1,die_2))}

In [333]:
#the following is a list of 20 students in order of how well they did on an exam
#the top three students and the bottom three students will change sets
#create a list of only the top three students and the bottom three students, with their ranking
#each entry should look like:   ('Matt', 1)
students = ['Matt','Keri','Raushaun','CJ','Sean',
            'Abdullah','Chris','Mabel','Anna','Liza',
            'Sam','Alfie','Emma','Michael','Boris',
            'Fred','Demi','Renata','Kush','Precious']

[(s, i) for i, s in enumerate(students, 1) 
 if i <=3 or i >= len(students)-2 ]

[('Matt', 1),
 ('Keri', 2),
 ('Raushaun', 3),
 ('Renata', 18),
 ('Kush', 19),
 ('Precious', 20)]

In [132]:
#what if you want to use the same students as above, 
#but you want to examine the papers of the best student, the 5th best, the 10th best, and the 16th best?
#you also want to combine the names of the student with their actual score from the list below
#make a dictionary with each entry looking like:  'Matt': ('#1 out of 20', 94)
scores = [94,92,88,80,75,73,70,65,64,63,58,55,54,52,50,48,47,38,35,30]

In [328]:
{t[0][1]: (f"#{t[0][0]+1} out of 20", t[1]) 
 for t in zip(enumerate(students),scores)}

{'Matt': ('#1 out of 20', 94),
 'Keri': ('#2 out of 20', 92),
 'Raushaun': ('#3 out of 20', 88),
 'CJ': ('#4 out of 20', 80),
 'Sean': ('#5 out of 20', 75),
 'Abdullah': ('#6 out of 20', 73),
 'Chris': ('#7 out of 20', 70),
 'Mabel': ('#8 out of 20', 65),
 'Anna': ('#9 out of 20', 64),
 'Liza': ('#10 out of 20', 63),
 'Sam': ('#11 out of 20', 58),
 'Alfie': ('#12 out of 20', 55),
 'Emma': ('#13 out of 20', 54),
 'Michael': ('#14 out of 20', 52),
 'Boris': ('#15 out of 20', 50),
 'Fred': ('#16 out of 20', 48),
 'Demi': ('#17 out of 20', 47),
 'Renata': ('#18 out of 20', 38),
 'Kush': ('#19 out of 20', 35),
 'Precious': ('#20 out of 20', 30)}

Solution:

<p style='color:white'>
{t[0][1]: (f"#{t[0][0]+1} out of 20", t[1]) for t in zip(enumerate(students),scores)}

<a id='nested_comprehensions'></a>

### Nested List & Dictionary Comprehensions

---

Sometimes you might have more than one 'for element in list' phrase within a single comprehension!  This will happen whenever you're iterating through more than one thing.

It may be helpful to remember that the nested comprehension for loops are in the same order as they would be in standard nested for loops, except the retrieved element comes first.

In [337]:
#Using a for loop, let's create all the combinations possible for choosing a number plus a letter from the lists below:
numbers = [1,2,3,4]
letters = ['A','B','C']

# for n in numbers:
#     for l in letters:
#         print(n,l)

In [335]:
#Now let's do the same thing with a list comprehension:
numbers = [5,6,7,8,9]
letters = ['A','B']

[(n, l) for n in numbers for l in letters]


[(5, 'A'),
 (5, 'B'),
 (6, 'A'),
 (6, 'B'),
 (7, 'A'),
 (7, 'B'),
 (8, 'A'),
 (8, 'B'),
 (9, 'A'),
 (9, 'B')]

In [351]:
#Let's create a matrix to work with:
import numpy as np
#instantiate numpy pseudorandom number generator (so that we all have the same matrix)
np.random.seed(42)
matrix = np.random.randint(100,size=(3,3))
matrix

array([[51, 92, 14],
       [71, 60, 20],
       [82, 86, 74]])

In [370]:
matrix[0][1]

92

In [352]:
#Let's create a flat list of these numbers using a for loop:
l_ = []
for row in matrix:
    for col in row:
        l_.append(col)

In [353]:
l_

[51, 92, 14, 71, 60, 20, 82, 86, 74]

In [355]:
#Now let's do the same thing with a list comprehension, but only for those numbers larger than 20:
[col for row in matrix for col in row if col > 20]


[51, 92, 71, 60, 82, 86, 74]

In [356]:
#Using a comprehension, let's create a dictionary recording the distances 
#that each runner completed during their weekly runs
#These are all in miles though, so let's convert those to kilometers first using 5 miles : 8 kilometers
#Let's ONLY keep the distances that are over 2 miles, though
runners = ['Katie','Aaron','Sheila','Edward']
distances_list = [[1.3,3.5,2.9,3.5,4.0],[6.3,7.0,7.5,5.8],[4.5,4.5,5.1,4.3,4.5],[2.6,2.5,2.9]]

In [373]:
distance_filt = [[d*8/5 for d in runs if d > 2]
                  for runs in distances_list]

In [374]:
dict(zip(runners, distance_filt))

{'Katie': [5.6, 4.64, 5.6, 6.4],
 'Aaron': [10.08, 11.2, 12.0, 9.28],
 'Sheila': [7.2, 7.2, 8.16, 6.88, 7.2],
 'Edward': [4.16, 4.0, 4.64]}

In [372]:
{runner:[distance/5*8 for distance in distances if distance >2]
        for runner,distances in zip(runners,distances_list)}

{'Katie': [5.6, 4.64, 5.6, 6.4],
 'Aaron': [10.08, 11.2, 12.0, 9.28],
 'Sheila': [7.2, 7.2, 8.16, 6.88, 7.2],
 'Edward': [4.16, 4.0, 4.64]}

#### Quick Practice: Try these nested comprehensions!

In [376]:
#Find all the different combinations of 
#a number from the first list, divided by a number from the second list, minus a number from the third list
#your answer should be a single flat list
first = [24,64,120]
second = [8,2,4]
third = [1,5]

# [f/s-t for f in first for s in second for t in third]

In [381]:
#Create a list of the square, cube, and fourth power of each of the numbers below
#Your answer should be a dictionary of lists, where the last element is 10: [100,1000,10000]
numbers = [1,2,3,4,5,6,10]
exponents = [2,3,4]
{n: list(n**e for e in exponents) for n in numbers }

{1: [1, 1, 1],
 2: [4, 8, 16],
 3: [9, 27, 81],
 4: [16, 64, 256],
 5: [25, 125, 625],
 6: [36, 216, 1296],
 10: [100, 1000, 10000]}

<a id='with_pandas'></a>

### Application of Comprehensions with Pandas

---

It's very easy to create a dataframe using a dictionary, so dictionary comprehensions in particular may come in handy!

Here's an example below:

In [182]:
import pandas as pd

column_names = ['height','weight','age']
values = [[62, 54, 60, 50], [180, 120, 200, 100], [33, 40, 25, 28]]

In [183]:
{col:vals for col, vals in zip(column_names, values)}

{'age': [33, 40, 25, 28],
 'height': [62, 54, 60, 50],
 'weight': [180, 120, 200, 100]}

In [184]:
records = pd.DataFrame({col:vals for col, vals in zip(column_names, values)})
records

Unnamed: 0,age,height,weight
0,33,62,180
1,40,54,120
2,25,60,200
3,28,50,100
