# 0. Problem set 1 make-up: List comprehension

**Total points**: 9

This optional problem set will replace your score on problem 2 of pset 1, which included three questions on basic list comprehension (worth 3 points each). This make-up assignment also covers list comprehension and contains the same number of problems worth the same number of points. These problems are meant to be slightly more difficult applications of list comprehension than in pset 1 and also draws on your skill in data manipulation with pandas. 

As a reminder, basic list comprehension in Python with an if statement looks like this: 
```python
[transform(thing) for thing in somelist if condition] # if-else looks a little different
```

In contrast, standard for loops in Python (for reference here only) look like this:
```python
for thing in someiterator:
    if condition:
        transform(thing)
```

We practice list comprehension for benefits of speed and compatibility with data manipulation in pandas, which will be especially useful when we get to text analysis and supervised machine learning.

Since this is a make-up, using standard for loops for the following problems will earn you zero points.

## 0.1 Import packages and simulate data

In [18]:
import numpy as np
import pandas as pd

np.random.seed(1129) # set seed for reproducibility

shop_names = ['Compass', 'Starbucks', 'Baked and Wired', 'Peets', 
              'Blue Bottle', 'Saxbys']
coffee_df = pd.concat([pd.DataFrame({'shop_name': shop_names,
                         'opening_time': np.random.choice(["8:00 AM", "9:00 AM", "10:00 AM"],
                                                       len(shop_names),
                                                       replace = True),
                         'closing_time': np.random.choice(["5:00 PM", "6:00 PM", 
                                                          "7:00 PM"],
                                                         len(shop_names),
                                                          replace = True),
                         'hourly_wage': np.random.uniform(14, 20,
                                                          len(shop_names)),
                        'year': 2019}),
                      pd.DataFrame({'shop_name': shop_names,
                         'opening_time': np.random.choice(["8:00 AM", "9:00 AM", "10:00 AM"],
                                                       len(shop_names),
                                                       replace = True),
                         'closing_time': np.random.choice(["3:00 PM", "4:00 PM",
                                                           "6:00 PM", 
                                                          "7:00 PM"],
                                                         len(shop_names),
                                                          replace = True),
                         'hourly_wage': np.random.uniform(14, 20,
                                                          len(shop_names)),
                        'year': 2021})]).sort_values(by = 'shop_name')
                      

coffee_df

Unnamed: 0,shop_name,opening_time,closing_time,hourly_wage,year
2,Baked and Wired,10:00 AM,5:00 PM,16.308214,2019
2,Baked and Wired,10:00 AM,3:00 PM,19.222221,2021
4,Blue Bottle,8:00 AM,7:00 PM,19.231655,2019
4,Blue Bottle,9:00 AM,7:00 PM,19.951942,2021
0,Compass,10:00 AM,6:00 PM,16.169551,2019
0,Compass,10:00 AM,7:00 PM,15.316723,2021
3,Peets,10:00 AM,7:00 PM,16.240656,2019
3,Peets,8:00 AM,7:00 PM,19.992475,2021
5,Saxbys,9:00 AM,6:00 PM,14.598841,2019
5,Saxbys,8:00 AM,3:00 PM,17.238755,2021


## 1. Use list comprehension with if-else to create binary indicator

- Create a new column in `coffee_df` that takes the value of `True` if `opening_time` equals 10:00 AM and is `False` otherwise.
- Check that the binary indicator you just made works properly by displaying both the indicator and the `opening_time` column.

In [19]:
# your code here
coffee_df["10am_opening"] = [True if time == "10:00 AM" else False for time in coffee_df["opening_time"]]

coffee_df[['shop_name', 'opening_time', '10am_opening']]

Unnamed: 0,shop_name,opening_time,10am_opening
2,Baked and Wired,10:00 AM,True
2,Baked and Wired,10:00 AM,True
4,Blue Bottle,8:00 AM,False
4,Blue Bottle,9:00 AM,False
0,Compass,10:00 AM,True
0,Compass,10:00 AM,True
3,Peets,10:00 AM,True
3,Peets,8:00 AM,False
5,Saxbys,9:00 AM,False
5,Saxbys,8:00 AM,False


## 2. Use list comprehension to apply function over column

- Apply the following function, `check_name_length`, to show whether each unique shop name in `coffee_df` is one word in length or longer. 
    - Leave the function as-is, and display a function result for each distinct shop name only once each.

In [20]:
def check_name_length(one_shop: str):
    '''Display whether the input shop name is one word in length or longer.
    Input:
        one_shop (str): name of a shop
    Output:
        None (print statement)'''
    
    sep_shop = one_shop.split(" ") 
    
    if len(sep_shop) > 1:
        print(one_shop + ": Shop name has >1 word")
    else:
        print(one_shop + ": Shop name has 1 word")

In [21]:
# your code here
for shop in coffee_df["shop_name"].unique():
    check_name_length(shop)

Baked and Wired: Shop name has >1 word
Blue Bottle: Shop name has >1 word
Compass: Shop name has 1 word
Peets: Shop name has 1 word
Saxbys: Shop name has 1 word
Starbucks: Shop name has 1 word


## 3. Use list comprehension to subset DataFrame (3 points)

The reshaping code below is provided for you to transform `coffee_df` from long to wide format. Subset `coffee_df_wide` to the shop name and columns reflecting shop hours and wages in 2021 only (exclude anything from 2019). 

In [22]:
# pivot data from long to wide format
coffee_df_wide = pd.pivot(coffee_df, 
                          index= 'shop_name',
                          columns= 'year', 
                          values = ['opening_time', 'closing_time', 'hourly_wage']).reset_index()

# clean up column names so single level (not MultiIndex)
coffee_df_wide.columns = [str(x) + "_" + str(y) for 
                          x, y in coffee_df_wide.columns]
coffee_df_wide.rename(columns = {'shop_name_': 'shop_name'}, inplace = True)
coffee_df_wide

Unnamed: 0,shop_name,opening_time_2019,opening_time_2021,closing_time_2019,closing_time_2021,hourly_wage_2019,hourly_wage_2021
0,Baked and Wired,10:00 AM,10:00 AM,5:00 PM,3:00 PM,16.308214,19.222221
1,Blue Bottle,8:00 AM,9:00 AM,7:00 PM,7:00 PM,19.231655,19.951942
2,Compass,10:00 AM,10:00 AM,6:00 PM,7:00 PM,16.169551,15.316723
3,Peets,10:00 AM,8:00 AM,7:00 PM,7:00 PM,16.240656,19.992475
4,Saxbys,9:00 AM,8:00 AM,6:00 PM,3:00 PM,14.598841,17.238755
5,Starbucks,9:00 AM,8:00 AM,7:00 PM,7:00 PM,19.470102,19.728897


In [23]:
# your code here
# Subset to columns with 2021 in the name
coffee_df_wide_2021 = coffee_df_wide[[col for col in coffee_df_wide.columns if "2021" in col or "shop_name" in col]]
coffee_df_wide_2021

Unnamed: 0,shop_name,opening_time_2021,closing_time_2021,hourly_wage_2021
0,Baked and Wired,10:00 AM,3:00 PM,19.222221
1,Blue Bottle,9:00 AM,7:00 PM,19.951942
2,Compass,10:00 AM,7:00 PM,15.316723
3,Peets,8:00 AM,7:00 PM,19.992475
4,Saxbys,8:00 AM,3:00 PM,17.238755
5,Starbucks,8:00 AM,7:00 PM,19.728897
