In [249]:
import os, numpy as np, pandas as pd

data = pd.read_csv("thanksgiving-data/thanksgiving-2015-poll-data.csv", encoding="Latin-1")

# Question:
What percentage of people celebrate Thanksgiving from the supplied poll data?

In [250]:
celebrate_ratio = len(data[data["Do you celebrate Thanksgiving?"] == "Yes"]) / len(data["Do you celebrate Thanksgiving?"])

filtered_data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

print(celebrate_ratio)

0.9262759924385633


# Conclusion:
~93%

# Question:
How many people ate each type of main dish for Thanksgiving, and how many people had gravy with their tofurkey?

In [251]:
print(data["What is typically the main dish at your Thanksgiving dinner?"].value_counts())

print(data["Do you typically have gravy?"][data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"].value_counts())

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64


# Conclusion:
See above for count distribution of main dishes. 

12 people had gravy with their tofurkey.

# Question:
How many people didn't choose either pumpkin, pecan or apple pie for Thanksgiving?

In [252]:
apple_pie = filtered_data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"]
pumpkin_pie = filtered_data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"]
pecan_pie = filtered_data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"]

print((pd.isnull(apple_pie) & pd.isnull(pumpkin_pie) & pd.isnull(pecan_pie)).value_counts())

False    876
True     104
dtype: int64


# Conclusion:
104 people didn't eat any of the specified pies for Thanksgiving.

# Question:
Find the mean (lower) age of people taking part in this poll.

In [253]:
def get_lower_age (row):
    if pd.isnull(row):
        return(None)
    elif "+" in row:
        return(int(row.replace('+','')))
    else:
        return(int(row.split(' ')[0]))
    
ages = data["Age"].apply(get_lower_age)

print(np.mean(ages))

39.38341463414634


# Conclusion:
Mean (lower) age of people taking part in the poll was ~39.

# Question:
Find the mean (lower) income of people taking part in this poll.

In [254]:
def get_lower_income (row):
    if pd.isnull(row):
        return(None)
    elif "Prefer" in row:
        return(None)
    else:
        return(int(row.split(' ')[0].replace('$','').replace(',','')))
    
income_column = data["How much total combined money did all members of your HOUSEHOLD earn last year?"]
incomes = income_column.apply(get_lower_income)
print(np.mean(incomes))

74077.61529808774


# Conclusion:
Mean (lower) income of people taking part in the poll was ~$74,078.

# Question:
Compare the distances people are willing to travel for Thanksgiving based on whether they are high or low earners.

In [258]:
high_earner_data = filtered_data["How far will you travel for Thanksgiving?"][incomes > 150000]
low_earner_data = filtered_data["How far will you travel for Thanksgiving?"][incomes < 50000]

print(high_earner_data.value_counts())
print(low_earner_data.value_counts())

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64
Thanksgiving is happening at my home--I won't travel at all                         106
Thanksgiving is local--it will take place in the town I live in                      92
Thanksgiving is out of town but not too far--it's a drive of a few hours or less     64
Thanksgiving is out of town and far away--I have to drive several hours or fly       16
Name: How far will you travel for Thanksgiving?, dtype: int64


# Conclusion:
A higher proportion of people with higher incomes (> $150,000) seem to stay at home for Thanksgiving.

# Question:
Figure out the most common dessert people ate for Thanksgiving.

In [256]:
print(filtered_data[filtered_data.columns[39:51]].count())

Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler               110
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies                     16
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies                    128
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Carrot cake                  72
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake                  191
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies                     204
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Fudge                        43
Which of these desserts do you typically have at

# Conclusion:
The most common dessert people ate was ice cream.

# Question:
Figure out the most common main dish eaten with each dessert for Thanksgiving.

In [257]:
def get_most_common_main_dish(column_id):
    meal_choice_columns = list()
    meal_choice_columns.append(filtered_data.columns[2])
    meal_choice_columns.append(filtered_data.columns[column_id])
    
    complete_meal_counts = filtered_data.groupby(meal_choice_columns).size().reset_index().rename(columns={0:'Count'})

    return(complete_meal_counts.loc[complete_meal_counts['Count'] == complete_meal_counts['Count'].max()])
    
dessert_column_ids = list(range(39,51))  

for dessert_column_id in dessert_column_ids:
    print(get_most_common_main_dish(dessert_column_id))

  What is typically the main dish at your Thanksgiving dinner?  \
5                                             Turkey             

  Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler  \
5                                      Apple cobbler                                                                      

   Count  
5     95  
  What is typically the main dish at your Thanksgiving dinner?  \
1                                             Turkey             

  Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies  \
1                                           Blondies                                                                 

   Count  
1     15  
  What is typically the main dish at your Thanksgiving dinner?  \
5                                             Turkey             

  Which of these desserts do you typically have at Thanksgiving dinner? P

# Conclusion:
It should come as no surprise that the most common main dish served with each chosen dessert was turkey; the most common combination being turkey with ice cream for dessert.