# US Thanksgiving Dinner Data Exploration
In this project, we'll be working with Jupyter notebook and analyzing data on Thanksgiving dinner in the US.

In [None]:
import pandas as pd
data = pd.read_csv('data/thanksgiving.csv', encoding='Latin-1')
print(data.head(5))
print(data.columns)

In [None]:
# what answers are available for this question?
column = 'Do you celebrate Thanksgiving?'
x = data[column].value_counts()
print(x)

# find only those who celebrate Thanksgiving (i.e., answer is 'Yes')
data = data[data['Do you celebrate Thanksgiving?'] == 'Yes']

In [None]:
# what kind of main dishes people have for dinner?
column = 'What is typically the main dish at your Thanksgiving dinner?'
x = data[column].value_counts()
print(x)

In [None]:
# do those who have 'tofurkey' as main dish, have groovy?
column1 = 'What is typically the main dish at your Thanksgiving dinner?'
column2 = 'Do you typically have gravy?'
newdata = data[data[column1] == 'Tofurkey']
print(newdata[column2])

In [None]:
# how many people eat Apple, Pecan or Pumpkin pie during Thanksgiving dinner
apple_column   = 'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'
pecan_column   = 'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'
pumpkin_column = 'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'

apple_isnull   = data[apple_column].isnull()
pecan_isnull   = data[pecan_column].isnull()
pumpkin_isnull = data[pumpkin_column].isnull()

ate_pies = apple_isnull & pecan_isnull & pumpkin_isnull
print(ate_pies.value_counts())
'''
False means that a person ate at least one of the types of pies
True means that they ate none of the types of pies
-
False    876
True     104
'''

In [None]:
# Converting Age To Numeric
def str_to_int(value):
    if pd.isnull(value):
        return None
    value = value.replace("+", " ")
    value = value.split(" ")[0]
    return(int(value))

data['int_age'] = data['Age'].apply(str_to_int)
print(data['int_age'].describe())

### Main findings:
- Mean age is 40.09 years old.
- Deviation is +/- 15 years.
- Minimal age is 18 yo
- Maximum age is 60 yo

In [None]:
# Converting Income To Numeric
def inc_to_num(value):
    if pd.isnull(value):
        return None
    elif value == "$200,000 and up":
        return 200000
    elif value == "Prefer not to answer":
        return None
    value = value.replace("$", "").replace(",", "")
    value = value.split(" ")[0]
    return(int(value))

column = 'How much total combined money did all members of your HOUSEHOLD earn last year?'
data['int_income'] = data[column].apply(inc_to_num)
print(data['int_income'].describe())

### Main findings:
- Mean income is 75965 US Dollars.
- Deviation is +/- 59068 US Dollars.
- Minimal income is 0
- Maximum income is 200000

In [None]:
# Correlating Travel Distance And Income
newdata = data[data['int_income'] < 150000]
newdata['How far will you travel for Thanksgiving?'].value_counts()

newdata = data[data['int_income'] > 150000]
newdata['How far will you travel for Thanksgiving?'].value_counts()

### Main findings:
- We've reviewed travel preferences of two categories:
  - People with income < 150000 USD / year
  - People with income > 150000 USD / year
- Majority of people in both categories do not like to travel at all
- Do not think there is any strong correlation here

In [None]:
# Linking Friendship And Age
column1 = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?'
column2 = 'Have you ever attended a "Friendsgiving?"'

# pivot table showing the average age of respondents for each category
table = data.pivot_table(values='int_age', index=column1, columns=column2)
table

In [None]:
# pivot table showing the average income of respondents for each category
table = data.pivot_table(values='int_income', index=column1, columns=column2)
table

### Main findings:

According to the age and income bin analysis, those who have not been to "FG", have higher incomes on average. Those who have been to "FG" and who have tried to meet up with home town friends, have lower incomes on average. This really only suggests that lower income people are more likely to answer "Yes" to both questions than those who answer "No" to either.

### Some potential next steps:

- Figure out the most common dessert people eat.
- Figure out the most common complete meal people eat.
- Identify how many people work on Thanksgiving.
- Find regional patterns in the dinner menus.
- Find age, gender, and income based patterns in dinner menus.