# Exploring Thanksgiving

In [2]:
import pandas as pd
data = pd.read_csv("thanksgiving.csv",encoding="Latin-1")
data[0:5]



Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


In [3]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

## Filter out non-celebraters

In [4]:
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [5]:
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

## Explore Main Dishes

In [6]:
data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [7]:
data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]["Do you typically have gravy?"]

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

## Exploring Pies

In [8]:
data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].value_counts()

Apple    514
Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple, dtype: int64

In [52]:
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull()
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull()
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull()
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_pies = (ate_pies == False)
data['ate_pies'] = ate_pies
ate_pies.value_counts()

True     876
False    104
dtype: int64

## Looking at Age

In [10]:
def convert_to_int(s):
    if pd.isnull(s):
        return None
    s = s.split(' ')[0]
    s = s.replace('+','')
    return int(s)

data["int_age"] = data["Age"].apply(convert_to_int)
data["int_age"].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

## Findings

Looks like age groups of participants are fairly evenly distributed, even though the summary statistics should skew low because we only took the lower bound of the age groups.

In [11]:
def to_income(s):
    if pd.isnull(s):
        return None
    s = s.split()[0]
    if s == "Prefer":
        return None
    s = s.replace(',','')
    s = s.replace('$','')
    return int(s)

data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(to_income)
data["int_income"].describe()    

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

Due to the fact that we took the lower bound of each income category a participant could select. These income figures skew lower than the real numbers. It does not represent the true income of the participants, but the column can be used to summarize by income bracket.

In [12]:
data[data["int_income"] < 150000]["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [13]:
data[data["int_income"] >= 150000 ]["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         66
Thanksgiving is local--it will take place in the town I live in                     34
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    25
Thanksgiving is out of town and far away--I have to drive several hours or fly      15
Name: How far will you travel for Thanksgiving?, dtype: int64

## Findings
It looks like accross groups of people above and below the $150,000 mark travel for thanksgiving at similar rates.

In [14]:
data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",columns='Have you ever attended a "Friendsgiving?"',values="int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [15]:
data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",columns='Have you ever attended a "Friendsgiving?"',values="int_income")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


## Findings
Meeting up with friends for Thanksgiving and the term "Friendsgiving" tends to be more common amount younger and poorer people.

# Most Common Dessert

In [16]:
dessert_counts = {}   
for column in data.iloc[:,39:51]:
    dessert_name = column[104:]
    dessert_counts[dessert_name] = data[column].count()
dessert_counts

{'Apple cobbler': 110,
 'Blondies': 16,
 'Brownies': 128,
 'Carrot cake': 72,
 'Cheesecake': 191,
 'Cookies': 204,
 'Fudge': 43,
 'Ice cream': 266,
 'None': 295,
 'Other (please specify)': 134,
 'Other (please specify).1': 134,
 'Peach cobbler': 103}

## Findings
More people have no dessert than any single particular dessert on the list, but Ice Cream is the most common dessert actually consumed.

# Most Common Complete Meal

In [17]:
def get_complete_meal(row):
    
    meal = ''
    main = row['What is typically the main dish at your Thanksgiving dinner?']
    if main == 'Other (please specify)': main = row['What is typically the main dish at your Thanksgiving dinner? - Other (please specify)']
    cooked = row['How is the main dish typically cooked?']
    if cooked == 'Other (please specify)': cooked = row['How is the main dish typically cooked? - Other (please specify)']
    meal = str(cooked) + ' ' + str(main) 
    stuffing = row['What kind of stuffing/dressing do you typically have?']
    meal += ' with ' + str(stuffing) + ' stuffing'
    cranberry = row['What type of cranberry saucedo you typically have?']
    meal += ' and ' + str(cranberry) + ' cranberry sauce'
    if row['Do you typically have gravy?'] == 'Yes':
        meal += ' and gravy'
    return meal
meals = data.apply(get_complete_meal,axis=1)
meals.value_counts()


Baked Turkey with Bread-based stuffing and Canned cranberry sauce and gravy                                                                                    224
Roasted Turkey with Bread-based stuffing and Canned cranberry sauce and gravy                                                                                  161
Roasted Turkey with Bread-based stuffing and Homemade cranberry sauce and gravy                                                                                106
Baked Turkey with Bread-based stuffing and Homemade cranberry sauce and gravy                                                                                   93
Baked Turkey with Bread-based stuffing and None cranberry sauce and gravy                                                                                       43
Roasted Turkey with Bread-based stuffing and None cranberry sauce and gravy                                                                                     28
Fried Turkey with Brea

The most common meal, not including side dishes, pies, or desserts is: Baked Turkey with Bread-based stuffing, Canned cranberry sauce, and gravy. 

In [18]:
def get_desserts(row):
    dessert = []   
    for column in data.iloc[:,39:51]:
        dessert_name = column[104:]
        if not pd.isnull(row[column]):
            dessert.append(dessert_name)

    return dessert

def get_sides(row):
    sides = []   
    for column in data.iloc[:,11:25]:
        side_name = column[108:]
        if not pd.isnull(row[column]):
            sides.append(side_name)

    return sides

def get_pies(row):
    pies = []   
    for column in data.iloc[:,26:38]:
        pie_name = column[99:]
        if not pd.isnull(row[column]):
            pies.append(pie_name)

    return pies

def get_complete_meal_with_extras(row):
    
    meal = ''
    main = row['What is typically the main dish at your Thanksgiving dinner?']
    if main == 'Other (please specify)': main = row['What is typically the main dish at your Thanksgiving dinner? - Other (please specify)']
    cooked = row['How is the main dish typically cooked?']
    if cooked == 'Other (please specify)': cooked = row['How is the main dish typically cooked? - Other (please specify)']
    meal = str(cooked) + ' ' + str(main) 
    stuffing = row['What kind of stuffing/dressing do you typically have?']
    meal += ' with ' + str(stuffing) + ' stuffing'
    cranberry = row['What type of cranberry saucedo you typically have?']
    meal += ' and ' + str(cranberry) + ' cranberry sauce'
    if row['Do you typically have gravy?'] == 'Yes':
        meal += ' and gravy'
    sides = get_sides(row)
    meal += ' ' + 'Sides: ' + ' '.join(sides)
    pies = get_pies(row)
    meal += ' ' + 'Pies: ' + ' '.join(pies)
    desserts = get_desserts(row)
    meal += ' ' + 'Desserts: ' + ' '.join(desserts)
    return meal
meals = data.apply(get_complete_meal_with_extras,axis=1)
meals.value_counts()

nan nan with nan stuffing and nan cranberry sauce Sides:  Pies:  Desserts:                                                                                                                                                                                                                                                                                                                6
Roasted Turkey with Bread-based stuffing and Homemade cranberry sauce and gravy Sides:  Pies:  Desserts:                                                                                                                                                                                                                                                                                  3
Roasted Turkey with Bread-based stuffing and Homemade cranberry sauce and gravy Sides: Corn Green beans/green bean casserole Mashed potatoes Rolls/biscuits Yams/sweet potato casserole Pies: Pumpkin Desserts: None                            

Not a lot of people have the exact same meal when including sides, pies, and desserts.

## Main Dishes and Sides

In [19]:
def get_complete_dishes(row):
    
    meal = ''
    main = row['What is typically the main dish at your Thanksgiving dinner?']
    if main == 'Other (please specify)': main = row['What is typically the main dish at your Thanksgiving dinner? - Other (please specify)']
    meal = str(main)
    sides = get_sides(row)
    meal += ' ' + 'Sides: ' + ' '.join(sides)
    return meal
meals = data.apply(get_complete_dishes,axis=1)
meals.value_counts()

Turkey Sides: Green beans/green bean casserole Mashed potatoes Rolls/biscuits Yams/sweet potato casserole                                                                                                       37
Turkey Sides: Corn Green beans/green bean casserole Mashed potatoes Rolls/biscuits Yams/sweet potato casserole                                                                                                  31
Turkey Sides: Green beans/green bean casserole Mashed potatoes Rolls/biscuits                                                                                                                                   21
Turkey Sides: Corn Green beans/green bean casserole Mashed potatoes Rolls/biscuits                                                                                                                              20
Turkey Sides: Corn Cornbread Green beans/green bean casserole Macaroni and cheese Mashed potatoes Rolls/biscuits Yams/sweet potato casserole                

The most common main main dish plus side dishes had a total of 37 instances in the poll.

# How Many People Work on Thanksgiving?

In [20]:
print(data.shape)
data['Will you employer make you work on Black Friday?'].value_counts()


(980, 67)


Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64

Only 43 of the surveyed 980 who celebrate thanksgiving were made to work on Black Friday by their employer. Although only 70 people responded to this question.

# Patterns in Dinner Menus
Columns 'How would you describe where you live?', 'Age', 'What is your gender?','US Region'

## Regional

In [21]:
list(data['US Region'].value_counts().axes[0])

['South Atlantic',
 'East North Central',
 'Middle Atlantic',
 'Pacific',
 'West South Central',
 'West North Central',
 'East South Central',
 'New England',
 'Mountain']

In [69]:
results = pd.DataFrame(index = list(data['US Region'].value_counts().axes[0]),columns = ["Baked(%)","Roasted(%)"])
for region in list(data['US Region'].value_counts().axes[0]):
    region_data = data[data["US Region"] == region]
    results.loc[region,'Baked(%)'] = region_data[region_data['How is the main dish typically cooked?'] == 'Baked'].shape[0]/region_data.shape[0]
    results.loc[region,'Roasted(%)'] = region_data[region_data['How is the main dish typically cooked?'] == 'Roasted'].shape[0]/region_data.shape[0] 
results    


Unnamed: 0,Baked(%),Roasted(%)
South Atlantic,0.522167,0.334975
East North Central,0.537931,0.393103
Middle Atlantic,0.427586,0.489655
Pacific,0.546154,0.353846
West South Central,0.541176,0.247059
West North Central,0.619718,0.253521
East South Central,0.482143,0.375
New England,0.254545,0.672727
Mountain,0.512195,0.463415


## Findings
We can see from the table above that only two regions where roasting the main dish is more popular than baking it are the Middle Atlantic and New England.



In [76]:
results_all = pd.DataFrame(index = list(data['US Region'].value_counts().axes[0]),columns = list(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts().sort_values(ascending=False).axes[0]))
for region in list(data['US Region'].value_counts().axes[0]):
    region_data = data[data["US Region"] == region]
    for meal in list(results_all.columns)  :
        results_all.loc[region,meal] = region_data[region_data['What is typically the main dish at your Thanksgiving dinner?'] == meal].shape[0]/region_data.shape[0]
results_all.sort_values('Turkey',ascending=False)

Unnamed: 0,Turkey,Other (please specify),Ham/Pork,Tofurkey,Chicken,Roast beef,I don't know,Turducken
East North Central,0.931034,0.0344828,0.0275862,0.00689655,0.0,0.0,0.0,0.0
New England,0.927273,0.0181818,0.0,0.0181818,0.0363636,0.0,0.0,0.0
West South Central,0.905882,0.0352941,0.0235294,0.0235294,0.0117647,0.0,0.0,0.0
Mountain,0.902439,0.0,0.0243902,0.0487805,0.0243902,0.0,0.0,0.0
Middle Atlantic,0.896552,0.0275862,0.0137931,0.0344828,0.00689655,0.0137931,0.0,0.00689655
East South Central,0.892857,0.0714286,0.0178571,0.0,0.0,0.0178571,0.0,0.0
South Atlantic,0.891626,0.0295567,0.0344828,0.0147783,0.0147783,0.0147783,0.0,0.0
West North Central,0.84507,0.0422535,0.056338,0.028169,0.0140845,0.0,0.0140845,0.0
Pacific,0.823077,0.0692308,0.0461538,0.0307692,0.0,0.00769231,0.00769231,0.0153846


Turkey is most common in the East North Central and least common in the Pacific. 

## Income and Age

In [61]:
data.pivot_table(index='How is the main dish typically cooked?',values='int_income').sort_values()

How is the main dish typically cooked?
Fried                     60512.820513
I don't know              66666.666667
Baked                     69306.220096
Roasted                   84645.061728
Other (please specify)    92820.512821
Name: int_income, dtype: float64

In [62]:
data.pivot_table(index='How is the main dish typically cooked?',values='int_age').sort_values()

How is the main dish typically cooked?
I don't know              27.800000
Fried                     35.200000
Baked                     37.949045
Roasted                   43.414169
Other (please specify)    44.020408
Name: int_age, dtype: float64

Roasting is more common among the older and wealthier.

In [63]:
data.pivot_table(index='What kind of stuffing/dressing do you typically have?',values='int_age').sort_values()

What kind of stuffing/dressing do you typically have?
Rice-based                31.950000
None                      35.947368
Bread-based               40.474265
Other (please specify)    47.382353
Name: int_age, dtype: float64

In [64]:
data.pivot_table(index='What kind of stuffing/dressing do you typically have?',values='int_income').sort_values()

What kind of stuffing/dressing do you typically have?
Rice-based                39032.258065
None                      62209.302326
Bread-based               78135.359116
Other (please specify)    81290.322581
Name: int_income, dtype: float64

Rice-based stuffing is more common among the younger and less wealthy.

In [65]:
data.pivot_table(index='What is typically the main dish at your Thanksgiving dinner?',values='int_income', ).sort_values()

What is typically the main dish at your Thanksgiving dinner?
I don't know               16666.666667
Roast beef                 35625.000000
Chicken                    40500.000000
Ham/Pork                   65370.370370
Tofurkey                   73235.294118
Turkey                     77113.543092
Other (please specify)     79193.548387
Turducken                 200000.000000
Name: int_income, dtype: float64

Roast beef and Chicken are more commong among poorer people. Turducken seems to be only for the truly wealthy.

## Gender

In [99]:
totals = data['What is your gender?'].value_counts()
totals

Female    515
Male      432
Name: What is your gender?, dtype: int64

In [103]:
data[data['Do you typically have gravy?'] == 'Yes']['What is your gender?'].value_counts()/totals

Female    0.906796
Male      0.932870
Name: What is your gender?, dtype: float64

In [106]:
data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']['What is your gender?'].value_counts()/totals

Female    0.025243
Male      0.016204
Name: What is your gender?, dtype: float64