# Analyzing what Americans eat for Thanksgiving dinner using python's pandas package (python3 version).

Data is from a survey conducted by [fivethirtyeight.com](https://fivethirtyeight.com/features/heres-what-your-part-of-america-eats-on-thanksgiving/) and available [here](https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015).

# Importing the Data

In [18]:
##Import pandas package as pd and read the dataset "thanksgivingdinner.csv" into a dataframe
import pandas as pd

In [19]:
data = pd.read_csv("thanksgiving.csv", encoding = "Latin-1")
data.head(3)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain


In [20]:
#View all the columns in the data dataframe
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

# Filter Rows from Data Frame

In [21]:
#Display the count of each category of response in the column "Do you celebrate Thanksgiving?"
data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [22]:
#We only need responses from people who celebrate Thanksgiving so we filter out all rows
#whose response to "Do you celebrate Thanksgiving?" is not 'yes' 

is_yes = data['Do you celebrate Thanksgiving?'] == 'Yes'
data = data[is_yes]

# Examine Main Dishes

In [23]:
#Let's count the number of each category of main dish in the 
#'What is typically the main dish at your Thanksgiving dinner?' column.

data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [24]:
#Show whether a respondent had gravy among people who had tofurkey for dinner

eat_tofurkey = data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey'
gravy_tofurkey = data[eat_tofurkey]['Do you typically have gravy?']
gravy_tofurkey

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

In [25]:
gravy_tofurkey.value_counts()

Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64

# What Type of Pies Do People Eat?

In [26]:


#Check if apple pie is served at Thanksgiving dinner. Select all rows where apple is null

apple_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'].isnull()

In [27]:
#Check if pumpkin pie is served at Thanksgiving dinner. 

pumpkin_isnull= data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'].isnull()

In [28]:
#Check if pecan pie is served at Thanksgiving dinner. 

pecan_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'].isnull()

In [29]:
#Combine apple_isnull & pumpkin_isnull & pecan_isnull to get a boolean value TRUE for those who at all three or FALSE 
#for those who ate 2, 1 or no pie
#Show the result

ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(ate_pies)

0       False
1       False
2       False
3       False
4       False
5        True
6       False
7        True
8       False
9       False
11      False
12      False
13      False
14      False
15       True
16      False
17      False
18      False
19      False
20      False
21       True
23      False
24      False
25      False
26      False
27      False
28      False
29      False
30      False
32      False
        ...  
1024    False
1025    False
1026    False
1027    False
1029    False
1030    False
1031    False
1033    False
1034    False
1035    False
1037     True
1038    False
1039    False
1040    False
1041    False
1042    False
1043     True
1044    False
1045    False
1046    False
1047    False
1048    False
1049    False
1050    False
1051    False
1053    False
1054    False
1055    False
1056     True
1057     True
dtype: bool


In [30]:
#show the unique values of the Series ate_pies

ate_pies.unique()

array([False, True], dtype=object)

In [31]:
ate_pies.value_counts()

False    876
True     104
dtype: int64

# Convert Age Brackets to Numeric

In [32]:
# Data in the Age columns are in age brackets e.g 18-29, we ned to convert them to integers of the lower range i.e 18 for '18-29'
#age range and 60 for '60+' age range so we write a function convert to carry out this task.

import re
import numpy as np
def convert(text):
    val = None
    if pd.isnull(text) == True:
        return val   
    elif text == '60+':
        return int(60)
    else:
        text = text.split( )
        text = int(text[0])
        return int(text)
        
            
                

                
            
    
        
       


In [33]:
#Apply the convert function to 'Age' columns of data to convert age to integers and assign the result to a new column called 'int_age'
data['int_age'] = data['Age'].apply(convert)



In [34]:
data['int_age'].describe()




count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%             NaN
50%             NaN
75%             NaN
max       60.000000
Name: int_age, dtype: float64

In [35]:
data['int_age'].value_counts()

45.0    269
60.0    258
30.0    235
18.0    185
Name: int_age, dtype: int64

# Findings

People above the age of 30 attend more Thanksgiving dinners compared to younger ones, but the difference is not very significant between the different age brackets for people that are above 30 years. The ages are not exactly representative of our survey participants rather, they are lower limits of the age buckets the participants fall into.

# Convert Income to Numeric

In [36]:
# lets convert the 'How much total combined money did all members of your HOUSEHOLD earn last year?' column which contains
# income brackets as strings to numeric. let's write a function 'convert_strings to do the conversion for us which will use
#the lower limits of the income brackets.
import re
import numpy as np
def convert_string(string):
    val = None
    if pd.isnull(string) == True:
        return None 
    else:
        val = string.split( )
        val = val[0]
        if val == 'Prefer':
            return None
        else:
            val = val.replace('$','')
            val = val.replace(',', '')
            return(int(val))
        

In [37]:
#Apply convert_string function to How much total combined money did all members of your HOUSEHOLD earn last year?' and
#assign the result to a new column int:income.

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(convert_string)

In [38]:
#Display the first five rows of the the int_income column of the data dataframe
data['int_income'].head(5)

0     75000.0
1     50000.0
2         0.0
3    200000.0
4    100000.0
Name: int_income, dtype: float64

In [None]:
#Describe the int_income column
data['int_income'].describe()

# Findings

The mean income is approximately $76,000.00, it gives us an idea of the income of respondents since we do not have their exact income data but it's not accurately representative of their income because there are some who prefer not to indicate their income, while we utilised the lower limits of the income brackets for other respondents.

# Explore the Correlation Between Travel Distance and Income

In [41]:
#Lets explore how far people earning less than 150,000 and above 150,000 will travel for 
less_than_150k = data[data['int_income'] < 150000]

In [42]:
less_than_150k_travel = less_than_150k['How far will you travel for Thanksgiving?'].value_counts()

In [43]:
less_than_150k_travel

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [44]:
above_150k = data[data['int_income'] > 150000]

In [45]:
above_150k_travel = above_150k['How far will you travel for Thanksgiving?'].value_counts()
above_150k_travel

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

# Findings

49 people representing approximately 48% of people who earn above 150,000 have their thanksgiving dinner at home compared to 281 people representing approximately 41% of people who earn less than 150,000.
203 out of 689 people representing approximately 29% of people who earn less than 150,000 will their dinner in the town where they live (but not at home) compared to 25 out of 102 people representing approximately 24% of those who earn above 150000.
This shows a positive correlation between income and travel for thanksgiving dinner. People with lower income are more likely to travel to have dinner with friends and family while people with higher income are more likely to have the dinner at home.

# Finding Relationship Between Travel Distance and Age


In [46]:
pd.pivot_table(data,index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', values ="int_age", columns = 'Have you ever attended a \"Friendsgiving?\"', aggfunc = np.mean)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [47]:
pd.pivot_table(data,index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', values ="int_income", columns = 'Have you ever attended a \"Friendsgiving?\"', aggfunc = np.mean)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842



# Findings 

Younger people with mean age 33 are more likely to attend "Friendsgiving" and meetup with hometown friends on 
Thanksgiving night, they also have the lowest mean income. On the other hand older people with mean age 42 are less likely to attend 'friendsgiving' or meetup with hometown friends on Thanksgiving night. This might be as a result of older people being more likely to have their own family to celebrate with at home, they also have the highest mean income.

# Exploring How the Main Dish is Typically Cooked

In [48]:
#Lets explore how the main dish is typically cooked.
data['How is the main dish typically cooked?'].value_counts()

Baked                     481
Roasted                   378
Other (please specify)     51
Fried                      47
I don't know               17
Name: How is the main dish typically cooked?, dtype: int64

# Finding

Almost 50% of the main Dish at US Thanksgiving dinner tables are baked

# How many People Work on ThanksGiving Day

In [49]:
#How many people work on Thanksgiving or the day after, this information is available in the column
#'Will you employer make you work on Black Friday?
data['Will you employer make you work on Black Friday?'].value_counts()

Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64

71% of people who responded to this question say they work on Black Friday

In [50]:
#is there any difference in dinner menu accross the regions

In [51]:
data.groupby(["US Region", "What is typically the main dish at your Thanksgiving dinner?"] )["What is typically the main dish at your Thanksgiving dinner?"].count()


US Region           What is typically the main dish at your Thanksgiving dinner?
East North Central  Ham/Pork                                                          4
                    Other (please specify)                                            5
                    Tofurkey                                                          1
                    Turkey                                                          135
East South Central  Ham/Pork                                                          1
                    Other (please specify)                                            4
                    Roast beef                                                        1
                    Turkey                                                           50
Middle Atlantic     Chicken                                                           1
                    Ham/Pork                                                          2
                    Other (please speci

## Findings

It's evident that in all the regions of the US, Turkey is the overwhelming favourite in the Thanksgiving dinner table


## Dinner Menu Based on Age

In [52]:
data.groupby(["int_age","What is typically the main dish at your Thanksgiving dinner?"] )["int_age"].count()


int_age  What is typically the main dish at your Thanksgiving dinner?
18.0     Chicken                                                           3
         Ham/Pork                                                          6
         I don't know                                                      3
         Other (please specify)                                            2
         Roast beef                                                        3
         Tofurkey                                                          6
         Turkey                                                          162
30.0     Chicken                                                           3
         Ham/Pork                                                         12
         I don't know                                                      1
         Other (please specify)                                           10
         Roast beef                                                        1
      

## Findings

There is no significant difference in the preference for menu accross age groups, Most people have Turkey while relatively few people have other menus.

## Gravy and gender pattern at the dinner table

In [54]:
data.groupby(["What is your gender?", "Do you typically have gravy?"])['Do you typically have gravy?'].count()

What is your gender?  Do you typically have gravy?
Female                No                               48
                      Yes                             467
Male                  No                               29
                      Yes                             403
Name: Do you typically have gravy?, dtype: int64

## Findings
A higher proprtion of women (approximately 90%) have gravy compared to the proportion of men (approximatly 76%)