<h1>Analyzing Thanksgiving Dinner</h1>

This is a guided project from Dataquest. The dataset is from  FiveThirtyEight, and can be found <a href="https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015">here</a>. It contains 1058 responses to an online survey about what Americans eat for Thanksgiving dinner. The first part of this notebook is the guided project. But what do most people want to know about? The food of course - so there are some extra statistics at the end.

<h3>Sample of the Starting Data - first four rows</h3>

In [1]:

import pandas as pd
from pandas import Series
import re
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head(4)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific


<h3>List of Questions Asked in the Survey (aka column headers)</h3>

In [2]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [3]:
seriesCelebrate = data["Do you celebrate Thanksgiving?"]
celebrateCounts = seriesCelebrate.value_counts()
celebrateCounts

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

<h3>Exclude Participants Not Celebrating Thanksgiving</h3>

In [4]:
data = data[data["Do you celebrate Thanksgiving?"] !="No"]

<h3>Counts of Main Dishes Served at Thanksgiving</h3>
Surprisingly not everyone eats Turkey.

In [5]:
seriesMainDish = data["What is typically the main dish at your Thanksgiving dinner?"]
mainDish = seriesMainDish.value_counts()
mainDish

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

<h3>Tofurkey and Gravy Anyone</h3>
A list of respondants who typically have gravy with their tofurkey. 

In [6]:
tofurkey = data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]
tofurkeyGravySeries = tofurkey["Do you typically have gravy?"]
tofurkeyGravySeries

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

<h3>How many people eat apple, pumpkin or pecan pie?</h3>

In [7]:
apple_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
pumpkin_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
pecan_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])


In [8]:
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_pies.value_counts()

False    876
True     104
dtype: int64

<h3>New Column Introduced - int_age</h3>

In [9]:
def stringToNumber(column):
    if pd.isnull(column):
        return None
    else:
        nums = column.split(" ")
        age = nums[0]
        if age == "60+":
            age = "60"
        return int(age)  

dataAge = data["Age"]     

In [10]:
data["int_age"] = dataAge.apply(stringToNumber)
data.head(4)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region,int_age
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic,18.0
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central,18.0
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain,18.0
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific,30.0


In [11]:
data["int_age"].describe()



count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%             NaN
50%             NaN
75%             NaN
max       60.000000
Name: int_age, dtype: float64

<h3>About the int-age Column</h3>
The column "int_age" was produced by using the minimum age of each age category. The age categories include:<br>
18 - 29,<br>
30 - 44,<br>
45 - 59,<br>
60+,<br> 
and null.
This column is not the true age of survey participants and should not be used to attain accurate statistics involving age. 

<h3>New Column Introduced - int_income</h3>

In [12]:
def incomeToNumber(column):
    if pd.isnull(column):
        return None
    else:
        nums = column.split(" ")
        income = nums[0]
        if income == "Prefer":
            return None
        else:
            income = re.sub("\$", "", income)
            income = re.sub(",", "", income)   
            return int(income) 

In [13]:
dataIncome = data["How much total combined money did all members of your HOUSEHOLD earn last year?"]
data["int_income"] = dataIncome.apply(incomeToNumber)
data.head(4)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region,int_age,int_income
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic,18.0,75000.0
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central,18.0,50000.0
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain,18.0,0.0
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific,30.0,200000.0


In [14]:
data["int_income"].describe()



count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%                NaN
50%                NaN
75%                NaN
max      200000.000000
Name: int_income, dtype: float64

<h3>About the int-income Column</h3>
The column "int_income" is derived from the lower number of the range in the column "How much total combined money did all members of your HOUSEHOLD earn last year?" If the value is NaN it indicates the participant prefered not to answer. This column is not the true income of survey participants and should not be used to attain accurate statistics involving income.

<h3>Correlations between Income and Travel</h3>

In [15]:
incomeLess150k = data[data["int_income"] < 150000]

In [16]:
incomeLess150kTravel = incomeLess150k["How far will you travel for Thanksgiving?"]
incomeLess150kTravel.value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [17]:
incomeMore150k = data[data["int_income"] >= 150000]

In [18]:
incomeMore150kTravel = incomeMore150k["How far will you travel for Thanksgiving?"]
incomeMore150kTravel.value_counts()

Thanksgiving is happening at my home--I won't travel at all                         66
Thanksgiving is local--it will take place in the town I live in                     34
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    25
Thanksgiving is out of town and far away--I have to drive several hours or fly      15
Name: How far will you travel for Thanksgiving?, dtype: int64

In [19]:
# Percentages of Travel data for income < 150k 
lessHome = 281 / 689 * 100 # At home
lessLocal = 203 /689 * 100 # Local
lessClose = 150/ 689 * 100 # Out of town but not far
lessFar = 55 / 689 * 100 # Drive for hours or fly
print(lessHome, lessLocal, lessClose, lessFar) 


40.78374455732946 29.46298984034833 21.77068214804064 7.982583454281568


In [20]:
#Percentages of Travel data for income >= 150k
moreHome = 49 / 102 * 100 # At home
moreLocal = 25 /102 * 100 # Local
moreClose = 16/ 102 * 100 # Out of town but not far
moreFar = 12 / 102 * 100 # Drive for hours or fly
print(moreHome, moreLocal, moreClose, moreFar) 


48.03921568627451 24.509803921568626 15.686274509803921 11.76470588235294


<h3>About Income and Travel</h3>
Comparative statistics were added to project by calculating percentages. Those who made a 150k or more were a bit more likely to host Thanksgiving at home or travel far. 

<h3>Thanksgiving, Friends, Age and Income</h3>

In [21]:
meetUp = data["Have you ever tried to meet up with hometown friends on Thanksgiving night?"]
friendsgiving = data['Have you ever attended a "Friendsgiving?"']
ageFriends = data.pivot_table(index=meetUp, columns=friendsgiving, values=["int_age"])
ageFriends

Unnamed: 0_level_0,int_age,int_age
"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_2,Unnamed: 2_level_2
No,42.283702,37.010526
Yes,41.47541,33.976744


In [22]:
incomeFriends = data.pivot_table(index=meetUp, columns=friendsgiving, values=["int_income"])
incomeFriends

Unnamed: 0_level_0,int_income,int_income
"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_2,Unnamed: 2_level_2
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


<h3>About Friends and Thanksgiving</h3>
People who have attended a "friendsgiving" and tried to meet up with hometown friends on Thanksgiving night, on average, were younger with less income. People who have never attended a "friendsgiving" were, on average, older with higher incomes. 

<h2>Foods at Thanksgiving</h2>
On to what really matters at Thanksgiving - the food. Lets see which foods and desserts are most popular at Thanksgiving?

<h3>Most Popular Side Dishes</h3>

In [23]:
sidecols = ['Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cauliflower',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Corn',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cornbread',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Fruit salad',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Green beans/green bean casserole',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Macaroni and cheese',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Mashed potatoes',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Rolls/biscuits',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Squash',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Vegetable salad',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Yams/sweet potato casserole']

In [24]:
sides = data[sidecols].reset_index()

In [25]:
numberSides = 980 - sides.isnull().sum()
numberSides

index                                                                                                                                           980
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts                     155
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots                             242
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cauliflower                          88
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Corn                                464
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cornbread                           235
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Frui

In [26]:
data["What kind of stuffing/dressing do you typically have?"].value_counts()

Bread-based               836
None                       60
Rice-based                 42
Other (please specify)     36
Name: What kind of stuffing/dressing do you typically have?, dtype: int64

<h3>Results - Most Popular Side Dishes</h3>
Stuffing was (of course) the most poplular side from respondants. Bread based stuffing beat rice-based by a huge margin. Mashed potatoes came in second, followed by rolls/biscuts and green beens. Other vegetables did not fair so well, brussel sprouts and cauliflower are the least popular for Thanksgiving dinner. If rice based stuffing is isolated it turns out it is less popular than cauliflower. 

<h3>Desserts</h3>

In [27]:
dessertcols = ['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Buttermilk',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Cherry',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Chocolate',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Coconut cream',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Key lime',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Peach',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Sweet Potato',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Carrot cake',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Fudge',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Ice cream',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Peach cobbler',]

In [28]:
desserts = data[dessertcols].reset_index()

In [29]:
numberDesserts = 980 - desserts.isnull().sum()
numberDesserts

index                                                                                                                    980
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple                 514
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Buttermilk             35
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Cherry                113
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Chocolate             133
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Coconut cream          36
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Key lime               39
Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Peach                  34


<h3>Dessert Popularity</h3>
Pumpkin and apple pie blew all the other desserts away in popularity. Surprisingly, ice cream is not served more often with pie. Too bad they did not ask about whipped cream. After all, pumpkin pie with a little whipped cream is delicious.  

<h3>Food Reflection</h3>
It's interesting to see the different foods people eat at Thanksgiving. Perhaps it helps expand or tighten the menu at the next Thanksgiving. Maybe rice-based stuffing is the way to go if a guest is gluten-free? Or maybe it's time to try a sweet potato pie? In all cases - experiment, eat and enjoy.
