<h1>Analyzing Thanksgiving Dinner</h1>

This is a guided project from Dataquest. The dataset is from  FiveThirtyEight, and can be found <a href="https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015">here</a>. It contains 1058 responses to an online survey about what Americans eat for Thanksgiving dinner. The first part of this notebook is the guided project. But what do most people want to know about? The food of course - so there are some extra statistics at the end.

<h3>Sample of the Starting Data - first four rows</h3>

In [None]:

import pandas as pd
from pandas import Series
import re
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head(4)

<h3>List of Questions Asked in the Survey (aka column headers)</h3>

In [None]:
data.columns

In [None]:
seriesCelebrate = data["Do you celebrate Thanksgiving?"]
celebrateCounts = seriesCelebrate.value_counts()
celebrateCounts

<h3>Exclude Participants Not Celebrating Thanksgiving</h3>

In [None]:
data = data[data["Do you celebrate Thanksgiving?"] !="No"]
data

<h3>Counts of Main Dishes Served at Thanksgiving</h3>
Surprisingly not everyone eats Turkey.

In [None]:
seriesMainDish = data["What is typically the main dish at your Thanksgiving dinner?"]
mainDish = seriesMainDish.value_counts()
mainDish

<h3>Tofurkey and Gravy Anyone</h3>
A list of respondants who typically have gravy with their tofurkey. 

In [None]:
tofurkey = data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]
tofurkeyGravySeries = tofurkey["Do you typically have gravy?"]
tofurkeyGravySeries

<h3>How many people eat apple, pumpkin or pecan pie?</h3>

In [None]:
apple_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
pumpkin_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
pecan_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])


In [None]:
#Count is True for people who had pumpkin, apple or pecan pie.
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_pies.value_counts()

<h3>New Column Introduced - int_age</h3>

In [None]:
def stringToNumber(column):
    if pd.isnull(column):
        return None
    else:
        nums = column.split(" ")
        age = nums[0]
        if age == "60+":
            age = "60"
        return int(age)  

dataAge = data["Age"]     

In [None]:
data["int_age"] = dataAge.apply(stringToNumber)
data

In [None]:
data["int_age"].describe()

<h3>About the int-age Column</h3>
The column "int_age" was produced by using the minimum age of each age category. The age categories include:<br>
18 - 29,<br>
30 - 44,<br>
45 - 59,<br>
60+,<br> 
and null.
This column is not the true age of survey participants and should not be used to attain accurate statistics involving age. 

<h3>New Column Introduced - int_income</h3>

In [None]:
def incomeToNumber(column):
    if pd.isnull(column):
        return None
    else:
        nums = column.split(" ")
        income = nums[0]
        if income == "Prefer":
            return None
        else:
            income = re.sub("\$", "", income)
            income = re.sub(",", "", income)   
            return int(income) 

In [None]:
dataIncome = data["How much total combined money did all members of your HOUSEHOLD earn last year?"]
data["int_income"] = dataIncome.apply(incomeToNumber)
data

In [None]:
data["int_income"].describe()

<h3>About the int-income Column</h3>
The column "int_income" is derived from the lower number of the range in the column "How much total combined money did all members of your HOUSEHOLD earn last year?" If the value is NaN it indicates the participant prefered not to answer. This column is not the true income of survey participants and should not be used to attain accurate statistics involving income.

<h3>Correlations between Income and Travel</h3>

In [None]:
incomeLess150k = data[data["int_income"] < 150000]

In [None]:
incomeLess150kTravel = incomeLess150k["How far will you travel for Thanksgiving?"]
incomeLess150kTravel.value_counts()

In [None]:
incomeMore150k = data[data["int_income"] > 150000]

In [None]:
incomeMore150kTravel = incomeMore150k["How far will you travel for Thanksgiving?"]
incomeMore150kTravel.value_counts()

In [None]:
# Percentages of Travel data for income < 150k 
lessHome = 281 / 689 * 100 # At home
lessLocal = 203 /689 * 100 # Local
lessClose = 150/ 689 * 100 # Out of town but not far
lessFar = 55 / 689 * 100 # Drive for hours or fly
print(lessHome, lessLocal, lessClose, lessFar) 


In [None]:
#Percentages of Travel data for income > 150k
moreHome = 49 / 102 * 100 # At home
moreLocal = 25 /102 * 100 # Local
moreClose = 16/ 102 * 100 # Out of town but not far
moreFar = 12 / 102 * 100 # Drive for hours or fly
print(moreHome, moreLocal, moreClose, moreFar) 


<h3>About Income and Travel</h3>
The original instructions for this project left out any data for those who made 150k. Comparative statistics were added by calculating percentages. Those who made more than 150k were a bit more likely to host Thanksgiving at home or travel far. 

<h3>Thanksgiving, Friends, Age and Income</h3>

In [None]:
meetUp = data["Have you ever tried to meet up with hometown friends on Thanksgiving night?"]
friendsgiving = data['Have you ever attended a "Friendsgiving?"']
ageFriends = data.pivot_table(index=meetUp, columns=friendsgiving, values=["int_age"])
ageFriends

In [None]:
incomeFriends = data.pivot_table(index=meetUp, columns=friendsgiving, values=["int_income"])
incomeFriends

<h3>About Friends and Thanksgiving</h3>
People who have attended a "friendsgiving" and tried to meet up with hometown friends on Thanksgiving night, on average, were younger with less income. People who have never attended a "friendsgiving" were, on average, older with higher incomes. 

<h2>Foods at Thanksgiving</h2>
On to what really matters at Thanksgiving - the food. Lets see which foods and desserts are most popular at Thanksgiving?

<h3>Most Popular Side Dishes</h3>

In [None]:
sidecols = ['Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cauliflower',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Corn',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cornbread',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Fruit salad',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Green beans/green bean casserole',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Macaroni and cheese',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Mashed potatoes',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Rolls/biscuits',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Squash',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Vegetable salad',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Yams/sweet potato casserole']

In [None]:
sides = data[sidecols].reset_index()

In [None]:
numberSides = 980 - sides.isnull().sum()
numberSides

In [None]:
data["What kind of stuffing/dressing do you typically have?"].value_counts()

<h3>Results - Most Popular Side Dishes</h3>
Stuffing was (of course) the most poplular side from respondants. Bread based stuffing beat rice-based by a huge margin. Mashed potatoes came in second, followed by rolls/biscuts and green beens. Other vegetables did not fair so well, brussel sprouts and cauliflower are the least popular for Thanksgiving dinner. If rice based stuffing is isolated it turns out it is less popular than cauliflower. 

<h3>Desserts</h3>

In [None]:
dessertcols = ['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Buttermilk',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Cherry',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Chocolate',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Coconut cream',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Key lime',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Peach',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin',
       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Sweet Potato',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Carrot cake',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Fudge',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Ice cream',
       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Peach cobbler',]

In [None]:
desserts = data[dessertcols].reset_index()

In [None]:
numberDesserts = 980 - desserts.isnull().sum()
numberDesserts

<h3>Dessert Popularity</h3>
Pumpkin and apple pie blew all the other desserts away in popularity. Surprisingly, ice cream is not served more often with pie. Too bad they did not ask about whipped cream. After all, pumpkin pie with a little whipped cream is delicious.  

<h3>Food Reflection</h3>
It's interesting to see the different foods people eat at Thanksgiving. Perhaps it helps expand or tighten the menu at the next Thanksgiving. Maybe rice-based stuffing is the way to go if a guest is gluten-free? Or maybe it's time to try a sweet potato pie? In all cases - experiment, eat and enjoy.
