## Analyzing Thanksgiving Dinner

In [2]:
import pandas as pd
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


# Filtering data with only those who celebrate Thanksgiving

In [3]:
# Total value counts in "Do you celebrate Thanksgiving" column
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [4]:
# Keep only rows where the answer to "Do you celebrate Thanksgiving?" is 'Yes'
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

# Exploring Main Dishes for Thanksgiving

In [5]:
# Type of Main Dish at Thanksgiving dinner
data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()


Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [6]:
# Display "Do you typically have gravy?" column if Main ish is Tofurkey
data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]["Do you typically have gravy?"]


4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

# Exploring different pies people ate for Thanksgiving

In [7]:
# apple pie is null
apple_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
# pumpkin pie is null
pumpkin_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
# pecan pie is null
pecan_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])

# people who ate atleast one pie 
ate_pies = (apple_isnull & pumpkin_isnull & pecan_isnull)
ate_pies.value_counts()

False    876
True     104
dtype: int64

# Evaluating Age of respondents

In [8]:
data["Age"].value_counts()

45 - 59    269
60+        258
30 - 44    235
18 - 29    185
Name: Age, dtype: int64

In [9]:
# Finding the age details of the participants
# convert strings to integers

def age_converter(value):
    if pd.isnull(value):
        return None
    value = value.split(" ")[0]
    value = value.replace("+", " ")
    return int(value)
data["int_age"] = data["Age"].apply(age_converter)
data["int_age"].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

Analysis: 
* Looks like the age is pretty evenly distributed ranging from 18 to 60 (even when we have taken only the first values in each age range)

## Analyzing salary of the respondents

In [10]:
data["How much total combined money did all members of your HOUSEHOLD earn last year?"].value_counts()

$25,000 to $49,999      166
$75,000 to $99,999      127
$50,000 to $74,999      127
Prefer not to answer    118
$100,000 to $124,999    109
$200,000 and up          76
$10,000 to $24,999       60
$0 to $9,999             52
$125,000 to $149,999     48
$150,000 to $174,999     38
$175,000 to $199,999     26
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64

In [11]:
# Finding the salary details of the participants
# convert string to integer

def income_converter(value):
    if pd.isnull(value):
        return None
    value = value.split(" ")[0]
    if value == "Prefer":
        return None
    value = value.replace("$", "")
    value = value.replace(",", "")
    return int(value)
data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(income_converter)
data["int_income"].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

# Analzing salary of the respondent and how far theya re willing to travel for Thansgiving

In [12]:
# Relation between how far people travel for thanksgiving and their income
# people below 150K
data[data["int_income"] < 150000]["How far will you travel for Thanksgiving?"].value_counts()


Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [13]:
# people above 150K
data[data["int_income"] > 150000]["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

Analysis: 
* It looks like people with less than 150K income travel more to other towns for Thanksgiving than the people whoes icome is greater than 150K. This wmight as well be because of the lesser earning people might be students or bachelors who travel to visit their families.

# Analyzing the relationship between Age and Friendship

In [15]:
# people who meet frinds or celebrate Friendsgiving - with Age 
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"', 
    values="int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


# Analyzing the relationship between Income and Friendship

In [16]:
# people who meet friends or celebrate Friendsgiving - with Income
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"', 
    values="int_income")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


Analysis: 
* Younger people tend to meet friends on Thaksgiving more than other age groups. 

# Desserts respondents have for Thanksgiving

In [39]:
# Most common dessert people eat
import numpy as np
import pandas as pd

apple_cobbler = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler"].value_counts()
print(apple_cobbler)


blondies = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies"].value_counts()
print(blondies)


brownies = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies"].value_counts()
print(brownies)


carrot_cake = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Carrot cake"].value_counts()
print(carrot_cake)


cheese_cake = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake"].value_counts()
print(cheese_cake)


cookies = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies"].value_counts()
print(cookies)


fudge = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Fudge"].value_counts()
print(fudge)


ice_cream = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Ice cream"].value_counts()
print(ice_cream)


peach_cobbler = data["Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Peach cobbler"].value_counts()
print(peach_cobbler)



all_pies = pd.concat((apple_cobler, blondies, brownies, carrot_cake, 
                      cheese_cake, cookies, fudge, ice_cream, peach_cobbler), axis = 0)
all_pies

Apple cobbler    110
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler, dtype: int64
Blondies    16
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies, dtype: int64
Brownies    128
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies, dtype: int64
Carrot cake    72
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Carrot cake, dtype: int64
Cheesecake    191
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake, dtype: int64
Cookies    204
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies, dtype: int64
Fudge    43
Name: Which of these desserts do you typically have at Thanksgiving dinner? Pl

Apple cobbler    110
Blondies          16
Brownies         128
Carrot cake       72
Cheesecake       191
Cookies          204
Fudge             43
Ice cream        266
Peach cobbler    103
dtype: int64

Analysis:
* Most preferred dessert for Thanksgiving is 'Ice Cream' (266) * Next most common are 'Cookies'(204) and 'Cheese Cake' (191)
* People do like 'Apple Cobbler' (110) and 'Brownies' (128) and 'Peach cobbler' (103)
* 'Blondies' is at the bottom of the list (16)

