# Analyzing Thanksgiving Dinner

In [2]:
import pandas
import numpy
data = pandas.read_csv('thanksgiving.csv', encoding='Latin-1')
data.head(3)


Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain


**pandas.Series.value_counts()** method displays counts of how many times each category occurs in the column that it is applied to

In [3]:
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

Below, we filter out any rows in data where response to *Do you celebrate Thanksgiving* is not *Yes*

In [4]:
data = data[data["Do you celebrate Thanksgiving?"]=='Yes']
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

# Main dishes that people eat at Thanksgiving Dinner:

In [5]:
data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

We now filter for people who have Tofurkey as main dish to see whether they typically have gravy or not.

In [6]:
data_tofurkey = data[data['What is typically the main dish at your Thanksgiving dinner?']=='Tofurkey']
print(data_tofurkey['Do you typically have gravy?'].value_counts())

Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64


# Exploring the dessert dishes

By combining the three boolean together in *ate_no_pies*,  we get a count of **False** for occurrences where person **ate at least 1 type of pie**. The **True** count indicates that **they ate none of the types of pies**. 

In [7]:
apple_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'].isnull()
pumpkin_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'].isnull()
pecan_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'].isnull()

ate_no_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_no_pies.value_counts()

False    876
True     104
dtype: int64

# Analyzing Age

In [8]:
def get_age(age):
    if pandas.isnull(age):
        return None
    age = age.split(' ')[0]
    age = age.strip('+')
    return int(age)
data['int_age'] = data['Age'].apply(get_age)
data['int_age'].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

## Findings
Since we took only the lower bound value in each string, the age is skewed downward. Also, we see that the age groups of respondents are fairly evenly distributed.

# Analyzing Income

In [9]:
def get_income(income):
    if pandas.isnull(income):
        return None
    income = income.split(' ')[0]
    if income == 'Prefer':
        return None
    income = income.replace('$', '')
    income = income.replace(',', '')
    return(int(income))
data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(get_income)
data['int_income'].describe()


count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

## Findings
There is large standard deviation and although we took the lower bound in each category, the average income is fairly high.

# Correlating Travel Distance And Income

In [10]:
#counting number of entries below and above given income

less_income_count = data[data['int_income'] < 150000]
print(len(less_income_count))
more_income_count = data[data['int_income'] >= 150000]
print(len(more_income_count))


689
140


## Breakup of Travel Distance for income < 150,000

In [11]:
less_than_150000 = data[data['int_income'] < 150000] 
travelled = less_than_150000['How far will you travel for Thanksgiving?'] 
print(travelled.value_counts())
print('----------------------------------------')
print(travelled.value_counts(normalize = True))

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
----------------------------------------
Thanksgiving is happening at my home--I won't travel at all                         0.407837
Thanksgiving is local--it will take place in the town I live in                     0.294630
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    0.217707
Thanksgiving is out of town and far away--I have to drive several hours or fly      0.079826
Name: How far will you travel for Thanksgiving?, dtype: float64


## Breakup of Travel Distance for income > 150,000

In [12]:
more_than_150000 = data[data['int_income'] > 150000] 
travelled = more_than_150000['How far will you travel for Thanksgiving?'] 
print(travelled.value_counts()) 
print('----------------------------------------')
print(travelled.value_counts(normalize = True))

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64
----------------------------------------
Thanksgiving is happening at my home--I won't travel at all                         0.480392
Thanksgiving is local--it will take place in the town I live in                     0.245098
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    0.156863
Thanksgiving is out of town and far away--I have to drive several hours or fly      0.117647
Name: How far will you travel for Thanksgiving?, dtype: float64


## Findings
Normalization of the distribution for the two categories reveals a slight correlation in the income and number of people who stay at home and won't travel. 

|Normalization|Income < 150,000|Income > 150,000|
|---|---|---|
|Staying at home|0.407837 | 0.480392|


# Linking Friendship and Age

In [13]:
data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"', values='int_age')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [14]:
data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"', values='int_income')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


## Findings
The average age for people meeting up with hometown friends and attending Friendsgiving is lowest. On the other end of the spectrum, People who replied **No** to both questions have a higher average age.

The same is corroborated by the income - Lower for younger age group and higher for people with higher average age.

# Most common dessert
We will return to the cell variables that we used while we were exploring dessert dishes

In [15]:
print(len(data[apple_isnull == False]))
print(len(data[pumpkin_isnull == False]))
print(len(data[pecan_isnull == False]))


514
729
342


## Findings
Pumpkin pie is the most common dessert followed by apple pie and pecan pie.

# People working on Thanksgiving


In [16]:
data['Will you employer make you work on Black Friday?'].value_counts()

Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64

## Findings
From the people who responded, it appears twice as many people work on Thanksgiving than those who don't.

# Regional Patterns in Dinner Menus

In [17]:
grouped = data.groupby("US Region")
grouped.size()

US Region
East North Central    145
East South Central     56
Middle Atlantic       145
Mountain               41
New England            55
Pacific               130
South Atlantic        203
West North Central     71
West South Central     85
dtype: int64

## US Region wise dinner menu

In [25]:
data.groupby("US Region")["What is typically the main dish at your Thanksgiving dinner?"].value_counts()

US Region           What is typically the main dish at your Thanksgiving dinner?
East North Central  Turkey                                                          135
                    Other (please specify)                                            5
                    Ham/Pork                                                          4
                    Tofurkey                                                          1
East South Central  Turkey                                                           50
                    Other (please specify)                                            4
                    Ham/Pork                                                          1
                    Roast beef                                                        1
Middle Atlantic     Turkey                                                          130
                    Tofurkey                                                          5
                    Other (please speci

## Dinner menu as per place of living

In [19]:
data.groupby("How would you describe where you live?")["What is typically the main dish at your Thanksgiving dinner?"].value_counts(normalize=True)


How would you describe where you live?  What is typically the main dish at your Thanksgiving dinner?
Rural                                   Turkey                                                          0.875000
                                        Other (please specify)                                          0.041667
                                        Ham/Pork                                                        0.032407
                                        I don't know                                                    0.013889
                                        Tofurkey                                                        0.013889
                                        Chicken                                                         0.009259
                                        Turducken                                                       0.009259
                                        Roast beef                                                      0.00

## Findings
Consumption of Tofurkey is 3 times more in urban area than suburban or rural areas.