# Analyzing Thanksgiving dinner data

In [1]:
import pandas as pd

data = pd.read_csv('thanksgiving.csv', encoding='Latin-1')
print(data[:5])

   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   
2    4337935621                            Yes   
3    4337933040                            Yes   
4    4337931983                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             
2                                             Turkey             
3                                             Turkey             
4                                           Tofurkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                                      
2                            

In [2]:
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [3]:
counts = (data['Do you celebrate Thanksgiving?'] == 'Yes').value_counts()

print(counts)

True     980
False     78
Name: Do you celebrate Thanksgiving?, dtype: int64


Reducing the dataset to respondents who actually celebrate Thanksgiving

In [4]:
data_yes = data[data['Do you celebrate Thanksgiving?'] == 'Yes']
print(data_yes.shape)
print(data_yes[:5])

(980, 65)
   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   
2    4337935621                            Yes   
3    4337933040                            Yes   
4    4337931983                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             
2                                             Turkey             
3                                             Turkey             
4                                           Tofurkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                                      
2                  

Let's explore what main dishes people tend to eat during Thanksgiving dinner.

In [5]:
main_dishes = (data_yes['What is typically the main dish at your Thanksgiving dinner?']).value_counts()

print(main_dishes[:5])

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


In [6]:
tofurkey_data = data_yes[data_yes['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']

print(tofurkey_data['Do you typically have gravy?'])

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


Specifically, we'll look at how many people eat Apple, Pecan, or Pumpkin pie during Thanksgiving dinner.

In [7]:
apple_isnull = pd.isnull(data_yes['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data_yes['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data_yes['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])

ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

print(ate_pies.value_counts())

False    876
True     104
dtype: int64


Let's analyze the Age column in more depth. In order to analyze the Age column, we'll first need to convert it to numeric values. This will make it simple to figure out things like the average age of survey respondents.

In [8]:
def convert_age_to_int(column):
    if pd.isnull(column):
        return None
    
    age = column.split(' ')[0]
    
    int_age = 0
    if '+' in age:
        int_age = int(age.replace('+', ''))
    else:
        int_age = int(age)
    
    return int_age

In [9]:
data_yes['int_age'] = data_yes['Age'].apply(convert_age_to_int)
data_yes['int_age'].describe()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

It is not a true depiction of the participants as we're using the lower end of each age window.

Now let's convert the income to numeric.

In [10]:
def convert_money_to_int(value):
    if pd.isnull(value):
        return None
    
    income = value.split(' ')[0]
    if income == 'Prefer':
        return None
    
    income = income.replace('$', '')
    income = income.replace(',', '')
    
    return int(income)

In [11]:
data_yes['int_income'] = data_yes['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(convert_money_to_int)
data_yes['int_income'].describe()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

Once more, this only pictures earning frames.

We can now see how the distance someone travels for Thanksgiving dinner relates to their income level. It's safe to hypothesize that people earning less money could be younger, and would travel to their parent's houses for Thanksgiving. People earning more are more likely to have Thanksgiving at their house as a result.

In [12]:
less_150000 = data_yes[data_yes['int_income'] < 150000]
far_less_150000 = less_150000['How far will you travel for Thanksgiving?']
print(far_less_150000.value_counts())
# print(far_less_150000.value_counts() / len(far_less_150000) * 100)

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64


Most of the people that earn less than 150000$ have Thanksgiving at home.

In [16]:
over_150000 = data_yes[data_yes['int_income'] > 150000]
far_over_150000 = over_150000['How far will you travel for Thanksgiving?']
print(far_over_150000.value_counts())
# print(far_over_150000.value_counts() / len(far_over_150000) * 100)

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64


The distribution for families who earn over 150000 is roughly the same as for the families who earn less than that.

In the US, a "Friendsgiving" is when instead of traveling home for the holiday, you celebrate it with friends who live in your area. Both questions seem skewed towards younger people. Let's see if this hypothesis holds up.

In [14]:
data_yes.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"', values='int_age')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [15]:
data_yes.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"', values='int_income')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


Younger people attend more "Friendsgiving".