# US Thanksgiving Data
The dataset has 65 columns, and 1058 rows. Most of the column names are questions, and most of the column values are string responses to the questions. Most of the columns are categorical, as a survey respondent had to select one of a few options. For example, one of the first column names is "What is typically the main dish at your Thanksgiving dinner?". The potential responses are:

- Turkey
- Other (please specify)
- Ham/Pork
- Tofurkey
- Chicken
- Roast beef
- I don't know
- Turducken

In [44]:
import pandas as pd
data = pd.read_csv("thanksgiving.csv",encoding="Latin-1")

First, we want to filter out only those who celebrate thanksgiving by filtering out those who answered No to "Do you celebrate Thanksgiving?".

In [89]:
celebrate_count = data["Do you celebrate Thanksgiving?"].value_counts()
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

## Using value_counts

Now, let's see what people eat as their main dish. To do this, we can use the 'value_counts()' function on the "What is typically the main dish at your Thanksgiving dinner?".

In [101]:
main_dish = data["What is typically the main dish at your Thanksgiving dinner?"]
main_dish_count = main_dish.value_counts()
print(main_dish_count)

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


Hmm, 20 people have Tofurkey (yuck!). I wonder if they have gravy with that? We can find that out below by printing the responses for "Do you typically have gravy?" only for those who have Tofurkey.

In [90]:
gravy_or_nah = data["Do you typically have gravy?"]
tofurkey_w_gravy = data[(main_dish == "Tofurkey")]
print(tofurkey_w_gravy["Do you typically have gravy?"])

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


As we can see, most people (12/20) who actually have Tofurkey prefer to have it with gravy...because without it, it'd be bland obviously.

## Do most people have pie with isnull
Next, we find out whether or not people eat pie with their main dish.

In [102]:
data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].value_counts()

Apple    514
Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple, dtype: int64

In [91]:
apple_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
pumpkin_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
pecan_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])


ate_pies = (apple_isnull) & (pumpkin_isnull) & (pecan_isnull)
print(ate_pies.value_counts())


False    876
True     104
dtype: int64


Surprise, surprise, most people (876) have pie (either apple,pumpkin, or pecan) with their main dish!

## Creating a function to get age
Because the dataset contains string age in ranges:

- 18 - 29
- 30 - 44
- 45 - 59
- 60+
- null


We want to extract an exact integer value. This can be done by creating a function that we pass into the pandas.series.apply() method that'll literally apply out function to each element.

In [92]:
import re

def to_int_age(age_row):
    if pd.isnull(age_row):
        return None
    elif re.match("[1-4][0-8] - [2-5][4-9]", age_row):
        age_list = age_row.split(" ")
        return int(age_list[0])
    else:
        new_age = age_row.replace("+","")
        return int(new_age)

data["int_age"] = data["Age"].apply(to_int_age)
data["int_age"].describe()


count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

Although we only have a rough approximation of age, and it skews downward because we took the first value in each string (the lower bound), we can see that that age groups of respondents are fairly evenly distributed from the percentile range values (25%/50%/75%)

## Creating a function to get income
Similarly, income is also in ranges. We can apply the same tactic as before to this column.

In [93]:
print(data["How much total combined money did all members of your HOUSEHOLD earn last year?"].value_counts())

$25,000 to $49,999      166
$50,000 to $74,999      127
$75,000 to $99,999      127
Prefer not to answer    118
$100,000 to $124,999    109
$200,000 and up          76
$10,000 to $24,999       60
$0 to $9,999             52
$125,000 to $149,999     48
$150,000 to $174,999     38
$175,000 to $199,999     26
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64


In [94]:
def to_int_inc(inc_row):
    if pd.isnull(inc_row):
        return None
    inc_list = inc_row.split(" ")
    if inc_list[0] == "Prefer":
        return None
    else:
        inc_list_new = inc_list[0].replace("$","").replace(",","")
        return int(inc_list_new)

data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(to_int_inc)
data["int_income"].describe()
        

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

## Do lower income individuals go home for thanksgiving?

We hypothesise that those with less income are younger and thus prefer to travel back home to celebrate, while those with higher income will have it at their own home.

In [96]:
data[data["int_income"] < 150000]["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [97]:
data[data["int_income"] > 150000]["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

It appears that proportion of people with high income celebrating Thanksgiving at home is higher than that of people with low income. This may be because younger students, who don't have a high income, tend to go home, whereas parents, who have higher incomes, don't.

## Is 'Friendsgiving' really a generational thing?

Next, we hypothesise that celebrating 'Friendsgiving' is fairly new phenomena, and organised mainly by younger age groups. Below I will investigate the average age of individuals who answered Yes and No to the following questions. We can easily show this information using a pivot table.

In [103]:
data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns='Have you ever attended a "Friendsgiving?"',values="int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


Huh, so it is for the young.