# Project Description

In this project some of the eating behaviour on Thanksgiving dinner in the US is explored. Also it takes a look how likely people are willing to travel and how likely they are willing to celebrate with friends based on their income. 

The [data](https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015) used in this projects comes from the [FiveThirtyEight](https://github.com/fivethirtyeight/data) repository and contains 1058 entries related to Thanksgiving generated based on a survey from 2015.

# Import Libaries

In [21]:
import os
import io
import urllib.request

from IPython.display import display

import pandas as pd
import numpy as np

# Set Global Varibles

In [22]:
URL = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/thanksgiving-2015/thanksgiving-2015-poll-data.csv'
PATH_DATASET = 'data/' + URL.split('/')[-1]

# Project Preparation

## Download the data

In [23]:
def download_github_csv_data(url):
    """Download a csv file and stores it in the data folder of the project repository.

    Args:
        URL of the csv file

    Returns:
        None
    """
    ### Create data dir if not exts
    if not os.path.exists('data/'):
        os.makedirs('data/')
    

    file = urllib.request.urlopen(url)

    df = pd.read_csv(io.TextIOWrapper(file))

    filename = url.split('/')[-1]
    path = 'data/' + filename
    df.to_csv(path, header=True, index=False, sep=',')

"""
Downloads the data to the data folder of a local repository after you run it once you can uncomment this lines.
To prevent the code from downloading the data every time you run the code.
"""   
download_github_csv_data(URL)

## Load the dataset

In [24]:
# Import the dataset in the var data
data = pd.read_csv(PATH_DATASET, encoding='Latin-1')
# Inspect the first 5 rows of the dataset
display(data.head())

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


# Analysis

## Inspect the column titles

In [25]:
### Print the column names
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

## Inspect the number of entries that celebrate Thanksgiving

In [26]:
# Inspect the number of entries that celebrate Thanksgiving
print(data['Do you celebrate Thanksgiving?'].value_counts())

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64


## Remove the entries that don't celebrate Thanksgiving

In [27]:
# Filter the entries that don't celebrate Thanksgiving
data = data[data['Do you celebrate Thanksgiving?'] == 'Yes']
print(data['Do you celebrate Thanksgiving?'].value_counts())

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64


## Inspect Thanksgiving main dishes

In [28]:
# Inspect the dishes of the entries that celebrate Thanksgiving
print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


## Inspect the count that had gravy that eat Tofurkey

In [29]:
# Inspect gravy for the entries that eat Tofurkey
print(data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']['Do you typically have gravy?'])

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


## Inspect the count of people eating classic pies

In [30]:
# Inspect which entries eat 'classic' pies 
apple_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'].isnull()
pumpkin_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'].isnull()
pecan_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'].isnull()
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(ate_pies.value_counts())

False    876
True     104
dtype: int64


## Convert the age column to integer

In [31]:
# Prepare the Age column
def prepare_age(age):

    if isinstance(age, float) and np.isnan(age):
        return None
    
    ### Using the lower age boundary
    age = age.split(' ')[0]
    age = age.replace('+', '')
    
    age = int(age)

    return age
    
data['int_age'] = data['Age'].apply(prepare_age)
print(data['int_age'].describe())

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64


The average age is in the around the age of 40.

## Convert the income column to integer

In [32]:
# Prepare the income column
def prepare_income(income):
    
    if isinstance(income, float) and np.isnan(income):
        return None
    
    income = income.split(' ')[0]
    
    if income == 'Prefer':
        return None
    
    income = income.replace('$', '')
    income = income.replace(',', '')
    
    income = int(income)
    
    return income

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(prepare_income)
print(data['int_income'].describe())

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64


The median income is in the around a income of 75K

## Inspect the correlation between income and the willingness to travel

In [33]:
# Inspect the correlation between income and travel distance
## Income below 150K
print('Income over 150,000')
print(data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts())
## Income equal or over 150K
print('Income below 150,000')
print(data[data['int_income'] >= 150000]['How far will you travel for Thanksgiving?'].value_counts())

Income over 150,000
Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
Income below 150,000
Thanksgiving is happening at my home--I won't travel at all                         66
Thanksgiving is local--it will take place in the town I live in                     34
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    25
Thanksgiving is out of town and far away--I have to drive several hours or fly      15
Name: How far will you travel for Thanksgiving?, dtype: int64


If the income is higher it is more likely that people travel for Thanksgiving

## Inspect the tendency of celebrate Thanksgiving with friends based on income and age

In [34]:
# Inspect tendency of celebrate Thanksgiving with friends based on income and age
display(pd.pivot_table(data, index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',\
               columns=['Have you ever attended a "Friendsgiving?"'],
              values='int_age'))

display(pd.pivot_table(data, index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',\
               columns=['Have you ever attended a "Friendsgiving?"'],
              values='int_income'))

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


The showing is that youger people more likely celebrate Thanksgiving with friends

Todo:
Figure out the most common dessert people eat.
Figure out the most common complete meal people eat.
Identify how many people work on Thanksgiving.
Find regional patterns in the dinner menus.
Find age, gender, and income based patterns in dinner menus.