# Thanksgiving Dataset from FiveThirtyEight
### Challenge and questions from DataQuest

In [1]:
import pandas as pd
data = pd.read_csv("thanksgiving.csv",encoding="Latin-1")

In [2]:
print('NUMBER OF COLUMNS:', len(data.columns))
data.columns

NUMBER OF COLUMNS: 65


Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

#### Find out the main dish that people typically have during Thanksgiving.

In order to solve this question:
1. Filter and remove all those who do not celebrate thanksgiving
2. For all those that celebrate thanksgiving, create a frequency count for each dish

In [3]:
data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [4]:
no_thanksgiving = data['Do you celebrate Thanksgiving?'] != 'No'

In [5]:
# overwriting original data with only passed argument
data = data[no_thanksgiving]

In [6]:
# confirmation of overwriting dataframe
data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

In [7]:
data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

#### Given the question 'What is typically the main dish at your Thanksgiving dinner?', find out how many participants said Tofurkey and  have it with gravy.
To solve this question:
1. Filter for all those who responded with 'Tofurkey'
2. Find out how they replied 'Do you typically have gravy?'
3. Get a count for each unique value

In [8]:
# Filtering for tofurkey which returns a series with boolean values
# Passing that back into the dataframe to receive all rows where condition is true
main_dish_tofurkey = data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']
# Selecting the column, do you typically have gravy and printing the unique values and
# their corresponding frequency counts
print(main_dish_tofurkey['Do you typically have gravy?'].value_counts())


Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64


#### Find out how many people ate at least one of the following pies: 
Apple, Pumpkin or Pecan

In [9]:
apple_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'].isnull()
pumpkin_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'].isnull()
pecan_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'].isnull()
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_pies.value_counts()

False    876
True     104
dtype: int64

There are 876 people who ate at least one of the pies.

#### Create a function that converts the age group into an Integer and provide statistic summary for the age of participants.

In [26]:
def convert_age(x):
    if pd.isnull(x):
        return None
    age = x.split(' ')[0]
    age = age.replace('+','')
    return int(age)
# uses .apply() to iterate over the entire series with convert_age function()
int_age = data["Age"].apply(convert_age)
data["int_age"] = int_age
print(data["int_age"].describe())

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64


Things to be aware of:
1. The age range within each age group was inconsistent
2. By taking an age group and choosing the youngest age within each group to represent the group will most likely impact the average age.

#### Convert Income to Numeric

In [36]:
data["How much total combined money did all members of your HOUSEHOLD earn last year?"].value_counts()

$25,000 to $49,999      166
$50,000 to $74,999      127
$75,000 to $99,999      127
Prefer not to answer    118
$100,000 to $124,999    109
$200,000 and up          76
$10,000 to $24,999       60
$0 to $9,999             52
$125,000 to $149,999     48
$150,000 to $174,999     38
$175,000 to $199,999     26
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64

In [37]:
def convert_income(income):
    if pd.isnull(income):
        return None
    
    income = income.split(" ")[0]
    
    if income == 'Prefer':
        return None
    
    income = income.replace("$","")
    income = income.replace(",","")
    return int(income)

int_income = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(convert_income)
data["int_income"] = int_income
print(data["int_income"].describe())

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64


#### Correlating Travel Distance And Income

In [79]:
is_income_under_150000 = data["int_income"] < 150000
income_under_150000 = data[is_income_under_150000]["How far will you travel for Thanksgiving?"]
print(income_under_150000.value_counts())
print(len(income_under_150000))

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
689


In [80]:
is_income_over_150000 = data["int_income"] > 150000
income_over_150000 = data[is_income_over_150000]["How far will you travel for Thanksgiving?"]
print(income_over_150000.value_counts())
print(len(income_over_150000))

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64
102


In [87]:
income_under_150000.value_counts() / len(income_under_150000)* 100

Thanksgiving is happening at my home--I won't travel at all                         40.783745
Thanksgiving is local--it will take place in the town I live in                     29.462990
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    21.770682
Thanksgiving is out of town and far away--I have to drive several hours or fly       7.982583
Name: How far will you travel for Thanksgiving?, dtype: float64

In [88]:
income_over_150000.value_counts() / len(income_over_150000)* 100

Thanksgiving is happening at my home--I won't travel at all                         48.039216
Thanksgiving is local--it will take place in the town I live in                     24.509804
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    15.686275
Thanksgiving is out of town and far away--I have to drive several hours or fly      11.764706
Name: How far will you travel for Thanksgiving?, dtype: float64

Findings:
* People earning less than 150,000 have thanksgiving less at home than those earning more
* People earning less than 150,000 travel several hours out of town less than those who earn more than 150,000
* People earning more than 150,000 have dinners more at home
* People earning more than 150,000 travel several horus out of town more than those who 