# Analyzing Thanksgiving Dinner

### Import Data

In [2]:
import pandas as pd

In [3]:
data = pd.read_csv("thanksgiving.csv", encoding = "Latin-1")

In [4]:
data.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


In [5]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

### Filtering out Rows From a DataFrame

We remove the responses of persons who do not celebrate Thanksgiving. 

This remedies ending write-up concerns in regards to our analysis.

In [6]:
pd.value_counts(data["Do you celebrate Thanksgiving?"])

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [7]:
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

In [8]:
pd.value_counts(data["Do you celebrate Thanksgiving?"])

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

In [9]:
data.shape[0]

980

We successfully **removed** the 78 "No" answers to the "Do you celebrate Thanksgiving?" question of the Thanksgiving data set, leaving 980 entries.

### Exploring Main Thanksgiving Dishes 

In [10]:
main_dishes_count = pd.value_counts(data["What is typically the main dish at your Thanksgiving dinner?"])

In [11]:
main_dishes_count

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [12]:
Tofurkey_gravy_answer = data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]["Do you typically have gravy?"]

In [13]:
Tofurkey_gravy_answer

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

In [14]:
len(Tofurkey_gravy_answer)

20

We observed the common Thanksgiving main course/dish is Turkey. 

Furthemore, we observe a count of 20 people that ordered "Tofurkey" and wanted gravy with it.

### Main Thanksgiving Desert Dishes

In [15]:
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull()

In [16]:
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull()

In [17]:
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull()

In [18]:
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

In [28]:
print(ate_pies.unique)
print("---------")
print(ate_pies.value_counts())

<bound method Series.unique of 0       False
1       False
2       False
3       False
4       False
5        True
6       False
7        True
8       False
9       False
11      False
12      False
13      False
14      False
15       True
16      False
17      False
18      False
19      False
20      False
21       True
23      False
24      False
25      False
26      False
27      False
28      False
29      False
30      False
32      False
        ...  
1024    False
1025    False
1026    False
1027    False
1029    False
1030    False
1031    False
1033    False
1034    False
1035    False
1037     True
1038    False
1039    False
1040    False
1041    False
1042    False
1043     True
1044    False
1045    False
1046    False
1047    False
1048    False
1049    False
1050    False
1051    False
1053    False
1054    False
1055    False
1056     True
1057     True
dtype: bool>
---------
False    876
True     104
dtype: int64


We see there are 104 people who did not answer the questions of **Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply.**

### Converting Age to Numeric

In [31]:
data["Age"].head()

0    18 - 29
1    18 - 29
2    18 - 29
3    30 - 44
4    30 - 44
Name: Age, dtype: object

We can see there is no exact age for the "Age" column in the data "data"

In [34]:
def string_to_int(string):
    if pd.isnull(string) == True:
        return(None)
    split_string = string.split(" ")[0]
    string_no_plus = split_string.replace("+", "")
    result = int(string_no_plus)
    return(result)

data["int_age"] = data["Age"].apply(string_to_int)
data["int_age"].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

We see that the mean age is 40. We should be aware that we are identifying ages in, say, bins/increments. because of this grouping/estimatation of ages for each person, we could potentially see severe polarity of where the data is at. 


That is, we can see skewed age data because of groupings of data can lean more on one end than another.

This is not a true depiction of the survey participants' ages.

### Converting income to Numerical Values

In [41]:
def to_dollars(string):
    if pd.isnull(string) == True:
        return(None)
    string_split = string.split(" ")[0]
    if string_split == "Prefer":
        return(None)
    cleaned_value1 = string_split.replace("$", "")
    cleaned_value2 = cleaned_value1.replace(",","")
    result = int(cleaned_value2)
    return(result)

In [42]:
data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(to_dollars)
data["int_income"].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

The data skews downward because we took the first value in each string.


### Correlation of Distance Traveled and Income

In [43]:
data[data["int_income"] < 50000]["How far will you travel for Thanksgiving?"].value_counts()


Thanksgiving is happening at my home--I won't travel at all                         106
Thanksgiving is local--it will take place in the town I live in                      92
Thanksgiving is out of town but not too far--it's a drive of a few hours or less     64
Thanksgiving is out of town and far away--I have to drive several hours or fly       16
Name: How far will you travel for Thanksgiving?, dtype: int64

In [44]:
data[data["int_income"] > 150000]["How far will you travel for Thanksgiving?"].value_counts()


Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

In [45]:
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"',
    values="int_age"
)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744
