# Thanksgiving Day Menu Exploration

This notebook explores Thanksgiving Day Menu dataset on the following subjects:

- Figure out the most common dessert people eat.
- Figure out the most common complete meal people eat.
- Identify how many people work on Thanksgiving.
- Find regional patterns in the dinner menus.
- Find age, gender, and income based patterns in dinner menus.

In [6]:
import pandas as pd

data = pd.read_csv('thanksgiving.csv', encoding='Latin-1')
#data.head()

In [2]:
#data.columns.values

In [4]:
# Filter out to only those who celebrate Thanksgiving
data = data[data['Do you celebrate Thanksgiving?']=='Yes']

# Most common complete meal people eat

In [60]:
phrase_search = 'What is typically the main dish'
colnames_maindish = []
for row in data.columns.values:
    if phrase_search in row:
        colnames_maindish.append(row)

In [61]:
#gathering all unique values and occurence count across all dessert types
maindish_type = pd.Series(name='Main Dish')
for colname in colnames_maindish:
    val_count_series = data[colname].value_counts()
    maindish_type = maindish_type.append(val_count_series)

In [62]:
maindish_type_sorted = maindish_type.sort_values(ascending=False)
print(maindish_type_sorted)

Turkey                                                                                   859
Other (please specify)                                                                    35
Ham/Pork                                                                                  29
Tofurkey                                                                                  20
Chicken                                                                                   12
Roast beef                                                                                11
I don't know                                                                               5
Turducken                                                                                  3
seafood                                                                                    2
Turkey and Ham                                                                             2
Prime Rib                                                             

### Findings on most common Main Dish
Traditional Turkey is by far the most common Main Dish during Thanksgiving

# Most common dessert

In [51]:
#get a list of rows with 'dessert' word in it
phrase_search = 'dessert'
colnames_dessert = []
for row in data.columns.values:
    if phrase_search in row:
        colnames_dessert.append(row)

In [52]:
#gathering all unique values and occurence count across all dessert types
dessert_types = pd.Series(name='Dessert Types')
for colname in colnames_dessert:
    val_count_series = data[colname].value_counts()
    dessert_types = dessert_types.append(val_count_series)

In [58]:
dessert_types_sorted = dessert_types.sort_values(ascending=False)
dessert_types_sorted.drop('None', inplace=True)
print(dessert_types_sorted)

Ice cream                                                      266
Cookies                                                        204
Cheesecake                                                     191
Other (please specify)                                         134
Brownies                                                       128
Apple cobbler                                                  110
Peach cobbler                                                  103
Carrot cake                                                     72
Fudge                                                           43
Blondies                                                        16
pie                                                             13
Pie                                                             12
pies                                                             6
Pumpkin pie                                                      4
pumpkin pie                                                   

### Findings on Most commone Dessert
People like IceCream, Cookies and Cheesecakes most of all as a dessert

# How many people work on Thanksgiving

In [74]:
#People answered about work on the day of Thanksgiving
data['Will you employer make you work on Black Friday?'].value_counts()

Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64

### Findings on work during Thanksgiving
Only a few people answered the question, so we can not give an answer.
Also can assume that (even though it can be a stretch) those who have to work on Thanksgiving, answered the question more frequently.

# Regional patterns in the dinner menus

In [83]:
# Function to create crosstab pattern

def crosstab_pattern(colnames):
    
    # Function calculates crosstab pattern
    # and sorts it by Turkey column in a descending order
    
    # Make a crosstab
    df_per_dish_crosstab = pd.crosstab(data[colnames[0]],data[colnames[1]])
    
    # Calculate dish percentage per group
    s_group_counts = data[colnames[0]].value_counts() # Returns Series with count of each value in a group
    df_per_dish = df_per_dish_crosstab # Assign dataframe to a new variable
    for groupname in s_group_counts.index.values:
        df_per_dish.loc[groupname] = df_per_dish_crosstab.loc[groupname]/s_group_counts[groupname]*100
    
    # return Sorted dish by Turkey
    return df_per_dish.sort_values(by='Turkey', ascending=False)

In [100]:
# Define Colnames
colnames = ['US Region',
    'What is typically the main dish at your Thanksgiving dinner?']

# Get crosstab pattern
df_pattern = crosstab_pattern(colnames)

In [101]:
df_pattern

What is typically the main dish at your Thanksgiving dinner?,Chicken,Ham/Pork,I don't know,Other (please specify),Roast beef,Tofurkey,Turducken,Turkey
US Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
East North Central,0.0,2.666667,0.0,3.333333,0.0,0.666667,0.0,90.0
New England,3.448276,0.0,0.0,1.724138,0.0,1.724138,0.0,87.931034
West South Central,1.098901,2.197802,0.0,3.296703,0.0,2.197802,0.0,84.615385
South Atlantic,1.401869,3.271028,0.0,2.803738,1.401869,1.401869,0.0,84.579439
East South Central,0.0,1.666667,0.0,6.666667,1.666667,0.0,0.0,83.333333
Middle Atlantic,0.628931,1.257862,0.0,2.515723,1.257862,3.144654,0.628931,81.761006
West North Central,1.351351,5.405405,1.351351,4.054054,0.0,2.702703,0.0,81.081081
Mountain,2.12766,2.12766,0.0,0.0,0.0,4.255319,0.0,78.723404
Pacific,0.0,4.109589,0.684932,6.164384,0.684932,2.739726,1.369863,73.287671


### Findings on Regional dish pattern
East North Centrals are more into Turkey than in any other Region.

People in Mountain and Pacific Regions tend to have less Turkey and prefer variety of other options.

# Age, gender and income based patterns in dinner menus

## Age to Dish pattern

In [102]:
# Define Colnames
colnames = ['Age',
    'What is typically the main dish at your Thanksgiving dinner?']

# Get crosstab pattern
df_pattern = crosstab_pattern(colnames)

In [103]:
df_pattern

What is typically the main dish at your Thanksgiving dinner?,Chicken,Ham/Pork,I don't know,Other (please specify),Roast beef,Tofurkey,Turducken,Turkey
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
60+,1.515152,0.757576,0.0,3.409091,1.136364,1.136364,0.378788,89.393939
45 - 59,0.699301,2.797203,0.0,4.895105,1.048951,0.699301,0.0,83.916084
30 - 44,1.158301,4.633205,0.3861,3.861004,0.3861,3.474903,0.772201,76.061776
18 - 29,1.388889,2.777778,1.388889,0.925926,1.388889,2.777778,0.0,75.0


#### Findings on Age and Dish correlation
There is a linear tendency: the older the more traditional people are in their dinner preferences. Older people are more likely to have Turkey as their main dish.
While younger people have higher variety. Also notably young people have higher percentage of a vegetarian option "Tofurkey" as their choice.

## Gender to Dish pattern

In [104]:
# Define Colnames
colnames = ['What is your gender?',
    'What is typically the main dish at your Thanksgiving dinner?']

# Get crosstab pattern
df_pattern = crosstab_pattern(colnames)

In [105]:
df_pattern

What is typically the main dish at your Thanksgiving dinner?,Chicken,Ham/Pork,I don't know,Other (please specify),Roast beef,Tofurkey,Turducken,Turkey
What is your gender?,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Female,1.102941,2.757353,0.367647,4.044118,0.735294,2.389706,0.183824,83.088235
Male,1.247401,2.702703,0.4158,2.702703,1.247401,1.455301,0.4158,79.62578


### Findings on Gender to Dish pattern
The distribution is relatively similar both for Males as well as Females.
Also it seems that Females are more likely to answer the question about the dish. Therefore they have higher percentage.

## Income to Dish pattern

In [106]:
#Define colnames
colnames = ['How much total combined money did all members of your HOUSEHOLD earn last year?',
    'What is typically the main dish at your Thanksgiving dinner?']

# Get income to main dish patterns as a crosstab
df_pattern = crosstab_pattern(colnames)

In [107]:
df_pattern

What is typically the main dish at your Thanksgiving dinner?,Chicken,Ham/Pork,I don't know,Other (please specify),Roast beef,Tofurkey,Turducken,Turkey
How much total combined money did all members of your HOUSEHOLD earn last year?,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"$100,000 to $124,999",0.900901,1.801802,0.0,3.603604,0.0,2.702703,0.0,89.189189
"$125,000 to $149,999",0.0,2.040816,0.0,4.081633,0.0,4.081633,0.0,87.755102
"$50,000 to $74,999",0.0,1.481481,0.740741,2.222222,1.481481,0.740741,0.0,87.407407
"$175,000 to $199,999",0.0,0.0,0.0,7.407407,0.0,3.703704,0.0,85.185185
"$150,000 to $174,999",0.0,2.5,0.0,5.0,2.5,0.0,0.0,85.0
"$200,000 and up",1.25,2.5,0.0,2.5,0.0,1.25,2.5,85.0
"$75,000 to $99,999",0.0,6.015038,0.0,3.759398,0.0,1.503759,0.0,84.210526
"$25,000 to $49,999",1.666667,3.333333,0.0,4.444444,0.555556,2.222222,0.0,80.0
Prefer not to answer,1.470588,0.735294,0.735294,2.941176,1.470588,2.205882,0.735294,76.470588
"$10,000 to $24,999",4.411765,5.882353,0.0,4.411765,1.470588,2.941176,0.0,69.117647


### Findings on Income to Dish pattern
It is very evident, that groups with lower income tend not to have a traditional Turkey as their main Dish
People with 50k+ income prefer a traditional Turkey dish during Thanksgiving.