# Introducing Thanksgiving Dinner Data
- Import the pandas package.
- Use the `pandas.read_csv()` function to read the thanksgiving.csv file in.
- Display the first few rows of data to see what the columns and rows look like.
- In a separate notebook cell, display all of the column names to get a sense of what the data consists of.

In [2]:
import pandas
data = pandas.read_csv('thanksgiving.csv', encoding='Latin-1')

print("First 5 rows -")
print(data[0:5])

First 5 rows -
   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   
2    4337935621                            Yes   
3    4337933040                            Yes   
4    4337931983                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             
2                                             Turkey             
3                                             Turkey             
4                                           Tofurkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                                      
2             

In [3]:
print("Columns -")
print(data.columns)

Columns -
Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypica

# Filtering Out Rows From A DataFrame
- Use the `pandas.Series.value_counts()` method to display counts of how many times each category occurs in the Do you celebrate Thanksgiving? column.
- Filter out any rows in data where the response to `Do you celebrate Thanksgiving?` is not `Yes`. 

In [4]:
print("Values that the column takes - ")
print(data['Do you celebrate Thanksgiving?'].value_counts())

Values that the column takes - 
Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64


Filter the columns having `Yes` as an answer and assign the filtered result to `data`

In [5]:
data = data[data['Do you celebrate Thanksgiving?'] == 'Yes']

# Using value_counts To Explore Main Dishes
- Use the `pandas.Series.value_counts()` method to display counts of how many times each category occurs in the `What is typically the main dish at your Thanksgiving dinner?` column.
- Display the `Do you typically have gravy?` column for any rows from `data` where the `What is typically the main dish at your Thanksgiving dinner?` column equals `Tofurkey`.

In [6]:
main_dishes = data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()
print(main_dishes)

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


In [7]:
tofurkey_people = data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']
tofurkey_with_gravy = tofurkey_people['Do you typically have gravy?']
print(tofurkey_with_gravy)

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


# Figuring Out What Pies People Eat
Generate 3 Series objects for the people who ate pies. Join all three Series using the & operator, and assign the result to `ate_pies`. 

In [8]:
apple_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])

ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

print(ate_pies.value_counts())

False    876
True     104
dtype: int64


# Converting Age To Numeric
- Write a function to convert a single string to an appropriate integer value. This will allow us to convert the values in the `Age` column to integers.
- Use the `pandas.Series.apply()` method to apply the function to each value in the `Age` column of data. Assign the result to `int_age`.
- Call the `pandas.Series.describe()` method on the `int_age` column of `data`, and display the result.

In [9]:
def convert_age_col_numeric(val_string):
    if pandas.isnull(val_string):
        return None
    elif val_string[-1] == '+':
        val_string = val_string[:-1]
        return int(val_string)
    else:
        val_string_components = val_string.split(' ')
        return int(val_string_components[0])

int_age = data['Age'].apply(convert_age_col_numeric)
print("int_age first few rows - ")
print(int_age[0:5])

int_age.describe()

int_age first few rows - 
0    18.0
1    18.0
2    18.0
3    30.0
4    30.0
Name: Age, dtype: float64


count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: Age, dtype: float64

# Converting Income To Numeric
- Write a function to convert a single string to an appropriate integer income value.
- Use `apply()` like the above example, to convert each value in the `How much total combined money did all members of your HOSEHOLD earn last year?` column.
- Use `describe()`.

In [10]:
def convert_income_col_numerical(val_string):
    if pandas.isnull(val_string):
        return None
    else:
        val_words = val_string.split(' ')
        if val_words[0] == 'Prefer':
            return None
        else:
            first_val_string = val_words[0].replace('$', '')
            first_val_string = first_val_string.replace(',', '')
            return int(first_val_string)

int_income = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(convert_income_col_numerical)

print('First few values of int_income - ')
print(int_income[0:5])

print(int_income.describe())

First few values of int_income - 
0     75000.0
1     50000.0
2         0.0
3    200000.0
4    100000.0
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: float64
count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: float64


# Correlating Travel Distance And Income
- See how far people earning under `150000` will travel.
- See how far people earning over `150000` will travel.

In [11]:
income_under_150000 = data[int_income < 150000]
income_over_150000 = data[int_income > 150000]

print('Income under 150000 first few rows - ')
print(income_under_150000[0:5])

print('Income over 150000 first few rows - ')
print(income_over_150000[0:5])

Income under 150000 first few rows - 
   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   
2    4337935621                            Yes   
4    4337931983                            Yes   
5    4337929779                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             
2                                             Turkey             
4                                           Tofurkey             
5                                             Turkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                              

In [12]:
income_under_150000['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [13]:
income_over_150000['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

# Linking Friendship And Age

- Generate a pivot table showing the average age of respondents for each category of `Have you ever tried to meet up with hometown friends on Thanksgiving night?` and `Have you ever attended a "Friendsgiving?`.
- Generate a pivot table showing the average income of respondents for each category of `Have you ever tried to meet up with hometown friends on Thanksgiving night?` and `Have you ever attended a "Friendsgiving?`.

In [17]:
data['int_age'] = int_age
data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', 
                 columns='Have you ever attended a "Friendsgiving?"', 
                 values='int_age')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [19]:
data['int_income'] = int_income
data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?', 
                 columns='Have you ever attended a "Friendsgiving?"', 
                 values='int_income')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


-----
# Extra assignment
-----
- Figure out the most common dessert people eat.
- Figure out the most common complete meal people eat.
- Identify how many people work on Thanksgiving.
- Find regional patterns in the dinner menus.
- Find age, gender, and income based patterns in dinner menus.

In [39]:
dessert_columns = [s for s in data.columns if 'Which of these desserts' in s]
dessert_columns_trimmed = dessert_columns[:-3]

dessert_counts = []

for dessert_col in dessert_columns_trimmed:
    dessert_counts.append({"column_name": dessert_col.split('-')[1][1:], 
                      "column_values": data[dessert_col].value_counts()[0]})
dessert_counts

[{'column_name': 'Apple cobbler', 'column_values': 110},
 {'column_name': 'Blondies', 'column_values': 16},
 {'column_name': 'Brownies', 'column_values': 128},
 {'column_name': 'Carrot cake', 'column_values': 72},
 {'column_name': 'Cheesecake', 'column_values': 191},
 {'column_name': 'Cookies', 'column_values': 204},
 {'column_name': 'Fudge', 'column_values': 43},
 {'column_name': 'Ice cream', 'column_values': 266},
 {'column_name': 'Peach cobbler', 'column_values': 103}]

In [41]:
data[dessert_columns[-1]].value_counts()

Other (please specify)    134
Name: Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Other (please specify), dtype: int64