# Thanksgiving dinner in the US
### The dataset came from [FiveThirtyEight](https://fivethirtyeight.com/), and can be found [here](https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015).

> * The dataset has 65 columns, and 1058 rows. 

> * Most of the column names are questions, and most of the column values are string responses to the questions.

### Here are descriptions of some of the most important columns:

* RespondentID -- a unique ID of the respondent to the survey.
* Do you celebrate Thanksgiving? -- a Yes/No reponse to the question.
* How would you describe where you live? -- responses are Suburban, Urban, and Rural.
* Age -- resposes are one of several categories, such as 18-29, and 30-44.
* How much total combined money did all members of your HOUSEHOLD earn last year? -- one of several categories, such as \$75,000 to \$99,999



## 1. Reading csv 

In [1]:
import pandas as pd
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
print(data.head(5))

   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   
2    4337935621                            Yes   
3    4337933040                            Yes   
4    4337931983                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             
2                                             Turkey             
3                                             Turkey             
4                                           Tofurkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                                      
2                            

In [2]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

## 2. Filtering out rows from a dataframe

In [3]:
celeb_column = data["Do you celebrate Thanksgiving?"]
print(celeb_column.value_counts())
yes_data = data[celeb_column == "Yes"]
print(yes_data["Do you celebrate Thanksgiving?"].value_counts())

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64
Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64


## 3. Using Value_counts to explore main dishes

In [4]:
yes_data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [5]:
yes_data["Do you typically have gravy?"][yes_data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

## 4. Figuring out what pies people eat

In [14]:
apple_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
pumpkin_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
pecan_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])
ate_pies = (apple_isnull & pumpkin_isnull & pecan_isnull)
ate_pies.value_counts()


False    876
True     182
dtype: int64

## 5. Converting age to numeric

In [13]:
def convert_to_int(some):
    if pd.isnull(some) == True:
        return None
    split_some = some.split(" ")[0]
    first_item = split_some.replace("+","")
    return int(first_item)

data["int_age"] = data["Age"].apply(convert_to_int)
data["int_age"].describe()    



count    1025.000000
mean       39.383415
std        15.398493
min        18.000000
25%              NaN
50%              NaN
75%              NaN
max        60.000000
Name: int_age, dtype: float64

### Выводы:

* Данные по возрасту очень приблизительные и перекошены вниз по возрасту, т.к. мы взяли только нижнюю границу возраста из каждой группы. 

* Однако, на данном этапе можно сказать, что данные по группам распределены достаточно равномерно.

## 6. Converting income to numeric

In [8]:
def how_much_is_the_fish(xxx):
    if pd.isnull(xxx) == True:
        return None
    split_string = xxx.split(" ")[0]
    if split_string == "Prefer":
        return None
    split_string = split_string.replace("$","")
    split_string = split_string.replace(",","")
    return int(split_string)

data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(how_much_is_the_fish, convert_dtype = "int64")
data["int_income"].describe()




count       889.000000
mean      74077.615298
std       59360.742902
min           0.000000
25%                NaN
50%                NaN
75%                NaN
max      200000.000000
Name: int_income, dtype: float64

### Выводы:

* Данные по возрасту очень приблизительные и перекошены вниз по возрасту, т.к. мы взяли только нижнюю границу возраста из каждой группы. 

* Среднее доход выглядит довольно большим, однако следует обратить внимание, что и стандартное отклонение высоко.

## 7. Correlating travel distance and income

In [9]:
data[data["int_income"] < 150000]["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [10]:
data[data["int_income"]>150000]["How far will you travel for Thanksgiving?"].value_counts(0)

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

### Выводы:

* По результатам анализа мы видим, что у себя дома День Благодарения чаще отмечают люди с более высоким уровнем дохода. 

* Это может объясняться тем, что студенты, которые имеют низкий уровень дохода, чаще всего отправляются отмечать этот праздник к родителям, которые, в свою очередь, имеют более высокий доход.

## 8. Linking friendship and age

In [11]:
data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns='Have you ever attended a "Friendsgiving?"', values="int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [17]:
data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns='Have you ever attended a "Friendsgiving?"',
values="int_income")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


### Выводы:

* По результатам анализа мы видим, что с друзьями День Благодарения чаще отмечают люди более молодой возрастной категории 
