## Who shops Black Friday sales on Thanksgiving Day?

### 1. Introduction

In the United States, Black Friday is the Friday after Thanksgiving, or the last Friday in November. It's a shopping extravaganza in which stores typically open their doors extremely early and offer one-time only deals. In years past, people have been known to form lines outside the stores starting Friday at midnight in order to score the best deals, also known as doorbuster deals, on items that are expected to quickly run out, such as TV's and video game consoles.

In recent years many stores have added a twist: they have started Black Friday on [*Thursday*](https://en.wikipedia.org/wiki/Black_Friday_%28shopping%29#Black_Thursday): Thanksgiving Day itself. It started a few years back with stores opening later in the evening, but JCPenney opened its doors at 2 P.M. on Thanksgiving Day 2017.

A few days before Thanksgiving 2015, [FiveThirtyEight](https://www.fivethirtyeight.com/) surveyed 1,058 respondents on general Thanksgiving [questions](https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015), such as whether they celebrate Thanksgiving with friends or with family or at all, what types of desserts and turkey stuffings they prefer, etc. They also asked respondents whether they would shop Black Friday sales on Thanksgiving Day itself. In this exploration and visualization, we seek to gain a little more insight on who does Black Friday on Thanksgiving Day using the data collected by FiveThirtyEight.

### 2. Reading the data

The survey data is available at FiveThirtyEight's [Github repository](https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015). We start by loading the libraries we will be using and then proceed to read the data.

Loading main libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Jupyter magic to make matplotlib lines plots appear inline
%matplotlib inline

Reading the data

In [None]:
tg_data = pd.read_csv("thanksgiving-2015-poll-data.csv", encoding = "Latin-1")
tg_data.shape

Those surveyed answered questions on a variety of [Thanksgiving topics](https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015). We are interested in finding out who shopped Thanksgiving sales across four categories: gender, age, income, and region.

### 3.0 Pivot tables

We can have pandas compute [pivot tables](https://en.wikipedia.org/wiki/Pivot_table#Mechanics) showing the percentages across categories that shopped Black Friday sales on Thanksgiving Day 2015.

#### 3.1 Gender

The survey asked respondents: *"What is your gender?"*. We will rename that column as *"Gender"*. Similarly, the column *"Will you shop any Black Friday sales on Thanksgiving Day?"*, which will be summarized across categories, will be renamed to *"Shopped_BF"*.

In [None]:
tg_data.rename(columns = {"What is your gender?": "Gender",
                          "Will you shop any Black Friday sales on Thanksgiving Day?": "Shopped_BF"}, inplace = True)

We will also change *Shopped_BF* from "Yes" and "No" to 1's and 0's for ease of computation.

In [None]:
def yes_no_to_int(yes_no_string):
    if yes_no_string == "Yes": return 1
    if yes_no_string == "No": return 0

tg_data["Shopped_BF"] = tg_data["Shopped_BF"].apply(yes_no_to_int)    

Now we are ready to create a [pivot table](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html) showing the percentages of males and females that shopped Black Friday sales on Thanksgiving Day 2015.

In [None]:
# Since Shopped_BF is either 1 or 0, the mean value is the fraction of Females or Males that shopped
# BF sales on Thanksgiving
# NaN's are automatically filtered out
gender_pvt_tbl = tg_data.pivot_table(index="Gender", values = "Shopped_BF", aggfunc = (lambda x: 100 * np.mean(x)))
gender_pvt_tbl

#### 3.2 Age

Without further ado, we can have pandas compute the percentages of people that shopped Black Friday sales on Thanksgiving Day 2015 across age groups.

In [None]:
age_pvt_tbl = tg_data.pivot_table(index="Age", values = "Shopped_BF", aggfunc = (lambda x: 100 * np.mean(x)))
age_pvt_tbl

It seems as one ages, Black Friday sales lose their allure, specially if it involves leaving the family on Thanksgiving Day to go elbow to elbow with strangers to buy material goods. On the other hand, one could see the appeal of doorbuster deals on toys for people with families.

#### 3.3 U.S. Region

Similarly for percentages of "Black Thursday" shoppers across U.S. regions.

In [None]:
region_pvt_tbl = tg_data.pivot_table(index="US Region", values = ["Shopped_BF"], aggfunc = (lambda x: 100 * np.mean(x)))
region_pvt_tbl

It seems that across the nation, a large majority of two thirds or more choose not to shop on Thanksgiving.

#### 3.4 Household Income

We will rename *How much total combined money did all members of your HOUSEHOLD earn last year?* to *Household Income*

In [None]:
tg_data.rename(columns = {"How much total combined money did all members of your HOUSEHOLD earn last year?": 
                          "Household Income"}, inplace = True)

We will also escape the dollar sign, "$", which is a special character