<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Fast Food Chains and Price Discrimination

_Authors: Kiefer Katovich (SF), Mario Carrillo (SF)_

---

This group lab uses a 1994 data set of detailed prices on items sold at more than 400 Burger King, Wendy's, KFC, and [Roy Rogers](https://en.wikipedia.org/wiki/Roy_Rogers_Restaurants) restaurants in New Jersey and Pennsylvania.

The data set is a restricted version of the data set used in this publication:

> [K. Graddy (1997), "Do Fast-Food Chains Price Discriminate on the Race and Income Characteristics of an Area?" Journal of Business and Economic Statistics 15, 391-401](http://people.brandeis.edu/~kgraddy/published%20papers/GraddyK_jbes1997.pdf).

**The goal of this exercise is to evaluate whether or not fast food restaurants are using discriminatory pricing.** This is a fairly open-ended prompt. It's up to you to determine how to quantify pricing discrimination using the following groups of variables:
- The price of fast food items, which can be a metric of discriminatory practices.
- The proportion of African Americans residents, low-income residents, and residents without a car.

---

### In Groups, You Should:

1) **Load and examine the data.**

2) **Perform any necessary data cleaning.**

3) **Conduct an exploratory data analysis relevant to the goals of the project.** What variables are you interested in for your target(s) and predictors? What types of relationships do you see in the data that will inform your analysis?

4) **Formulate and formally define your hypotheses.** Based on the prompt and your EDA, come up with a plan for testing each one.

5) **Construct regression models to test each hypothesis.** What are your findings? Do they support the hypothesis? What are the limitations and assumptions of your approach? 

6) **[Bonus] Cross-validate the results of your regression.** If the results support your hypotheses, do they hold up during cross-validation or a train/test split?

7) **Prepare a brief (10-minute) presentation on the findings.** Each group's presentation should include your questions, models, and findings.
    - Be concise! Only include relevant information in your presentation.
    - Visuals are nice, but don't overdo it.
    - Don't just talk about your model's significance or metrics. Interpret the coefficients. What are the implications?
    - What future hypotheses could you test going forward?


> *Note*: If you are interested, the full data set is also available in the "datasets" folder under the name `discrim_full.csv`.

### Data Set Characteristics

    :Number of Instances: 410
    
    :Attribute Information
    
    psoda         price of medium soda
    pfries        price of small fries
    pentree       price entree (burger or chicken)
    wagest        starting wage
    nmgrs         number of managers
    nregs         number of registers
    hrsopen       hours open
    emp           number of employees
    compown       =1 if company owned
    chain         BK = 1, KFC = 2, Roy Rogers = 3, Wendy's = 4
    density       population density, town
    crmrte        crime rate, town
    state         NJ = 1, PA = 2
    prpblck       proportion black, zipcode
    prppov        proportion in poverty, zipcode
    prpncar       proportion no car, zipcode
    hseval        median housing value, zipcode
    nstores       number of stores, zipcode
    income        median family income, zipcode
    county        county label
    NJ            =1 for New Jersey
    BK            =1 if Burger King
    KFC           =1 if Kentucky Fried Chicken
    RR            =1 if Roy Rogers

In [2]:
# Data modules
import numpy as np
import scipy.stats as stats
import pandas as pd

# Plotting modules
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

# Stats/regressions packages
from sklearn import linear_model
from sklearn.metrics import r2_score

# Make sure your charts appear in the notebook
%matplotlib inline
%config InlineBackend.figure_format ='retina'

In [3]:
food = pd.read_csv('./datasets/discrim.csv')