# Predicting Areas of Affluence using Yelp Pricing Data

#### Authors: 
- Eddie Yip
- Hadi Morrow [LinkedIn](https://www.linkedin.com/in/hadi-morrow-4b94164b/) | [GitHub](https://github.com/HadiMorrow) | [Medium](https://medium.com/@hadi.a.morrow)
- Mahdi Shadkam-Farrokhi: [GitHub](https://github.com/Shaddyjr) | [Medium](https://medium.com/@mahdis.pw) | [http://mahdis.pw](http://mahdis.pw)

## Problem Statement [Hadi]

While affluence should never be a factor when choosing to provide disaster aid or not, we must consider the following:

- On the assumption that affluence plays a role, one might relate affluency to preparedness. Those who can afford to will always look out for their families at any cost. Those who can not might not be able to prepare as well due to the fact that it is not an option. 

- On the assumption that affluence is not part of a majority class, if we should be miopic with our search efforts we might want to consider saving the masses, those living in tight coridors and those with little to no income. If effect those most suseptible to losing their lives in a major disaster. 

- Using tax data we aim to show that using YELP data dollar signs is enough to predict where we might want to quickly and accuratly align our efforts. 

New Light Technologies as our audience, we hope to show that while using expensive and hard to handle data such as tax data can be more precise, a quick and dirty aproach could be to simply sord though the dollar signs data on yelp. 


Research cases when affluent areas were better prepared for natural disasters.

We define affluence as any group making over $$100,000 a year (AGI)
If the Yelp price is a significant predictor for this     


## Executive Summary [Mahdi]

- Difficulty gathering data
- Prompt confusing regarding "affluence"
- Other projects used outside data as metric
- We pulled from API and didn't use old data, which was challenging

## Table of Contents
- [Loading Data](#Loading-Data)
- [Preliminary Exploratory Data Analysis](#Preliminary-Exploratory-Data-Analysis)
- [Cleaning the Data](#Cleaning-the-Data)
- [Feature Engineering](#Feature-Engineering)
- [Exploratory Data Analysis](#Exploratory-Data-Analysis)
- [Model Preparation](#Model-Preparation)
- [Model Selection](#Model-Selection)
- [Model Evaluation](#Model-Evaluation)
- [Conclusions and Recommendations](#Conclusions-and-Recommendations)
- [Source Documentation](#Source-Documentation)

## Loading Data
- [All]

In [125]:
import time
import pandas as pd
import numpy as np
import columnExpander
from ast import literal_eval

In [126]:
data_file_path = "./data/total_merged.csv"
df = pd.read_csv(data_file_path, index_col = 0)
df.reset_index(drop=True, inplace = True)

In [127]:
df.head()

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url
0,mos-original-brooklyn,"[{'alias': 'ramen', 'title': 'Ramen'}, {'alias...","{'latitude': 40.66127, 'longitude': -73.95342}",(718) 513-0698,1542.617156,YwpP-mgXV5N35xhLibLw5g,https://s3-media2.fl.yelpcdn.com/bphoto/-L9roT...,False,"{'address1': '453 Rogers Ave', 'address2': Non...",Mo's Original,17185130000.0,,4.5,32,[],https://www.yelp.com/biz/mos-original-brooklyn...
1,peaches-hothouse-brooklyn,"[{'alias': 'southern', 'title': 'Southern'}, {...","{'latitude': 40.6833699737169, 'longitude': -7...",(718) 483-9111,3471.52542,GA5msU6NO9rQRctPfDJCBg,https://s3-media1.fl.yelpcdn.com/bphoto/KEAXgZ...,False,"{'address1': '415 Tompkins Ave', 'address2': '...",Peaches HotHouse,17184840000.0,$$,4.0,1082,"['pickup', 'delivery']",https://www.yelp.com/biz/peaches-hothouse-broo...
2,claw-daddys-brooklyn,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...","{'latitude': 40.68561, 'longitude': -73.98035}",(347) 318-8893,5062.337404,1x2hn3e9sCCZca1cnRTpEg,https://s3-media3.fl.yelpcdn.com/bphoto/ABHo2x...,False,"{'address1': '31 3rd Ave', 'address2': '', 'ad...",Claw Daddy's,13473190000.0,$$,4.0,282,['restaurant_reservation'],https://www.yelp.com/biz/claw-daddys-brooklyn?...
3,barons-brooklyn,"[{'alias': 'newamerican', 'title': 'American (...","{'latitude': 40.6908116, 'longitude': -73.953915}",(718) 230-7100,4451.492133,GxMhN2PEttvw7CRGIzB6Gg,https://s3-media3.fl.yelpcdn.com/bphoto/VmnsId...,False,"{'address1': '564 Dekalb Ave', 'address2': '',...",Baron's,17182310000.0,$$,4.5,258,"['pickup', 'restaurant_reservation']",https://www.yelp.com/biz/barons-brooklyn?adjus...
4,lowerline-brooklyn-2,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...","{'latitude': 40.67421, 'longitude': -73.96324}",(347) 533-7110,3158.955607,swKXaURwqdSrSTcpHsxdbA,https://s3-media4.fl.yelpcdn.com/bphoto/oJbAhL...,False,"{'address1': '794 Washington Ave', 'address2':...",Lowerline,13475340000.0,$$,4.5,118,[],https://www.yelp.com/biz/lowerline-brooklyn-2?...


In [106]:
sum_null = df.isnull().sum()
sum_null[sum_null > 0]

display_phone     697
image_url         417
phone             697
price            2013
dtype: int64

We have many missing values in the data, however many of the columns are not meaningful for our problem and these columns can be safely dropped.

Also, `categories`, `location`, and `transactions` are compressed data columns with information we desire and will need to be unpacked.

### Dropping unneccessary columns

In [107]:
keepers = ['categories','id', 'location', 'price', 'rating', 'review_count', 'transactions']
df = df[keepers]

Convert price to binary

In [128]:
df["price"].isnull().sum()

2013

We decided to drop null prices from analysis and bring back to maybe cluster for analysis later

In [129]:
df.dropna(subset=["price"], inplace = True)

In [130]:
df.shape

(9212, 16)

There are no null values - this is a complete dataset

### Parsing location

In [131]:
def get_keys_from_sting_dict(string, keys):
    if len(string) == 0:
        return None
    dic = literal_eval(string)
    out = {}
    for key in keys:
        out[key] = dic.get(key)
    return out

In [132]:
keys = ["zip_code", "city", 'state']
zips_and_cities = df["location"].map(lambda string: get_keys_from_sting_dict(string, keys))

for key in keys:
    df[key] = [pair[key] for pair in zips_and_cities]

In [133]:
df.head()

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url,zip_code,city,state
1,peaches-hothouse-brooklyn,"[{'alias': 'southern', 'title': 'Southern'}, {...","{'latitude': 40.6833699737169, 'longitude': -7...",(718) 483-9111,3471.52542,GA5msU6NO9rQRctPfDJCBg,https://s3-media1.fl.yelpcdn.com/bphoto/KEAXgZ...,False,"{'address1': '415 Tompkins Ave', 'address2': '...",Peaches HotHouse,17184840000.0,$$,4.0,1082,"['pickup', 'delivery']",https://www.yelp.com/biz/peaches-hothouse-broo...,11216,Brooklyn,NY
2,claw-daddys-brooklyn,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...","{'latitude': 40.68561, 'longitude': -73.98035}",(347) 318-8893,5062.337404,1x2hn3e9sCCZca1cnRTpEg,https://s3-media3.fl.yelpcdn.com/bphoto/ABHo2x...,False,"{'address1': '31 3rd Ave', 'address2': '', 'ad...",Claw Daddy's,13473190000.0,$$,4.0,282,['restaurant_reservation'],https://www.yelp.com/biz/claw-daddys-brooklyn?...,11217,Brooklyn,NY
3,barons-brooklyn,"[{'alias': 'newamerican', 'title': 'American (...","{'latitude': 40.6908116, 'longitude': -73.953915}",(718) 230-7100,4451.492133,GxMhN2PEttvw7CRGIzB6Gg,https://s3-media3.fl.yelpcdn.com/bphoto/VmnsId...,False,"{'address1': '564 Dekalb Ave', 'address2': '',...",Baron's,17182310000.0,$$,4.5,258,"['pickup', 'restaurant_reservation']",https://www.yelp.com/biz/barons-brooklyn?adjus...,11205,Brooklyn,NY
4,lowerline-brooklyn-2,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...","{'latitude': 40.67421, 'longitude': -73.96324}",(347) 533-7110,3158.955607,swKXaURwqdSrSTcpHsxdbA,https://s3-media4.fl.yelpcdn.com/bphoto/oJbAhL...,False,"{'address1': '794 Washington Ave', 'address2':...",Lowerline,13475340000.0,$$,4.5,118,[],https://www.yelp.com/biz/lowerline-brooklyn-2?...,11238,Brooklyn,NY
5,uglyduckling-brooklyn,"[{'alias': 'tradamerican', 'title': 'American ...","{'latitude': 40.686023, 'longitude': -73.991302}",(718) 451-3825,5764.199408,CwOAKJdX8AMz5iAoA-ZEuA,https://s3-media3.fl.yelpcdn.com/bphoto/sCDU8u...,False,"{'address1': '166 Smith St', 'address2': '', '...",Uglyduckling,17184510000.0,$$,4.0,453,"['pickup', 'delivery', 'restaurant_reservation']",https://www.yelp.com/biz/uglyduckling-brooklyn...,11201,Brooklyn,NY


### Parsing categories

In [134]:
def convert_string_dict_to_string(string, key):
    return ",".join([dic[key] for dic in literal_eval(string)])

df["categories"] = df["categories"].map(lambda s: convert_string_dict_to_string(s,"alias"))

### Parsing transactions

In [135]:
def convert_string_list_to_string(string):
    return ",".join(literal_eval(string))

df["transactions"] = df["transactions"].map(convert_string_list_to_string)

In [136]:
df.shape

(9212, 19)

### Filtering for NYC-only

#### Removed non-NY state

In [137]:
df = df[df['state'] == "NY"]

#### Imputing missing zip codes

In [141]:
df[df['zip_code'] == ""]

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url,zip_code,city,state
6940,laceycakes-new-york,"cupcakes,bakeries,customcakes","{'latitude': 40.71455, 'longitude': -74.00714}",(917) 768-2163,15636.972881,ECY0sIYxPJio81dteqiMhg,https://s3-media2.fl.yelpcdn.com/bphoto/q_ch1rwZuRznb0W83yUckA/o.jpg,False,"{'address1': '', 'address2': None, 'address3': '', 'city': 'New York', 'zip_code': '', 'country': 'US', 'state': 'NY', 'display_address': ['New York, NY']}",Laceycakes,19177680000.0,$$,4.5,16,,https://www.yelp.com/biz/laceycakes-new-york?adjust_creative=_dOL2tQEZNA2cv5pnszrdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=_dOL2tQEZNA2cv5pnszrdw,,New York,NY
8092,petite-treat-cupcake-company-staten-island,bakeries,"{'latitude': 40.605271, 'longitude': -74.149243}",(646) 481-2187,2976.411467,6u5cnsN35mJz24HMQ9pfFw,https://s3-media2.fl.yelpcdn.com/bphoto/kIw4XTxYjWgChorNS_GhHA/o.jpg,False,"{'address1': '', 'address2': '', 'address3': '', 'city': 'Staten Island', 'zip_code': '', 'country': 'US', 'state': 'NY', 'display_address': ['Staten Island, NY']}",Petite Treat Cupcake Company,16464810000.0,$$,2.0,15,,https://www.yelp.com/biz/petite-treat-cupcake-company-staten-island?adjust_creative=_dOL2tQEZNA2cv5pnszrdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=_dOL2tQEZNA2cv5pnszrdw,,Staten Island,NY
10283,halal-cart-queens-2,foodstands,"{'latitude': 40.7488639529741, 'longitude': -73.8918964433597}",,85.745532,BilbRcNQXKmcBFvLm4gxAQ,https://s3-media2.fl.yelpcdn.com/bphoto/AG7ZvEYDINcEPLDHiAjBtg/o.jpg,False,"{'address1': '74 St 37th Ave', 'address2': '', 'address3': 'Across TD Bank', 'city': 'Queens', 'zip_code': '', 'country': 'US', 'state': 'NY', 'display_address': ['74 St 37th Ave', 'Across TD Bank...",Halal Cart,,$$,5.0,3,,https://www.yelp.com/biz/halal-cart-queens-2?adjust_creative=_dOL2tQEZNA2cv5pnszrdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=_dOL2tQEZNA2cv5pnszrdw,,Queens,NY
10401,elmhurst-green-market-queens,"farmersmarket,localflavor","{'latitude': 40.7454635299775, 'longitude': -73.8876832068039}",,438.013407,jZzbV6SRt9FXdCoziNv5xw,https://s3-media1.fl.yelpcdn.com/bphoto/yW3gl0ScLElrORXPYgbMrQ/o.jpg,False,"{'address1': '41st Ave between 80th & 81st St', 'address2': '', 'address3': '', 'city': 'Queens', 'zip_code': '', 'country': 'US', 'state': 'NY', 'display_address': ['41st Ave between 80th & 81st ...",Elmhurst Green Market,,$$,4.0,4,,https://www.yelp.com/biz/elmhurst-green-market-queens?adjust_creative=_dOL2tQEZNA2cv5pnszrdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=_dOL2tQEZNA2cv5pnszrdw,,Queens,NY


Locations were found using Google Maps and zip codes imputed manually

In [143]:
df[df["id"] == "ECY0sIYxPJio81dteqiMhg"] = "10007"
df[df["id"] == "6u5cnsN35mJz24HMQ9pfFw"] = "10314"
df[df["id"] == "BilbRcNQXKmcBFvLm4gxAQ"] = "11372"
df[df["id"] == "jZzbV6SRt9FXdCoziNv5xw"] = "11373"

### Remove by NYC zip

In [145]:
min_zip = 10001
max_zip = 11104

df['zip_code'] = df['zip_code'].astype(int)

Removed area codes that only belong to NYC

### **DataFrame by Eddie from below**

In [None]:
nyc.drop([8712], inplace = True)

## Preliminary Exploratory Data Analysis
- [All]

## Cleaning the Data
- [Mahdi] one person for consistency

## Feature Engineering
- [All]

## Exploratory Data Analysis
- [Mahdi] killer graphs and visuals

## Model Preparation

## Model Selection
- [Hadi] Exploring models
- [Eddie] Exploring models

Maybe split on which models you 2 want to try out

## Model Evaluation
- [Mahdi] killer graphs and visuals

## Conclusions and Recommendations
- [All]

## Source Documentation