# Predicting Areas of Affluence using Yelp Pricing Data

#### Authors: 
- Eddie Yip
- Hadi Morrow [LinkedIn](https://www.linkedin.com/in/hadi-morrow-4b94164b/) | [GitHub](https://github.com/HadiMorrow) | [Medium](https://medium.com/@hadi.a.morrow)
- Mahdi Shadkam-Farrokhi: [GitHub](https://github.com/Shaddyjr) | [Medium](https://medium.com/@mahdis.pw) | [http://mahdis.pw](http://mahdis.pw)

## Problem Statement [Hadi]

While affluence should never be a factor when choosing to provide disaster aid or not, we must consider the following:

- On the assumption that affluence plays a role, one might relate affluency to preparedness. Those who can afford to will always look out for their families at any cost. Those who can not might not be able to prepare as well due to the fact that it is not an option. 

- On the assumption that affluence is not part of a majority class, if we should be miopic with our search efforts we might want to consider saving the masses, those living in tight coridors and those with little to no income. If effect those most suseptible to losing their lives in a major disaster. 

- Using tax data we aim to show that using YELP data dollar signs is enough to predict where we might want to quickly and accuratly align our efforts. 

New Light Technologies as our audience, we hope to show that while using expensive and hard to handle data such as tax data can be more precise, a quick and dirty aproach could be to simply sord though the dollar signs data on yelp. 


Research cases when affluent areas were better prepared for natural disasters.

We define affluence as any group making over $$100,000 a year (AGI)
If the Yelp price is a significant predictor for this     


## Executive Summary [Mahdi]

- Difficulty gathering data
- Prompt confusing regarding "affluence"
- Other projects used outside data as metric
- We pulled from API and didn't use old data, which was challenging

## Table of Contents
- [Loading Data](#Loading-Data)
- [Preliminary Exploratory Data Analysis](#Preliminary-Exploratory-Data-Analysis)
- [Cleaning the Data](#Cleaning-the-Data)
- [Feature Engineering](#Feature-Engineering)
- [Exploratory Data Analysis](#Exploratory-Data-Analysis)
- [Model Preparation](#Model-Preparation)
- [Model Selection](#Model-Selection)
- [Model Evaluation](#Model-Evaluation)
- [Conclusions and Recommendations](#Conclusions-and-Recommendations)
- [Source Documentation](#Source-Documentation)

## Loading Data
- [All]

In [12]:
import time
import pandas as pd
import numpy as np
import columnExpander
from ast import literal_eval
pd.set_option("display.max_colwidth", 200)

In [67]:
data_file_path = "./data/total_merged.csv"
nyc = pd.read_csv(data_file_path, index_col = 0)

The categories, location, and transactions column is borked and we want them.

In [5]:
df["price"].isnull().sum()

2013

Convert price to binary

Removed non-NY state

Removed area codes that only belong to NYC

Parsing categories

In [6]:
def convert_string_dict_to_string(string, key):
    return ",".join([dic[key] for dic in literal_eval(string)])

df["categories"] = df["categories"].map(lambda s: convert_string_dict_to_string(s,"alias"))

Parsing transactions

### **DataFrame by Eddie from below**

In [None]:
def convert_string_list_to_string(string):
    return ",".join(literal_eval(string))

df["transactions"] = df["transactions"].map(convert_string_list_to_string)

In [68]:
list_of_zip = [literal_eval(i)['zip_code'] for i in nyc['location']]
list_of_state = [literal_eval(i)['state'] for i in nyc['location']]
list_of_city = [literal_eval(i)['city'] for i in nyc['location']]

In [69]:
nyc['city'] = list_of_city
nyc['state'] = list_of_state
nyc['zip'] = list_of_zip
nyc["area_code"] = nyc["display_phone"].apply(lambda d: d[1:4] if type(d) == str else np.nan)

In [70]:
remove_zip_codes = ['714', '516', '888', '914', '844', '866', '218', '518', '856', '631', '833', '956',
        '219', '607', '201', '781', '732', '908', '862', '973', '254',
        '716', '323', '585', '510', '469', '785', '877', '845', '202',
        '855', '936', '800', '520', '626', '802', '773', '904',
        '203', '312', '353', '717', '302', '374', '484', '708', '954',
        '415', '727', '407', '551', '609', '205', '702', '860']

In [71]:
for i in remove_zip_codes:
    nyc = nyc[nyc['area_code'] != i]

In [72]:
nyc = nyc[nyc['area_code'].notnull()]

In [73]:
nyc['is_closed'].value_counts(dropna = False)

False    10126
True         1
Name: is_closed, dtype: int64

In [74]:
nyc['price'].value_counts(dropna = False)

$$      4781
$       3080
NaN     1576
$$$      562
$$$$     128
Name: price, dtype: int64

In [75]:
nyc.reset_index(drop=True, inplace = True)

In [76]:
nyc.drop([8712], inplace = True)

In [77]:
category = pd.DataFrame([literal_eval(i) for i in nyc['categories']], columns = ['first_category', 'second_category', 'third_category', 'fourth_category'])

In [78]:
category.fillna('None', inplace = True)

In [79]:
category['second_category'] = category['second_category'].apply(lambda x: {'alias': 'None', 'title': 'None'} if x == 'None' else x)
category['third_category'] = category['third_category'].apply(lambda x: {'alias': 'None', 'title': 'None'} if x == 'None' else x)
category['fourth_category'] = category['fourth_category'].apply(lambda x: {'alias': 'None', 'title': 'None'} if x == 'None' else x)

In [80]:
category['first_category_alias'] = [i['alias'] for i in category['first_category']]
category['second_category_alias'] = [i['alias'] for i in category['second_category']]
category['third_category_alias'] = [i['alias'] for i in category['third_category']]

In [81]:
cateories_to_merge = category.drop(columns = ['first_category', 'second_category', 'third_category', 'fourth_category'])

In [82]:
nyc = pd.merge(left = nyc, right = cateories_to_merge, left_index = True, right_index = True)

In [83]:
nyc.drop(columns = ['alias', 'coordinates', 'distance', 'image_url', 'is_closed', 'name','url'], inplace = True)

In [84]:
nyc = pd.get_dummies(nyc, columns = ['first_category_alias', 'second_category_alias','third_category_alias'])

In [85]:
nyc

Unnamed: 0,categories,display_phone,id,location,phone,price,rating,review_count,transactions,city,...,third_category_alias_venezuelan,third_category_alias_venues,third_category_alias_vietnamese,third_category_alias_waffles,third_category_alias_wedding_planning,third_category_alias_whiskeybars,third_category_alias_wholesale_stores,third_category_alias_wine_bars,third_category_alias_wraps,third_category_alias_yoga
0,"[{'alias': 'ramen', 'title': 'Ramen'}, {'alias': 'bbq', 'title': 'Barbeque'}, {'alias': 'comfortfood', 'title': 'Comfort Food'}]",(718) 513-0698,YwpP-mgXV5N35xhLibLw5g,"{'address1': '453 Rogers Ave', 'address2': None, 'address3': '', 'city': 'Brooklyn', 'zip_code': '11225', 'country': 'US', 'state': 'NY', 'display_address': ['453 Rogers Ave', 'Brooklyn, NY 11225']}",1.718513e+10,,4.5,32,[],Brooklyn,...,0,0,0,0,0,0,0,0,0,0
1,"[{'alias': 'southern', 'title': 'Southern'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'bars', 'title': 'Bars'}]",(718) 483-9111,GA5msU6NO9rQRctPfDJCBg,"{'address1': '415 Tompkins Ave', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11216', 'country': 'US', 'state': 'NY', 'display_address': ['415 Tompkins Ave', 'Brooklyn, NY 1121...",1.718484e+10,$$,4.0,1082,"['pickup', 'delivery']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0
2,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, {'alias': 'seafood', 'title': 'Seafood'}]",(347) 318-8893,1x2hn3e9sCCZca1cnRTpEg,"{'address1': '31 3rd Ave', 'address2': '', 'address3': None, 'city': 'Brooklyn', 'zip_code': '11217', 'country': 'US', 'state': 'NY', 'display_address': ['31 3rd Ave', 'Brooklyn, NY 11217']}",1.347319e+10,$$,4.0,282,['restaurant_reservation'],Brooklyn,...,0,0,0,0,0,0,0,0,0,0
3,"[{'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'bars', 'title': 'Bars'}, {'alias': 'seafood', 'title': 'Seafood'}]",(718) 230-7100,GxMhN2PEttvw7CRGIzB6Gg,"{'address1': '564 Dekalb Ave', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11205', 'country': 'US', 'state': 'NY', 'display_address': ['564 Dekalb Ave', 'Brooklyn, NY 11205']}",1.718231e+10,$$,4.5,258,"['pickup', 'restaurant_reservation']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0
4,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, {'alias': 'seafood', 'title': 'Seafood'}, {'alias': 'southern', 'title': 'Southern'}]",(347) 533-7110,swKXaURwqdSrSTcpHsxdbA,"{'address1': '794 Washington Ave', 'address2': '', 'address3': None, 'city': 'Brooklyn', 'zip_code': '11238', 'country': 'US', 'state': 'NY', 'display_address': ['794 Washington Ave', 'Brooklyn, N...",1.347534e+10,$$,4.5,118,[],Brooklyn,...,0,0,0,0,0,0,0,0,0,0
5,"[{'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'gastropubs', 'title': 'Gastropubs'}]",(718) 451-3825,CwOAKJdX8AMz5iAoA-ZEuA,"{'address1': '166 Smith St', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11201', 'country': 'US', 'state': 'NY', 'display_address': ['166 Smith St', 'Brooklyn, NY 11201']}",1.718451e+10,$$,4.0,453,"['pickup', 'delivery', 'restaurant_reservation']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0
6,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, {'alias': 'southern', 'title': 'Southern'}]",(718) 928-7555,unePwYAUWb7oC8RBt84e3A,"{'address1': '198 Lewis Ave', 'address2': None, 'address3': '', 'city': 'Brooklyn', 'zip_code': '11221', 'country': 'US', 'state': 'NY', 'display_address': ['198 Lewis Ave', 'Brooklyn, NY 11221']}",1.718929e+10,$$,4.0,90,"['pickup', 'delivery']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0
7,"[{'alias': 'caribbean', 'title': 'Caribbean'}]",(718) 484-7555,VDz9n7gwcq51wOAbTBdXxA,"{'address1': '355 Rogers Ave', 'address2': None, 'address3': '', 'city': 'Brooklyn', 'zip_code': '11225', 'country': 'US', 'state': 'NY', 'display_address': ['355 Rogers Ave', 'Brooklyn, NY 11225']}",1.718485e+10,$$,4.5,243,"['pickup', 'delivery']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0
8,"[{'alias': 'tapas', 'title': 'Tapas Bars'}]",(347) 435-0920,tXWA5kUJnZY_NbED1-ST0g,"{'address1': '656 Nostrand Ave', 'address2': None, 'address3': '', 'city': 'Brooklyn', 'zip_code': '11216', 'country': 'US', 'state': 'NY', 'display_address': ['656 Nostrand Ave', 'Brooklyn, NY 11...",1.347435e+10,,5.0,12,"['pickup', 'delivery']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0
9,"[{'alias': 'french', 'title': 'French'}, {'alias': 'cocktailbars', 'title': 'Cocktail Bars'}, {'alias': 'seafood', 'title': 'Seafood'}]",(929) 234-2941,49ST--X1jcIPzUIM1O3K6w,"{'address1': '221 Knickerbocker Ave', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11237', 'country': 'US', 'state': 'NY', 'display_address': ['221 Knickerbocker Ave', 'Brookly...",1.929234e+10,$$,4.0,628,"['pickup', 'restaurant_reservation']",Brooklyn,...,0,0,0,0,0,0,0,0,0,0


In [4]:
df.head()

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url
0,mos-original-brooklyn,"[{'alias': 'ramen', 'title': 'Ramen'}, {'alias...","{'latitude': 40.66127, 'longitude': -73.95342}",(718) 513-0698,1542.617156,YwpP-mgXV5N35xhLibLw5g,https://s3-media2.fl.yelpcdn.com/bphoto/-L9roT...,False,"{'address1': '453 Rogers Ave', 'address2': Non...",Mo's Original,17185130000.0,,4.5,32,[],https://www.yelp.com/biz/mos-original-brooklyn...
1,peaches-hothouse-brooklyn,"[{'alias': 'southern', 'title': 'Southern'}, {...","{'latitude': 40.6833699737169, 'longitude': -7...",(718) 483-9111,3471.52542,GA5msU6NO9rQRctPfDJCBg,https://s3-media1.fl.yelpcdn.com/bphoto/KEAXgZ...,False,"{'address1': '415 Tompkins Ave', 'address2': '...",Peaches HotHouse,17184840000.0,$$,4.0,1082,"['pickup', 'delivery']",https://www.yelp.com/biz/peaches-hothouse-broo...
2,claw-daddys-brooklyn,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...","{'latitude': 40.68561, 'longitude': -73.98035}",(347) 318-8893,5062.337404,1x2hn3e9sCCZca1cnRTpEg,https://s3-media3.fl.yelpcdn.com/bphoto/ABHo2x...,False,"{'address1': '31 3rd Ave', 'address2': '', 'ad...",Claw Daddy's,13473190000.0,$$,4.0,282,['restaurant_reservation'],https://www.yelp.com/biz/claw-daddys-brooklyn?...
3,barons-brooklyn,"[{'alias': 'newamerican', 'title': 'American (...","{'latitude': 40.6908116, 'longitude': -73.953915}",(718) 230-7100,4451.492133,GxMhN2PEttvw7CRGIzB6Gg,https://s3-media3.fl.yelpcdn.com/bphoto/VmnsId...,False,"{'address1': '564 Dekalb Ave', 'address2': '',...",Baron's,17182310000.0,$$,4.5,258,"['pickup', 'restaurant_reservation']",https://www.yelp.com/biz/barons-brooklyn?adjus...
4,lowerline-brooklyn-2,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...","{'latitude': 40.67421, 'longitude': -73.96324}",(347) 533-7110,3158.955607,swKXaURwqdSrSTcpHsxdbA,https://s3-media4.fl.yelpcdn.com/bphoto/oJbAhL...,False,"{'address1': '794 Washington Ave', 'address2':...",Lowerline,13475340000.0,$$,4.5,118,[],https://www.yelp.com/biz/lowerline-brooklyn-2?...


## Preliminary Exploratory Data Analysis
- [All]

## Cleaning the Data
- [Mahdi] one person for consistency

## Feature Engineering
- [All]

## Exploratory Data Analysis
- [Mahdi] killer graphs and visuals

## Model Preparation

## Model Selection
- [Hadi] Exploring models
- [Eddie] Exploring models

Maybe split on which models you 2 want to try out

## Model Evaluation
- [Mahdi] killer graphs and visuals

## Conclusions and Recommendations
- [All]

## Source Documentation