# Recategorize POIs

#### Primary Author
Chris Carey

#### Description:
This notebook adjusts categories of points of interest (POIs) for experiments, including pulling out fast food restaurants (FFRs) and delis and convenience stores (DCS) for nutrition-related categorization.

#### Limitations:
- Not all delis have "Deli" in their name
- Not all points of interest (POIs) with "Deli" in their name are delis
- Not all fast food restaurants (FFRs) are categorized as such
[//]: # (EOL)
 
#### Inputs:
```
exports/poi.csv
```
 
#### Outputs:
```
exports/poi_health_recategorized.csv
```

In [1]:
import geopandas as gpd
import math
import matplotlib.pyplot as plt
import pandas as pd
from tqdm import tqdm

import warnings
warnings.filterwarnings('ignore')

In [2]:
def peek(df):
    display(df.iloc[0:3, :])
    print(len(df))

In [3]:
poi_df = pd.read_csv('./exports/poi.csv')
peek(poi_df)

Unnamed: 0,placekey,cbg,naics_code,category,sub_category,location_name,area_square_feet,latitude,longitude
0,222-222@627-s94-nwk,360470395002,445210,Supermarkets and Specialty Food Stores,Meat Markets,Broadway Meats,3177.0,40.691436,-73.924891
1,223-222@627-rw6-zfz,360050386008,445110,Supermarkets and Specialty Food Stores,Supermarkets and Other Grocery (except Conveni...,Foodtown,3401.0,40.87689,-73.847776
2,223-222@627-rwq-vcq,360050117001,445110,Supermarkets and Specialty Food Stores,Supermarkets and Other Grocery (except Conveni...,Kirsch Mushroom Company,10079.0,40.816779,-73.883401


36475


## Recategorize

In [4]:
poi_df.loc[poi_df['naics_code'].isin([445210, 445220, 445230, 445291, 445292, 445299]), 'category'] = 'Specialty Food Stores'
poi_df.loc[poi_df['naics_code'].isin([445110]), 'category'] = 'Supermarkets and Grocery Stores'
poi_df.loc[poi_df['naics_code'].isin([445120]), 'category'] = 'Convenience Stores'
poi_df.loc[poi_df['naics_code'].isin([452319, 453998]), 'category'] = 'General Merchandise Stores'

# We need to remove non-food locations, i.e. Macy's.
poi_df.loc[poi_df['naics_code'].isin([452210]), 'category'] = 'Big Box Grocers'
poi_df.loc[poi_df['naics_code'].isin([722511]), 'category'] = 'Full-Service Restaurants'
poi_df.loc[poi_df['naics_code'].isin([722513]), 'category'] = 'Limited-Service Restaurants'
poi_df.loc[poi_df['naics_code'].isin([722515, 311811]), 'category'] = 'Snacks and Bakeries'

# Why are caterers grouped with community food services?
poi_df.loc[poi_df['naics_code'].isin([624210, 722320]), 'category'] = 'Food Services'

poi_df.loc[poi_df['naics_code'].isin([446110, 446191]), 'category'] = 'Pharmacies and Drug Stores'
poi_df.loc[poi_df['naics_code'].isin([445310]), 'category'] = 'Beer, Wine, and Liquor Stores'
poi_df.loc[poi_df['naics_code'].isin([453991]), 'category'] = 'Tobacco Stores'
poi_df.loc[poi_df['naics_code'].isin([722410]), 'category'] = 'Drinking Places'

## Pull out Fast-Food Restaurants (FFRs)

In [5]:
FAST_FOOD_NAMES = set([
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4783380/
    "Burger King",
    "Chick-fil-A",
    "Dunkin'",
    "KFC",
    "McDonald's",
    "Pizza Hut",
    "Starbucks",
    "Subway",
    "Taco Bell",
    "Wendy's",
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2446463/
    "Au Bon Pain",
    "Papa John's",
    "Popeyes Louisiana Kitchen",
# https://dol.ny.gov/system/files/documents/2021/07/p716.pdf
    "Ben & Jerry's",
    "Chipotle Mexican Grill",
    "Golden Krust Caribbean Bakery and Grill",
    "Jamba",
    "Nathan's Famous",
    "Shake Shack",
    "Tim Hortons",
    "Uno Chicago Grill",
    "White Castle",
# https://s27147.pcdn.co/wp-content/uploads/NELP-Fact-Sheet-Fast-Food-Employment-New-York.pdf
    "Auntie Anne's",
    "Baskin Robbins",
    "Carvel",
    "Domino's Pizza",
    "Little Caesars Pizza",
    "Panera Bread",
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4967005/
    "Arby's",
])

In [6]:
condition_ffr = poi_df['location_name'].isin(FAST_FOOD_NAMES)
poi_df.loc[condition_ffr, 'category'] = 'Fast Food Restaurants'
poi_df.loc[condition_ffr, 'sub_category'] = 'Fast Food Restaurants'
peek(poi_df.loc[condition_ffr, :])

Unnamed: 0,placekey,cbg,naics_code,category,sub_category,location_name,area_square_feet,latitude,longitude
4787,225-222@627-s4m-75z,360610205003,722515,Fast Food Restaurants,Fast Food Restaurants,Starbucks,615.0,40.807108,-73.964925
4819,22p-222@627-s87-7qz,360810010001,722513,Fast Food Restaurants,Fast Food Restaurants,Subway,1197.0,40.692066,-73.861595
4820,22p-222@627-s9q-q2k,360810444001,722515,Fast Food Restaurants,Fast Food Restaurants,Baskin Robbins,1102.0,40.706206,-73.792172


2361


## Pull out Delis and combine with Convenience Stores

In [7]:
# Consider anything with "Deli" as a deli.
condition_dcs = (
    (poi_df['location_name'].str.match(r'.*(\bDeli\b).*')) |
    (poi_df['sub_category'] == 'Convenience Stores')
)
poi_df.loc[condition_dcs, 'category'] = 'Delis and Convenience Stores'
poi_df.loc[condition_dcs, 'sub_category'] = 'Delis and Convenience Stores'
peek(poi_df.loc[condition_dcs, :])

Unnamed: 0,placekey,cbg,naics_code,category,sub_category,location_name,area_square_feet,latitude,longitude
15,22p-222@627-wgb-2x5,360470822001,445110,Delis and Convenience Stores,Delis and Convenience Stores,The Place Deli & Grocery,1823.0,40.655977,-73.953288
25,224-222@627-s7p-6hq,360810595004,445110,Delis and Convenience Stores,Delis and Convenience Stores,Ibrahim Deli,1479.0,40.714929,-73.911052
37,238-222@627-rzx-f75,360811129003,445110,Delis and Convenience Stores,Delis and Convenience Stores,Sky Deli,3908.0,40.762919,-73.77579


1886


In [8]:
poi_df.to_csv('./exports/poi_health_recategorized.csv', index=None)