PREDICTION OF SOCIAL DISTANCING FRIENDLY RESTAURANTS IN NEW YORK CITY

The below project is to extract some of the most social distancing friendly restaurants in new york city. The criteria for segregation such as Outdoor Seating, Parking and Reservations are entirely based off of CDC's considerations for restaurants and bars. This system purely works on data segregation, listing and visualisation

DATA COLLECTION

Importing all necessary packages.
We use pandas for loading and cleaning
data.
We use json to extract data from json files on different websites .
We use requests to scrape datasets from various sources to contribute to better segregation of data.
We use folium to plot the locations of datapoints at the end.

In [70]:
import numpy as np
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
import sklearn.cluster as Kmeans
import folium
import lxml.html as lh


We extract a newyork json file with the list of all the boroughs and the neighbourhoods in them as well as their latitude and longitude. This json file is converted to a dataframe with the name of the boroughs, neighbourhoods, Latitude and Longitude

In [71]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
newyork_data

{'bbox': [-74.2492599487305,
  40.5033187866211,
  -73.7061614990234,
  40.9105606079102],
 'crs': {'properties': {'name': 'urn:ogc:def:crs:EPSG::4326'}, 'type': 'name'},
 'features': [{'geometry': {'coordinates': [-73.84720052054902,
     40.89470517661],
    'type': 'Point'},
   'geometry_name': 'geom',
   'id': 'nyu_2451_34572.1',
   'properties': {'annoangle': 0.0,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661],
    'borough': 'Bronx',
    'name': 'Wakefield',
    'stacked': 1},
   'type': 'Feature'},
  {'geometry': {'coordinates': [-73.82993910812398, 40.87429419303012],
    'type': 'Point'},
   'geometry_name': 'geom',
   'id': 'nyu_2451_34572.2',
   'properties': {'annoangle': 0.0,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.874294193

In [72]:
neighborhood_data=newyork_data["features"]
column_name = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_name)
for data in neighborhood_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


We extract the current number of active covid cases per borough. This is done to ensure that if the the number of cases is sufficiently high to ensure further spread, it pushes the user to opt for delivery options and shows the delivery provider 

In [73]:
url='https://raw.githubusercontent.com/nychealth/coronavirus-data/master/boro.csv'
active_covid_cases = pd.read_csv(url,sep=",") 
active_covid_cases.drop(columns="COVID_CASE_RATE",inplace=True)
active_covid_cases

Unnamed: 0,BOROUGH_GROUP,COVID_CASE_COUNT
0,The Bronx,43252
1,Brooklyn,51931
2,Manhattan,23620
3,Queens,58574
4,Staten Island,12937
5,Citywide,190408


The area factor is included for normalization. The need for normalization owes to the fact that if the area is low but theres a low number of cases that could still imply the spread due to clustered spaces

In [74]:
url_area='https://raw.githubusercontent.com/aanchal-n/Coursera_Capstone/master/data/new_york_borough_area%20-%20Sheet1.csv'
borough_area = pd.read_csv(url_area,sep=",") 
borough_area

Unnamed: 0,Borough,Area
0,Manhattan,59.1
1,Staten Island,152.0
2,Queens,280.0
3,The Bronx,110.0
4,Brooklyn,180.0


USING FOURSQUARE API TO EXTRACT DETAILS OF A VENUE

In [75]:
CLIENT_ID = 'FA3TOUPAVF3KQ34PCWYHT3CP0HFHQC0VUXKVSZ2XP1JQDPUQ' 
CLIENT_SECRET = '5B1AFQPGBFKVEFAWX1SRK4VEMYABJ5F1NWN1IU5KOFDUC3TM' 
VERSION = '20180605' 

We prompt the user for an input to locate the neighbourhood. We use the inputted neighbourhood to extract the latitude and longitude of the neighbourhood as well as the borough

In [76]:
cur_neighbourhood=input("Enter the neighbourhood you're in:")
cur_lat=neighborhoods["Latitude"].loc[neighborhoods["Neighborhood"].str.lower()== cur_neighbourhood.lower()]
cur_lat=list(cur_lat)[0]
cur_long=neighborhoods["Longitude"].loc[neighborhoods["Neighborhood"].str.lower()== cur_neighbourhood.lower()]
cur_long=list(cur_long)[0]
borough_in=list(neighborhoods["Borough"].loc[neighborhoods["Neighborhood"].str.lower()== cur_neighbourhood.lower()])[0]
print("The latitude and longitudes of the neighborhood are:",cur_lat,cur_long)
print("The borough is:",borough_in)

Enter the neighbourhood you're in:Soho
The latitude and longitudes of the neighborhood are: 40.72218384131794 -74.00065666959759
The borough is: Manhattan


We make a foursquare API call to extract only the restaurants in the vicinity of the neighbourhood and not just random locations. The reason the limit is set at 25 is due to the fact that the details endpoint is a premium call and we can only make 50 calls per day 

In [77]:
limit=25
radius=500
restid="4d4b7105d754a06374d81259"
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&intent=browse&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    cur_lat, 
    cur_long, 
    radius, 
    limit,
    restid)
url

'https://api.foursquare.com/v2/venues/explore?client_id=FA3TOUPAVF3KQ34PCWYHT3CP0HFHQC0VUXKVSZ2XP1JQDPUQ&client_secret=5B1AFQPGBFKVEFAWX1SRK4VEMYABJ5F1NWN1IU5KOFDUC3TM&intent=browse&v=20180605&ll=40.72218384131794,-74.00065666959759&radius=500&limit=25&categoryId=4d4b7105d754a06374d81259'

We extract our results in the form of a json. The meta code being 200 implies that the data was successfully extracted. The json contains details of the top 15 restaurants in the vicinity. It contains details such as their name, address, contact no, delivery options if any, reasons for visiting and category of the restaurant. We are only concerned with the id, name, category, latitude and longitude, Address, and delivery options

In [78]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f130ea1a764ea022499bc3c'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-59dfe90ed69ed038f9316118-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vegetarian_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1d3941735',
         'name': 'Vegetarian / Vegan Restaurant',
         'pluralName': 'Vegetarian / Vegan Restaurants',
         'primary': True,
         'shortName': 'Vegetarian / Vegan'}],
       'delivery': {'id': '1391556',
        'provider': {'icon': {'name': '/delivery_provider_seamless_20180129.png',
          'prefix': 'https://fastly.4sqi.net/img/general/cap/',
          'sizes': [40, 50]},
         'name': 'seamless'},
        'url': 'https://www.seamless.com/menu/le-botaniste-127-grand-st-new-york/1

We extract the necessary details such as id, name, address,coordinates, category and delivery provider. We append this to a dictionary of lists.

In [79]:
columns_rest=["id","name","category","address","Latitude","Longitude","delivery","Parking","Seating","reservations","Menu","Rating","Phone number","Price"]
restaurants_df=pd.DataFrame(columns=columns_rest)
response=results["response"]
dicttest=response["groups"][0]
itemslist=dicttest["items"]
dict_details={"id":[],"name":[],"category":[],"address":[],"Latitude":[],"Longitude":[],"delivery":[],"Parking":[],"Seating":[],"reservations":[],"Menu":[],"Rating":[],"Phone number":[],"Price":[]}
for entry in itemslist:
  venue_dict=entry["venue"]
  dict_details["name"].append(venue_dict["name"])
  dict_details["id"].append(venue_dict["id"])
  if "address" in venue_dict["location"].keys():
    dict_details["address"].append(venue_dict["location"]["address"])
  else:
    dict_details["address"].append(cur_neighbourhood+",New York")
  dict_details["category"].append(venue_dict["categories"][0]["name"])
  if "delivery" in venue_dict.keys():
    dict_details["delivery"].append(venue_dict["delivery"]["provider"]["name"])
  else:
    dict_details["delivery"].append("No option")
  if "location" in venue_dict.keys():
    if "lat" in venue_dict["location"].keys():
      dict_details["Latitude"].append(venue_dict["location"]["lat"])
    else:
      dict_details["Latitude"].append(cur_lat)
    if "lng" in venue_dict["location"].keys():
      dict_details["Longitude"].append(venue_dict["location"]["lng"])
    else:
      dict_details["Longitude"].append(venue_dict["location"]["lng"])
  

We extract all the id's of the collected restaurants and use it to make calls so as to extract all the details of the particular restaurant. Here we extract their contact details, the price range,parking option, seating option and their menu options

In [80]:
restaurant_id=dict_details["id"]
c=0
incomplete=[]
for i in restaurant_id:
  url_desc="https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}".format(
    i,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
  )
  desc_results = requests.get(url_desc).json()
  responses=desc_results["response"]["venue"]
  dict_details["Rating"].append(responses["rating"])
  if "formattedPhone" in responses["contact"].keys():
    dict_details["Phone number"].append(responses["contact"]["formattedPhone"])
  elif "phone" in responses["contact"].keys():
    dict_details["Phone number"].append(responses["contact"]["phone"])
  else:
    dict_details["Phone number"].append("No contact")
  grp=responses["attributes"]["groups"]
  price_value=""
  Serves="Same as category"
  seating="indoor"
  Parking="No"
  reservations="No"
  for entry in grp:
    if entry["type"]=="price":
      price_value=entry["items"][0]["displayValue"]
    elif entry["type"]=="serves":
      Serves=entry["summary"]
    elif entry["type"]==seating:
      if entry["items"][0]["displayValue"]=="Yes":
        seating="Outdoor"
      else:
        seating="Indoor"
    elif entry["type"]=="parking":
      Parking=entry["items"][0]["displayValue"]
    elif entry["type"]=="reservations":
      reservations=entry["items"][0]["displayValue"]
  dict_details["Price"].append(price_value)
  dict_details["Menu"].append(Serves)
  dict_details["Seating"].append(seating)
  dict_details["Parking"].append(Parking)
  dict_details["reservations"].append(reservations)


The social distancing rating is calculated based off of the CDC article about restaurants. If there are delivery options, the rating gets added 1. If there is street parking, the rating getd added 1. If the are reservation option, It's a plus one. If there are outdoor seating options it's a plus one. The rating is currently out of 4

In [82]:
dict_details["Social distancing rating"]=[]
for i in range(0,25):
  cur_del=dict_details["delivery"][i]
  cur_parking=dict_details["Parking"][i]
  cur_seating=dict_details["Seating"][i]
  cur_res=dict_details["reservations"][i]
  cur_rating=0
  if cur_del != "No option":
    cur_rating+=1
  if cur_parking=="Street":
    cur_rating+=1
  if cur_res == "Yes":
    cur_rating+=1
  if cur_seating != "indoor":
    cur_rating+=1
  dict_details["Social distancing rating"].append(cur_rating)

In [81]:
dict_details

{'Latitude': [40.720544695755194,
  40.72141826939049,
  40.72114442373172,
  40.723565299001635,
  40.72199,
  40.721637542248196,
  40.723515651165286,
  40.72160778167724,
  40.722022,
  40.72159769818963,
  40.72342550635,
  40.72160721760932,
  40.721989915270875,
  40.72253082306549,
  40.7211844,
  40.7248,
  40.71911370398213,
  40.72456567801656,
  40.72192791798716,
  40.72277,
  40.72021170000001,
  40.724703,
  40.721251,
  40.72468796609349,
  40.721936090419106],
 'Longitude': [-74.00013783329237,
  -74.001343345271,
  -73.99796266127038,
  -74.00277252633407,
  -73.99779,
  -73.99747002195814,
  -74.00344360363695,
  -73.99724900722504,
  -73.997528,
  -73.99747136388419,
  -74.00327713154016,
  -74.0012348067045,
  -73.99734249611029,
  -73.99704794955281,
  -73.9971105,
  -74.00222,
  -74.00020174355126,
  -74.00287302105048,
  -73.99651188343245,
  -73.9972,
  -74.00213080000002,
  -73.9983278,
  -73.997292,
  -73.99867503501252,
  -73.99627935019758],
 'Menu': ['Dinn

DATA DISPLAY AND VISUALISATION

We convert the aforementioned dictionary into a dataframe and use this to extract the most safe restaurants

In [97]:
restaurant_data=pd.DataFrame.from_dict(dict_details,orient="index").transpose()
restaurant_data=restaurant_data.head(25)
restaurant_data

Unnamed: 0,id,name,category,address,Latitude,Longitude,delivery,Parking,Seating,reservations,Menu,Rating,Phone number,Price,Social distancing rating
0,59dfe90ed69ed038f9316118,Le Botaniste,Vegetarian / Vegan Restaurant,127 Grand St,40.7205,-74.0001,seamless,No,indoor,No,"Dinner, Lunch & more",8.8,No contact,$$,1
1,4a05ea5cf964a5209b721fe3,Antique Garage,Mediterranean Restaurant,41 Mercer St,40.7214,-74.0013,No option,Yes,indoor,No,"Happy Hour, Brunch & more",8.5,(212) 219-1019,$$$,0
2,45697387f964a520e53d1fe3,Despaña,Spanish Restaurant,408 Broome St,40.7211,-73.998,No option,No,indoor,No,"Dinner, Lunch & more",8.9,(212) 219-5050,$$,0
3,40f5c900f964a520a30a1fe3,Cipriani Downtown,Italian Restaurant,376 W Broadway,40.7236,-74.0028,seamless,No,indoor,No,"Happy Hour, Dinner & more",8.6,(212) 343-0999,$$$,1
4,4c7d4f1b8da18cfa1afc9ece,Osteria Morini,Italian Restaurant,218 Lafayette St,40.722,-73.9978,No option,No,indoor,No,"Happy Hour, Brunch & more",8.7,(212) 965-8777,$$$,0
5,55ea9f4d498ed46db0383483,Champion Pizza,Pizza Place,17 Cleveland Pl,40.7216,-73.9975,seamless,No,indoor,No,Lunch & Dinner,9.0,No contact,$,1
6,53ab0af5498e13bffddb8d96,Pi Greek Bakerie,Bakery,512 Broome St,40.7235,-74.0034,seamless,Private Lot,indoor,Yes,"Dessert, Lunch & more",8.9,(212) 226-2701,$$,2
7,5b9ef2d9f8cbd4002c8c7c1e,19 Cleveland,Mediterranean Restaurant,19 Cleveland Pl,40.7216,-73.9972,seamless,No,indoor,No,"Dinner, Lunch & more",9.1,(646) 823-9227,$$,1
8,4f0f47650cd695a0e54cb438,Jack's Wife Freda,Mediterranean Restaurant,224 Lafayette St,40.722,-73.9975,No option,Street,indoor,No,"Dessert, Lunch & more",8.6,(212) 510-8550,$$,1
9,431e2d80f964a52079271fe3,La Esquina,Mexican Restaurant,114 Kenmare St,40.7216,-73.9975,No option,Street,indoor,No,"Dinner, Lunch & more",8.7,(646) 613-7100,$$$,1


We extract the number of active cases and area of the borough as a benchmarking option. If the value of cases/area is high enough to ensure further spread, it displays a menu of all delivery options and prompts the user to pick one of them. If it's safe enough to eat outside, It lists those restaurants which abide by the considerations for restaurants and bars as mentioned by CDC

In [84]:
borough_cases=list(active_covid_cases["COVID_CASE_COUNT"].loc[active_covid_cases["BOROUGH_GROUP"].str.lower()==borough_in.lower()])[0]
borough_area_cur=list(borough_area["Area"].loc[borough_area["Borough"].str.lower()==borough_in.lower()])[0]
cases_by_area=borough_cases/borough_area_cur
cases_by_area

399.6615905245347

In [98]:
if cases_by_area<10:
  print("Kindly opt for delivery options")
  restaurant_data=restaurant_data.loc[restaurant_data["delivery"]!="No option"]
else:
  restaurant_data=restaurant_data[(restaurant_data["Parking"]=="Street") | (restaurant_data["Seating"]!="indoor") | (restaurant_data["reservations"]=="Yes")]


We display the details of all those restaurants which have passed through the filter that the user can pick from and sort on basis of the social distancing rating

In [99]:
restaurant_data=restaurant_data.sort_values('Social distancing rating',ascending=False)
restaurant_data=restaurant_data.reset_index(drop=True)
restaurant_data

Unnamed: 0,id,name,category,address,Latitude,Longitude,delivery,Parking,Seating,reservations,Menu,Rating,Phone number,Price,Social distancing rating
0,53ab0af5498e13bffddb8d96,Pi Greek Bakerie,Bakery,512 Broome St,40.7235,-74.0034,seamless,Private Lot,indoor,Yes,"Dessert, Lunch & more",8.9,(212) 226-2701,$$,2
1,46ff98a7f964a520234b1fe3,Lure Fishbar,Seafood Restaurant,142 Mercer St,40.7247,-73.9983,seamless,Street,indoor,No,"Happy Hour, Brunch & more",8.8,(212) 431-7676,$$$,2
2,4f0f47650cd695a0e54cb438,Jack's Wife Freda,Mediterranean Restaurant,224 Lafayette St,40.722,-73.9975,No option,Street,indoor,No,"Dessert, Lunch & more",8.6,(212) 510-8550,$$,1
3,431e2d80f964a52079271fe3,La Esquina,Mexican Restaurant,114 Kenmare St,40.7216,-73.9975,No option,Street,indoor,No,"Dinner, Lunch & more",8.7,(646) 613-7100,$$$,1
4,51de06aa498e998d374ab4da,Hirohisa,Japanese Restaurant,73 Thompson St,40.7246,-74.0029,No option,No,indoor,Yes,"Dinner, Lunch & more",8.8,(212) 925-1613,$$$,1
5,5a382327dee7707dd4842ab8,La Mercerie,French Restaurant,53 Howard St,40.7202,-74.0021,No option,No,indoor,Yes,"Dinner, Lunch & more",8.0,(212) 852-9097,$$$,1
6,4f3046da7beb0cfa14dcac59,Taïm Falafel and Smoothie Bar,Falafel Restaurant,45 Spring St,40.7219,-73.9963,No option,Street,indoor,No,Lunch & Dinner,9.1,(212) 219-0600,$,1


In [95]:
first_best=restaurant_data.iloc[0]
if len(restaurant_data["name"])>1:
  second_best=restaurant_data.iloc[1]
  if len(restaurant_data["name"])>2:
    third_best=restaurant_data.iloc[2]
    if len(restaurant_data["name"])>3:
      restaurant_data_backup=restaurant_data[3:]

We plot a graph of all those locations which have passed through the filter on a map generated by folium so as to ensure the user can triangulate the best on according to them. The red circle shows the generic neighborhood position. The dark green circle is the best option. The light green circle is second best . The orange circle is the third best option. The pink circle denotes the rest of left over options

In [101]:
venues_map = folium.Map(location=[cur_lat, cur_long], zoom_start=16) 

folium.CircleMarker([cur_lat, cur_long],radius=10,color='red',fill = True,fill_color = 'red',fill_opacity = 0.6).add_to(venues_map)

folium.CircleMarker([first_best["Latitude"], first_best["Longitude"]],radius=10,color='green',popup=first_best["name"],fill = True,fill_color = 'green',fill_opacity = 0.6).add_to(venues_map)

folium.CircleMarker([second_best["Latitude"],second_best["Longitude"]],radius=10,color='lightgreen',popup=second_best["name"],fill = True,fill_color = 'lightgreen',fill_opacity = 0.6).add_to(venues_map)

folium.CircleMarker([third_best["Latitude"], third_best["Longitude"]],radius=10,color='orange',popup=third_best["name"],fill = True,fill_color = 'orange',fill_opacity = 0.6).add_to(venues_map)

for lat, lng, name in zip(restaurant_data_backup.Latitude, restaurant_data_backup.Longitude, restaurant_data_backup.name,):
    folium.CircleMarker([lat, lng],radius=7,color='pink',popup=name,fill = True,fill_color='pink',fill_opacity=0.6).add_to(venues_map)

venues_map