# Data:
The Foursquare data is vast and rich: it contains millions of reviews of different business types (e.g. restaurants and dry cleaners), and each business type has different set of attributes associated with them. Each review at minimum consists of review text and review star rating. Because of its size and richness, it presented an initial challenge for us to decide what data to include in our modeling and what features should be engineered. In addition, not every user has made several reviews or reviews with different ratings, so we needed to figure out what datasets to use for training vs. testing. To this end, we performed Exploratory Data Analysis(EDA) on the data, specifically to look at relevant business attributes, user attributes, and reviews. We reasoned that these are the core information needed to link users and businesses with their preferences (hence, we disregarded data such as checkin, photos, and tip).

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


##  Search for a specific venue category
> `https://api.foursquare.com/v2/venues/`**search**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`

In [3]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of New York City are 40.7308619, -73.9871558.


In [53]:
latitude = 40.7308619
longitude = -73.9871558
CLIENT_ID = 'QMAJMFICCDBVSWUHPSBBALBKYGVWJQOAEF11DA4BNOBZ23XO' # your Foursquare ID
CLIENT_SECRET = '3JN54DEWNJNVNIPYUV0K4FUSIK02OJMB0S4CFMZEYRIIFKMV' # your Foursquare Secret
VERSION = '20180604'
search_query = 'Restaurant'

In [54]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION,search_query)
url

'https://api.foursquare.com/v2/venues/search?client_id=QMAJMFICCDBVSWUHPSBBALBKYGVWJQOAEF11DA4BNOBZ23XO&client_secret=3JN54DEWNJNVNIPYUV0K4FUSIK02OJMB0S4CFMZEYRIIFKMV&ll=40.7308619,-73.9871558&v=20180604&query=Restaurant'

In [51]:
import requests

In [63]:
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.shape

(30, 25)

In [56]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]
dataframe_filtered

Unnamed: 0,name,categories,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,id
0,Waverly Restaurant,"[{'name': 'Diner', 'pluralName': 'Diners', 'ic...",385 Avenue of the Americas,US,New York,United States,at Waverly Pl.,1119,"[385 Avenue of the Americas (at Waverly Pl.), ...","[{'lat': 40.73301202815689, 'lng': -74.0001244...",40.733012,-74.000124,,10014,NY,43bfd385f964a520232d1fe3
1,Rolf's German Restaurant,"[{'name': 'German Restaurant', 'pluralName': '...",281 3rd Ave,US,New York,United States,at E 22nd St,869,"[281 3rd Ave (at E 22nd St), New York, NY 1001...","[{'lat': 40.73821027147588, 'lng': -73.9836485...",40.73821,-73.983649,,10010,NY,3fd66200f964a5207ae51ee3
2,Clinton St. Baking Co. & Restaurant,"[{'name': 'Bakery', 'pluralName': 'Bakeries', ...",4 Clinton St,US,New York,United States,at E Houston St,1108,"[4 Clinton St (at E Houston St), New York, NY ...","[{'lat': 40.72122967701571, 'lng': -73.9838138...",40.72123,-73.983814,,10002,NY,40a55d80f964a52020f31ee3
3,Frank Restaurant,"[{'name': 'Italian Restaurant', 'pluralName': ...",88 2nd Ave,US,New York,United States,at E 5th St,460,"[88 2nd Ave (at E 5th St), New York, NY 10003,...","[{'lat': 40.7269388318875, 'lng': -73.98889878...",40.726939,-73.988899,,10003,NY,3fd66200f964a5204de41ee3
4,Ukrainian East Village Restaurant,"[{'name': 'Ukrainian Restaurant', 'pluralName'...",140 2nd Ave,US,New York,United States,btwn St. Marks Pl & E 9th St,210,"[140 2nd Ave (btwn St. Marks Pl & E 9th St), N...","[{'lat': 40.72896775386977, 'lng': -73.9870739...",40.728968,-73.987074,,10003,NY,3fd66200f964a520b7ea1ee3
5,Jing Fong Restaurant 金豐大酒樓,"[{'name': 'Dim Sum Restaurant', 'pluralName': ...",20 Elizabeth St,US,New York,United States,btwn Bayard & Canal St,1872,"[20 Elizabeth St (btwn Bayard & Canal St), New...",,40.715807,-73.997049,,10013,NY,3fd66200f964a520d5e31ee3
6,The NoMad Restaurant,"[{'name': 'American Restaurant', 'pluralName':...",1170 Broadway,US,New York,United States,inside NoMad Hotel,1586,"[1170 Broadway (inside NoMad Hotel), New York,...","[{'lat': 40.74507433709667, 'lng': -73.9885612...",40.745074,-73.988561,,10001,NY,4f6e6af3e4b0463c94b07375
7,Junior's Restaurant & Bakery,"[{'name': 'American Restaurant', 'pluralName':...",1515 Broadway,US,New York,United States,at W 45th St,3081,"[1515 Broadway (at W 45th St), New York, NY 10...","[{'lat': 40.758539, 'lng': -73.986477, 'label'...",40.758539,-73.986477,Theater District,10036,NY,462a6065f964a520d9451fe3
8,Panna II Garden Indian Restaurant,"[{'name': 'Indian Restaurant', 'pluralName': '...",93 1st Ave,US,New York,United States,btwn 5th & 6th St,516,"[93 1st Ave (btwn 5th & 6th St), New York, NY ...","[{'lat': 40.726272990045985, 'lng': -73.986273...",40.726273,-73.986273,,10003,NY,4116be80f964a520f90b1fe3
9,Sidewalk Bar & Restaurant,"[{'name': 'Café', 'pluralName': 'Cafés', 'icon...",94 Avenue A,US,New York,United States,at E 6th St,664,"[94 Avenue A (at E 6th St), New York, NY 10009...","[{'lat': 40.72547161917389, 'lng': -73.9837831...",40.725472,-73.983783,,10009,NY,3fd66200f964a5201fe51ee3


In [62]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Waverly Restaurant,Diner,385 Avenue of the Americas,US,New York,United States,at Waverly Pl.,1119,"[385 Avenue of the Americas (at Waverly Pl.), ...","[{'lat': 40.73301202815689, 'lng': -74.0001244...",40.733012,-74.000124,,10014,NY,43bfd385f964a520232d1fe3
1,Rolf's German Restaurant,German Restaurant,281 3rd Ave,US,New York,United States,at E 22nd St,869,"[281 3rd Ave (at E 22nd St), New York, NY 1001...","[{'lat': 40.73821027147588, 'lng': -73.9836485...",40.73821,-73.983649,,10010,NY,3fd66200f964a5207ae51ee3
2,Clinton St. Baking Co. & Restaurant,Bakery,4 Clinton St,US,New York,United States,at E Houston St,1108,"[4 Clinton St (at E Houston St), New York, NY ...","[{'lat': 40.72122967701571, 'lng': -73.9838138...",40.72123,-73.983814,,10002,NY,40a55d80f964a52020f31ee3
3,Frank Restaurant,Italian Restaurant,88 2nd Ave,US,New York,United States,at E 5th St,460,"[88 2nd Ave (at E 5th St), New York, NY 10003,...","[{'lat': 40.7269388318875, 'lng': -73.98889878...",40.726939,-73.988899,,10003,NY,3fd66200f964a5204de41ee3
4,Ukrainian East Village Restaurant,Ukrainian Restaurant,140 2nd Ave,US,New York,United States,btwn St. Marks Pl & E 9th St,210,"[140 2nd Ave (btwn St. Marks Pl & E 9th St), N...","[{'lat': 40.72896775386977, 'lng': -73.9870739...",40.728968,-73.987074,,10003,NY,3fd66200f964a520b7ea1ee3


In [58]:
dataframe_filtered.categories

0                       Diner
1           German Restaurant
2                      Bakery
3          Italian Restaurant
4        Ukrainian Restaurant
5          Dim Sum Restaurant
6         American Restaurant
7         American Restaurant
8           Indian Restaurant
9                        Café
10         Italian Restaurant
11         Italian Restaurant
12         Dim Sum Restaurant
13      Vietnamese Restaurant
14         Italian Restaurant
15         Italian Restaurant
16    New American Restaurant
17           Tapas Restaurant
18         Miscellaneous Shop
19                 Restaurant
20                      Diner
21       Kitchen Supply Store
22                        Pub
23          Paella Restaurant
24         Italian Restaurant
25                        Pub
26         Chinese Restaurant
27         Chinese Restaurant
28                  Irish Pub
29            Soba Restaurant
Name: categories, dtype: object

In [61]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel


# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map