# Prepare Data for analysis

In this section, I will desctibe in details how to make calls to the Foursquare API for obtaining necessary information. It will show how to construct a URL to send a request to the API to search for a specific type of venues, to explore a particular venue, to explore a Foursquare user, to explore a geographical location, and to get reviews and tips for venues around a location. Also, I will use the visualization library, Folium, to visualize the results.


### Import necessary Libraries


In [1]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h470a237_1         3.1 MB  conda-forge
    certifi-2018.10.15         |        py36_1000         138 KB  conda-forge
    geopy-1.17.0               |             py_0          49 KB  conda-forge
    ca-certificates-2018.10.15 |       ha4d7672_0         135 KB  conda-forge
    conda-4.5.11               |        py36_1000         651 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.1 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0            conda-forge
    geopy:           

### Define Foursquare Credentials and Version

In [2]:
CLIENT_ID = '5ZSRVH5PGXRONTBPVIOU4SNATKX33ZXLS01IPAOE51LCD35T' # your Foursquare ID
CLIENT_SECRET = 'RSCJ5YVSEC4XH1OAVVSIS023SLPDOBAKRIBKJRGSSRSNAZLT' # your Foursquare Secret
VERSION = '20181023'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5ZSRVH5PGXRONTBPVIOU4SNATKX33ZXLS01IPAOE51LCD35T
CLIENT_SECRET:RSCJ5YVSEC4XH1OAVVSIS023SLPDOBAKRIBKJRGSSRSNAZLT


#### Let's define Manhattan, New York, as an address where we need to explore venues. Next, I'm converting address to its latitude and longitude coordinates.

In [3]:
address = 'Manhattan, New York'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)



40.7900869 -73.9598295


### Next step is to search for a specific venue category

#### Let's define a query to search for Restaurant that is within 500 metres from Manhattan. Send the GET Request and examine the results

In [4]:
search_query = 'Restaurant'
radius = 500

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bcee9981ed2194284eb9ddb'},
 'response': {'venues': [{'id': '4a897cb1f964a5201f0820e3',
    'name': '3 Guys Restaurant',
    'location': {'address': '49 E 96th St',
     'crossStreet': 'Madison Ave',
     'lat': 40.787442622504265,
     'lng': -73.95403610873488,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.787442622504265,
       'lng': -73.95403610873488}],
     'distance': 570,
     'postalCode': '10128',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['49 E 96th St (Madison Ave)',
      'New York, NY 10128',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d147941735',
      'name': 'Diner',
      'pluralName': 'Diners',
      'shortName': 'Diner',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/diner_',
       'suffix': '.png'},
      'primary': True}],
    'delivery': {'id': '278300',
     'url': 'https://www.seamles

#### Get relevant part of JSON and transform it into a *pandas* dataframe

In [5]:
venues = results['response']['venues']

dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,...,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d147941735', 'name': 'D...",278300.0,/delivery_provider_seamless_20180129.png,https://igx.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/3-guys-1381-madi...,False,4a897cb1f964a5201f0820e3,49 E 96th St,...,Madison Ave,570,"[49 E 96th St (Madison Ave), New York, NY 1012...","[{'label': 'display', 'lat': 40.78744262250426...",40.787443,-73.954036,10128,NY,3 Guys Restaurant,v-1540286872
1,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",,,,,,,False,4f32afdc19836c91c7efe4af,1410 Madison Ave,...,,551,"[1410 Madison Ave, New York, NY 10029, United ...","[{'label': 'display', 'lat': 40.788427, 'lng':...",40.788427,-73.953659,10029,NY,Hanratty's Restaurant,v-1540286872
2,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",,,,,,,False,4f32456319836c91c7c72940,1398 Madison Ave,...,,573,"[1398 Madison Ave, New York, NY 10029, United ...","[{'label': 'display', 'lat': 40.787901, 'lng':...",40.787901,-73.953672,10029,NY,Polonia Restaurant,v-1540286872


#### Define information of interest and filter Dataframe

In [6]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,3 Guys Restaurant,Diner,49 E 96th St,US,New York,United States,Madison Ave,570,"[49 E 96th St (Madison Ave), New York, NY 1012...","[{'label': 'display', 'lat': 40.78744262250426...",40.787443,-73.954036,10128,NY,4a897cb1f964a5201f0820e3
1,Hanratty's Restaurant,Food,1410 Madison Ave,US,New York,United States,,551,"[1410 Madison Ave, New York, NY 10029, United ...","[{'label': 'display', 'lat': 40.788427, 'lng':...",40.788427,-73.953659,10029,NY,4f32afdc19836c91c7efe4af
2,Polonia Restaurant,Food,1398 Madison Ave,US,New York,United States,,573,"[1398 Madison Ave, New York, NY 10029, United ...","[{'label': 'display', 'lat': 40.787901, 'lng':...",40.787901,-73.953672,10029,NY,4f32456319836c91c7c72940


#### Visualize the Restaurants that are nearby

In [7]:
dataframe_filtered.name

0        3 Guys Restaurant
1    Hanratty's Restaurant
2       Polonia Restaurant
Name: name, dtype: object

In [8]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Tour Eiffel

# add a red circle marker to represent the Tour Eiffel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Manhattan',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the ___________ as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Let's explore the closest Restaurant -- _3 Guys Restaurant_

In [9]:
venue_id = '4a897cb1f964a5201f0820e3' # ID of 3 Guys Restaurant
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url

'https://api.foursquare.com/v2/venues/4a897cb1f964a5201f0820e3?client_id=5ZSRVH5PGXRONTBPVIOU4SNATKX33ZXLS01IPAOE51LCD35T&client_secret=RSCJ5YVSEC4XH1OAVVSIS023SLPDOBAKRIBKJRGSSRSNAZLT&v=20181023'

### Let's _GET_ request for result

In [10]:
result = requests.get(url).json()
print(result['response']['venue'].keys())
result['response']['venue']

dict_keys(['id', 'name', 'contact', 'location', 'canonicalUrl', 'categories', 'verified', 'stats', 'price', 'hasMenu', 'likes', 'dislike', 'ok', 'rating', 'ratingColor', 'ratingSignals', 'delivery', 'menu', 'allowMenuUrlEdit', 'beenHere', 'specials', 'photos', 'reasons', 'hereNow', 'createdAt', 'tips', 'shortUrl', 'timeZone', 'listed', 'popular', 'pageUpdates', 'inbox', 'attributes', 'bestPhoto', 'colors'])


{'id': '4a897cb1f964a5201f0820e3',
 'name': '3 Guys Restaurant',
 'contact': {'phone': '2123483800', 'formattedPhone': '(212) 348-3800'},
 'location': {'address': '49 E 96th St',
  'crossStreet': 'Madison Ave',
  'lat': 40.787442622504265,
  'lng': -73.95403610873488,
  'labeledLatLngs': [{'label': 'display',
    'lat': 40.787442622504265,
    'lng': -73.95403610873488}],
  'postalCode': '10128',
  'cc': 'US',
  'city': 'New York',
  'state': 'NY',
  'country': 'United States',
  'formattedAddress': ['49 E 96th St (Madison Ave)',
   'New York, NY 10128',
   'United States']},
 'canonicalUrl': 'https://foursquare.com/v/3-guys-restaurant/4a897cb1f964a5201f0820e3',
 'categories': [{'id': '4bf58dd8d48988d147941735',
   'name': 'Diner',
   'pluralName': 'Diners',
   'shortName': 'Diner',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/diner_',
    'suffix': '.png'},
   'primary': True}],
 'verified': False,
 'stats': {'tipCount': 16},
 'price': {'tier': 2, 'message': 'Mod

### Get the Restaurant's overall rating

In [11]:
try:
    print(result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

5.1


That is not a very good rating. Let's check the reviews of the restaurant.

### Get the number of _tips_

In [12]:
result['response']['venue']['tips']['count']

16

#### Create URL and send _GET_ request.

In [41]:
limit = 16 # set limit to be greater than or equal to the total number of tips
url = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)
url

'https://api.foursquare.com/v2/venues/4a897cb1f964a5201f0820e3/tips?client_id=5ZSRVH5PGXRONTBPVIOU4SNATKX33ZXLS01IPAOE51LCD35T&client_secret=RSCJ5YVSEC4XH1OAVVSIS023SLPDOBAKRIBKJRGSSRSNAZLT&v=20181023&limit=16'

In [31]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bcef7b1dd579707537901ad'},
 'response': {'tips': {'count': 16,
   'items': [{'id': '57105fab498e998577de6763',
     'createdAt': 1460690859,
     'text': "I've been here literally 100 times. It took 100 tries but I finally found something I like, something that's a bit better than mediocre, which most of their food is. The pasta primavera! Yum",
     'type': 'user',
     'canonicalUrl': 'https://foursquare.com/item/57105fab498e998577de6763',
     'lang': 'en',
     'likes': {'count': 0, 'groups': []},
     'logView': True,
     'agreeCount': 0,
     'disagreeCount': 0,
     'todo': {'count': 0},
     'user': {'id': '42432565',
      'firstName': 'Lane',
      'lastName': 'Rettig',
      'gender': 'male',
      'photo': {'prefix': 'https://igx.4sqi.net/img/user/',
       'suffix': '/42432565-OYCOOBEOD2XX4UDU.jpg'}},
     'authorInteractionType': 'meh'}]}}}

#### Get tips and list of associated features

In [46]:
tips = results['response']['tips']['items']

tip = results['response']['tips']['items'][0]
tip.keys()

dict_keys(['id', 'createdAt', 'text', 'type', 'canonicalUrl', 'lang', 'likes', 'logView', 'agreeCount', 'disagreeCount', 'todo', 'user', 'authorInteractionType'])

#### Format column width and display all tips

In [47]:
pd.set_option('display.max_colwidth', -1)

tips_df = json_normalize(tips) # json normalize tips

# columns to keep
filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName', 'user.gender', 'user.id']
tips_filtered = tips_df.loc[:, filtered_columns]

# display tips
tips_filtered

Unnamed: 0,text,agreeCount,disagreeCount,id,user.firstName,user.lastName,user.gender,user.id
0,"I've been here literally 100 times. It took 100 tries but I finally found something I like, something that's a bit better than mediocre, which most of their food is. The pasta primavera! Yum",0,0,57105fab498e998577de6763,Lane,Rettig,male,42432565


Limitation: using a personal developer account, I can access only 2 of the restaurant's tips, instead of all 16 tips.