## Final Assignment Capstone Project

The objective of this project is to determine how many malls are located around Park Central Hotel New York and if it were possible to find out the different kinds of shops in them for the benefit of our stakeholders and hotel customers.
[Note that I have changed the hotel location due to Foursquare not being able to get coordinates of hotels in other countries.]

This notebook is used exclusively for the report's methodology section. We will use the combination of Foursquare API and Python to help visualize the data and finally provide a report.

First, let us start with installing and/or importing the necessary libraries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata: ...working... done
Solving environment: ...working... 
  - anaconda::ca-certificates-2019.1.23-0, anaconda::openssl-1.1.1b-he774522_1
  - anaconda::openssl-1.1.1b-he774522_1, defaults::ca-certificates-2019.1.23-0
  - anaconda::ca-certificates-2019.1.23-0, defaults::openssl-1.1.1b-he774522_1
  - defaults::ca-certificates-2019.1.23-0, defaults::openssl-1.1.1b-he774522_1done

## Package Plan ##

  environment location: C:\AnacondaDir

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.3.9           |           py37_0         149 KB  conda-forge
    conda-4.6.14               |           py37_0         2.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    geopy-1.19.0               |             py_0          53 KB  conda-forge
    ------------------------------------

Great! Now that we have the necessary library installed and imported, we proceed with defining our Foursquare credentials and version. [Note that I did not show my Client ID and Client Secret for privacy reason]

In [2]:
# My Foursquare Client ID
CLIENT_ID = ''

# My Foursquare Client Secret
CLIENT_SECRET = ''

# Foursquare Version Tag
VERSION = '20180604'
LIMIT = 30

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

My credentials:
CLIENT_ID: CLIENT_SECRET: 

Now that we have specified our Foursquare Credentials and Version, we proceed in defining the location we work in. That is Park Central Hotel New York. Let us start by identifying the hotel's address as well as its latitude and longitude coordinates.

In [6]:
# Get address from Google
address = '870 7th Ave, New York, NY 10019, USA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude) # Show hotel's latitude and longitude coordinates

40.7646446 -73.9812106


The coordinates shown above will pin-point the start of our search radius for malls in the area. Let us define a query to search for malls that is within 1 KM radius from the hotel.

In [12]:
search_query = 'Mall'
radius = 1000
print(search_query + ' .... OK!')

Mall .... OK!


Now we define the corresponding URL

In [13]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=NIP0GUKEMWDL4YCALJRKCCECHHYJMCNKE4TBVODIMSZV0AMZ&client_secret=K0RRHWZ3WPASICIY2351N5Q5LKQS5DRHCMZFMSHETHJAMI5D&ll=40.7646446,-73.9812106&v=20180604&query=Mall&radius=1000&limit=30'

Send the GET Request and check the results

In [14]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cd52ec26a60712128af443b'},
 'response': {'venues': [{'id': '49ebbf61f964a52020671fe3',
    'name': 'Central Park Mall',
    'location': {'address': 'Central Park',
     'lat': 40.77244937900301,
     'lng': -73.97154808044434,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.77244937900301,
       'lng': -73.97154808044434}],
     'distance': 1190,
     'postalCode': '10028',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['Central Park',
      'New York, NY 10028',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d163941735',
      'name': 'Park',
      'pluralName': 'Parks',
      'shortName': 'Park',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1557475010',
    'hasPerk': False},
   {'id': '5716655d498e23619f894ef0',
    'name': 'T

Get relevant part of JSON and transform it into a Pandas dataframe

In [15]:
# Assign relevant part of JSON to venues
venues = results['response']['venues']

# Tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,...,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,,,,,,False,49ebbf61f964a52020671fe3,Central Park,...,,1190,"[Central Park, New York, NY 10028, United States]","[{'label': 'display', 'lat': 40.77244937900301...",40.772449,-73.971548,10028.0,NY,Central Park Mall,v-1557475010
1,"[{'id': '4bf58dd8d48988d120951735', 'name': 'F...",1104490.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/taheni-grill-at-...,False,5716655d498e23619f894ef0,1000 S 8th Ave,...,btwn 57th & 58th St,294,"[1000 S 8th Ave (btwn 57th & 58th St), New Yor...","[{'label': 'display', 'lat': 40.767081, 'lng':...",40.767081,-73.982569,10019.0,NY,TurnStyle Underground Market,v-1557475010
2,"[{'id': '4bf58dd8d48988d162941735', 'name': 'O...",,,,,,,False,4c2c914957a9c9b65294f767,Broadway,...,btw Central Park South & 53rd St.,254,"[Broadway (btw Central Park South & 53rd St.),...","[{'label': 'display', 'lat': 40.76690669800086...",40.766907,-73.981657,,NY,Broadway Pedestrian Mall - 59th St to 53rd St,v-1557475010
3,"[{'id': '5744ccdfe4b0c0459246b4dc', 'name': 'S...",,,,,,,False,516d8ed1e4b0ffff587a8081,45 Rockefeller Plz,...,,678,"[45 Rockefeller Plz, New York, NY, United States]","[{'label': 'display', 'lat': 40.759151, 'lng':...",40.759151,-73.977728,,NY,Rockefeller Plaza Mall,v-1557475010
4,"[{'id': '4bf58dd8d48988d111951735', 'name': 'J...",,,,,,,False,4c717459b3ce224b48ba76c6,76 W 47th St,...,at 6th Ave.,807,"[76 W 47th St (at 6th Ave.), New York, NY 1003...","[{'label': 'display', 'lat': 40.75740741762733...",40.757407,-73.981892,10036.0,NY,Jewel Mall,v-1557475010


The above dataframe looks very messy and doesn't make much sense to anyone. Let us clean it up a bit and specify information that is of more use and interest for our stakeholder.

In [16]:
# Keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# Clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Central Park Mall,Park,Central Park,US,New York,United States,,1190,"[Central Park, New York, NY 10028, United States]","[{'label': 'display', 'lat': 40.77244937900301...",40.772449,-73.971548,10028.0,NY,49ebbf61f964a52020671fe3
1,TurnStyle Underground Market,Food Court,1000 S 8th Ave,US,New York,United States,btwn 57th & 58th St,294,"[1000 S 8th Ave (btwn 57th & 58th St), New Yor...","[{'label': 'display', 'lat': 40.767081, 'lng':...",40.767081,-73.982569,10019.0,NY,5716655d498e23619f894ef0
2,Broadway Pedestrian Mall - 59th St to 53rd St,Other Great Outdoors,Broadway,US,New York,United States,btw Central Park South & 53rd St.,254,"[Broadway (btw Central Park South & 53rd St.),...","[{'label': 'display', 'lat': 40.76690669800086...",40.766907,-73.981657,,NY,4c2c914957a9c9b65294f767
3,Rockefeller Plaza Mall,Shopping Plaza,45 Rockefeller Plz,US,New York,United States,,678,"[45 Rockefeller Plz, New York, NY, United States]","[{'label': 'display', 'lat': 40.759151, 'lng':...",40.759151,-73.977728,,NY,516d8ed1e4b0ffff587a8081
4,Jewel Mall,Jewelry Store,76 W 47th St,US,New York,United States,at 6th Ave.,807,"[76 W 47th St (at 6th Ave.), New York, NY 1003...","[{'label': 'display', 'lat': 40.75740741762733...",40.757407,-73.981892,10036.0,NY,4c717459b3ce224b48ba76c6
5,TGI Fridays,American Restaurant,761 7th Ave,US,New York,United States,,417,"[761 7th Ave, New York, NY 10019, United States]","[{'label': 'display', 'lat': 40.7611327, 'lng'...",40.761133,-73.982947,10019.0,NY,4a4971eef964a52058ab1fe3
6,LUSH,Cosmetics Shop,"1000 South Eighth Avenue, Suite #20",US,New York,United States,,486,"[1000 South Eighth Avenue, Suite #20, New York...","[{'label': 'display', 'lat': 40.76869837651813...",40.768698,-73.98336,10019.0,NY,571a5afa498ed30e8208cfc4
7,Literary Walk,Sculpture Garden,The Mall,US,New York,United States,at East Dr,936,"[The Mall (at East Dr), New York, NY 10022, Un...","[{'label': 'display', 'lat': 40.76997625932251...",40.769976,-73.972618,10022.0,NY,4ba12d0bf964a520169f37e3


# Show Location in a Map

This section will visualize the malls that are within 1KM radius of the hotel we work in.

In [17]:
dataframe_filtered.name

0                                Central Park Mall
1                     TurnStyle Underground Market
2    Broadway Pedestrian Mall - 59th St to 53rd St
3                           Rockefeller Plaza Mall
4                                       Jewel Mall
5                                      TGI Fridays
6                                             LUSH
7                                    Literary Walk
Name: name, dtype: object

In [21]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # Generate map centred around the hotel

# Add a red circle marker to represent the Park Central Hotel New York
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# Add the malls as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# Display map
venues_map

And there we have it! A map of New York showing the hotel we work in as the red dot and the blue dots are the malls within 1KM radius of our hotel. You can zoom in to get a better and more accurate placements of the dots.

We can clearly see that there are a total of 8 malls within the 1KM radius of our hotel. The name of malls identified is presented in the dataframe above as well as the mall's category.

# Explore The Malls!

Before we begin, I'd like to leave a note that since I'm currently using a sandbox account for Foursquare - I have a very limited number of calls, photos and tips per avenue.

Let us begin by exploring the first mall nearest to our hotel - Central Park Mall

In [22]:
venue_id = '49ebbf61f964a52020671fe3' # ID of Central Park Mall from the dataframe
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url

'https://api.foursquare.com/v2/venues/49ebbf61f964a52020671fe3?client_id=NIP0GUKEMWDL4YCALJRKCCECHHYJMCNKE4TBVODIMSZV0AMZ&client_secret=K0RRHWZ3WPASICIY2351N5Q5LKQS5DRHCMZFMSHETHJAMI5D&v=20180604'

Send GET request for result

In [23]:
result = requests.get(url).json()
print(result['response']['venue'].keys())
result['response']['venue']

dict_keys(['id', 'name', 'contact', 'location', 'canonicalUrl', 'categories', 'verified', 'stats', 'url', 'likes', 'dislike', 'ok', 'rating', 'ratingColor', 'ratingSignals', 'beenHere', 'specials', 'photos', 'reasons', 'page', 'hereNow', 'createdAt', 'tips', 'shortUrl', 'timeZone', 'listed', 'popular', 'pageUpdates', 'inbox', 'parent', 'hierarchy', 'attributes', 'bestPhoto', 'colors'])


{'id': '49ebbf61f964a52020671fe3',
 'name': 'Central Park Mall',
 'contact': {'twitter': 'centralparknyc',
  'facebook': '37965424481',
  'facebookUsername': 'centralparknyc',
  'facebookName': 'Central Park'},
 'location': {'address': 'Central Park',
  'lat': 40.77244937900301,
  'lng': -73.97154808044434,
  'labeledLatLngs': [{'label': 'display',
    'lat': 40.77244937900301,
    'lng': -73.97154808044434}],
  'postalCode': '10028',
  'cc': 'US',
  'city': 'New York',
  'state': 'NY',
  'country': 'United States',
  'formattedAddress': ['Central Park', 'New York, NY 10028', 'United States']},
 'canonicalUrl': 'https://foursquare.com/v/central-park-mall/49ebbf61f964a52020671fe3',
 'categories': [{'id': '4bf58dd8d48988d163941735',
   'name': 'Park',
   'pluralName': 'Parks',
   'shortName': 'Park',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
    'suffix': '.png'},
   'primary': True},
  {'id': '4bf58dd8d48988d15a941735',
   'name': 'Garden',
   

Get the mall's overall rating

In [26]:
try:
    print(result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

9.3


This is a great rating out of 10! Let's delve in deeper and get the number of tips for the mall.

In [27]:
result['response']['venue']['tips']['count']

39

There are 39 number of tips for Central Park Mall. This is surely a popular place for shopping!

Let's get the tips:

In [33]:
## Central Park Mall Tips
limit = 15 # set limit to be greater than or equal to the total number of tips
url = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cd53a956a607121273bfbe3'},
 'response': {'tips': {'count': 39,
   'items': [{'id': '5b0a9a189ba3e5002c434257',
     'createdAt': 1527421464,
     'text': "Another highly photographed part of Central Park, with beautiful trees lining the pathway. At the Southern end you'll find the LITERARY WALK, with statues of several literary figures",
     'type': 'user',
     'canonicalUrl': 'https://foursquare.com/item/5b0a9a189ba3e5002c434257',
     'lang': 'en',
     'likes': {'count': 0, 'groups': []},
     'logView': True,
     'agreeCount': 3,
     'disagreeCount': 0,
     'lastVoteText': 'Upvoted Feb 21',
     'lastUpvoteTimestamp': 1550774769,
     'todo': {'count': 0},
     'user': {'id': '2787312',
      'firstName': 'May',
      'lastName': '♍',
      'gender': 'female',
      'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
       'suffix': '/2787312_8D9r3Ylr_VFZy1eJqtnSoh791MVQf5RV7z9JLJ0TCW7xS8xFd6S3P8xwurGvQmvbSv0K3IhQX.jpg'}},
     'auth

The result is a bit messy... Let's clean it up a bit

In [34]:
tips = results['response']['tips']['items']

tip = results['response']['tips']['items'][0]
tip.keys()

dict_keys(['id', 'createdAt', 'text', 'type', 'canonicalUrl', 'lang', 'likes', 'logView', 'agreeCount', 'disagreeCount', 'lastVoteText', 'lastUpvoteTimestamp', 'todo', 'user', 'authorInteractionType'])

In [35]:
pd.set_option('display.max_colwidth', -1)

tips_df = json_normalize(tips) # json normalize tips

# columns to keep
filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName', 'user.gender', 'user.id']
tips_filtered = tips_df.loc[:, filtered_columns]

# display tips
tips_filtered

Unnamed: 0,text,agreeCount,disagreeCount,id,user.firstName,user.lastName,user.gender,user.id
0,"Another highly photographed part of Central Park, with beautiful trees lining the pathway. At the Southern end you'll find the LITERARY WALK, with statues of several literary figures",3,0,5b0a9a189ba3e5002c434257,May,♍,female,2787312


Unfortunately, since we are using a sandbox account, we get only 1 tip per avenue. We can do this for all the malls we have identified within 1KM radius of our hotel by repeating the process for each one of them but that would mean upgrading the Foursquare account beyond developer's account.

# Explore Trending Malls

For the final part, let us explore the trending malls around Central Park Mall.

In [36]:
# Define URL
url = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)

# Send GET request and get trending venues
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cd53d399fb6b756b16f7ad8'},
 'response': {'venues': []}}

Check if any venues are trending at this time:

In [37]:
if len(results['response']['venues']) == 0:
    trending_venues_df = 'No trending venues are available at the moment!'
    
else:
    trending_venues = results['response']['venues']
    trending_venues_df = json_normalize(trending_venues)

    # Filter columns
    columns_filtered = ['name', 'categories'] + ['location.distance', 'location.city', 'location.postalCode', 'location.state', 'location.country', 'location.lat', 'location.lng']
    trending_venues_df = trending_venues_df.loc[:, columns_filtered]

    # Filter the category for each row
    trending_venues_df['categories'] = trending_venues_df.apply(get_category_type, axis=1)

In [38]:
# Display trending venues
trending_venues_df

'No trending venues are available at the moment!'

Oops! There are none available at the moment. This result occur because such information are fetched live and there are none at the moment. Since there are none, we cannot proceed to visualize the trending venues which brings us to the end of our methodology stage for the project report.