# Applied Data Science Capstone Final Project

#### Finding top rated venues around the neighborhoods of the city of São Paulo

Let's use geopy to get latitude and longitude values for São Paulo city


In [6]:
address = 'São Paulo'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of São Paulo City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of São Paulo City are -23.5506507, -46.6333824.


## Creating a map of São Paulo City:

In [7]:
#import Folium library

!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    ------------------------------------------------------------
                       

In [15]:
map_saopaulo = folium.Map(location=[latitude, longitude], zoom_start=11.5)
map_saopaulo

#### As we can see, São Paulo is a huge metropolis and has several neighborhoods a visitor can stay. Each neighborhood has its own chacarteristics and different venues. How about creating a vuene query to retrieve the top rated venues around each region? 

We must have the latitude and longitude of the address the visitor is staying. It's easy to do that using a Google search. We also need the number of venues our query will return and the radius around the neighborhood.


In [28]:
#Input data:

neighborhood = 'Higienopolis'
lat = -23.5492784
long = -46.6627057

LIMIT = 5
radius = 500

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood, 
                                                               lat, 
                                                               long))

Latitude and longitude values of Higienopolis are -23.5492784, -46.6627057.


Now that's run a query using Foursquare to find the Top 5 venues around this neighborhood.

First: define Foursquare credentials and version

In [29]:
CLIENT_ID = '0MMBNYFS4XH0KK3RPZAE41EL4AAIMFQ5XMKPRCJ5NA1PTH1T' 
CLIENT_SECRET = 'TDVQEIG0A2HFRI5ABOVYV4ULSQZNU1E0KNXTCZ5XRO4BF5GV' 
VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0MMBNYFS4XH0KK3RPZAE41EL4AAIMFQ5XMKPRCJ5NA1PTH1T
CLIENT_SECRET:TDVQEIG0A2HFRI5ABOVYV4ULSQZNU1E0KNXTCZ5XRO4BF5GV


In [30]:
# importing some libraries
import json 
import numpy as np

import requests 
from pandas.io.json import json_normalize 

#### Now, let's get the top 5 venues that are in Higienopolis within a radius of 500 meters.

First create a get request url and then send the get request examine the results

In [32]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=0MMBNYFS4XH0KK3RPZAE41EL4AAIMFQ5XMKPRCJ5NA1PTH1T&client_secret=TDVQEIG0A2HFRI5ABOVYV4ULSQZNU1E0KNXTCZ5XRO4BF5GV&ll=-23.5492784,-46.6627057&v=20180605&radius=500&limit=5'

In [33]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eded3b93907e7001b18f344'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Consolação',
  'headerFullLocation': 'Consolação, São Paulo',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 68,
  'suggestedBounds': {'ne': {'lat': -23.544778395499993,
    'lng': -46.65780603790154},
   'sw': {'lat': -23.553778404500004, 'lng': -46.667605362098456}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d3db20914aa8cfa33fcb55e',
       'name': 'Complexo Esportivo do Pacaembu',
       'location': {'address': 'Praça Charles Miller, s/nº',
        'lat': -23.55018700174617,
        'lng': -46.66489719340717,
        'labeledLatLngs': [{'label

In [34]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [35]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Complexo Esportivo do Pacaembu,Athletics & Sports,-23.550187,-46.664897
1,Black'n Load,Coffee Shop,-23.550314,-46.660705
2,Shiatsu Luiza Sato,Spa,-23.550139,-46.661124
3,Museu do Futebol,Museum,-23.547604,-46.664853
4,Vico - Gelato Artigianale,Ice Cream Shop,-23.550245,-46.660473


In [36]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

5 venues were returned by Foursquare.


In [38]:
nearby_venues.rename(columns={'name': 'Venue'}, inplace = True)
nearby_venues.rename(columns={'categories': 'Category'}, inplace = True)
nearby_venues.rename(columns={'lat': 'Latitude'}, inplace = True)
nearby_venues.rename(columns={'lng': 'Longitude'}, inplace = True)
nearby_venues

Unnamed: 0,Venue,Category,Latitude,Longitude
0,Complexo Esportivo do Pacaembu,Athletics & Sports,-23.550187,-46.664897
1,Black'n Load,Coffee Shop,-23.550314,-46.660705
2,Shiatsu Luiza Sato,Spa,-23.550139,-46.661124
3,Museu do Futebol,Museum,-23.547604,-46.664853
4,Vico - Gelato Artigianale,Ice Cream Shop,-23.550245,-46.660473


#### Now let's create a map with the neighborhood and the venues

In [44]:
# create map of Higienopolis using latitude and longitude values
map_higienopolis = folium.Map(location=[lat, long], zoom_start=14.5)

# add markers to map
for lati, lng, label in zip(nearby_venues['Latitude'], nearby_venues['Longitude'], nearby_venues['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_higienopolis)  
    
map_higienopolis

So that's it. Now a visitor can have idea about the Top 5 venues around the neighborhood he or she is staying at.