# IBM Data Science - Professional Certificate
## Capstone Projects - Battle of Neighborhoods
### Author: Vitor Villas Boas

## Introduction/Business Problem

So your vaccation is coming and you planned your trip to go São Paulo - Brazil. You're a big fan of coffee just like me! But You have never been to this city and wants to know which one is the best.

## Data

The data used to analyze this problem will be found using the FourSquare location data to have access to the location and tips of the cafeterias in the city.

In [1]:
# -*- coding: utf-8 -*-

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from datetime import datetime

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import folium # map rendering library
print('Libraries imported.')

Libraries imported.


In [2]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = datetime.today().strftime('%Y%m%d') # get today's version
ACCESS_TOKEN = '' # your access token to user.id
RADIUS = 20000
LIMIT = 100

In [3]:
address = 'Sao Paulo, SP'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

-23.5506507 -46.6333824


Studying the data

In [4]:
url_SP = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&radius={}&limit={}".format(
    CLIENT_ID,
    CLIENT_SECRET,
    ACCESS_TOKEN,
    VERSION,
    latitude, longitude,
    RADIUS,
    LIMIT)
results = requests.get(url_SP).json()

In [5]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [6]:
results['response']['groups'][0]['items'][0]

{'reasons': {'count': 0,
  'items': [{'reasonName': 'globalInteractionReason',
    'summary': 'This spot is popular',
    'type': 'general'}]},
 'referralId': 'e-0-4b17eb00f964a520a1c923e3-0',
 'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/default_',
     'suffix': '.png'},
    'id': '52e81612bcbc57f1066b7a32',
    'name': 'Cultural Center',
    'pluralName': 'Cultural Centers',
    'primary': True,
    'shortName': 'Cultural Center'}],
  'id': '4b17eb00f964a520a1c923e3',
  'location': {'address': 'R. Álvares Penteado, 112',
   'cc': 'BR',
   'city': 'São Paulo',
   'country': 'Brasil',
   'crossStreet': 'R. Quitanda',
   'distance': 365,
   'formattedAddress': ['R. Álvares Penteado, 112 (R. Quitanda)',
    'São Paulo, SP',
    '01012-000'],
   'labeledLatLngs': [{'label': 'display',
     'lat': -23.547588190396358,
     'lng': -46.6346831174672}],
   'lat': -23.547588190396358,
   'lng': -46.6346831174672,
   'postalCode': '01012-000',
 

In [7]:
venues = results['response']['groups'][0]['items']
nearby_venues = pd.json_normalize(venues) # flatten JSON
# nearby_venues.head()

# filter columns
filtered_columns = ['venue.id', 'venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print(nearby_venues.shape)
nearby_venues.head(10)

(100, 5)


Unnamed: 0,id,name,categories,lat,lng
0,4b17eb00f964a520a1c923e3,Centro Cultural Banco do Brasil (CCBB),Cultural Center,-23.547588,-46.634683
1,573dab3f498e4e0d56c67f4e,Casa de Francisca,Music Venue,-23.548733,-46.634763
2,4bdafa28c79cc928177a80e9,Theatro Municipal de São Paulo,Theater,-23.545477,-46.638812
3,4b09acf9f964a520701b23e3,Teatro Renault,Theater,-23.55412,-46.638695
4,4b05bc5ef964a520fee122e3,Mercado Municipal Paulistano,Market,-23.541673,-46.629742
5,58f278f8e65d0c4b18f857f0,Kenzo Sushi,Sushi Restaurant,-23.557705,-46.636036
6,4ec7c74261af9e14300f3b60,Hachi Crepe e Café,Creperie,-23.560826,-46.634951
7,54ac1385498e545686d23f88,Por um Punhado de Dólares,Coffee Shop,-23.548488,-46.645305
8,560ee563498e447aba686327,A Casa do Porco,Brazilian Restaurant,-23.544887,-46.644622
9,4b0588c7f964a520e8d922e3,Pinacoteca do Estado de São Paulo,Art Museum,-23.534735,-46.634001


## Filtering search for Cafeterias in São Paulo

Here I used the API to return only the restaurants that has 'Café' on its name

In [8]:
QUERY = ['Café']
CATEGORYID = 'bf58dd8d48988d1e0931735' # Coffee Shops ID

url_SP = "https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&query={}&radius={}&limit={}".format(
    CLIENT_ID,
    CLIENT_SECRET,
    ACCESS_TOKEN,
    VERSION,
    latitude,
    longitude,
    QUERY,
    RADIUS,
    LIMIT)
Coffee_results = requests.get(url_SP).json()

In [9]:
coffee_venues = Coffee_results['response']['venues']
coffee_df = pd.json_normalize(coffee_venues)
coffee_df.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,location.neighborhood,venuePage.id
0,4cdeb1c8f8a4a14344e2d8bc,Café do Páteo,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1606656072,False,Pateo do Collegio,-23.547905,-46.632732,"[{'label': 'display', 'lat': -23.5479054807141...",312,01016-030,BR,São Paulo,SP,Brasil,"[Pateo do Collegio, São Paulo, SP, 01016-030]",,,
1,4c068713cf8c76b0219b3a65,Piero Pasta & Café,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1606656072,False,"R. Roberto Simonsen, 98",-23.548734,-46.632186,"[{'label': 'display', 'lat': -23.5487343888834...",245,01017-020,BR,São Paulo,SP,Brasil,"[R. Roberto Simonsen, 98 (R. Venceslau Brás), ...",R. Venceslau Brás,,
2,5878fbd45a5869013dec9032,Havanna Café,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1606656072,False,Livraria Saraiva,-23.551947,-46.634289,"[{'label': 'display', 'lat': -23.5519467396333...",171,01501-001,BR,São Paulo,SP,Brasil,"[Livraria Saraiva (2º Piso), São Paulo, SP, 01...",2º Piso,Centro,
3,5a5f56b165cdf81a23799d63,Cafe by Suplicy (Café do Farol por Suplicy),"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1606656072,False,Farol Santander,-23.545827,-46.634134,"[{'label': 'display', 'lat': -23.5458270732206...",542,01014-010,BR,São Paulo,SP,Brasil,"[Farol Santander (26º Andar), São Paulo, SP, 0...",26º Andar,,
4,4e3b44e52271d21e86d9e832,Café,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1606656072,False,Teatro Abril,-23.554134,-46.638632,"[{'label': 'display', 'lat': -23.5541341220620...",661,01317-000,BR,São Paulo,SP,Brasil,"[Teatro Abril, São Paulo, SP, 01317-000]",,,


In [10]:
filtered_columns = ['name', 'categories', 'location.neighborhood', 'location.lat', 'location.lng', 'id'] # + [col for col in coffee_df.columns if col.startswith('location.')]
filtered_coffee = coffee_df.loc[:, filtered_columns]
# filtered_coffee.head()

filtered_coffee['categories'] = filtered_coffee.apply(get_category_type, axis=1)
filtered_coffee.columns = [column.split('.')[-1] for column in filtered_coffee.columns]
print(filtered_coffee.shape)
filtered_coffee.head()

(50, 6)


Unnamed: 0,name,categories,neighborhood,lat,lng,id
0,Café do Páteo,Café,,-23.547905,-46.632732,4cdeb1c8f8a4a14344e2d8bc
1,Piero Pasta & Café,Italian Restaurant,,-23.548734,-46.632186,4c068713cf8c76b0219b3a65
2,Havanna Café,Café,Centro,-23.551947,-46.634289,5878fbd45a5869013dec9032
3,Cafe by Suplicy (Café do Farol por Suplicy),Coffee Shop,,-23.545827,-46.634134,5a5f56b165cdf81a23799d63
4,Café,Café,,-23.554134,-46.638632,4e3b44e52271d21e86d9e832


In [11]:
print(filtered_coffee.loc[:,['name', 'categories']])

                                                 name            categories
0                                       Café do Páteo                  Café
1                                  Piero Pasta & Café    Italian Restaurant
2                                        Havanna Café                  Café
3         Cafe by Suplicy (Café do Farol por Suplicy)           Coffee Shop
4                                                Café                  Café
5                                       Café do Ponto                  Café
6                                       Cafe Bancario                 Diner
7                           Café Gourmet Santa Monica                  Café
8                           Café da Bolsa BM&FBovespa                  Café
9                                  Ki Salgados e Café  Brazilian Restaurant
10                       Santander - Work Café Centro                  Bank
11                               Café Martinelli Midi                  Café
12          

In [12]:
filtered_coffee.groupby('categories').count()

Unnamed: 0_level_0,name,neighborhood,lat,lng,id
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Bank,1,0,1,1,1
Beer Garden,1,0,1,1,1
Brazilian Restaurant,8,0,8,8,8
Breakfast Spot,3,0,3,3,3
Café,23,1,23,23,23
Coffee Shop,8,0,8,8,8
Diner,2,0,2,2,2
Italian Restaurant,1,0,1,1,1
Juice Bar,1,0,1,1,1
Restaurant,1,0,1,1,1


Let's see their location on the map

In [82]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=14)

for lat, lng, name in zip(filtered_coffee.lat, filtered_coffee.lng, filtered_coffee.name):
  folium.Marker(
      location = [lat, lng],
      popup = name,
      icon = folium.Icon(color='beige',
                       icon_color ='black',
                       prefix='fa', # https://fontawesome.com/icons
                       icon='coffee') # Coffee Icon HTML: <i class="fas fa-coffee"></i>
  ).add_to(venues_map)

# display map
venues_map

Now let's extract the Likes every coffee received from users to put in our rank:

In [30]:
coffee_rank = pd.DataFrame(columns = ['name', 'likes'])

for id, name in zip(filtered_coffee.id, filtered_coffee.name):
  url = "https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&oauth_token={}&v={}".format(
      id,
      CLIENT_ID,
      CLIENT_SECRET,
      ACCESS_TOKEN,
      VERSION
  )
  venuelikes = requests.get(url).json()
  likes = venuelikes['response']['likes']['count']

  coffee_rank.loc[len(coffee_rank)] = [name, likes]

In [42]:
top5 = coffee_rank.sort_values("likes", ascending=False).head(5).reset_index(drop=True)
top5

Unnamed: 0,name,likes
0,Café Girondino,2190
1,Café Piu Piu,1013
2,Café Martinelli Midi,252
3,Piero Pasta & Café,227
4,Café do Páteo,150


Now let's see where to find them:

In [81]:
# Add the coordinates of the Top 5 Cafeterias
coor = pd.DataFrame(columns=['latitude', 'longitude'])
for i in range(len(top5)):
  for name, lat, lng in zip(filtered_coffee.name, filtered_coffee.lat, filtered_coffee.lng):
    if top5['name'][i] == name:
      coor.loc[len(coor)] = [lat, lng]

fulltop5 = top5.join(coor)

# now plotting on the map centered on the means of the coordinates of all top 5
top5_map = folium.Map(location=[coor.mean()[0], coor.mean()[1]], zoom_start=15)
for name, lat, lng in zip(fulltop5.name, fulltop5.latitude, fulltop5.longitude):
  folium.Marker(
      location = [lat, lng],
      popup = name,
      icon = folium.Icon(color='beige',
                       icon_color ='black',
                       prefix='fa', # https://fontawesome.com/icons
                       icon='star') # Coffee Icon HTML: <i class="fas fa-coffee"></i>
  ).add_to(top5_map)

title_html = '''
             <h3 align="center" style="font-size:20px"><b>Top 5 Cafeterias in Sao Paulo</b></h3>
             '''
top5_map.get_root().html.add_child(folium.Element(title_html))
# display map
top5_map

## Results and Discussions

Some data wasn't available to give a more accurate based on other aspects such as population density on each neighborhood or the name of each neighborhood.

So the analysis is based only on the amount of likes one cafeteria received regardless if it's spotted on a neighborhood with a greater population.

This disparity can be noticed by the 4 closer cafeterias on the map above. They are located on a central neighborhood of the city, so naturally, will have more visitors.