## Part 1
##### A description of the problem and a discussion of the background. (15 marks)
Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

This submission will eventually become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

## Part 2
#### A description of the data and how it will be used to solve the problem. (15 marks)
Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it

# Part 1 (Problem):
Where in Rio de Janeiro would it be best to open a restaurant. What is the place that has the most reviews, what location seems to be the most popular. Is there a type of restaurant that is more popular than another?

The problem will be solved using the Foursquare API, looking at the usercount and tipcount. This problem will help everyone that wants to open up a restaurant in Rio de Janeiro and wants to know if it is a viable option where they intend on opening.


# Part 2 (Data):
The data that will be used is Foursquare data for Rio de Janeiro city. It will focus on venues of restaurant type. The popularity will be an indicator for how good an area is for a restaurant.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                       

In [2]:
# Get longitude and latitude for Toronto
address = 'Rio de Janeiro, Rio de Janeiro'

geolocator = Nominatim(user_agent="rio_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Rio de Janeiro is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Rio de Janeiro is -22.9110137, -43.2093727.


In [3]:
# Set up Foursquare
CLIENT_ID = 'DXUA5T4QUO3JEHJ3A1MLEXXHFR4VDXIRUF1V4D0BVE2YBTI3' # your Foursquare ID
CLIENT_SECRET = 'MFOUXRESHK35XRDDANTE2WQFP3WO3V0XJYTSU0ZJLJEFQXXK' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DXUA5T4QUO3JEHJ3A1MLEXXHFR4VDXIRUF1V4D0BVE2YBTI3
CLIENT_SECRET:MFOUXRESHK35XRDDANTE2WQFP3WO3V0XJYTSU0ZJLJEFQXXK


In [4]:
search_query = 'Restaurant'
radius = 50000
print(search_query + ' .... OK!')

Restaurant .... OK!


In [5]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=DXUA5T4QUO3JEHJ3A1MLEXXHFR4VDXIRUF1V4D0BVE2YBTI3&client_secret=MFOUXRESHK35XRDDANTE2WQFP3WO3V0XJYTSU0ZJLJEFQXXK&ll=-22.9110137,-43.2093727&v=20180604&query=Restaurant&radius=50000&limit=30'

In [6]:
results = requests.get(url).json()

In [7]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'id': '52e81612bcbc57f1066b79f1', 'name': 'B...",False,56e04e1d38fa3ae11155b863,"Rua Riachuelo, 242",BR,Rio de Janeiro,Brasil,,2146,"[Rua Riachuelo, 242, Rio de Janeiro, RJ, Brasil]","[{'label': 'display', 'lat': -22.914733, 'lng'...",-22.914733,-43.188829,,RJ,Riá Restaurant,v-1592660656
1,"[{'id': '4bf58dd8d48988d16b941735', 'name': 'B...",False,4ece6782cc219860f521dd5d,,BR,Rio de Janeiro,Brasil,,1828,"[Rio de Janeiro, RJ, Brasil]","[{'label': 'display', 'lat': -22.9097338088607...",-22.909734,-43.191598,,RJ,Restaurante Pitada,v-1592660656
2,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,50e0b634e4b0f94bd8a38f47,Navio Soberano,BR,Rio de Janeiro,Brasil,,2849,"[Navio Soberano, Rio de Janeiro, RJ, Brasil]","[{'label': 'display', 'lat': -22.8928508209919...",-22.892851,-43.189795,,RJ,Restaurant El Duero,v-1592660656
3,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",False,5231f7b811d27690b0979df7,Hilton Copacabana,BR,Rio de Janeiro,Brasil,4º Andar,7046,"[Hilton Copacabana (4º Andar), Rio de Janeiro,...","[{'label': 'display', 'lat': -22.9648048398105...",-22.964805,-43.173135,22010-000,RJ,Restaurante The View,v-1592660656
4,"[{'id': '52e81612bcbc57f1066b7a00', 'name': 'C...",False,54fc90bc498e67112301efff,Travessa Angustura,BR,Rio de Janeiro,Brasil,Rua do Matoso,500,"[Travessa Angustura (Rua do Matoso), Rio de Ja...","[{'label': 'display', 'lat': -22.9138483642793...",-22.913848,-43.213167,,RJ,Restaurante Drink's,v-1592660656


In [8]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Riá Restaurant,Bistro,"Rua Riachuelo, 242",BR,Rio de Janeiro,Brasil,,2146,"[Rua Riachuelo, 242, Rio de Janeiro, RJ, Brasil]","[{'label': 'display', 'lat': -22.914733, 'lng'...",-22.914733,-43.188829,,RJ,56e04e1d38fa3ae11155b863
1,Restaurante Pitada,Brazilian Restaurant,,BR,Rio de Janeiro,Brasil,,1828,"[Rio de Janeiro, RJ, Brasil]","[{'label': 'display', 'lat': -22.9097338088607...",-22.909734,-43.191598,,RJ,4ece6782cc219860f521dd5d
2,Restaurant El Duero,Restaurant,Navio Soberano,BR,Rio de Janeiro,Brasil,,2849,"[Navio Soberano, Rio de Janeiro, RJ, Brasil]","[{'label': 'display', 'lat': -22.8928508209919...",-22.892851,-43.189795,,RJ,50e0b634e4b0f94bd8a38f47
3,Restaurante The View,Breakfast Spot,Hilton Copacabana,BR,Rio de Janeiro,Brasil,4º Andar,7046,"[Hilton Copacabana (4º Andar), Rio de Janeiro,...","[{'label': 'display', 'lat': -22.9648048398105...",-22.964805,-43.173135,22010-000,RJ,5231f7b811d27690b0979df7
4,Restaurante Drink's,Comfort Food Restaurant,Travessa Angustura,BR,Rio de Janeiro,Brasil,Rua do Matoso,500,"[Travessa Angustura (Rua do Matoso), Rio de Ja...","[{'label': 'display', 'lat': -22.9138483642793...",-22.913848,-43.213167,,RJ,54fc90bc498e67112301efff


In [9]:

venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map