# Capstone Project - The Battle of Neighborhoods (Week 1)

### Introduction

Let's suppose you had a touristic company and you want to sell trip packages to the capital of Rio de Janeiro located in Brazil. Along with the vacation package, you decided to provide some tips of the best venues in the city, so that your customers can enjoy the main sights and taste the local food, but you never been in Rio de Janeiro before and you do not know what are the best spots in the area. So, you ask for a data scientist friend to help you out.

Well, the main idea here is to find out which are the best venues on the most famous neighbourhoods of the city of Rio de Janeiro, including food and tourist attractions.

So, let's get started! ;)

According to [Culture Trip](https://theculturetrip.com/south-america/brazil/articles/the-10-coolest-neighbourhoods-in-rio-de-janeiro/ "Culture Trip") the 10 Coolest Neighbourhoods in Rio de Janeiro are:

<ol>
 <li>Copacabana</li>
 <li>Santa Teresa</li>
 <li>Ipanema</li>
 <li>Lapa</li>
 <li>Leblon</li>
 <li>Urca</li>
 <li>Lagoa</li>
 <li>Jardim Botanico</li>
 <li>Centro </li>
 <li>Botafogo </li>
</ol>

### Data

So, using a list composed of the top 10 coolest neighbourhoods in Rio, we will use the Foursquare API to look for the best venues in the above mentioned neighbourhoods. 

After the request, this data set will comprise a diversity of venues classified by categories and their ratings.

But first, let's import some libraries that we will need ahead.

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Now, just for a reminder, let's use geolocator to find the geographical coordinates and sees where Rio de Janeiro is located on the map.

In [6]:
address = 'Rio de Janeiro, Brazil'

geolocator = Nominatim(user_agent="brazil")
location = geolocator.geocode(address)
latituderj = location.latitude
longituderj = location.longitude
print('The geographical coordinates of Rio de Janeiro are: Latitude {}, and Longitude {}.'.format(latituderj, longituderj))
map_rj = folium.Map(location=[latituderj, longituderj], zoom_start=11.5)
map_rj

The geographical coordinates of Rio de Janeiro are: Latitude -22.9110137, and Longitude -43.2093727.


Now, let's create the data frame with the required neighbouhoods and its correspondent latitude and longitude.

In [68]:
neighbourhoods = {'Neighbourhood': ['Copacabana','Santa Teresa', 'Ipanema', 'Lapa','Leblon','Urca', 'Lagoa','Jardim Botanico', 'Centro', 'Botafogo']}
labels = ['Neighbourhood']
df = pd.DataFrame.from_records(neighbourhoods,columns=labels)

latitude = []
longitude = []

for address in df['Neighbourhood']:

    geolocator = Nominatim(user_agent="Brazil")
    location = geolocator.geocode(address)
    latitude.append(location.latitude)
    longitude.append(location.longitude)
    
df['latitude'] = latitude
df['longitude'] = longitude
df

Unnamed: 0,Neighbourhood,latitude,longitude
0,Copacabana,-22.971964,-43.184343
1,Santa Teresa,-22.931948,-43.196995
2,Ipanema,-22.983956,-43.202216
3,Lapa,-13.250571,-43.410754
4,Leblon,-22.983556,-43.224938
5,Urca,46.548852,23.961872
6,Lagoa,37.132581,-8.455051
7,Jardim Botanico,-22.968385,-43.228694
8,Centro,47.549025,1.732406
9,Botafogo,-22.948845,-43.179829


Notice that the neighbourhoods 'Lapa', 'Urca', 'Lagoa' and 'Centro' received the wrong geographical coordinates, so I changed them manually in the df as shown below

In [69]:
#Lapa
df['latitude'].replace({-13.250571: -22.9136},inplace=True)
df['longitude'].replace({-43.410754: -43.1817},inplace=True)# 46.548852: -43.1599343603}, inplace=True)
#Urca
df['latitude'].replace(to_replace = 46.548852, value= -22.9528578552)
df['longitude'].replace({23.961872: -43.1599343603})
#Lagoa
df['latitude'].replace({37.132581: -22.97633})
df['longitude'].replace({-8.455051: -43.20966}) 
#Centro
df['latitude'].replace({47.549025: -22.9035})
df['longitude'].replace({1.732406: -43.2096})
df

Unnamed: 0,Neighbourhood,latitude,longitude
0,Copacabana,-22.971964,-43.184343
1,Santa Teresa,-22.931948,-43.196995
2,Ipanema,-22.983956,-43.202216
3,Lapa,-22.9136,-43.1817
4,Leblon,-22.983556,-43.224938
5,Urca,46.548852,23.961872
6,Lagoa,37.132581,-8.455051
7,Jardim Botanico,-22.968385,-43.228694
8,Centro,47.549025,1.732406
9,Botafogo,-22.948845,-43.179829


As I have no idea why replace function only work for Lapa, I drop some rows and create manually a new df to append it to df

In [75]:
df.drop([5,6,8],inplace=True)

df2 = pd.DataFrame({"Neighbourhood":['Urca', 'Lagoa', 'Centro'],
                    "latitude":[-22.9528578552, -22.97633, -22.9035],
                    "longitude": [-43.1599343603,-43.20966,-43.2096]})


df.append(df2, ignore_index = True) 

Unnamed: 0,Neighbourhood,latitude,longitude
0,Copacabana,-22.971964,-43.184343
1,Santa Teresa,-22.931948,-43.196995
2,Ipanema,-22.983956,-43.202216
3,Lapa,-22.9136,-43.1817
4,Leblon,-22.983556,-43.224938
5,Jardim Botanico,-22.968385,-43.228694
6,Botafogo,-22.948845,-43.179829
7,Urca,-22.952858,-43.159934
8,Lagoa,-22.97633,-43.20966
9,Centro,-22.9035,-43.2096


Now let's update de Rio de Janeiro map with the neighbourhoods.

In [79]:
map_rj = folium.Map(location=[latituderj, longituderj], zoom_start=11.5)

# add markers to map
for label, lat, lng in zip(df['Neighbourhood'], df['latitude'], df['longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_rj)  
    
map_rj

In the next week we will explore the venues using 4square.