# The Battle of Neighbourhoods

![Image](https:www.tupiniquimhostel.com.br/wp/wp-content/uploads/2016/11/Cristo-Redentor-800x250.jpg)

## A study of the Rio de Janeiro City Brazilian Restaurants

## Table of Content

* [1. Introduction](#chapter1)
    * [1.1 Purpose](#section_1_1)
    * [1.2 Geographical location](#section_1_2)
    * [1.3 Problem Description](#section_1_3)
    * [1.4 Stakeholders](#section_1_4)
* [2. Data](#chapter2)
    * [2.1 Data Sources](#section_2_1)
        * [2.1.1 Libraries and necessary dependencies](#section_2_1_1)
        * [2.1.2 Rio de Janeiro Neighbourhood Data](#section_2_1_2)
        * [2.1.3 Rio de Janeiro Population Data](#section_2_1_3)
        * [2.1.4 Rio de Janeiro Foursquare Data](#section_2_1_4)
        * [2.1.5 Methodology](#section_2_1_5)
        

## 1. Introduction <a class='anchor' id='chapter1'></a>

### 1.1 Purpose <a class = 'anchor' id = 'section_1_1'></a>
>This Jupyter notebook is part of the Course Assignment for the IBM Data Science Professional Certificate.

### 1.2 Geographical location <a class = 'anchor' id = 'section_1_2'></a>
>Rio de Janeiro is the second most populous municipality in Brazil with 6.72 million inhabitants according to the 2019 statistics. There are 16 boroughs and 163 neighbourhoods in Rio de Janeiro. It is the capital of the state of Rio de Janeiro and has headquarters to Brazilian oil, mining and telecomunication companies. Rio de Janeiro is one of the most visitered cities in the South America and it is known for the "Christ the Redeemer" statue, the "Sugarloaf" Mountain, Carnaval, Samba, Bossa Nova and the beaches, https://en.wikipedia.org/wiki/Rio_de_Janeiro. An other interesting thing is the number of bars and restaurants with tastes from all over the world. We are going to focus our study on the Brazilian Restaurants in the following two  boroughs:
>* The South Zone (Zona Sul), https://en.wikipedia.org/wiki/Rio_de_Janeiro#South_Zone
>* Central Zone (Centro), https://en.wikipedia.org/wiki/Rio_de_Janeiro#Central_Zone
>
>These two boroughs have together 32 neighbourhoods that are common locations for the stakeholders involved in this study.

### 1.3 Problem Description <a class = 'anchor' id = 'section_1_3'></a>
>The big challenge for a restaurant invester is to know where to invest in a particular restaurant category and where to place the restaurant to get the best result of the investment. For the investor several factors must be taken into account such as:
>
>* Who are the competitors in each borough/neighbourhood?
>* Which pricing do they have?
>* What is their Menu?
>* What type of clients local, turists, etc are there in each location?
>* What kind of borough/neigbourhood, residencial or business area?
>* How many competitors for each restaurant category?
>* etc.
>
>For a tourist, with the taste for restaurants of the visited city, the challenge is to find the best spot and location and therefore other factors are important such as:
>* What restaurant categories are there in each borough/neigbourhood?
>* The distance to each restaurant of interest?
>* etc.
>
>We are going to focus our study to investigate the Brazilian Restaurant categories in different boroughs and neigbourhoods. Besides Brazilian restaurants with regional dishes we will see that there are several Brazilian restaurants with specialities such as, Acai, Churrasco, Empada, Pastelaria and Tapioca.
>
>By using some Data Science techniques we will study these restaurant categories and locations and as result, make some observations and recommendations according to our findings.

### 1.4 Stakeholders <a class = 'anchor' id = 'section_1_4'></a>
> The people who would be interested in this study will be investers in the Brazilian Restaurant Business but also turists who would like to know the best loction and what kind of Brazilian restaurants that are available of their interest in the two different boroughs, "The South Zone" and the "Central Zone" of Rio de Janeiro.

## 2. Data <a class = 'anchor' id ='chapter2'></a>

### 2.1 Data Sources <a class = 'anchor' id = 'section_2_1'></a>
>To make the Brazilian Restaurant study we will need following data of Rio de Janeiro:
>* Neighbourhood Data
>* Population Data
>* Brazilian Restaurant Categories  

#### 2.1.1 Libraries and necessary dependencies <a class = 'anchor' id = 'section_2_1_1'></a>

>Following Libraries and dependencies will be needed to explore the Neighbourhood Data, the Population Data and the Brazilian Restaurant Categories of Rio de Janeiro:

In [1]:
import warnings
warnings.filterwarnings('ignore')

# For Numpy and Pandas Handling
import numpy as np
import pandas as pd

# For Geopandas handling
#!conda install -c conda-forge geopandas=0.3.0 --yes # Uncomment if not yet installed
#!conda install -c conda-forge geoplot=0.2.3 --yes # Uncomment this if not yet installed
#!conda install -c conda-forge geopy --yes # Uncomment this line if not yet installed
import geopandas as gpd
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopandas import GeoSeries
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon

# To handle JSON files and requests
import requests
import json

# For plot handling
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# For K-means from clustering handling
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer

# For map handling
#!conda install -c conda-forge folium=0.7.0 --yes # Uncomment if not yet installed
import folium

print('Dependencies downloaded')

Dependencies downloaded


#### 2.1.2 Rio de Janeiro Neighbourhood Data <a class = 'anchor' id = 'section_2_1_2'></a>

> The Data for the 'Neighbourhood' will come from the open source portal: www.data.rio
>
> The Dataset will be downloaded from www.data.rio/datasets/limite-de-bairros/data?geometry=-44.357%2C-23.138%2C-42.533%2C-22.695 
via following GeoJSON file: https://opendata.arcgis.com/datasets/dc94b29fc3594a5bb4d297bee0c9a3f2_15.geojson

##### 2.1.2.1 The Dataset for the Neighbourhoods

In [2]:
RioNeighbourhoodUrl = 'https://opendata.arcgis.com/datasets/dc94b29fc3594a5bb4d297bee0c9a3f2_15.geojson'

In [3]:
RioNeighbourhoodFeatures = requests.get(RioNeighbourhoodUrl).json()

In [4]:
RioNeighbourhoodData = gpd.GeoDataFrame.from_file(RioNeighbourhoodUrl)
RioNeighbourhoodData.head()

Unnamed: 0,OBJECTID,Área,NOME,REGIAO_ADM,AREA_PLANE,CODBAIRRO,CODRA,CODBNUM,LINK,RP,Cod_RP,CODBAIRRO_LONG,SHAPESTArea,SHAPESTLength,geometry
0,325,1705685.0,Paquetá,PAQUETA,1,13,21,13,Paqueta&area=013 ...,Centro,1.1,13,1705685.0,24841.426669,"MULTIPOLYGON (((-43.10567 -22.74888, -43.10568..."
1,326,4056403.0,Freguesia (Ilha),ILHA DO GOVERNADOR,3,98,20,98,Freguesia (Ilha) &area=98 ...,Ilha do Governador,3.7,98,4056403.0,18303.595717,"MULTIPOLYGON (((-43.17170 -22.77661, -43.17170..."
2,327,978046.5,Bancários,ILHA DO GOVERNADOR,3,97,20,97,Bancários &area=97 ...,Ilha do Governador,3.7,97,978046.5,7758.781282,"MULTIPOLYGON (((-43.18915 -22.78318, -43.18915..."
3,328,18957420.0,Galeão,ILHA DO GOVERNADOR,3,104,20,104,Galeão &area=104 ...,Ilha do Governador,3.7,104,18957420.0,21510.05922,"MULTIPOLYGON (((-43.22804 -22.78374, -43.22811..."
4,329,1672546.0,Tauá,ILHA DO GOVERNADOR,3,101,20,101,Tauá &area=101 ...,Ilha do Governador,3.7,101,1672546.0,8246.109606,"POLYGON ((-43.18039 -22.79940, -43.18022 -22.7..."


>We can observe that downloaded Neighbourhood Data has more columns than we need for analysis and the columns that will be used need to be renamed for better understanding.
>The only columns needed are following:
>* 'RP' which will be named 'Borough'
>* 'NOME' which will be named 'Neighbourhood'
>* 'geometry' which is the polygon data of the Neighbourhood

##### 2.1.2.2 Drop, rename and reorder Columns

In [5]:
RioNeighbourhoodDataReduced = RioNeighbourhoodData.copy()
RioNeighbourhoodDataReduced.drop(columns = ['OBJECTID', 'Área', 'REGIAO_ADM', 'AREA_PLANE', 'CODBAIRRO', 'CODRA', 'CODBNUM', 'LINK','Cod_RP','CODBAIRRO_LONG','SHAPESTArea','SHAPESTLength'], inplace = True)
RioNeighbourhoodDataReduced.rename(columns = {'NOME': 'Neighbourhood', 'RP': 'Borough'}, inplace = True)
RioNeighbourhoodDataReduced = RioNeighbourhoodDataReduced[['Borough','Neighbourhood','geometry']]
RioNeighbourhoodDataReduced.head()

Unnamed: 0,Borough,Neighbourhood,geometry
0,Centro,Paquetá,"MULTIPOLYGON (((-43.10567 -22.74888, -43.10568..."
1,Ilha do Governador,Freguesia (Ilha),"MULTIPOLYGON (((-43.17170 -22.77661, -43.17170..."
2,Ilha do Governador,Bancários,"MULTIPOLYGON (((-43.18915 -22.78318, -43.18915..."
3,Ilha do Governador,Galeão,"MULTIPOLYGON (((-43.22804 -22.78374, -43.22811..."
4,Ilha do Governador,Tauá,"POLYGON ((-43.18039 -22.79940, -43.18022 -22.7..."


>This is the Neighbourhood Dataset that we will performe Data Wrangling and explore more.

#### 2.1.3 Rio de Janeiro Population Data <a class = 'anchor' id = 'section_2_1_3'></a>

>The Data for the neighbourhood 'Population' numbers will come from the govermental site IBGE (Brazilian Institute of Geography and Statistics): https://www.ibge.gov.br
>
>The Dataset will be constructed and downloaded from https://sidra.ibge.gov.br/Tabela/3175 via following csv file:https://sidra.ibge.gov.br/geratabela?format=us.csv&name=tabela3175.csv&terr=NS&rank=-&query=t/3175/n102/all/v/allxp/p/all/c86/0/c1/0/c2/0/c287/0/l/v,p%2Bc86%2Bc1,t%2Bc2%2Bc287&measurescol=true

##### 2.1.3.1 The Dataset for the population

In [6]:
BrazilPopulationPerNeighbourhoodUrl = 'https://sidra.ibge.gov.br/geratabela?format=us.csv&name=tabela3175.csv&terr=NS&rank=-&query=t/3175/n102/all/v/allxp/p/all/c86/0/c1/0/c2/0/c287/0/l/v,p%2Bc86%2Bc1,t%2Bc2%2Bc287&measurescol=true'

In [7]:
BrazilPopulationPerNeighbourhoodData = pd.read_csv(BrazilPopulationPerNeighbourhoodUrl,skiprows = 5)

In [8]:
BrazilPopulationPerNeighbourhoodData.head()

Unnamed: 0,Nível,Bairro,Sexo,Idade,Total
0,BA,Centro - Alta Floresta D'Oeste (RO),Total,Total,1960
1,BA,Liberdade - Alta Floresta D'Oeste (RO),Total,Total,1075
2,BA,Cidade Alta - Alta Floresta D'Oeste (RO),Total,Total,1175
3,BA,Santa Felicidade - Alta Floresta D'Oeste (RO),Total,Total,2833
4,BA,Princesa Isabel - Alta Floresta D'Oeste (RO),Total,Total,3067


>We can observe that downloaded Population Data has more columns than we need for the analysis and the columns that will be used need to be renamed for better understanding.
>
>The columns that will be needed for exploration are the following:
>* 'Bairro' which will be named 'NeighbourhoodComposed'
>* 'Total' which will be named 'Population'

##### 2.1.3.2 Drop and rename Columns

In [9]:
BrazilPopulationPerNeighbourhoodDataReduced = BrazilPopulationPerNeighbourhoodData.copy()
BrazilPopulationPerNeighbourhoodDataReduced.drop(columns = ['Nível','Sexo','Idade'], inplace = True)
BrazilPopulationPerNeighbourhoodDataReduced.rename(columns = {'Bairro': 'NeighbourhoodComposed', 'Total': 'Population'}, inplace = True)
BrazilPopulationPerNeighbourhoodDataReduced.head()

Unnamed: 0,NeighbourhoodComposed,Population
0,Centro - Alta Floresta D'Oeste (RO),1960
1,Liberdade - Alta Floresta D'Oeste (RO),1075
2,Cidade Alta - Alta Floresta D'Oeste (RO),1175
3,Santa Felicidade - Alta Floresta D'Oeste (RO),2833
4,Princesa Isabel - Alta Floresta D'Oeste (RO),3067


>This is the Population Dataset that we will performe Data Wrangling and explore more.

#### 2.1.4 Rio de Janeiro Foursquare Data <a class = 'anchor' id = 'section_2_1_4'></a>
>The Data of Brazilian Restaurants will be discovered from the Foursquare location platform, https://developer.foursquare.com
>
>To limit the types of venues, we will use the "Brazilian Restaurant Category" that can be found at https://developer.foursquare.com/docs/build-with-foursquare/categories/

##### 2.1.4.1 The Foursquare Categories to be used

>These are the different restuarant categories that we will try to find and explore in each neighbourhood:
>
>* Brazilian Restaurant
    * Acai House
    * Baiano Restaurant
    * Central Brazilian Restaurant
    * Churrascaria
    * Empada House
    * Goiano Restaurant
    * Mineiro Restaurant
    * Norteastern Brazilian Restaurant
    * Northern Brazilian Restaurant
    * Pastelaria
    * Southeastern Brazilian Restaurant
    * Southern Brazilian Restaurant
    * Tapiocaria

##### 2.1.4.2 The Foursquare Credentials and Version

In [10]:
# @hidden_cell
CLIENT_ID = '5ZFTKSMHL3CRCN4N2KRZSSBWCXR1ENL230ZGVTVA1QEITZKD'
CLIENT_SECRET = 'WDSN0USS2CNSKC2ZT1YKJ1UGP4HOGB4TNHTPDIES0ZYO02VH'
VERSION = '20180605'

#### 2.1.5 Methodology <a class = 'anchor' id = 'section_2_1_5'></a>
>In order to find the right neighbourhood(s) for the stakeholders, we will explore the demographics of the Rio de Janeiro City neighbourhoods by segmenting the data and make descriptive analysis using Panda. First we will perform data wrangling the Neigbourhood and the Population Datasets, select the data we need and then merge them before make any further analysis.
>
>Secondly, with the two Datasets cleaned and merged, we will use this Dataset to explore the Neighbourhoods to discover the Brazilian restaurants through the Foursquare location platform. With the Restaurant Data from Foursquare we will then cluster the Neighbourhoods by using k-means. To find the best K value we will use the Yellowbrick KElbowVisualizer. We will also viualize and examine these clusters.
>
>Thirdly, with some final explorations and analysis done, we will make the final conclusions of the data and give recommendations from this Brazilian restaurant study.