# THE BATTLE OF NEIGHBORHOODS FROM COURSERA CAPSTONE

By Gastón Costa

# Introduction

The problem:

Hello, my name is Gastón. I am from Argentina but I have been living in Barcelona for 10 years. My wife has a fashion design project. She has a store in El Eixample, a very fashionable neighborhood and where there are many small entrepreneurs related to design.

We will be visiting Toronto and we believe it is a good possibility to contact new partners and to expand the business. But we have never been to this city and we don't know where to find similar neighborhoods.

Business Model:

Through this work, I want to analyze the different neighborhoods of Toronto and look for potential partners / clients for design entrepreneurship.

# Data

Description of the data and how it will be used to solve the problem
The following data will be used:

- List of Boroughs and neighborhoods of Barcelona with their geodata;
- List of Boroughs and neighborhoods of Toronto with their geodata;
- List of place whit gluten free description of Barcelona with their geodata;
- List of place whit gluten free description of Toronto with their geodata.

# Data Sources

- Boroughs and neighborhoods of Barcelona from Wikipedia (https://es.wikipedia.org/wiki/Distritos_de_Barcelona);
- Boroughs and neighborhoods of Totonto from Wikipedia (https://es.wikipedia.org/wiki/%C3%81rea_metropolitana_de_Toronto);
- Geocode information from Geopy;
- Gluten Free in Barcelona and Toronto from Foursquare

# Methodology


- The neighborhood where I live in Barcelona will be individualized
- Stores related to design and accessories will be explored
- Toronto's various neighborhoods will be analyzed and a comparison will be made to find the best neighborhood to explore on our visit.

# Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install folium
import folium # map rendering library
from folium import plugins

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 10.8MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.10.1


# First: getting the data of Barcelona and processing them

In [2]:
link_barcelona = 'https://es.wikipedia.org/wiki/Distritos_de_Barcelona'
Barna= pd.read_html(link_barcelona)[0]
Barna

Unnamed: 0,Nº,Distrito,Imagen,Superficie km²,Población (2016)[1]​,Densidad hab/km²,Barrios (nº),Regidor
0,1,Ciutat Vella,,449,100 070,"22 424,28","El Raval (1), Barrio Gótico (2), La Barcelonet...",Gala Pin Ferrando (Barcelona en Comú)
1,2,Eixample,,746,264 305,"35 330,43","El Fort Pienc (5), Sagrada Familia (6), Dreta ...",Agustí Colom Cabau (Barcelona en Comú)
2,3,Sants - Montjuïc,,2135,180 977,846951,"Pueblo Seco (11), La Marina del Prat Vermell (...",Jaume Asens Llodrà (Barcelona en Comú)
3,4,Les Corts,,608,81 642,"13 355,26","Les Corts (19), La Maternidad y San Ramón (20)...",Laura Pérez Castaño (Barcelona en Comú)
4,5,Sarrià - Sant Gervasi,,2009,148 026,725540,"Vallvidrera, Tibidabo i les Planes (22), Sarri...",Gerardo Pisarello Prados (Barcelona en Comú)
5,6,Gràcia,,419,120 918,"28 704,77","Vallcarca y los Penitentes (28), El Coll (29),...",Raimundo Viejo (Barcelona en Comú)
6,7,Horta - Guinardó,,1196,167 268,"13 959,03","Baix Guinardó (33), Can Baró (34), El Guinardó...",Mercedes Vidal Lago (Barcelona en Comú)
7,8,Nou Barris,,804,164 881,"20 462,19","Vilapicina y La Torre Llobeta (44), Porta (45)...",Janet Sanz Cid (Iniciativa per Catalunya - Verds)
8,9,Sant Andreu,,656,146 731,"22 253,51","La Trinitat Vella (57), Baró de Viver (58), El...",Laia Ortiz i Castellví (Iniciativa per Catalun...
9,10,Sant Martí,,1080,233 928,"21 539,72","El Campo del Arpa del Clot (64), El Clot (65),...",Josep Maria Montaner Martorell (Barcelona en C...


In [3]:
#Selecting and change the names of columns
Barna = Barna[['Distrito','Barrios (nº)']]
Barna.rename(columns = {'Distrito': 'District', 'Barrios (nº)': 'Neighbourhood'}, inplace = True)

Barna

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,District,Neighbourhood
0,Ciutat Vella,"El Raval (1), Barrio Gótico (2), La Barcelonet..."
1,Eixample,"El Fort Pienc (5), Sagrada Familia (6), Dreta ..."
2,Sants - Montjuïc,"Pueblo Seco (11), La Marina del Prat Vermell (..."
3,Les Corts,"Les Corts (19), La Maternidad y San Ramón (20)..."
4,Sarrià - Sant Gervasi,"Vallvidrera, Tibidabo i les Planes (22), Sarri..."
5,Gràcia,"Vallcarca y los Penitentes (28), El Coll (29),..."
6,Horta - Guinardó,"Baix Guinardó (33), Can Baró (34), El Guinardó..."
7,Nou Barris,"Vilapicina y La Torre Llobeta (44), Porta (45)..."
8,Sant Andreu,"La Trinitat Vella (57), Baró de Viver (58), El..."
9,Sant Martí,"El Campo del Arpa del Clot (64), El Clot (65),..."


In Barcelona the neighborhoods are very small, so I will do the analysis on the district of Eixample


In [4]:
# Selecting my district
My_district = Barna[Barna['District'].str.contains('Eixample', na = False)]
My_district

# Getting latitude and longitude of my district
address = My_district['District'].values[0] + ', Barna'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of my district are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of my district are 41.3933942, 2.1660849840426866.




In [5]:
# Creating a map of Milan and finding my district.
map_my_district = folium.Map(location=[latitude, longitude], zoom_start= 15)

folium.CircleMarker(
    [latitude, longitude],
    radius=5,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_my_district)

map_my_district

# Extracting the number of shops in the district area

In [5]:
# Date from foursquare 
CLIENT_ID = '50W2M4K3ZWEVPUIT5GFGACTGJW1L0MBD34ICNCDRRCAF1XOR' # your Foursquare ID
CLIENT_SECRET = '1T4YYBZPYOIMVGGB5EE0QE05IDF3JY2ZCSMGSQGZDCLMPRDQ' # your Foursquare Secret
VERSION = '20191206'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 50W2M4K3ZWEVPUIT5GFGACTGJW1L0MBD34ICNCDRRCAF1XOR
CLIENT_SECRET:1T4YYBZPYOIMVGGB5EE0QE05IDF3JY2ZCSMGSQGZDCLMPRDQ


In [6]:
# We choose to search by category with a 600m radius.
radius = 600
LIMIT = 100
category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores

# Define the corresponding URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, category_id, radius, LIMIT)

# Send the GET Request
results = requests.get(url).json()

# Get relevant part of JSON and transform it into a pandas dataframe
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Loewe,Accessories Store,"Passeig de Gràcia, 35",ES,Barcelona,España,,231,"[Passeig de Gràcia, 35, 08007 Barcelona Catalu...","[{'label': 'display', 'lat': 41.39134354229618...",41.391344,2.165631,8007.0,Cataluña,4adcda5af964a5203f4421e3
1,Bimba & Lola,Women's Store,"Valencia, 272",ES,Barcelona,España,,193,"[Valencia, 272, 08007 Barcelona Cataluña, España]","[{'label': 'display', 'lat': 41.39278077769239...",41.392781,2.163918,8007.0,Cataluña,4cdecc8c3644a09375574e9f
2,Desigual,Clothing Store,"Passeig de Gràcia, 47",ES,Barcelona,España,C. Aragó,198,"[Passeig de Gràcia, 47 (C. Aragó), 08007 Barce...","[{'label': 'display', 'lat': 41.39193652224534...",41.391937,2.164725,8007.0,Cataluña,4c051cccf423a593c079d216
3,Louis Vuitton,Boutique,"Paseo de Gracia, 80",ES,Barcelona,España,,253,"[Paseo de Gracia, 80, 08008 Barcelona Cataluña...","[{'label': 'display', 'lat': 41.3941229, 'lng'...",41.394123,2.163205,8008.0,Cataluña,4bc87cd98b7c9c74c22d38cf
4,Uterqüe,Design Studio,"Passeig de Gràcia, 65",ES,Barcelona,España,València,230,"[Passeig de Gràcia, 65 (València), 08008 Barce...","[{'label': 'display', 'lat': 41.39331530006156...",41.393315,2.163331,8008.0,Cataluña,4bc5cd266c26b71326bdebf3
5,Hermès,Boutique,"Paseo de Gracia, 77",ES,Barcelona,España,,315,"[Paseo de Gracia, 77, 08008 Barcelona Cataluña...","[{'label': 'display', 'lat': 41.39396625948295...",41.393966,2.162387,8008.0,Cataluña,4c0148b8cf3aa593b8c4ccb0
6,GUESS,Clothing Store,Passeig De Gràcia,ES,Barcelona,España,Gran Vía de les Corts Catalanes,448,[Passeig De Gràcia (Gran Vía de les Corts Cata...,"[{'label': 'display', 'lat': 41.38950967297749...",41.38951,2.16751,8007.0,Cataluña,4eecb1820e01899efeca9083
7,Biba,Accessories Store,Consell de Cent,ES,,España,,144,"[Consell de Cent, España]","[{'label': 'display', 'lat': 41.39233586461486...",41.392336,2.165078,,,4ef1c5f56c253bf12305e95d
8,Natura Selection,Accessories Store,Consejo de Ciento 304,ES,Barcelona,España,,309,"[Consejo de Ciento 304, 08007 Barcelona Catalu...","[{'label': 'display', 'lat': 41.39068750234346...",41.390688,2.165236,8007.0,Cataluña,4adcda5bf964a520544421e3
9,Cartier,Jewelry Store,"Paseo de Gracia, 82",ES,Barcelona,España,,309,"[Paseo de Gracia, 82, 08008 Barcelona Cataluña...","[{'label': 'display', 'lat': 41.3944011, 'lng'...",41.394401,2.162628,8008.0,Cataluña,4adcda59f964a5201f4421e3


In [7]:
print('The total number of shops in an area of 1000 is: ' +str(dataframe_filtered.shape[0]))

The total number of shops in an area of 1000 is: 49


In [8]:
new_df = dataframe_filtered.drop(['labeledLatLngs','cc','formattedAddress','distance', 'postalCode','crossStreet',
                                  'country'], axis=1)
new_df.head()

Unnamed: 0,name,categories,address,city,lat,lng,state,id
0,Loewe,Accessories Store,"Passeig de Gràcia, 35",Barcelona,41.391344,2.165631,Cataluña,4adcda5af964a5203f4421e3
1,Bimba & Lola,Women's Store,"Valencia, 272",Barcelona,41.392781,2.163918,Cataluña,4cdecc8c3644a09375574e9f
2,Desigual,Clothing Store,"Passeig de Gràcia, 47",Barcelona,41.391937,2.164725,Cataluña,4c051cccf423a593c079d216
3,Louis Vuitton,Boutique,"Paseo de Gracia, 80",Barcelona,41.394123,2.163205,Cataluña,4bc87cd98b7c9c74c22d38cf
4,Uterqüe,Design Studio,"Passeig de Gràcia, 65",Barcelona,41.393315,2.163331,Cataluña,4bc5cd266c26b71326bdebf3


In [76]:
new_df.name
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred of Eixample

# add a red circle marker to represent the center of the neighborhoods 
folium.vector_layers.CircleMarker(
    [latitude,longitude],
    radius=10,
    color='red',
    popup='Eixample',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the shops as blue circle markers
for lat, lng, label in zip(new_df.lat, new_df.lng, new_df.categories):
    folium.vector_layers.CircleMarker(
        [lat,lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map


The number of Accessory stores in my district area is 49. Now we're going to search for this characteristic in Toronto

# Gathering Toronto data

In [9]:
# Importing the neighbourhood data

link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df = pd.read_html(link)[0]
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


In [10]:
#discarding the Not Assigned
df = df[df['Borough'] != 'Not assigned']
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


In [11]:
#Separating with commas and eliminating duplicates
df['Neighbourhood'] = df.groupby("Postcode")['Neighbourhood'].transform(lambda neigh: ', '.join(neigh))
df.drop_duplicates(subset ="Postcode", inplace = True)
df.head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,"Lawrence Heights, Lawrence Manor"
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Queen's Park,Not assigned
10,M1B,Scarborough,"Rouge, Malvern"
13,M3B,North York,Don Mills North
14,M4B,East York,"Woodbine Gardens, Parkview Hill"
16,M5B,Downtown Toronto,"Ryerson, Garden District"


In [12]:
filename = 'https://cocl.us/Geospatial_data'

geodata = pd.read_csv(filename)
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
df.index = range(df.shape[0])
finaldf = df.merge(geodata, left_on = 'Postcode', right_on = 'Postal Code', how = 'left').drop (columns= 'Postal Code')
finaldf.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Not assigned,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


In [14]:
# Selecting the columns we need and selecting Toronto.
finaldf = df.merge(geodata, left_on = 'Postcode', right_on = 'Postal Code', how = 'left').drop (columns= 'Postal Code')
finaldf = finaldf[finaldf['Borough'].str.contains('Toronto', na = False)]
print(finaldf.shape)
finaldf.dropna()

(39, 5)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568
31,M6H,West Toronto,"Dovercourt Village, Dufferin",43.669005,-79.442259


In [15]:
# Creating a map of Toronto
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [60]:
lat = latitude
long = longitude

map_TOR = folium.Map(location=[lat, long], zoom_start=11)

for lat, long, label in zip(finaldf['Latitude'], finaldf['Longitude'], finaldf['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_TOR)  
    
map_TOR

## An iteration through all the district is performed to get the higher number of shops

In [17]:
# Initializing the number of shops
N_shop = []

In [38]:
for i in np.arange(37finaldf.shape[0]):

    # We choose to search by category with a 500m radius.
    radius = 500
    LIMIT = 100
    category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores

    latitude = finaldf['Latitude'][i]
    longitude = finaldf['Longitude'][i]

    # Define the corresponding URL
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, category_id, radius, LIMIT)

    # Send the GET Request
    results = requests.get(url).json()

    # Get relevant part of JSON and transform it into a pandas dataframe
    # assign relevant part of JSON to venues
    venues = results['response']['venues']

    # tranform venues into a dataframe
    dataframe = json_normalize(venues)
    dataframe.head()

    # keep only columns that include venue name, and anything that is associated with location
    filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
    dataframe_filtered = dataframe.loc[:, filtered_columns]

    # function that extracts the category of the venue
   
    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']

        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']
   

    # filter the category for each row
    dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean column names by keeping only last term
    dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
    
    print(str(i) + ') The number of shops in ' +finaldf['Neighbourhood'][i] + ' is ' +str(dataframe_filtered.shape[0]) + '\n')
    N_shop.append(dataframe_filtered.shape[0])


SyntaxError: invalid syntax (<ipython-input-38-45be47da0bdb>, line 1)

In [21]:
finaldf['Number of shops'] = N_shop
finaldf.head()

ValueError: Length of values does not match length of index

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 23)