# THE BATTLE OF NEIGHBORHOODS FROM COURSERA CAPSTONE

By Gastón Costa

# Introduction

The problem:

Hello, my name is Gastón. I am from Argentina but I have been living in Barcelona for 10 years. My wife has a fashion design project. She has a store in El Eixample, a very fashionable neighborhood and where there are many small entrepreneurs related to design.

We will be visiting New York and we believe it is a good possibility to contact new partners and to expand the business. But we have never been to this city and we don't know where to find similar neighborhoods.

Business Model:

Through this work, I want to analyze the different neighborhoods of New York and look for potential partners / clients for design entrepreneurship.

# Data

Description of the data and how it will be used to solve the problem
The following data will be used:

- List of Boroughs and neighborhoods of Barcelona with their geodata;
- List of Boroughs and neighborhoods of New York with their geodata;
- List of place whit gluten free description of Barcelona with their geodata;
- List of place whit gluten free description of New York with their geodata.

# Data Sources

- Boroughs and neighborhoods of Barcelona from Wikipedia (https://es.wikipedia.org/wiki/Distritos_de_Barcelona);
- Boroughs and neighborhoods of New York from Wikipedia (https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City);
- Geocode information from Geopy;
- Accesories Shops in Barcelona and Toronto from Foursquare

# Methodology


- The neighborhood where I live in Barcelona will be individualized
- Stores related to design and accessories will be explored
- New York's various neighborhoods will be analyzed and a comparison will be made to find the best neighborhood to explore on our visit.

# Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install folium
import folium # map rendering library
from folium import plugins

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans



# First: getting the data of Barcelona and processing them

In [2]:
link_barcelona = 'https://es.wikipedia.org/wiki/Distritos_de_Barcelona'
Barna= pd.read_html(link_barcelona)[0]
Barna

Unnamed: 0,Nº,Distrito,Imagen,Superficie km²[1]​,Población (2016)[2]​,Densidad hab/km²,Barrios (nº),Regidor
0,1,Ciudad Vieja,,411,100 070,"22 424,28","El Raval (1), Barrio Gótico (2), La Barcelonet...",Gala Pin Ferrando (Barcelona en Comú)
1,2,Ensanche,,746,264 305,"35 330,43","El Fort Pienc (5), Sagrada Familia (6), Dreta ...",Agustí Colom Cabau (Barcelona en Comú)
2,3,Sants-Montjuïc,,2268,180 977,846951,"Pueblo Seco (11), La Marina del Prat Vermell (...",Jaume Asens Llodrà (Barcelona en Comú)
3,4,Les Corts,,602,81 642,"13 355,26","Les Corts (19), La Maternidad y San Ramón (20)...",Laura Pérez Castaño (Barcelona en Comú)
4,5,Sarriá-San Gervasio,,1990,148 026,725540,"Vallvidrera, el Tibidabo i les Planes (22), Sa...",Albert Batlle (Units per Avançar)
5,6,Gracia,,419,120 918,"28 704,77","Vallcarca y los Penitentes (28), El Coll (29),...",Raimundo Viejo (Barcelona en Comú)
6,7,Horta - Guinardó,,1196,167 268,"13 959,03","Baix Guinardó (33), Can Baró (34), El Guinardó...",Mercedes Vidal Lago (Barcelona en Comú)
7,8,Nou Barris,,805,164 881,"20 462,19","Vilapicina y La Torre Llobeta (44), Porta (45)...",Janet Sanz Cid (Iniciativa per Catalunya - Verds)
8,9,San Andrés,,659,146 731,"22 253,51","La Trinitat Vella (57), Baró de Viver (58), El...",Laia Ortiz i Castellví (Iniciativa per Catalun...
9,10,San Martín,,1039,233 928,"21 539,72","El Campo del Arpa del Clot (64), El Clot (65),...",Josep Maria Montaner Martorell (Barcelona en C...


In [3]:
#Selecting and change the names of columns
Barna = Barna[['Distrito','Barrios (nº)']]
Barna.rename(columns = {'Distrito': 'District', 'Barrios (nº)': 'Neighbourhood'}, inplace = True)
Barna

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,District,Neighbourhood
0,Ciudad Vieja,"El Raval (1), Barrio Gótico (2), La Barcelonet..."
1,Ensanche,"El Fort Pienc (5), Sagrada Familia (6), Dreta ..."
2,Sants-Montjuïc,"Pueblo Seco (11), La Marina del Prat Vermell (..."
3,Les Corts,"Les Corts (19), La Maternidad y San Ramón (20)..."
4,Sarriá-San Gervasio,"Vallvidrera, el Tibidabo i les Planes (22), Sa..."
5,Gracia,"Vallcarca y los Penitentes (28), El Coll (29),..."
6,Horta - Guinardó,"Baix Guinardó (33), Can Baró (34), El Guinardó..."
7,Nou Barris,"Vilapicina y La Torre Llobeta (44), Porta (45)..."
8,San Andrés,"La Trinitat Vella (57), Baró de Viver (58), El..."
9,San Martín,"El Campo del Arpa del Clot (64), El Clot (65),..."


In Barcelona the neighborhoods are very small, so I will do the analysis on the district of Eixample


In [4]:
# Selecting my district
My_district = Barna[Barna['District'].str.contains('Ensanche', na = False)]
My_district

Unnamed: 0,District,Neighbourhood
1,Ensanche,"El Fort Pienc (5), Sagrada Familia (6), Dreta ..."


In [5]:
My_district.District = My_district.District.replace({"Ensanche": "Eixample"})
My_district

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


Unnamed: 0,District,Neighbourhood
1,Eixample,"El Fort Pienc (5), Sagrada Familia (6), Dreta ..."


In [6]:
 #Getting latitude and longitude of my district
address = My_district['District'].values[0] + ', Barna'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of my district are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of my district are 41.3933942, 2.1660849840426866.


In [7]:
# Creating a map of Milan and finding my district.
map_my_district = folium.Map(location=[latitude, longitude], zoom_start= 15)

folium.CircleMarker(
    [latitude, longitude],
    radius=5,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_my_district)

map_my_district

# Extracting the number of shops in the district area

In [8]:
# Date from foursquare 
CLIENT_ID = '50W2M4K3ZWEVPUIT5GFGACTGJW1L0MBD34ICNCDRRCAF1XOR' # your Foursquare ID
CLIENT_SECRET = '1T4YYBZPYOIMVGGB5EE0QE05IDF3JY2ZCSMGSQGZDCLMPRDQ' # your Foursquare Secret
VERSION = '20191206'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 50W2M4K3ZWEVPUIT5GFGACTGJW1L0MBD34ICNCDRRCAF1XOR
CLIENT_SECRET:1T4YYBZPYOIMVGGB5EE0QE05IDF3JY2ZCSMGSQGZDCLMPRDQ


In [9]:
# We choose to search by category with a 600m radius.
radius = 600
LIMIT = 100
category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores

# Define the corresponding URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, category_id, radius, LIMIT)

# Send the GET Request
results = requests.get(url).json()

# Get relevant part of JSON and transform it into a pandas dataframe
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Bimba & Lola,Women's Store,"Valencia, 272",ES,Barcelona,España,,193,"[Valencia, 272, 08007 Barcelona Cataluña, España]","[{'label': 'display', 'lat': 41.39278077769239...",41.392781,2.163918,8007.0,Cataluña,4cdecc8c3644a09375574e9f
1,Desigual,Clothing Store,"Passeig de Gràcia, 47",ES,Barcelona,España,C. Aragó,198,"[Passeig de Gràcia, 47 (C. Aragó), 08007 Barce...","[{'label': 'display', 'lat': 41.39193652224534...",41.391937,2.164725,8007.0,Cataluña,4c051cccf423a593c079d216
2,Loewe,Accessories Store,"Passeig de Gràcia, 35",ES,Barcelona,España,,231,"[Passeig de Gràcia, 35, 08007 Barcelona Catalu...","[{'label': 'display', 'lat': 41.39134354229618...",41.391344,2.165631,8007.0,Cataluña,4adcda5af964a5203f4421e3
3,Louis Vuitton,Boutique,"Paseo de Gracia, 80",ES,Barcelona,España,,253,"[Paseo de Gracia, 80, 08008 Barcelona Cataluña...","[{'label': 'display', 'lat': 41.3941229, 'lng'...",41.394123,2.163205,8008.0,Cataluña,4bc87cd98b7c9c74c22d38cf
4,Chapó Bulevard,Accessories Store,"Passeig de Gracia, 55",ES,Barcelona,España,,418,"[Passeig de Gracia, 55, 08008 Barcelona Catalu...","[{'label': 'display', 'lat': 41.39106580589006...",41.391066,2.162146,8008.0,Cataluña,4fc8fdfce4b048cbd2823aed
5,Soloio,Men's Store,Rambla Catalunya 109,ES,Barcelona,España,Provença,556,"[Rambla Catalunya 109 (Provença), 08008 Barcel...","[{'label': 'display', 'lat': 41.393785, 'lng':...",41.393785,2.159437,8008.0,Cataluña,5a52816f2be4251b73924bed
6,Hermès,Boutique,"Paseo de Gracia, 77",ES,Barcelona,España,,315,"[Paseo de Gracia, 77, 08008 Barcelona Cataluña...","[{'label': 'display', 'lat': 41.39396625948295...",41.393966,2.162387,8008.0,Cataluña,4c0148b8cf3aa593b8c4ccb0
7,Biba,Accessories Store,Consell de Cent,ES,,España,,144,"[Consell de Cent, España]","[{'label': 'display', 'lat': 41.39233586461486...",41.392336,2.165078,,,4ef1c5f56c253bf12305e95d
8,Uterqüe,Design Studio,"Passeig de Gràcia, 65",ES,Barcelona,España,València,230,"[Passeig de Gràcia, 65 (València), 08008 Barce...","[{'label': 'display', 'lat': 41.39331530006156...",41.393315,2.163331,8008.0,Cataluña,4bc5cd266c26b71326bdebf3
9,Natura Selection,Accessories Store,Consejo de Ciento 304,ES,Barcelona,España,,309,"[Consejo de Ciento 304, 08007 Barcelona Catalu...","[{'label': 'display', 'lat': 41.39068750234346...",41.390688,2.165236,8007.0,Cataluña,4adcda5bf964a520544421e3


In [10]:
print('The total number of shops in an area of 1000 is: ' +str(dataframe_filtered.shape[0]))

The total number of shops in an area of 1000 is: 47


In [11]:
new_df = dataframe_filtered.drop(['labeledLatLngs','cc','formattedAddress','distance', 'postalCode','crossStreet',
                                  'country'], axis=1)
new_df.head()

Unnamed: 0,name,categories,address,city,lat,lng,state,id
0,Bimba & Lola,Women's Store,"Valencia, 272",Barcelona,41.392781,2.163918,Cataluña,4cdecc8c3644a09375574e9f
1,Desigual,Clothing Store,"Passeig de Gràcia, 47",Barcelona,41.391937,2.164725,Cataluña,4c051cccf423a593c079d216
2,Loewe,Accessories Store,"Passeig de Gràcia, 35",Barcelona,41.391344,2.165631,Cataluña,4adcda5af964a5203f4421e3
3,Louis Vuitton,Boutique,"Paseo de Gracia, 80",Barcelona,41.394123,2.163205,Cataluña,4bc87cd98b7c9c74c22d38cf
4,Chapó Bulevard,Accessories Store,"Passeig de Gracia, 55",Barcelona,41.391066,2.162146,Cataluña,4fc8fdfce4b048cbd2823aed


In [12]:
new_df.name
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred of Eixample

# add a red circle marker to represent the center of the neighborhoods 
folium.vector_layers.CircleMarker(
    [latitude,longitude],
    radius=10,
    color='red',
    popup='Eixample',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the shops as blue circle markers
for lat, lng, label in zip(new_df.lat, new_df.lng, new_df.categories):
    folium.vector_layers.CircleMarker(
        [lat,lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map


The number of Accessory stores in my district area is 49. Now we're going to search for this characteristic in Toronto

# Gathering New York data

In [13]:
# Importing the neighbourhood data

link_ny = 'https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City'

NY = pd.read_html(link_ny)[0]
NY.head()

Unnamed: 0,Community Board(CB),Areakm2,Pop.Census2010,Pop./km2,Neighborhoods
0,Bronx CB 1,7.17,91497,12761,"Melrose, Mott Haven, Port Morris"
1,Bronx CB 2,5.54,52246,9792,"Hunts Point, Longwood"
2,Bronx CB 3,4.07,79762,19598,"Claremont, Concourse Village, Crotona Park, Mo..."
3,Bronx CB 4,5.28,146441,27735,"Concourse, Highbridge"
4,Bronx CB 5,3.55,128200,36145,"Fordham, Morris Heights, Mount Hope, Universit..."


In [14]:
# Selecting the columns we need and selecting manhattan.
NY = NY[['Community Board(CB)','Neighborhoods']]

NY.rename(columns = {'Community Board(CB)' : 'Borough'}, inplace = True)

NY = NY[NY['Borough'].str.contains('Manhattan', na = False)]
NY.reset_index(drop = True, inplace = True)
NY.head()

Unnamed: 0,Borough,Neighborhoods
0,Manhattan CB 1,"Battery Park City, Financial District, Tribeca"
1,Manhattan CB 2,"Chinatown, Greenwich Village, Little Italy, Lo..."
2,Manhattan CB 3,"Alphabet City, Chinatown, East Village, Lower ..."
3,Manhattan CB 4,"Chelsea, Clinton, Hell's Kitchen, Hudson Yards"
4,Manhattan CB 5,Midtown


In [15]:
# Separating the districts in different rows
NY = pd.DataFrame(NY.Neighborhoods.str.split(',').tolist(), index=NY.Borough.astype('object')).stack()

NY = NY.reset_index([0, 'Borough'])

NY.columns = ['Borough', 'District']
print(NY.shape)
NY.head()

(48, 2)


Unnamed: 0,Borough,District
0,Manhattan CB 1,Battery Park City
1,Manhattan CB 1,Financial District
2,Manhattan CB 1,Tribeca
3,Manhattan CB 2,Chinatown
4,Manhattan CB 2,Greenwich Village


In [16]:
NY.drop([18,44],inplace=True)

In [17]:
# Getting latitude and longitude NY 
lat = []
long = []

In [18]:
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds= 1)

In [19]:
for i in np.arange(0,NY.shape[0]):
    address = NY['District'].values[i] + ', New York'
    geolocator = Nominatim(user_agent="specify_your_app_name_here",timeout=None)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print(str(i)+'The geograpical coordinate of ' +address +' are {}, {}.'.format(latitude, longitude))
    lat.append(latitude)
    long.append(longitude)

0The geograpical coordinate of Battery Park City, New York are 40.7110166, -74.0169369.
1The geograpical coordinate of  Financial District, New York are 40.7076124, -74.009378.
2The geograpical coordinate of  Tribeca, New York are 40.7153802, -74.0093063.
3The geograpical coordinate of Chinatown, New York are 40.7164913, -73.9962504.
4The geograpical coordinate of  Greenwich Village, New York are 40.7319802, -73.9965658.
5The geograpical coordinate of  Little Italy, New York are 40.7192728, -73.9982152.
6The geograpical coordinate of  Lower East Side, New York are 40.7159357, -73.9868057.
7The geograpical coordinate of  NoHo, New York are 40.7258746, -73.9939566.
8The geograpical coordinate of  SoHo, New York are 40.72288, -73.9987505.
9The geograpical coordinate of  West Village, New York are 40.7341857, -74.00558.
10The geograpical coordinate of Alphabet City, New York are 40.7251022, -73.9795833.
11The geograpical coordinate of  Chinatown, New York are 40.7164913, -73.9962504.
12The

In [20]:
# Inserting latitude and longitude in the DF
NY['latitude'] = lat
NY['longitude'] = long
NY.head()

Unnamed: 0,Borough,District,latitude,longitude
0,Manhattan CB 1,Battery Park City,40.711017,-74.016937
1,Manhattan CB 1,Financial District,40.707612,-74.009378
2,Manhattan CB 1,Tribeca,40.71538,-74.009306
3,Manhattan CB 2,Chinatown,40.716491,-73.99625
4,Manhattan CB 2,Greenwich Village,40.73198,-73.996566


In [21]:
# Creating a map of NY.

lat = 40.73
long = -73.98

map_NY = folium.Map(location=[lat, long], zoom_start=11)

for lat, long, label in zip(NY['latitude'], NY['longitude'], NY['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NY)  
    
map_NY

## An iteration through all the district is performed to get the higher number of shops

In [22]:
# Initializing the number of shops
N_shop = []

44.7278943


In [27]:
for i in np.arange(0, NY.shape[0]):

    # We choose to search by category with a 500m radius.
    radius = 600
    LIMIT = 200
    category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores

    latitude = NY['latitude'][i]
    longitude = NY['longitude'][i]
  # Define the corresponding URL
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, category_id, radius, LIMIT)

    # Send the GET Request
    results = requests.get(url).json()

    # Get relevant part of JSON and transform it into a pandas dataframe
    # assign relevant part of JSON to venues
    venues = results['response']['venues']

    # tranform venues into a dataframe
    dataframe = json_normalize(venues)
    dataframe.head()
    
# keep only columns that include venue name, and anything that is associated with location
    filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
    dataframe_filtered = dataframe.loc[:, filtered_columns]
    

    # function that extracts the category of the venue
    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']

        if len(categories_list) == 0:
            return None
        
        else:
            return categories_list[0]['name']

    # filter the category for each row
    dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean column names by keeping only last term
    dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
    
    print(str(i) + ') The number of shops in ' +NY['District'][i] + ' is ' +str(dataframe_filtered.shape[0]) + '\n')
    N_shop.append(dataframe_filtered.shape[0])


0) The number of shops in Battery Park City is 15

1) The number of shops in  Financial District is 18

2) The number of shops in  Tribeca is 20

3) The number of shops in Chinatown is 39

4) The number of shops in  Greenwich Village is 42

5) The number of shops in  Little Italy is 50

6) The number of shops in  Lower East Side is 18

7) The number of shops in  NoHo is 49

8) The number of shops in  SoHo is 50

9) The number of shops in  West Village is 34

10) The number of shops in Alphabet City is 7

11) The number of shops in  Chinatown is 39

12) The number of shops in  East Village is 20

13) The number of shops in  Lower East Side is 18

14) The number of shops in  Two Bridges is 3

15) The number of shops in Chelsea is 14



KeyError: "None of [Index(['name', 'categories', 'id'], dtype='object')] are in the [columns]"

In [None]:
NY['Number of Shops'] = N_shop
NY.head()

###### 