# Collecting the Neigbourhoods of Amsterdam

In this notebook we collect the neighbourhoods of Amsterdam from a wikepedia webpage. <br>
After cleaning the Neighbourhood data, it will be enriched with the geographical coordinates.

## Importing libraries

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

from bs4 import BeautifulSoup # Library for scraping webpage
from IPython.display import display_html # Library for displaying HTML

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopy.extra.rate_limiter import RateLimiter # ratelimiter for stopping if it takes to long to get the geocodes

# Library for saving en reading data from the project
#from project_lib import Project

!pip install folium 
import folium # plotting library

print('Importing ready!')

Importing ready!


## Retreive neighbourhoods of Amsterdam from Wikipedia webpage

In [2]:
# Get webpage
source = requests.get('https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Amsterdam').text
# Scrape webpage
soup = BeautifulSoup(source,'lxml')
# Check title of webpage
print(soup.title)
# Collect all list items from div class in a list
list = []
for item in soup.findAll('div',{'class':'mw-category-group'}):
    sub_items = item.findAll('li')
    for sub_item in sub_items:
        list.append(['Amsterdam',sub_item.text])
        

list

<title>Category:Neighbourhoods of Amsterdam - Wikipedia</title>


[['Amsterdam', 'Template:Neighborhoods of Amsterdam'],
 ['Amsterdam', 'Admiralenbuurt'],
 ['Amsterdam', 'Amsteldorp'],
 ['Amsterdam', 'Amsterdam Oud-West'],
 ['Amsterdam', 'Amsterdam Oud-Zuid'],
 ['Amsterdam', 'Amsterdam Science Park'],
 ['Amsterdam', 'Apollobuurt'],
 ['Amsterdam', 'Betondorp'],
 ['Amsterdam', 'Bijlmermeer'],
 ['Amsterdam', 'Binnenstad (Amsterdam)'],
 ['Amsterdam', 'Bos en Lommer'],
 ['Amsterdam', 'Buiksloot'],
 ['Amsterdam', 'Buikslotermeer'],
 ['Amsterdam', 'Buitenveldert'],
 ['Amsterdam', 'Bullewijk'],
 ['Amsterdam', 'Burgwallen Nieuwe Zijde'],
 ['Amsterdam', 'Burgwallen Oude Zijde'],
 ['Amsterdam', 'Chassébuurt'],
 ['Amsterdam', 'Cruquiuseiland'],
 ['Amsterdam', 'Czaar Peterbuurt'],
 ['Amsterdam', 'Dapperbuurt'],
 ['Amsterdam', 'De Aker'],
 ['Amsterdam', 'De Pijp'],
 ['Amsterdam', 'De Wallen'],
 ['Amsterdam', 'Diamantbuurt (Amsterdam)'],
 ['Amsterdam', 'Duivelseiland (Amsterdam)'],
 ['Amsterdam', 'Eastern Docklands'],
 ['Amsterdam', 'Eendracht (Amsterdam)'],
 ['Ams

In [3]:
# Create a dataframe with the Neighbourhoods of Amsterdam
df=pd.DataFrame(list,columns=['City','Neighbourhood'])
df["Neighbourhood"]  = df["Neighbourhood"].str.strip()
df

Unnamed: 0,City,Neighbourhood
0,Amsterdam,Template:Neighborhoods of Amsterdam
1,Amsterdam,Admiralenbuurt
2,Amsterdam,Amsteldorp
3,Amsterdam,Amsterdam Oud-West
4,Amsterdam,Amsterdam Oud-Zuid
...,...,...
102,Amsterdam,Westerpark (neighbourhood)
103,Amsterdam,Willemspark (Amsterdam)
104,Amsterdam,Zeeburgereiland
105,Amsterdam,Zeeheldenbuurt


In [4]:
df.shape

(107, 2)

### Cleaning en preparing the neighbourhoods

#### Remove rows with 'Template' in the Neighbourhood

In [5]:
# The first row is a title wich begins with 'Template', so we check how many rows also do
df.loc[df.Neighbourhood.str.startswith('Template'), 'Neighbourhood'].count() 

1

In [6]:
# Remove the row that begins with 'Template'
df1 = df[~df.Neighbourhood.str.contains('Template')]
df1.head()

Unnamed: 0,City,Neighbourhood
1,Amsterdam,Admiralenbuurt
2,Amsterdam,Amsteldorp
3,Amsterdam,Amsterdam Oud-West
4,Amsterdam,Amsterdam Oud-Zuid
5,Amsterdam,Amsterdam Science Park


In [7]:
df1.shape

(106, 2)

#### Remove text between brackets () in the column Neigbourhood

In [8]:
# Count the number of Neighbourhoods with a starting bracket (
df1.loc[df1.Neighbourhood.str.contains('\('), 'Neighbourhood'].count()

19

In [9]:
# Take the part of the Neighbourhood before the starting bracket (
df1['Neighbourhood'] = df1['Neighbourhood'].str.split('\(').str[0]
df1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['Neighbourhood'] = df1['Neighbourhood'].str.split('\(').str[0]


Unnamed: 0,City,Neighbourhood
1,Amsterdam,Admiralenbuurt
2,Amsterdam,Amsteldorp
3,Amsterdam,Amsterdam Oud-West
4,Amsterdam,Amsterdam Oud-Zuid
5,Amsterdam,Amsterdam Science Park
...,...,...
102,Amsterdam,Westerpark
103,Amsterdam,Willemspark
104,Amsterdam,Zeeburgereiland
105,Amsterdam,Zeeheldenbuurt


In [10]:
# Check if there are any ending brackets ) in the Neighbourhood
df1.loc[df1.Neighbourhood.str.contains('\)'), 'Neighbourhood'].count()

0

In [11]:
# Create a column 'Address' for getting the geographical coordinates
df1["Address"] = df1["Neighbourhood"] + ', ' +  df1["City"]
df1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1["Address"] = df1["Neighbourhood"] + ', ' +  df1["City"]


Unnamed: 0,City,Neighbourhood,Address
1,Amsterdam,Admiralenbuurt,"Admiralenbuurt, Amsterdam"
2,Amsterdam,Amsteldorp,"Amsteldorp, Amsterdam"
3,Amsterdam,Amsterdam Oud-West,"Amsterdam Oud-West, Amsterdam"
4,Amsterdam,Amsterdam Oud-Zuid,"Amsterdam Oud-Zuid, Amsterdam"
5,Amsterdam,Amsterdam Science Park,"Amsterdam Science Park, Amsterdam"
...,...,...,...
102,Amsterdam,Westerpark,"Westerpark , Amsterdam"
103,Amsterdam,Willemspark,"Willemspark , Amsterdam"
104,Amsterdam,Zeeburgereiland,"Zeeburgereiland, Amsterdam"
105,Amsterdam,Zeeheldenbuurt,"Zeeheldenbuurt, Amsterdam"


In [12]:
# Create a new dataframe with Neigbourhood data
df2 = df1.copy()
df2

Unnamed: 0,City,Neighbourhood,Address
1,Amsterdam,Admiralenbuurt,"Admiralenbuurt, Amsterdam"
2,Amsterdam,Amsteldorp,"Amsteldorp, Amsterdam"
3,Amsterdam,Amsterdam Oud-West,"Amsterdam Oud-West, Amsterdam"
4,Amsterdam,Amsterdam Oud-Zuid,"Amsterdam Oud-Zuid, Amsterdam"
5,Amsterdam,Amsterdam Science Park,"Amsterdam Science Park, Amsterdam"
...,...,...,...
102,Amsterdam,Westerpark,"Westerpark , Amsterdam"
103,Amsterdam,Willemspark,"Willemspark , Amsterdam"
104,Amsterdam,Zeeburgereiland,"Zeeburgereiland, Amsterdam"
105,Amsterdam,Zeeheldenbuurt,"Zeeheldenbuurt, Amsterdam"


## Collecting the geographical coordinates for the neighbourhoods of Amsterdam

In [13]:
# Get the Geographical coordinates of 1 neighboorhood, to check if the geolocator works
address = 'Amsteldorp, Amsterdam'

geolocator = Nominatim(user_agent="neighbourhoud_explorer")

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Amsterdam are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Amsterdam are 52.3443384, 4.9220313.


In [14]:
# 1 - convenient function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

In [15]:
# 2- - create location column
df2['location'] = df2['Address'].apply(geocode)

In [16]:
# 3 - create longitude, latitude and altitude from location column (returns tuple)
df2['point'] = df2['location'].apply(lambda loc: tuple(loc.point) if loc else None)

In [17]:
# Check for Neighbourhoods without geogrophical coordinates
print(df2.loc[df2["location"].isnull()].count())
df2.loc[df2["location"].isnull()]

City             15
Neighbourhood    15
Address          15
location          0
point             0
dtype: int64


Unnamed: 0,City,Neighbourhood,Address,location,point
1,Amsterdam,Admiralenbuurt,"Admiralenbuurt, Amsterdam",,
17,Amsterdam,Chassébuurt,"Chassébuurt, Amsterdam",,
35,Amsterdam,Hoofddorppleinbuurt,"Hoofddorppleinbuurt, Amsterdam",,
40,Amsterdam,Jodenbuurt,"Jodenbuurt, Amsterdam",,
46,Amsterdam,Kolenkit District,"Kolenkit District, Amsterdam",,
50,Amsterdam,Middelveldsche Akerpolder,"Middelveldsche Akerpolder, Amsterdam",,
58,Amsterdam,Nieuwendammerdijk en Buiksloterdijk,"Nieuwendammerdijk en Buiksloterdijk, Amsterdam",,
72,Amsterdam,Overtoombuurt,"Overtoombuurt, Amsterdam",,
75,Amsterdam,Prinses Irenebuurt,"Prinses Irenebuurt, Amsterdam",,
78,Amsterdam,Rieteilanden,"Rieteilanden, Amsterdam",,


In [18]:
# Remove the rows without geographical coordinats
df2.dropna(inplace=True)
df2.reset_index(drop=True, inplace=True)
df2

Unnamed: 0,City,Neighbourhood,Address,location,point
0,Amsterdam,Amsteldorp,"Amsteldorp, Amsterdam","(Huisarts Amsteldorp, Middelhoffstraat, Franke...","(52.3443384, 4.9220313, 0.0)"
1,Amsterdam,Amsterdam Oud-West,"Amsterdam Oud-West, Amsterdam","(HEMA Amsterdam-Kinkerstraat, 313, Kinkerstraa...","(52.3647387, 4.8630105, 0.0)"
2,Amsterdam,Amsterdam Oud-Zuid,"Amsterdam Oud-Zuid, Amsterdam","(Amsterdam-Oud Zuid, Ringweg-Zuid, Zuidas, Zui...","(52.3391253, 4.8661853, 0.0)"
3,Amsterdam,Amsterdam Science Park,"Amsterdam Science Park, Amsterdam","(Amsterdam Science Park, Kruislaan, Watergraaf...","(52.352926, 4.948315, 0.0)"
4,Amsterdam,Apollobuurt,"Apollobuurt, Amsterdam","(Apollobuurt, Zuid, Amsterdam, Noord-Holland, ...","(52.348072599999995, 4.875559011765657, 0.0)"
...,...,...,...,...,...
86,Amsterdam,Westerpark,"Westerpark , Amsterdam","(Westerpark, West, Amsterdam, Noord-Holland, N...","(52.387236349999995, 4.871777328438663, 0.0)"
87,Amsterdam,Willemspark,"Willemspark , Amsterdam","(Café Willemspark, 223, Willemsparkweg, Museum...","(52.3552537, 4.8683772, 0.0)"
88,Amsterdam,Zeeburgereiland,"Zeeburgereiland, Amsterdam","(Zeeburgereiland, Schellingwoude, Amsterdam, N...","(52.372608299999996, 4.965545531374505, 0.0)"
89,Amsterdam,Zeeheldenbuurt,"Zeeheldenbuurt, Amsterdam","(Zeeheldenbuurt, Amsterdam, Noord-Holland, Ned...","(52.389329849999996, 4.888242227776295, 0.0)"


In [19]:
df2.shape

(91, 5)

In [20]:
# 4 - split point column into latitude, longitude and altitude columns
df2[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df2['point'].tolist(), index=df2.index)
df2

Unnamed: 0,City,Neighbourhood,Address,location,point,latitude,longitude,altitude
0,Amsterdam,Amsteldorp,"Amsteldorp, Amsterdam","(Huisarts Amsteldorp, Middelhoffstraat, Franke...","(52.3443384, 4.9220313, 0.0)",52.344338,4.922031,0.0
1,Amsterdam,Amsterdam Oud-West,"Amsterdam Oud-West, Amsterdam","(HEMA Amsterdam-Kinkerstraat, 313, Kinkerstraa...","(52.3647387, 4.8630105, 0.0)",52.364739,4.863010,0.0
2,Amsterdam,Amsterdam Oud-Zuid,"Amsterdam Oud-Zuid, Amsterdam","(Amsterdam-Oud Zuid, Ringweg-Zuid, Zuidas, Zui...","(52.3391253, 4.8661853, 0.0)",52.339125,4.866185,0.0
3,Amsterdam,Amsterdam Science Park,"Amsterdam Science Park, Amsterdam","(Amsterdam Science Park, Kruislaan, Watergraaf...","(52.352926, 4.948315, 0.0)",52.352926,4.948315,0.0
4,Amsterdam,Apollobuurt,"Apollobuurt, Amsterdam","(Apollobuurt, Zuid, Amsterdam, Noord-Holland, ...","(52.348072599999995, 4.875559011765657, 0.0)",52.348073,4.875559,0.0
...,...,...,...,...,...,...,...,...
86,Amsterdam,Westerpark,"Westerpark , Amsterdam","(Westerpark, West, Amsterdam, Noord-Holland, N...","(52.387236349999995, 4.871777328438663, 0.0)",52.387236,4.871777,0.0
87,Amsterdam,Willemspark,"Willemspark , Amsterdam","(Café Willemspark, 223, Willemsparkweg, Museum...","(52.3552537, 4.8683772, 0.0)",52.355254,4.868377,0.0
88,Amsterdam,Zeeburgereiland,"Zeeburgereiland, Amsterdam","(Zeeburgereiland, Schellingwoude, Amsterdam, N...","(52.372608299999996, 4.965545531374505, 0.0)",52.372608,4.965546,0.0
89,Amsterdam,Zeeheldenbuurt,"Zeeheldenbuurt, Amsterdam","(Zeeheldenbuurt, Amsterdam, Noord-Holland, Ned...","(52.389329849999996, 4.888242227776295, 0.0)",52.389330,4.888242,0.0


## Create a map with neighbourhoods superimposed on top.

In [21]:
def getGeolocation(city):
    address = city
    geolocator = Nominatim(user_agent="city_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'.format(city, latitude, longitude))
            
    return [latitude, longitude]    

In [22]:
def printMap(dta, city, zoom):
    print(city)
    map = folium.Map(location=getGeolocation(city), zoom_start=zoom)
    
    data = dta[dta["City"] == city]
    
    # add markers to map
    for lat, lng, city, neighbourhood in zip(data['latitude'], data['longitude'], data['City'], data['Neighbourhood']):
        label = '{}, {}'.format(neighbourhood, city)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map)  
    
    return map    

In [23]:
printMap(df2, 'Amsterdam', 11)

Amsterdam
The geograpical coordinate of Amsterdam are 52.3727598, 4.8936041.


## Save Neighbourhood information in CSV

In [24]:
# @hidden_cell
token = 'p-7547ced92495ac0a4b7cff0670f4667f5c30ffb0'

In [25]:
# Run this cell only in IBM Watson in the cloud
# Create an access to this project
#project = Project.access(None,token,token)

# Save the collected Neighbourhoods and geographical data in project data bucket
#project.save_data(file_name="geo_amsterdam.csv", data=df2.to_csv(index=False))

In [26]:
# Save in same dir as Notebook
df2.to_csv('Neighbourhoods_of_Amsterdam.csv', index=False)
print('Geographical data are saved in Neighbourhoods_of_Amsterdam.csv')

Geographical data are saved in Neighbourhoods_of_Amsterdam.csv
