## Obtain data about Paris neighborhoods in Wikipedia

Le quartier désigne la division administrative de l’arrondissement. Chaque arrondissement est découpé
en quatre quartiers administratifs. Paris compte ainsi quatre-vingt quartiers administratifs.

### Import libraries

In [112]:
import pandas as pd
print('Pandas library imported.')
import requests
print('Requests library imported.')
from bs4 import BeautifulSoup
print('BeautifulSoup library imported.')

print('All libraries imported.')

Pandas library imported.
Requests library imported.
BeautifulSoup library imported.
All libraries imported.


### Scrape the Wikipedia page and transform it into a dataframe with *BeautifulSoup* and *Pandas*

Scrape the wikipedia page about the neighborhoods of Paris: https://en.wikipedia.org/wiki/Quarters_of_Paris

In [113]:
result = requests.get("https://en.wikipedia.org/wiki/Quarters_of_Paris")
soup = BeautifulSoup(result.content,'lxml')
table = soup.find_all('table')[0] 
paris_neighborhoods = pd.read_html(str(table))[0]

print('The dataframe shape is {}.'.format(paris_neighborhoods.shape))

print('The five first rows are:')
paris_neighborhoods.head()

The dataframe shape is (80, 6).
The five first rows are:


Unnamed: 0,Arrondissement(Districts),Quartiers(Quarters),Quartiers(Quarters).1,Population in1999[3],Area(hectares)[3],Map
0,"1st arrondissement(Called ""du Louvre"")",1st,Saint-Germain-l'Auxerrois,1672,86.9,
1,"1st arrondissement(Called ""du Louvre"")",2nd,Les Halles,8984,41.2,
2,"1st arrondissement(Called ""du Louvre"")",3rd,Palais-Royal,3195,27.4,
3,"1st arrondissement(Called ""du Louvre"")",4th,Place-Vendôme,3044,26.9,
4,"2nd arrondissement(Called ""de la Bourse"")",5th,Gaillon,1345,18.8,


### Cleaning the dataframe

Drop the "Map" column which is empty

In [114]:
paris_neighborhoods.drop(['Map'], axis='columns', inplace=True)

paris_neighborhoods.head()

Unnamed: 0,Arrondissement(Districts),Quartiers(Quarters),Quartiers(Quarters).1,Population in1999[3],Area(hectares)[3]
0,"1st arrondissement(Called ""du Louvre"")",1st,Saint-Germain-l'Auxerrois,1672,86.9
1,"1st arrondissement(Called ""du Louvre"")",2nd,Les Halles,8984,41.2
2,"1st arrondissement(Called ""du Louvre"")",3rd,Palais-Royal,3195,27.4
3,"1st arrondissement(Called ""du Louvre"")",4th,Place-Vendôme,3044,26.9
4,"2nd arrondissement(Called ""de la Bourse"")",5th,Gaillon,1345,18.8


Rename the columns

In [115]:
paris_neighborhoods.columns = ['District', 'NeighborhoodCode', 'Neighborhood', 'Population (1999)', 'Area (ha)']

paris_neighborhoods.head()

Unnamed: 0,District,NeighborhoodCode,Neighborhood,Population (1999),Area (ha)
0,"1st arrondissement(Called ""du Louvre"")",1st,Saint-Germain-l'Auxerrois,1672,86.9
1,"1st arrondissement(Called ""du Louvre"")",2nd,Les Halles,8984,41.2
2,"1st arrondissement(Called ""du Louvre"")",3rd,Palais-Royal,3195,27.4
3,"1st arrondissement(Called ""du Louvre"")",4th,Place-Vendôme,3044,26.9
4,"2nd arrondissement(Called ""de la Bourse"")",5th,Gaillon,1345,18.8


Split the district official name (number of the "arrondissement") and the "also called" named (which is not commonly used in Paris)

In [116]:
paris_neighborhoods[['District', 'DistrictName']] = paris_neighborhoods.District.str.split("(",expand=True,)

paris_neighborhoods.head()

Unnamed: 0,District,NeighborhoodCode,Neighborhood,Population (1999),Area (ha),DistrictName
0,1st arrondissement,1st,Saint-Germain-l'Auxerrois,1672,86.9,"Called ""du Louvre"")"
1,1st arrondissement,2nd,Les Halles,8984,41.2,"Called ""du Louvre"")"
2,1st arrondissement,3rd,Palais-Royal,3195,27.4,"Called ""du Louvre"")"
3,1st arrondissement,4th,Place-Vendôme,3044,26.9,"Called ""du Louvre"")"
4,2nd arrondissement,5th,Gaillon,1345,18.8,"Called ""de la Bourse"")"


Make the district name more beautiful

In [117]:
paris_neighborhoods[['DistrictName']] = paris_neighborhoods.DistrictName.str.replace('Called \"', 'Arrondissement ')
paris_neighborhoods[['DistrictName']] = paris_neighborhoods.DistrictName.str.replace("\"\)", '')

paris_neighborhoods.head()

Unnamed: 0,District,NeighborhoodCode,Neighborhood,Population (1999),Area (ha),DistrictName
0,1st arrondissement,1st,Saint-Germain-l'Auxerrois,1672,86.9,Arrondissement du Louvre
1,1st arrondissement,2nd,Les Halles,8984,41.2,Arrondissement du Louvre
2,1st arrondissement,3rd,Palais-Royal,3195,27.4,Arrondissement du Louvre
3,1st arrondissement,4th,Place-Vendôme,3044,26.9,Arrondissement du Louvre
4,2nd arrondissement,5th,Gaillon,1345,18.8,Arrondissement de la Bourse


Remove letters from the neighborhood code

In [118]:
paris_neighborhoods[['NeighborhoodCode']] = paris_neighborhoods.NeighborhoodCode.str[:-2]

paris_neighborhoods.head()

Unnamed: 0,District,NeighborhoodCode,Neighborhood,Population (1999),Area (ha),DistrictName
0,1st arrondissement,1,Saint-Germain-l'Auxerrois,1672,86.9,Arrondissement du Louvre
1,1st arrondissement,2,Les Halles,8984,41.2,Arrondissement du Louvre
2,1st arrondissement,3,Palais-Royal,3195,27.4,Arrondissement du Louvre
3,1st arrondissement,4,Place-Vendôme,3044,26.9,Arrondissement du Louvre
4,2nd arrondissement,5,Gaillon,1345,18.8,Arrondissement de la Bourse


Add the postal code

In [119]:
paris_neighborhoods['PostalCode'] = '750' + paris_neighborhoods.District.str.split(' ').str[0].str[:-2].str.rjust(2, '0')

paris_neighborhoods.head()

Unnamed: 0,District,NeighborhoodCode,Neighborhood,Population (1999),Area (ha),DistrictName,PostalCode
0,1st arrondissement,1,Saint-Germain-l'Auxerrois,1672,86.9,Arrondissement du Louvre,75001
1,1st arrondissement,2,Les Halles,8984,41.2,Arrondissement du Louvre,75001
2,1st arrondissement,3,Palais-Royal,3195,27.4,Arrondissement du Louvre,75001
3,1st arrondissement,4,Place-Vendôme,3044,26.9,Arrondissement du Louvre,75001
4,2nd arrondissement,5,Gaillon,1345,18.8,Arrondissement de la Bourse,75002


Reorganize columns

In [120]:
paris_neighborhoods = paris_neighborhoods[['PostalCode', 'District', 'DistrictName', 'NeighborhoodCode', 'Neighborhood', 'Population (1999)', 'Area (ha)']]

paris_neighborhoods.head()

Unnamed: 0,PostalCode,District,DistrictName,NeighborhoodCode,Neighborhood,Population (1999),Area (ha)
0,75001,1st arrondissement,Arrondissement du Louvre,1,Saint-Germain-l'Auxerrois,1672,86.9
1,75001,1st arrondissement,Arrondissement du Louvre,2,Les Halles,8984,41.2
2,75001,1st arrondissement,Arrondissement du Louvre,3,Palais-Royal,3195,27.4
3,75001,1st arrondissement,Arrondissement du Louvre,4,Place-Vendôme,3044,26.9
4,75002,2nd arrondissement,Arrondissement de la Bourse,5,Gaillon,1345,18.8


In [None]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

Solving environment: \ 

In [None]:
!wget --quiet https://opendata.paris.fr/explore/dataset/quartier_paris/download/?format=geojson&timezone=Europe/Berlin&lang=fr -O paris_neighborhood_geo.json

print('GeoJSON file downloaded!')

In [None]:
paris_neighborhood_geo = r'paris_neighborhood_geo.json' # geojson file

# create a plain world map
paris_map = folium.Map(location=[48.866667, 2.333333], zoom_start=10, tiles='Mapbox Bright')

In [None]:
# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
paris_map.choropleth(
    geo_data=paris_neighborhood_geo,
    data=paris_neighborhood,
    columns=['NeighborhoodCode', 'Population (1999)	'],
    key_on='feature.properties.c_qu',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Neighborhoods of Paris'
)

# display map
paris_map