**"**Below are terms that guide the later description of resources, adapted from Zuk et al. (2015):
- **Neighborhood change**. Broad term used to capture the full spectrum of economic, racial or ethnic, and structural changes in a geographic area, both positive and negative. **Neighborhood revitalization** is a related term that implies change viewed as positive, usually accompanied by new public or private investment.
- **Gentrification**. Transformation of areas historically inhabited by marginalized groups, usually racial or ethnic or class groups, into areas used by the dominant class or racial or ethnic group. Usually characterized by increased investments in areas that have seen long-term disinvestment.
- **Displacement**. Forced or involuntary household movement from place of residence. Usually expanded beyond formal forced moves such as evictions to include unaffordable rents or poor living conditions. Displacement is distinct from residential mobility, which includes voluntary household movement.**"**


This study relays just a glimpse of the copious literacy available amounted by the word "gentrification" itself, therefore it should be considered as a mere Data Science student's exercise from a resident perspective rather than a profound one.
The objective aims to detect trends in historical data from Barcelona city. The choosen period spans from 2009 to 2019. This range was selected due to the next reasons:
1. Eleven years is a lapse long enough to reveal medium/long-term tendencies in a society


2. This period it's been chosen by its "inter-crisis" properties, begins a year after the 2008-SubPrime-crisis and lasts until a year before 2020-Covid crisis. 

With this in mind the chosen data consist of:
- Area in m<sup>2</sup> distributed by their uses in each neghbourhood.
- Price per neighbourhood (both €/m<sup>2</sup> and €/month).
- Airbnb data per neighbourhood.
- Internal Migration Rate per neighbourhood and per thousand people (rent contracts signed and expired contracts).
- Gini index showing the distribution of income inequality across each neighbourhood's population.


In [1]:
# file handling
import csv
import glob
import os
import requests
from PyPDF2 import PdfFileReader
from os import listdir
from os.path import isfile, join
import warnings
warnings.filterwarnings("ignore")

# text related tasks
from bs4 import BeautifulSoup
import regex as re

# dataframe related
import numpy as np
import pandas as pd 

# data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sb
import squarify as sq



## DataSets

___
### A) m<sup>2</sup> by Usage + hood
- Historical Data (2009-2019) of Barcelona area (in m<sup>2</sup>) by it's types of usage over districts and neighbourhoods. (\* Own elaboration from Source: Ajuntament Barcelona <sup>[6]</sup>)

In [2]:
a = pd.read_csv("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\locals_us_desti\\superficie_locals_us.csv")
display(a.info())
display(a)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 803 entries, 0 to 802
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   year                  803 non-null    int64  
 1   Dto. Barrios          803 non-null    object 
 2   SUPERFICIETOTAL (m2)  803 non-null    int64  
 3   Vivienda              803 non-null    int64  
 4   Aparcamiento          803 non-null    int64  
 5   Comercio              803 non-null    int64  
 6   Industria             803 non-null    int64  
 7   Oficinas              803 non-null    int64  
 8   Educación             803 non-null    int64  
 9   Sanidad               803 non-null    int64  
 10  Hostelería            803 non-null    int64  
 11  Deportivo             802 non-null    float64
 12  Religioso             803 non-null    int64  
 13  Espectáculos          802 non-null    float64
 14  Otros Usos            803 non-null    int64  
dtypes: float64(2), int64(12

None

Unnamed: 0,year,Dto. Barrios,SUPERFICIETOTAL (m2),Vivienda,Aparcamiento,Comercio,Industria,Oficinas,Educación,Sanidad,Hostelería,Deportivo,Religioso,Espectáculos,Otros Usos
0,2009,1 1. el Raval,2705870,1546988,160328,209740,156295,139212,169324,14885,202307,11446.0,29450,54312.0,11583
1,2009,1 2. el Barri Gòtic,2114277,933865,65002,218969,175900,297734,73075,6897,174594,2752.0,36749,18458.0,110282
2,2009,1 3. la Barceloneta,851315,426500,113036,60800,6285,85832,28139,39271,73385,13774.0,3627,0.0,666
3,2009,"1 4. Sant Pere, Santa Caterina i la Ribera",1915220,1054211,85397,140573,144849,186125,75812,4397,48923,725.0,12003,8435.0,153770
4,2009,2 5. el Fort Pienc,2074293,1270613,223804,127493,167355,132426,45689,10545,31743,20939.0,5986,30715.0,6985
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
798,2019,10 69. Diagonal Mar i el Front Marítim del Pob...,1518167,546633,372922,123387,64401,171627,17292,3027,161899,33687.0,2478,8838.0,8976
799,2019,10 70. el Besòs i el Maresme,1318638,617332,74738,32272,406291,46690,37320,21830,29496,19831.0,2927,23302.0,5828
800,2019,10 71. Provençals del Poblenou,1323832,690749,113981,67431,302946,101174,24142,1106,14921,5409.0,1862,0.0,111
801,2019,10 72. Sant Martí de Provençals,1239199,877469,91069,67049,55810,24738,53932,9272,6055,38075.0,7720,0.0,873


In [3]:
# Minor adjustments
a['Dto. Barrios'] = [re.sub(r"['.',0-9]+", "", x) for x in a['Dto. Barrios']]
a.columns = ['year', 'neighbourhood' , 'total_surface(m2)', 'housing(m2)', 'parking(m2)', 'commerce(m2)', 'industry(m2)', 'offices(m2)', 'education(m2)', 'healthcare(m2)', 'hostelry(m2)', 'sports(m2)', 'religious(m2)', 'entertainment(m2)', 'other_uses(m2)']
a['year'] =[int(x) for x in a['year']]
df_surface_uses = a
df_surface_uses['neighbourhood'] = [re.sub(r"lArpa", "l'Arpa", x) for x in df_surface_uses['neighbourhood']]
df_surface_uses['neighbourhood'] = [re.sub(r"lAntiga", "l'Antiga", x) for x in df_surface_uses['neighbourhood']]
df_surface_uses['neighbourhood'] = [re.sub(r"lEixample", "l'Eixample", x) for x in df_surface_uses['neighbourhood']]
df_surface_uses['neighbourhood'] = [re.sub(r"dHebron", "d'Hebron", x) for x in df_surface_uses['neighbourhood']]
df_surface_uses['neighbourhood'] = [re.sub(r"den", "d'en", x) for x in df_surface_uses['neighbourhood']]
df_surface_uses['neighbourhood'] = [re.sub(r"\s-\s.+Franca", "", x) for x in df_surface_uses['neighbourhood']]

# solve blank spaces problem
df_surface_uses['neighbourhood'] = [x.strip() for x in df_surface_uses['neighbourhood']]

# reorder df
df_surface_uses = df_surface_uses.sort_values(by=['year', 'neighbourhood'], ignore_index=True)
display(df_surface_uses)

Unnamed: 0,year,neighbourhood,total_surface(m2),housing(m2),parking(m2),commerce(m2),industry(m2),offices(m2),education(m2),healthcare(m2),hostelry(m2),sports(m2),religious(m2),entertainment(m2),other_uses(m2)
0,2009,Baró de Viver,110971,80163,10269,11414,2004,707,5883,0,0,0.0,224,0.0,307
1,2009,Can Baró,410247,316139,28780,22886,25952,3677,8613,1256,1778,1112.0,0,0.0,54
2,2009,Can Peguera,67195,54156,3910,2907,533,0,5315,0,0,0.0,374,0.0,0
3,2009,Canyelles,303783,240025,8142,9562,22129,7034,7828,1768,628,6198.0,105,0.0,364
4,2009,Ciutat Meridiana,277044,233846,2009,10199,7813,2279,15732,456,2486,1232.0,992,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
798,2019,la Vila Olímpica del Poblenou,1125546,421411,209949,79469,138461,85870,43615,10741,38681,7525.0,736,0.0,67582
799,2019,la Vila de Gràcia,3422349,2228680,258600,298087,154683,164726,58427,49120,129114,11245.0,45463,20326.0,11
800,2019,les Corts,3889778,1961694,650576,342885,217246,413194,57357,82776,112409,24703.0,13155,7837.0,117
801,2019,les Roquetes,569201,418422,43119,39777,27415,2395,17901,3449,966,15634.0,123,0.0,0



### B) AirBnB

- Historical data (2010-2022) of Airbnb including Host_Since, First_Review and Neighborhood amongst many

In [4]:
df1 = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\data\\airbnb\\resta/*.csv")), ignore_index=True)

In [5]:
df1.info()
display(df1)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64601 entries, 0 to 64600
Data columns (total 74 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            64601 non-null  int64  
 1   listing_url                                   64601 non-null  object 
 2   scrape_id                                     64601 non-null  int64  
 3   last_scraped                                  64601 non-null  object 
 4   name                                          64561 non-null  object 
 5   description                                   64215 non-null  object 
 6   neighborhood_overview                         38494 non-null  object 
 7   picture_url                                   64601 non-null  object 
 8   host_id                                       64601 non-null  int64  
 9   host_url                                      64601 non-null 

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,host_url,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,18674,https://www.airbnb.com/rooms/18674,20211207182400,2021-12-08,Huge flat for 8 people close to Sagrada Familia,110m2 apartment to rent in Barcelona. Located ...,Apartment in Barcelona located in the heart of...,https://a0.muscache.com/pictures/13031453/413c...,71615,https://www.airbnb.com/users/show/71615,...,4.90,4.71,4.29,HUTB-002062,t,19,19,0,0,0.21
1,23197,https://www.airbnb.com/rooms/23197,20211207182400,2021-12-08,Forum CCIB DeLuxe★Spacious &Elegant★Large Balcony,"Beautiful spacious apartment, large terrace, 5...","Strategically located in the Parc del Fòrum, a...",https://a0.muscache.com/pictures/miso/Hosting-...,90417,https://www.airbnb.com/users/show/90417,...,4.98,4.66,4.67,HUTB-005057,f,2,2,0,0,0.41
2,32711,https://www.airbnb.com/rooms/32711,20211207182400,2021-12-08,Sagrada Familia area - Còrsega 1,A lovely two bedroom apartment only 250 m from...,What's nearby <br />This apartment is located...,https://a0.muscache.com/pictures/357b25e4-f414...,135703,https://www.airbnb.com/users/show/135703,...,4.79,4.81,4.40,HUTB-001722,t,3,3,0,0,0.50
3,34981,https://www.airbnb.com/rooms/34981,20211207182400,2021-12-07,VIDRE HOME PLAZA REAL on LAS RAMBLAS,Spacious apartment for large families or group...,"Located in Ciutat Vella in the Gothic Quarter,...",https://a0.muscache.com/pictures/c4d1723c-e479...,73163,https://www.airbnb.com/users/show/73163,...,4.68,4.73,4.47,HUTB-001506,f,2,2,0,0,1.17
4,35318,https://www.airbnb.com/rooms/35318,20211207182400,2021-12-07,Luxury room with private bathroom and balcony,"Luxury Room with King Size bed, private bathro...",The Gothic Quarter of Barcelona. One of the ol...,https://a0.muscache.com/pictures/miso/Hosting-...,152070,https://www.airbnb.com/users/show/152070,...,4.83,4.81,4.68,,t,1,0,1,0,1.89
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64596,52159926,https://www.airbnb.com/rooms/52159926,20210910193102,2021-09-11,Beautiful 2 Bed Apt in front of Montjuïc park,The spectacular 2-bedroom apartment which has ...,,https://a0.muscache.com/pictures/5cc238dd-f092...,130223809,https://www.airbnb.com/users/show/130223809,...,,,,,f,35,35,0,0,
64597,52179563,https://www.airbnb.com/rooms/52179563,20210910193102,2021-09-11,Suites GV Sant Antoni - 2 bedroom with balcony,This lovely apartment is located in one of the...,Typical barcelonian neighbourhood full of serv...,https://a0.muscache.com/pictures/57e8d75b-210d...,10704,https://www.airbnb.com/users/show/10704,...,,,,,f,16,14,0,0,
64598,52183684,https://www.airbnb.com/rooms/52183684,20210910193102,2021-09-11,Laforja 57 2º2º - Room 2,Esta bonita habitación Inèdit es una pequeña h...,,https://a0.muscache.com/pictures/f3d5ef26-0ebb...,1503151,https://www.airbnb.com/users/show/1503151,...,,,,,f,30,0,30,0,
64599,52185585,https://www.airbnb.com/rooms/52185585,20210910193102,2021-09-11,Stylish and centric apartment in Barcelona 5a,Elegant apartment located in the center of Bar...,,https://a0.muscache.com/pictures/aa452efd-df96...,212431311,https://www.airbnb.com/users/show/212431311,...,,,,HUTB-010157,f,14,14,0,0,


In [6]:
df2 = pd.read_csv(r"C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\airbnb\\fins2020\\airbnb-listings_until_October_2020.csv", sep=';')
df2.info()  
display(df2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17430 entries, 0 to 17429
Data columns (total 89 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   ID                              17430 non-null  int64  
 1   Listing Url                     17430 non-null  object 
 2   Scrape ID                       17430 non-null  int64  
 3   Last Scraped                    17430 non-null  object 
 4   Name                            17429 non-null  object 
 5   Summary                         16781 non-null  object 
 6   Space                           12493 non-null  object 
 7   Description                     17428 non-null  object 
 8   Experiences Offered             17430 non-null  object 
 9   Neighborhood Overview           10285 non-null  object 
 10  Notes                           7979 non-null   object 
 11  Transit                         10063 non-null  object 
 12  Access                          

Unnamed: 0,ID,Listing Url,Scrape ID,Last Scraped,Name,Summary,Space,Description,Experiences Offered,Neighborhood Overview,...,Review Scores Communication,Review Scores Location,Review Scores Value,License,Jurisdiction Names,Cancellation Policy,Calculated host listings count,Reviews per Month,Geolocation,Features
0,1822149,https://www.airbnb.com/rooms/1822149,20170407214050,2017-04-08,Nice & Quite room Raval Boquería center,"Habitación Privada cerca del Macba, en el Barr...",Ascensor y montacargas,"Habitación Privada cerca del Macba, en el Barr...",none,,...,9.0,10.0,9.0,,,strict,3,2.04,"41.38189293967732,2.172397386530445","Host Has Profile Pic,Is Location Exact,Require..."
1,6290341,https://www.airbnb.com/rooms/6290341,20170407214050,2017-04-08,GORGEOUS DESIGN GROUND FLOOR,Thanks for checking our apartment out. As exp...,"Welcome to Barcelona, the city where culture, ...",Thanks for checking our apartment out. As exp...,none,El Raval is a neighbourhood in the Ciutat Vell...,...,9.0,9.0,9.0,,,flexible,13,2.66,"41.37961797606787,2.1658124402265333","Host Has Profile Pic,Host Identity Verified,Re..."
2,11629529,https://www.airbnb.com/rooms/11629529,20170407214050,2017-04-08,Cute attic in the city centre.,Cute and sunny attic with terrace in the city ...,,Cute and sunny attic with terrace in the city ...,none,,...,10.0,9.0,9.0,,,strict,1,1.29,"41.37839492550796,2.1643348628845187","Host Has Profile Pic,Host Identity Verified,Is..."
3,3845305,https://www.airbnb.com/rooms/3845305,20170407214050,2017-04-08,LOVELY ROOM IN LICEU,Double room in the middle of the city. The ro...,Double room in the middle of the city. The ro...,Double room in the middle of the city. The ro...,none,Raval,...,,,,,,flexible,1,,"41.3790885238818,2.173287122755557","Host Has Profile Pic,Requires License"
4,11312258,https://www.airbnb.com/rooms/11312258,20170407214050,2017-04-08,ESTUDIO RAMBLAS A,Apart-Ramblas108 offers modern apartments situ...,The stylish Apart-Ramblas108 apartments have a...,Apart-Ramblas108 offers modern apartments situ...,none,Apart-Ramblas108 is set on the edge of the Got...,...,10.0,10.0,10.0,,,strict,8,1.06,"41.38208379449846,2.171824061369549","Host Has Profile Pic,Is Location Exact,Require..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17425,16344584,https://www.airbnb.com/rooms/16344584,20170407214050,2017-04-08,Doble habitación con baño privado y terraza,Lugares de interés: Carrefour Market Mercat de...,"La habitación es preciosa, dentro un apartamen...",Lugares de interés: Carrefour Market Mercat de...,none,El barrio es muy agradable y tranquilo. Hay el...,...,,,,,,moderate,5,,"41.41974178260469,2.197252304352387","Host Has Profile Pic,Host Identity Verified,Re..."
17426,12247076,https://www.airbnb.com/rooms/12247076,20170407214050,2017-04-08,"Cozy room, 15 minutes to the beach!","Bright, cozy, big room with double bed in a sp...","Cozy, bright, big room with double bed. This i...","Bright, cozy, big room with double bed in a sp...",none,"The house is situated in a quiet, green, beaut...",...,8.0,9.0,8.0,,,moderate,4,3.53,"41.41384607397222,2.1993715441197725","Host Has Profile Pic,Host Identity Verified,Re..."
17427,12808754,https://www.airbnb.com/rooms/12808754,20170407214050,2017-04-08,"Habitación Calida, confortable y limpia",Habitación muy agradable en la que se puede es...,,Habitación muy agradable en la que se puede es...,none,,...,9.0,9.0,9.0,,,flexible,1,4.38,"41.41584716563923,2.1961851138955835","Host Has Profile Pic,Is Location Exact,Require..."
17428,15576078,https://www.airbnb.com/rooms/15576078,20170407214050,2017-04-08,"Cama doble,piso nuevo y exterior",Cama doble en habitacion exterior. Todo elpiso...,,Cama doble en habitacion exterior. Todo elpiso...,none,,...,,,,,,flexible,1,,"41.420750051185536,2.2073973246572947","Host Has Profile Pic,Host Identity Verified,Re..."


# aquí pilla el maxim de info util

In [7]:

a = df2.filter(items=['Neighbourhood Group Cleansed', 'Neighbourhood Cleansed','ID','Property Type', 'Room Type', 'Beds', 'License', 'Price', 'Number of Reviews', 'Reviews per Month', 'First Review', 'Last Review', 'Review Scores Location', 'Review Scores Communication','Host ID', 'Host Since','Host Listings Count'])
b = df1.filter(items=['neighbourhood_group_cleansed', 'neighbourhood_cleansed', 'id','property_type', 'room_type', 'beds', 'license', 'price', 'number_of_reviews', 'reviews_per_month', 'first_review', 'last_review', 'review_scores_location', 'review_scores_communication','host_id', 'host_since','host_listings_count'])
a.columns = b.columns
a.info()
b.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17430 entries, 0 to 17429
Data columns (total 17 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   neighbourhood_group_cleansed  17430 non-null  object 
 1   neighbourhood_cleansed        17430 non-null  object 
 2   id                            17430 non-null  int64  
 3   property_type                 17430 non-null  object 
 4   room_type                     17430 non-null  object 
 5   beds                          17394 non-null  float64
 6   license                       3757 non-null   object 
 7   price                         17388 non-null  float64
 8   number_of_reviews             17430 non-null  int64  
 9   reviews_per_month             14028 non-null  float64
 10  first_review                  14028 non-null  object 
 11  last_review                   14021 non-null  object 
 12  review_scores_location        13823 non-null  float64
 13  r

In [8]:
# # correct simbols in price column
b['price'] = [x.replace("$", "").replace(",","") for x in b['price']]
b['price'] = [float(x) for x in b['price']]

# concatenate
frames = [a,b]
df_airbnb = pd.concat(frames,axis=0, ignore_index=True)

# convert host_since column to datetime 
df_airbnb['host_since'] = pd.to_datetime(df_airbnb['host_since'])
df_airbnb['first_review'] = pd.to_datetime(df_airbnb['first_review'])
df_airbnb['last_review'] = pd.to_datetime(df_airbnb['last_review'])

In [9]:
print('\nfinal dataframe airbnb:-----------------------------')
display(df_airbnb)


final dataframe airbnb:-----------------------------


Unnamed: 0,neighbourhood_group_cleansed,neighbourhood_cleansed,id,property_type,room_type,beds,license,price,number_of_reviews,reviews_per_month,first_review,last_review,review_scores_location,review_scores_communication,host_id,host_since,host_listings_count
0,Ciutat Vella,el Raval,1822149,Apartment,Private room,1.0,,36.0,50,2.04,2015-04-05,2016-12-12,10.0,9.0,9539548,2013-10-20,3.0
1,Ciutat Vella,el Raval,6290341,Apartment,Entire home/apt,3.0,,69.0,62,2.66,2015-05-12,2017-03-19,9.0,9.0,1853675,2012-03-04,14.0
2,Ciutat Vella,el Raval,11629529,Apartment,Entire home/apt,2.0,,79.0,16,1.29,2016-04-01,2017-03-27,9.0,10.0,61619668,2016-03-05,1.0
3,Ciutat Vella,el Raval,3845305,Apartment,Private room,1.0,,65.0,0,,NaT,NaT,,,19821352,2014-08-11,1.0
4,Ciutat Vella,el Raval,11312258,Apartment,Entire home/apt,2.0,,125.0,7,1.06,2016-09-22,2017-04-07,10.0,10.0,55366754,2016-01-20,8.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82026,Sants-Montjuïc,el Poble Sec,52159926,Entire rental unit,Entire home/apt,2.0,,58.0,0,,NaT,NaT,,,130223809,2017-05-14,38.0
82027,Eixample,la Nova Esquerra de l'Eixample,52179563,Entire serviced apartment,Entire home/apt,3.0,,110.0,0,,NaT,NaT,,,10704,2009-03-19,15.0
82028,Sarrià-Sant Gervasi,Sant Gervasi - Galvany,52183684,Private room in rental unit,Private room,1.0,,18.0,0,,NaT,NaT,,,1503151,2011-12-14,82.0
82029,Eixample,l'Antiga Esquerra de l'Eixample,52185585,Entire rental unit,Entire home/apt,4.0,HUTB-010157,265.0,0,,NaT,NaT,,,212431311,2018-08-29,16.0



### C) Internal Migration Rate

- Historical data (2009-2022) of new/expired contracts every 1000 people

In [10]:
df3 = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\data\\taxa_migracio_interna(altes_i_baixes)\\09to18/*.csv")), ignore_index=True)
df4 = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\data\\taxa_migracio_interna(altes_i_baixes)\\resta/*.csv")), ignore_index=True)

display(df3.info())
display(df4.info())

display(df3)
display(df4)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Any             1460 non-null   int64  
 1   Codi_districte  1460 non-null   int64  
 2   Nom_districte   1460 non-null   object 
 3   Codi_barri      1460 non-null   int64  
 4   Nom_barri       1460 non-null   object 
 5   Taxa_mil_hab    1460 non-null   object 
 6   Nombre          1460 non-null   float64
dtypes: float64(1), int64(3), object(3)
memory usage: 80.0+ KB


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 292 entries, 0 to 291
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Any             292 non-null    int64  
 1   Codi_Districte  292 non-null    int64  
 2   Nom_Districte   292 non-null    object 
 3   Codi_Barri      292 non-null    int64  
 4   Nom_Barri       292 non-null    object 
 5   Moviment        292 non-null    object 
 6   Taxa_mil_hab    292 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 16.1+ KB


None

Unnamed: 0,Any,Codi_districte,Nom_districte,Codi_barri,Nom_barri,Taxa_mil_hab,Nombre
0,2009,1,Ciutat Vella,1,el Raval,Barri de baixa taxa per mil habitants,156.179661
1,2009,1,Ciutat Vella,2,el Barri Gòtic,Barri de baixa taxa per mil habitants,117.082442
2,2009,1,Ciutat Vella,3,la Barceloneta,Barri de baixa taxa per mil habitants,93.816892
3,2009,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Barri de baixa taxa per mil habitants,98.245159
4,2009,2,Eixample,5,el Fort Pienc,Barri de baixa taxa per mil habitants,64.002169
...,...,...,...,...,...,...,...
1455,2018,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,Barri d'alta taxa per mil habitants,45.159864
1456,2018,10,Sant Martí,70,el Besòs i el Maresme,Barri d'alta taxa per mil habitants,64.074294
1457,2018,10,Sant Martí,71,Provençals del Poblenou,Barri d'alta taxa per mil habitants,39.398071
1458,2018,10,Sant Martí,72,Sant Martí de Provençals,Barri d'alta taxa per mil habitants,41.399193


Unnamed: 0,Any,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Moviment,Taxa_mil_hab
0,2019,1,Ciutat Vella,1,el Raval,Alta,104.4
1,2019,1,Ciutat Vella,2,el Barri Gòtic,Alta,121.2
2,2019,1,Ciutat Vella,3,la Barceloneta,Alta,85.1
3,2019,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Alta,81.0
4,2019,2,Eixample,5,el Fort Pienc,Alta,60.1
...,...,...,...,...,...,...,...
287,2020,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,Baixa,38.6
288,2020,10,Sant Martí,70,el Besòs i el Maresme,Baixa,51.0
289,2020,10,Sant Martí,71,Provençals del Poblenou,Baixa,39.1
290,2020,10,Sant Martí,72,Sant Martí de Provençals,Baixa,39.8


In [11]:
# columns don't match in name nor in value ...solving:

# create dictionary to map values over df3['Taxa_mil_hab'] column
new_str = {"Barri d'alta taxa per mil habitants":"Alta","Barri de baixa taxa per mil habitants":"Baixa"}
    
# map values over each cell in the column
df3['Taxa_mil_hab'] = df3['Taxa_mil_hab'].map(new_str)
    
# change columns so both dataframes match...solved:
df3.columns = ['Any', 'Codi_Districte', 'Nom_Districte', 'Codi_Barri', 'Nom_Barri', 'Moviment', 'Taxa_mil_hab']
df4.columns = ['Any', 'Codi_Districte', 'Nom_Districte', 'Codi_Barri', 'Nom_Barri', 'Moviment', 'Taxa_mil_hab']

# concatenate dataframes
frames = [df3,df4]
df_internal_migration = pd.concat(frames, ignore_index=True)

# check
display(df_internal_migration.info())
display(df_internal_migration)

# separate movements in terms of Alta(new rent contracts) and Baixa(expired rent contracts)
dummy = pd.get_dummies(df_internal_migration['Moviment'])
df_internal_migration = pd.concat([df_internal_migration, dummy], axis=1)
print('\nCheck the two additional rows ______________')
display(df_internal_migration)

# drop unwanted columns
df_internal_migration.drop(labels=['Codi_Districte', 'Nom_Districte','Codi_Barri', 'Moviment'],axis=1, inplace=True)
print('\nNow it looks much cleaner, but it could be better ______________')
display(df_internal_migration)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1752 entries, 0 to 1751
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Any             1752 non-null   int64  
 1   Codi_Districte  1752 non-null   int64  
 2   Nom_Districte   1752 non-null   object 
 3   Codi_Barri      1752 non-null   int64  
 4   Nom_Barri       1752 non-null   object 
 5   Moviment        1752 non-null   object 
 6   Taxa_mil_hab    1752 non-null   float64
dtypes: float64(1), int64(3), object(3)
memory usage: 95.9+ KB


None

Unnamed: 0,Any,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Moviment,Taxa_mil_hab
0,2009,1,Ciutat Vella,1,el Raval,Baixa,156.179661
1,2009,1,Ciutat Vella,2,el Barri Gòtic,Baixa,117.082442
2,2009,1,Ciutat Vella,3,la Barceloneta,Baixa,93.816892
3,2009,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Baixa,98.245159
4,2009,2,Eixample,5,el Fort Pienc,Baixa,64.002169
...,...,...,...,...,...,...,...
1747,2020,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,Baixa,38.600000
1748,2020,10,Sant Martí,70,el Besòs i el Maresme,Baixa,51.000000
1749,2020,10,Sant Martí,71,Provençals del Poblenou,Baixa,39.100000
1750,2020,10,Sant Martí,72,Sant Martí de Provençals,Baixa,39.800000



Check the two additional rows ______________


Unnamed: 0,Any,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Moviment,Taxa_mil_hab,Alta,Baixa
0,2009,1,Ciutat Vella,1,el Raval,Baixa,156.179661,0,1
1,2009,1,Ciutat Vella,2,el Barri Gòtic,Baixa,117.082442,0,1
2,2009,1,Ciutat Vella,3,la Barceloneta,Baixa,93.816892,0,1
3,2009,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Baixa,98.245159,0,1
4,2009,2,Eixample,5,el Fort Pienc,Baixa,64.002169,0,1
...,...,...,...,...,...,...,...,...,...
1747,2020,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,Baixa,38.600000,0,1
1748,2020,10,Sant Martí,70,el Besòs i el Maresme,Baixa,51.000000,0,1
1749,2020,10,Sant Martí,71,Provençals del Poblenou,Baixa,39.100000,0,1
1750,2020,10,Sant Martí,72,Sant Martí de Provençals,Baixa,39.800000,0,1



Now it looks much cleaner, but it could be better ______________


Unnamed: 0,Any,Nom_Barri,Taxa_mil_hab,Alta,Baixa
0,2009,el Raval,156.179661,0,1
1,2009,el Barri Gòtic,117.082442,0,1
2,2009,la Barceloneta,93.816892,0,1
3,2009,"Sant Pere, Santa Caterina i la Ribera",98.245159,0,1
4,2009,el Fort Pienc,64.002169,0,1
...,...,...,...,...,...
1747,2020,Diagonal Mar i el Front Marítim del Poblenou,38.600000,0,1
1748,2020,el Besòs i el Maresme,51.000000,0,1
1749,2020,Provençals del Poblenou,39.100000,0,1
1750,2020,Sant Martí de Provençals,39.800000,0,1


Split the Dataframe in two, reset the indexes, delete redundant columns and concatenate both df's result will be half the rows containing the same information.

In [12]:
# split dataframe in two 
a = df_internal_migration.loc[df_internal_migration['Alta'] == 1]  
a = pd.DataFrame(a)
a = a.rename(columns={'Taxa_mil_hab':'new_contracts_1000_hab', 'Nom_Barri':'neighbourhood', 'Any':'year'})
a.drop(columns=['Alta','Baixa'], axis=1, inplace=True)
a = a.reset_index(drop=True)
display(a.info())
b = df_internal_migration.loc[df_internal_migration['Baixa'] == 1]       
b = pd.DataFrame(b)
b = b.rename(columns={'Taxa_mil_hab':'expired_contracts_1000_hab'})
b.drop(columns=['Any','Nom_Barri', 'Alta', 'Baixa'], axis=1, inplace=True)
b = b.reset_index(drop=True)
display(b.info())

# concatenate
df_internal_migration = pd.concat([a,b], axis=1)

# select until 2019, included
df_internal_migration = df_internal_migration.loc[df_internal_migration['year']< 2020]

display(df_internal_migration)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 876 entries, 0 to 875
Data columns (total 3 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   year                    876 non-null    int64  
 1   neighbourhood           876 non-null    object 
 2   new_contracts_1000_hab  876 non-null    float64
dtypes: float64(1), int64(1), object(1)
memory usage: 20.7+ KB


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 876 entries, 0 to 875
Data columns (total 1 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   expired_contracts_1000_hab  876 non-null    float64
dtypes: float64(1)
memory usage: 7.0 KB


None

Unnamed: 0,year,neighbourhood,new_contracts_1000_hab,expired_contracts_1000_hab
0,2009,el Raval,147.034371,156.179661
1,2009,el Barri Gòtic,75.980043,117.082442
2,2009,la Barceloneta,84.337349,93.816892
3,2009,"Sant Pere, Santa Caterina i la Ribera",89.600622,98.245159
4,2009,el Fort Pienc,54.846094,64.002169
...,...,...,...,...
798,2019,Diagonal Mar i el Front Marítim del Poblenou,42.700000,51.200000
799,2019,el Besòs i el Maresme,72.000000,69.800000
800,2019,Provençals del Poblenou,53.200000,46.500000
801,2019,Sant Martí de Provençals,45.700000,44.900000


In [13]:
# create new binary column (rent_growth_1000_hab) based on the difference of
# signed minus expired contracts. 
df_internal_migration['win_lost_rents_1000_hab'] = df_internal_migration['new_contracts_1000_hab'] - df_internal_migration['expired_contracts_1000_hab']

df_internal_migration['binary_rent_growth_1000_hab'] = df_internal_migration['new_contracts_1000_hab'] - df_internal_migration['expired_contracts_1000_hab']
df_internal_migration['binary_rent_growth_1000_hab'][df_internal_migration['binary_rent_growth_1000_hab']<0] = 0
df_internal_migration['binary_rent_growth_1000_hab'][df_internal_migration['binary_rent_growth_1000_hab']>0] = 1

# correct typos
df_internal_migration['neighbourhood'] = [x.replace(",","") for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r"lArpa", "l'Arpa", x) for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r"lAntiga", "l'Antiga", x) for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r"lEixample", "l'Eixample", x) for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r"dHebron", "d'Hebron", x) for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r"\s-\s.+Franca", "", x) for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r"den", "d'en", x) for x in df_internal_migration['neighbourhood']]
df_internal_migration['neighbourhood'] = [re.sub(r".+Sec\s*$", "el Poble Sec - Parc Montjuïc", x) for x in df_internal_migration['neighbourhood']]

# solve blank spaces problem
df_internal_migration['neighbourhood'] = [x.strip() for x in df_internal_migration['neighbourhood']]

# finally order by year and neighbourhood
df_internal_migration = df_internal_migration.sort_values(by=['year','neighbourhood'], ignore_index=True)
display(df_internal_migration)

Unnamed: 0,year,neighbourhood,new_contracts_1000_hab,expired_contracts_1000_hab,win_lost_rents_1000_hab,binary_rent_growth_1000_hab
0,2009,Baró de Viver,32.883642,40.050590,-7.166948,0.0
1,2009,Can Baró,56.119664,56.228846,-0.109182,0.0
2,2009,Can Peguera,65.610860,36.199095,29.411765,1.0
3,2009,Canyelles,26.226389,32.884903,-6.658513,0.0
4,2009,Ciutat Meridiana,93.879348,99.251431,-5.372083,0.0
...,...,...,...,...,...,...
798,2019,la Vila Olímpica del Poblenou,45.100000,46.100000,-1.000000,0.0
799,2019,la Vila de Gràcia,63.100000,72.000000,-8.900000,0.0
800,2019,les Corts,47.000000,47.800000,-0.800000,0.0
801,2019,les Roquetes,65.600000,66.200000,-0.600000,0.0


In [14]:
llista_barris_df_migration = sorted(df_internal_migration['neighbourhood'].unique())
if len(llista_barris_df_migration) == 73:
    print('''Nº unique neighb. names = 73 SOLVED!!!
Neighbourhood typos corrected in df_internal_migration''')
else: print('keep trying! There is still' , len(llista_barris_df_migration), 'names')

Nº unique neighb. names = 73 SOLVED!!!
Neighbourhood typos corrected in df_internal_migration



### D) Rent Prices (€/month & €/m<sup>2</sup>)

- Historical Data (2014-2022) of Barcelona area medium rent prices by zone.
(*Source: Barcelona.OpenData*)

- Historical Data (2009-2013) of Barcelona area medium rent prices by zone.
(*Source: Barcelona.OpenData*)

In [15]:
df_rent_prices = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\lloguer_preumitj_mt2_pis\\2014-2022/*.csv")),  ignore_index=True)
display(df_rent_prices)

Unnamed: 0,Any,Trimestre,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Lloguer_mitja,Preu
0,2014,1,1,Ciutat Vella,1,el Raval,Lloguer mitjà mensual (Euros/mes),589.55
1,2014,1,1,Ciutat Vella,2,el Barri Gòtic,Lloguer mitjà mensual (Euros/mes),712.79
2,2014,1,1,Ciutat Vella,3,la Barceloneta,Lloguer mitjà mensual (Euros/mes),540.71
3,2014,1,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Lloguer mitjà mensual (Euros/mes),673.44
4,2014,1,2,Eixample,5,el Fort Pienc,Lloguer mitjà mensual (Euros/mes),736.09
...,...,...,...,...,...,...,...,...
4813,2022,1,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,Lloguer mitjà per superfície (Euros/m2 mes),15.0
4814,2022,1,10,Sant Martí,70,el Besòs i el Maresme,Lloguer mitjà per superfície (Euros/m2 mes),12.0
4815,2022,1,10,Sant Martí,71,Provençals del Poblenou,Lloguer mitjà per superfície (Euros/m2 mes),13.5
4816,2022,1,10,Sant Martí,72,Sant Martí de Provençals,Lloguer mitjà per superfície (Euros/m2 mes),11.4


In [16]:
# separate prices in terms of €/m2, €/month
dummy = pd.get_dummies(df_rent_prices['Lloguer_mitja'])
df_rent_prices = pd.concat([df_rent_prices, dummy], axis=1)
df_rent_prices.drop(labels=['Codi_Districte', 'Nom_Districte','Codi_Barri', 'Lloguer_mitja' ],axis=1, inplace=True)
df_rent_prices

Unnamed: 0,Any,Trimestre,Nom_Barri,Preu,Lloguer mitjà mensual (Euros/mes),Lloguer mitjà per superfície (Euros/m2 mes)
0,2014,1,el Raval,589.55,1,0
1,2014,1,el Barri Gòtic,712.79,1,0
2,2014,1,la Barceloneta,540.71,1,0
3,2014,1,"Sant Pere, Santa Caterina i la Ribera",673.44,1,0
4,2014,1,el Fort Pienc,736.09,1,0
...,...,...,...,...,...,...
4813,2022,1,Diagonal Mar i el Front Marítim del Poblenou,15.0,0,1
4814,2022,1,el Besòs i el Maresme,12.0,0,1
4815,2022,1,Provençals del Poblenou,13.5,0,1
4816,2022,1,Sant Martí de Provençals,11.4,0,1


In [17]:
# €/month lines
a = df_rent_prices.loc[df_rent_prices['Lloguer mitjà mensual (Euros/mes)'] == 1]  
a = pd.DataFrame(a)
a.drop(columns=['Lloguer mitjà mensual (Euros/mes)','Lloguer mitjà per superfície (Euros/m2 mes)'], axis=1, inplace=True)
a = a.reset_index(drop=True)

# €/m2 
b = df_rent_prices.loc[df_rent_prices['Lloguer mitjà per superfície (Euros/m2 mes)'] == 1]       
b = pd.DataFrame(b)
b.drop(columns=['Any','Trimestre','Nom_Barri','Lloguer mitjà mensual (Euros/mes)','Lloguer mitjà per superfície (Euros/m2 mes)'], axis=1, inplace=True)
b = b.reset_index(drop=True)

print(a.shape)
print(b.shape)

df_rents = pd.concat([a,b], axis=1)
df_rents.columns = ['year','quarter', 'neighbourhood', 'avg_€/month', 'avg_€/m2']
df_rents['neighbourhood'] = [x.replace(',', '') for x in df_rents['neighbourhood']]
df_rents = df_rents[df_rents['year'] < 2020]
display(df_rents)

(2409, 4)
(2409, 1)


Unnamed: 0,year,quarter,neighbourhood,avg_€/month,avg_€/m2
0,2014,1,el Raval,589.55,10.76
1,2014,1,el Barri Gòtic,712.79,10.58
2,2014,1,la Barceloneta,540.71,14.4
3,2014,1,Sant Pere Santa Caterina i la Ribera,673.44,11.01
4,2014,1,el Fort Pienc,736.09,10.42
...,...,...,...,...,...
1747,2019,4,Diagonal Mar i el Front Marítim del Poblenou,1498.6,18.5
1748,2019,4,el Besòs i el Maresme,693.8,10.8
1749,2019,4,Provençals del Poblenou,1013.6,14.5
1750,2019,4,Sant Martí de Provençals,835.7,12.0


Makes sense, since we have 4 rows (quarters) per year, we'll compute the average

In [18]:
# check types of columns
df_rents.info()

# check nan values
display(df_rents.isna().sum())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1752 entries, 0 to 1751
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   year           1752 non-null   int64 
 1   quarter        1752 non-null   int64 
 2   neighbourhood  1752 non-null   object
 3   avg_€/month    1638 non-null   object
 4   avg_€/m2       1637 non-null   object
dtypes: int64(2), object(3)
memory usage: 82.1+ KB


year               0
quarter            0
neighbourhood      0
avg_€/month      114
avg_€/m2         115
dtype: int64

In [19]:
# convert numerical columns
df_rents['avg_€/month'] = pd.to_numeric(df_rents['avg_€/month'])
df_rents['avg_€/m2'] = pd.to_numeric(df_rents['avg_€/m2'])

# get the mean of all the trimesters grouping by year and neighbourhood 
df_rents = df_rents.groupby(['year','neighbourhood'])['avg_€/month','avg_€/m2'].mean()

# reset index and sort dataframe per year and neighbourhood
df_rents = df_rents.reset_index()
df_rents = df_rents.sort_values(by=['year', 'neighbourhood'], ignore_index=True)


In [20]:
df_rents.info()
df_rents.isna().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 438 entries, 0 to 437
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   year           438 non-null    int64  
 1   neighbourhood  438 non-null    object 
 2   avg_€/month    421 non-null    float64
 3   avg_€/m2       421 non-null    float64
dtypes: float64(2), int64(1), object(1)
memory usage: 13.8+ KB


year              0
neighbourhood     0
avg_€/month      17
avg_€/m2         17
dtype: int64

In [21]:
# get 2009-2013 data

c = pd.read_csv(r'C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\lloguer_preumitj_mt2_pis\\2009-2013\\2009-2013_lloguer_preu_trim.csv')

# correct typos
c['avg_€/m2'] =[x.replace(',', '.').replace('--', '').replace('"', '') for x in c['avg_€/m2']]
c['avg_€/month'] =[x.replace('--', '') for x in c['avg_€/month']]
c['neighbourhood'] = [re.sub(r"['.',0-9]+", "", x) for x in c['neighbourhood']]
c['avg_€/month'] = pd.to_numeric(c['avg_€/month'])
c['avg_€/m2'] = pd.to_numeric(c['avg_€/m2'])

# order by year and hood
c.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)

display(c.info())
display(df_rents.info())

# concatenate
df_price_rent = pd.concat([c,df_rents],axis=0, ignore_index=True, verify_integrity=True)

# solve blank spaces problem
df_price_rent['neighbourhood'] = [x.strip() for x in df_price_rent['neighbourhood']]

df_price_rent['neighbourhood'] = [re.sub(r"lArpa", "l'Arpa", x) for x in df_price_rent['neighbourhood']]
df_price_rent['neighbourhood'] = [re.sub(r"lAntiga", "l'Antiga", x) for x in df_price_rent['neighbourhood']]
df_price_rent['neighbourhood'] = [re.sub(r"lEixample", "l'Eixample", x) for x in df_price_rent['neighbourhood']]
df_price_rent['neighbourhood'] = [re.sub(r"dHebron", "d'Hebron", x) for x in df_price_rent['neighbourhood']]
df_price_rent['neighbourhood'] = [re.sub(r"den", "d'en", x) for x in df_price_rent['neighbourhood']]
df_price_rent['neighbourhood'] = [re.sub(r"\s-\s.+Franca", "", x) for x in df_price_rent['neighbourhood']]
df_price_rent['neighbourhood'] = [re.sub(r".+Sec\s*$", "el Poble Sec - Parc Montjuïc", x) for x in df_price_rent['neighbourhood']]


#create new column average housing (m2)
df_price_rent['avg_housing(m2)'] = df_price_rent['avg_€/month'] / df_price_rent['avg_€/m2']

# rearrange df
df_price_rent = df_price_rent.sort_values(by=['year', 'neighbourhood'], ignore_index=True)

# check
display(df_price_rent.info())
df_price_rent


# for debugging
f = df_price_rent.loc[df_price_rent['neighbourhood']== 'Pedralbes']
display(f)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   year           365 non-null    int64  
 1   neighbourhood  365 non-null    object 
 2   avg_€/month    205 non-null    float64
 3   avg_€/m2       205 non-null    float64
dtypes: float64(2), int64(1), object(1)
memory usage: 11.5+ KB


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 438 entries, 0 to 437
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   year           438 non-null    int64  
 1   neighbourhood  438 non-null    object 
 2   avg_€/month    421 non-null    float64
 3   avg_€/m2       421 non-null    float64
dtypes: float64(2), int64(1), object(1)
memory usage: 13.8+ KB


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 803 entries, 0 to 802
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   year             803 non-null    int64  
 1   neighbourhood    803 non-null    object 
 2   avg_€/month      626 non-null    float64
 3   avg_€/m2         626 non-null    float64
 4   avg_housing(m2)  626 non-null    float64
dtypes: float64(3), int64(1), object(1)
memory usage: 31.5+ KB


None

Unnamed: 0,year,neighbourhood,avg_€/month,avg_€/m2,avg_housing(m2)
10,2009,Pedralbes,1.295,14.21,0.091133
83,2010,Pedralbes,1.318,15.01,0.087808
156,2011,Pedralbes,2.372,14.42,0.164494
229,2012,Pedralbes,,,
302,2013,Pedralbes,,,
375,2014,Pedralbes,1489.345,12.44,119.722267
448,2015,Pedralbes,1714.145,14.1875,120.820793
521,2016,Pedralbes,1653.6975,14.6725,112.707276
594,2017,Pedralbes,1785.885,16.1425,110.632492
667,2018,Pedralbes,1707.0125,15.6875,108.813546


In [22]:
f = df_price_rent.loc[df_price_rent['neighbourhood']=='Pedralbes']
display(f)

Unnamed: 0,year,neighbourhood,avg_€/month,avg_€/m2,avg_housing(m2)
10,2009,Pedralbes,1.295,14.21,0.091133
83,2010,Pedralbes,1.318,15.01,0.087808
156,2011,Pedralbes,2.372,14.42,0.164494
229,2012,Pedralbes,,,
302,2013,Pedralbes,,,
375,2014,Pedralbes,1489.345,12.44,119.722267
448,2015,Pedralbes,1714.145,14.1875,120.820793
521,2016,Pedralbes,1653.6975,14.6725,112.707276
594,2017,Pedralbes,1785.885,16.1425,110.632492
667,2018,Pedralbes,1707.0125,15.6875,108.813546


### E) Socioeconomic Indicators per neighbourhood

In [23]:
# each of the csv contains a URL with a PDF quickview at neighbourhood 
# stats. Either get url and scrap it or get the pdf's extract text and parse it with regular
# expresions.
    
    # open each csv, add a new column "year" to discern data and store it in a list of df's 
x = map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\fitxes_de_barri/*.csv"))
y = 2009
list_dfs = []
for i in x:
    i['YEAR'] = y
    list_dfs.append(pd.DataFrame(i))
    y += 1
    
    # concatenate them
df_barris = pd.concat(list_dfs)

    # select only rows relative to ['categoria barri'] (that's where the data URL is)
row_indexes_to_drop = df_barris[df_barris['CATEGORIA_DIVISIO'] == 'Districte'].index
df_barris.drop(row_indexes_to_drop, inplace=True)
    # remove all but necesary columns (year, neighbourhood and URL)
df_barris = df_barris.filter(['YEAR','NOM_DIVISIO_TERRITORIAL', 'URL_FITXA_DIVISIO_TERRITORIAL' ])
display(df_barris)

Unnamed: 0,YEAR,NOM_DIVISIO_TERRITORIAL,URL_FITXA_DIVISIO_TERRITORIAL
0,2009,el Raval,https://ajuntament.barcelona.cat/estadistica/c...
1,2009,el Barri Gòtic,https://ajuntament.barcelona.cat/estadistica/c...
2,2009,la Barceloneta,https://ajuntament.barcelona.cat/estadistica/c...
3,2009,"Sant Pere, Santa Caterina i la Ribera",https://ajuntament.barcelona.cat/estadistica/c...
4,2009,el Fort Pienc,https://ajuntament.barcelona.cat/estadistica/c...
...,...,...,...
68,2019,Diagonal Mar i el Front Marítim del Poblenou,https://ajuntament.barcelona.cat/estadistica/c...
69,2019,el Besòs i el Maresme,https://ajuntament.barcelona.cat/estadistica/c...
70,2019,Provençals del Poblenou,https://ajuntament.barcelona.cat/estadistica/c...
71,2019,Sant Martí de Provençals,https://ajuntament.barcelona.cat/estadistica/c...


get pdfs

output_dir = 'C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\fitxes_de_barri\\pdf_output'
i = 1
# get urls iterating over previous df
for x in df_barris['URL_FITXA_DIVISIO_TERRITORIAL']:
    
    response = requests.get(url, allow_redirects=True)
    if response.status_code == 200:
        file_path = os.path.join(output_dir, str(i))
        with open(file_path, 'wb') as f:
            f.write(response.content)
    else:
        print(f'Error {response.status_code} in file number {i}')
    
    i += 1

In [24]:
# define method that extracts text  from pdf
def extract_pdfText(path):
    with open(path, 'rb') as f:
        reader = PdfFileReader(f)
        page_obj = reader.getPage(1)
        text = page_obj.extractText()
        text_strip = text.strip()
        # get the page
        #page = pdf.getPage(1)
        #Extract text from the page
    return text




Since some of the data in the pdf changes a bit (i.e:some variables appeared in 2011) , we grouped the tables in 3 folders according to their similarities so the regex can work best.

In [25]:
# create empty lists to append the variables     
years = []
districtes = []
barris = []
pop = []
spanish = []
guiris = []
studies = []
nojob = []   

# get the path of every pdf in every folder   

# first folder (2009-10)

my_path = 'C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\fitxes_de_barri\\pdf_output\\2009_10\\'
pdf_paths = []
for filename in os.listdir(my_path):
    path = os.path.join(my_path, filename)
    pdf_paths.append(path)
    
# iterate over list of pdf_paths and use method to extract text of each
for file_path in pdf_paths:
    text = extract_pdfText(file_path)
    print(text,'\n')

    # save variables with values obtained using regex
    year = re.findall("MICS\s+(20\d{2})", text)
    barri = re.findall(r"\n(.+)\nIND", text)
    population = re.findall(r"Població\s+(['.',0-9]+)\s", text)
    spaniards = re.findall(r"Espanyols (['.',0-9]+)", text)
    strangers = re.findall(r"Estrangers (['.',0-9]+)", text)
    
    # Put nan or 0 since these variables started in 2011 and first folder contains pdf's from 2009-2010
    unemployment = [np.nan]
    tuition = [np.nan]
     
    
    # process variables to desired types
    year = [int(x) for x in year]
    
    population =[x.replace('.', '') for x in population]
    population = [float(x) for x in population]
   
    spaniards = [x.replace(',', '.') for x in spaniards]
    spaniards = [float(x) for x in spaniards]
    
    strangers = [x.replace(',', '.') for x in strangers]  
    strangers = [float(x) for x in strangers]
    
    # extending every list of values instead of appending it      
    years.extend(year)
    barris.extend(barri)
    pop.extend(population)
    spanish.extend(spaniards)
    guiris.extend(strangers)
    studies.extend(tuition)
    nojob.extend(unemployment)

Xref table not zero-indexed. ID numbers for objects will be corrected.


Barri el Raval
INDICADORSSOCIOECONÒMICS2009 BARRI DISTRICTE BARCELONADistricte de Ciutat Vella
INDICADORS  SOCIOECONÒMICS  2009 BARRI DISTRICTE BARCELONA
Població 49.315 109.897 1.638.103 Superfície (km2) 1,1 4,4 102,2 Densitat (hab/km2) 44.897 25.157 16.035 Població per sexeDones 22.340 51.385 856.591Homes 26.975 58.512 781.512
Pb l i ód(%)Població per e dat (en %)
0‐14 11,3 10,0 11,915‐24 10,5 9,8 9,325‐64 64,7 65,0 58,565 i mes 13,5 15,2 20,3Població per lloc de naixement (en %)Barcelona 28,0 32,2 51,0
Resta Catalunya 4,4 5,0 7,3Resta Espanya 13,5 14,9 20,0Estranger 54,1 48,0 21,8Població per nacionalitat (en %)Espanyols 52,7 59,5 82,4Estrangers 47,3 40,5 17,6Principals nacionalitats estrangeres
Pakistan Pakistan Itàlia4.388 5.594 7,7Filipines Filipines Equador3.988 4.521 7,5Bangla Desh Itàlia Pakistan1.668 3.737 6,0 Taxa natalitat 
/ 1.000 hab 10 ,48 ,98 ,7 / , , ,
Turismes (persones físiques) / 1.000 hab 151,6 186,2 328,8 Motos (persones físiques) / 1.000 hab 51,6 68,3 112,2Poblac

Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de l'Eixample
Barri Sant Antoni
INDICADORSSOCIOECONÒMICS2009 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2009 BARRI DISTRICTE BARCELONA
Població 38.742 269.188 1.638.103 Superfície (km2) 0,8 7,5 102,2 Densitat (hab/km2) 48.368 36.005 16.035 Població per sexeDones 20.673 145.019 856.591Homes 18.069 124.169 781.512
Poblacióperedat(en%)Població  per edat (en %)
0‐14 10,0 10,6 11,915‐24 8,1 8,8 9,325‐64 58,7 58,9 58,565 i mes 23,2 21,6 20,3Població per lloc de naixement (en %)Barcelona 51,6 51,5 51,0
RestaCatalunya 85 90 73 Resta Catalunya 8,5 9,0 7,3
Resta Espanya 16,8 16,6 20,0Estranger 23,1 22,8 21,8Població per nacionalitat (en %)Espanyols 81,6 82,3 82,4Estrangers 18,4 17,7 17,6Principals nacionalitats estrangeres
Itàlia Itàlia Itàlia Itàlia Itàlia Itàlia
811 5.524 7,7Xina Xina Equador503 4.075 7,5França França Pakistan414 3.109 6,0 Taxa natalitat / 1.000 hab 7,3 7,9 8,7
Turismes (personesfísiques)/1 000hab 291 3 316 2 328 8 Turismes  (persones  físiques)  / 1.000 h

Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sarrià‐Sant Gervasi
Barri el Putxet i el Farró
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 30.566 145.550 1.630.494 Superfície (km2) 0,8 20,1 102,2 Densitat (hab/km2) 36.117 7.243 15.960 Població per sexeDones 16.194 78.716 853.436Homes 14.372 66.834 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 13,2 15,5 12,115‐24 9,4 9,9 9,025‐64 58,5 53,8 58,365 i mes 19,0 20,9 20,5Població per lloc de naixement (en %)Barcelona 57,9 62,7 51,2
RestaCatalunya 89 91 73 Resta Catalunya 8,9 9,1 7,3
Resta Espanya 11,7 12,1 19,7Estranger 21,4 16,2 21,8Població per nacionalitat (en %)Espanyols 87,4 89,2 82,7Estrangers 12,6 10,8 17,3Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
490 1.901 22.342França França Itàlia278 1.718 22.002Alemanya Alemanya Equador199 789 17.966 Taxa natalitat / 1.000 hab 9,2 9,1 8,8
Turismes (personesfísiques)/1 000hab 368 1 429 2 329 3 Turismes

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Gràcia
Barri Vallcarca i els Penitents
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 15.436 123.253 1.630.494 Superfície (km2) 1,2 4,2 102,2 Densitat (hab/km2) 12.772 29.447 15.960 Població per sexeDones 8.294 66.259 853.436Homes 7.142 56.994 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 12,7 11,1 12,115‐24 8,6 8,1 9,025‐64 57,0 60,0 58,365 i mes 21,8 20,7 20,5Població per lloc de naixement (en %)Barcelona 56,1 54,9 51,2
RestaCatalunya 86 88 73 Resta Catalunya 8,6 8,8 7,3
Resta Espanya 17,4 15,8 19,7Estranger 17,9 20,5 21,8Població per nacionalitat (en %)Espanyols 87,0 85,1 82,7Estrangers 13,0 14,9 17,3Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
196 2.413 22.342Bolívia França Itàlia145 1.183 22.002Colòmbia Colòmbia Equador135 831 17.966 Taxa natalitat / 1.000 hab 9,6 8,7 8,8
Turismes (personesfísiques)/1 000hab 371 9 319 2 329 3 Turismes  (person

Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri Can Baró
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 9.034 171.026 1.630.494 Superfície (km2) 0,4 11,9 102,2 Densitat (hab/km2) 23.542 14.315 15.960 Població per sexeDones 4.820 90.042 853.436Homes 4.214 80.984 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,9 12,0 12,115‐24 9,1 8,7 9,025‐64 57,3 56,8 58,365 i mes 21,7 22,5 20,5Població per lloc de naixement (en %)Barcelona 55,2 52,4 51,2
RestaCatalunya 58 59 73 Resta Catalunya 5,8 5,9 7,3
Resta Espanya 23,0 24,8 19,7Estranger 16,0 16,9 21,8Població per nacionalitat (en %)Espanyols 87,3 87,4 82,7Estrangers 12,7 12,6 17,3Principals nacionalitats estrangeres
Itàlia Equador Pakistan Itàlia Equador Pakistan
80 2.076 22.342Colòmbia Bolívia Itàlia77 1.726 22.002Equador Perú Equador72 1.606 17.966 Taxa natalitat / 1.000 hab 9,7 8,5 8,8
Turismes (personesfísiques)/1 000hab 341 6 349 9 329 3 Turismes  (persones  físique

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri el Guinardó
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 35.803 171.026 1.630.494 Superfície (km2) 1,3 11,9 102,2 Densitat (hab/km2) 27.367 14.315 15.960 Població per sexeDones 19.154 90.042 853.436Homes 16.649 80.984 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,8 12,0 12,115‐24 8,4 8,7 9,025‐64 57,8 56,8 58,365 i mes 22,0 22,5 20,5Població per lloc de naixement (en %)Barcelona 54,8 52,4 51,2
RestaCatalunya 72 59 73 Resta Catalunya 7,2 5,9 7,3
Resta Espanya 19,5 24,8 19,7Estranger 18,5 16,9 21,8Població per nacionalitat (en %)Espanyols 85,6 87,4 82,7Estrangers 14,4 12,6 17,3Principals nacionalitats estrangeres
Equador Equador Pakistan Equador Equador Pakistan
460 2.076 22.342Perú Bolívia Itàlia426 1.726 22.002Colòmbia Perú Equador395 1.606 17.966 Taxa natalitat / 1.000 hab 9,2 8,5 8,8
Turismes (personesfísiques)/1 000hab 342 3 349 9 329 3 Turismes  (persones 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri Montbau
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 5.199 171.026 1.630.494 Superfície (km2) 2,0 11,9 102,2 Densitat (hab/km2) 2.540 14.315 15.960 Població per sexeDones 2.883 90.042 853.436Homes 2.316 80.984 777.058
Pb l i ódt(%)Població per e dat (en %)
0‐14 12,1 12,0 12,115‐24 5,9 8,7 9,025‐64 46,8 56,8 58,365 i mes 35,2 22,5 20,5Població per lloc de naixement (en %)Barcelona 53,3 52,4 51,2
RestaCatalunya 75 59 73 Resta Catalunya 7,5 5,9 7,3
Resta Espanya 25,4 24,8 19,7Estranger 13,9 16,9 21,8Població per nacionalitat (en %)Espanyols 88,9 87,4 82,7Estrangers 11,1 12,6 17,3Principals nacionalitats estrangeres
Colòmbia Equador Pakistan Colòmbia Equador Pakistan
64 2.076 22.342Perú Bolívia Itàlia49 1.726 22.002Xile Perú Equador47 1.606 17.966 Taxa natalitat / 1.000 hab 9,5 8,5 8,8
Turismes (personesfísiques)/1 000hab 361 4 349 9 329 3 Turismes  (persones  físiques)  / 1.000 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri la Clota
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 451 171.026 1.630.494 Superfície (km2) 0,2 11,9 102,2 Densitat (hab/km2) 2.530 14.315 15.960 Població per sexeDones 216 90.042 853.436Homes 235 80.984 777.058
Pb l i ódt(%)Població per e dat (en %)
0‐14 12,6 12,0 12,115‐24 8,2 8,7 9,025‐64 53,4 56,8 58,365 i mes 25,7 22,5 20,5Població per lloc de naixement (en %)Barcelona 53,9 52,4 51,2
RestaCatalunya 29 59 73 Resta Catalunya 2,9 5,9 7,3
Resta Espanya 21,1 24,8 19,7Estranger 22,2 16,9 21,8Població per nacionalitat (en %)Espanyols 77,3 87,4 82,7Estrangers 22,7 12,6 17,3Principals nacionalitats estrangeres
Pakistan Equador Pakistan Pakistan Equador Pakistan
49 2.076 22.342Equador Bolívia Itàlia9 1.726 22.002Veneçuela Perú Equador8 1.606 17.966 Taxa natalitat / 1.000 hab 4,4 8,5 8,8
Turismes (personesfísiques)/1 000hab 303 8 349 9 329 3 Turismes  (persones  físiques)  / 1.000

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Nou Barris
Barri Can Peguera
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 2.250 169.048 1.630.494 Superfície (km2) 0,1 8,0 102,2 Densitat (hab/km2) 18.826 21.022 15.960 Població per sexeDones 1.200 87.911 853.436Homes 1.050 81.137 777.058
Pb l i ódt(%)Població per e dat (en %)
0‐14 12,2 12,6 12,115‐24 11,3 9,3 9,025‐64 51,6 55,6 58,365 i mes 24,9 22,5 20,5Població per lloc de naixement (en %)Barcelona 66,7 46,3 51,2
RestaCatalunya 48 47 73 Resta Catalunya 4,8 4,7 7,3
Resta Espanya 21,1 28,2 19,7Estranger 7,4 20,8 21,8Població per nacionalitat (en %)Espanyols 95,6 83,7 82,7Estrangers 4,4 16,3 17,3Principals nacionalitats estrangeres
Marroc Equador Pakistan Marroc Equador Pakistan
15 4.360 22.342Itàlia Bolívia Itàlia10 2.694 22.002Xile Pakistan Equador9 1.791 17.966 Taxa natalitat / 1.000 hab 7,2 9,3 8,8
Turismes (personesfísiques)/1 000hab 295 1 328 5 329 3 Turismes  (persones  físiques)  / 1.000 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Nou Barris
Vallbona
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 1.342 169.048 1.630.494 Superfície (km2) 0,6 8,0 102,2 Densitat (hab/km2) 2.245 21.022 15.960 Població per sexeDones 678 87.911 853.436Homes 664 81.137 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 17,4 12,6 12,115‐24 9,8 9,3 9,025‐64 55,1 55,6 58,365 i mes 17,6 22,5 20,5Població per lloc de naixement (en %)Barcelona 53,1 46,3 51,2
RestaCatalunya 78 47 73 Resta Catalunya 7,8 4,7 7,3
Resta Espanya 26,1 28,2 19,7Estranger 13,0 20,8 21,8Població per nacionalitat (en %)Espanyols 88,2 83,7 82,7Estrangers 11,8 16,3 17,3Principals nacionalitats estrangeres
Pakistan Equador Pakistan Pakistan Equador Pakistan
37 4.360 22.342Marroc Bolívia Itàlia24 2.694 22.002Ucraïna Pakistan Equador12 1.791 17.966 Taxa natalitat / 1.000 hab 18,0 9,3 8,8
Turismes (personesfísiques)/1 000hab 349 5 328 5 329 3 Turismes  (persones  físiques)  / 1.0

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Andreu
Navas
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 21.699 147.374 1.630.494 Superfície (km2) 0,4 6,6 102,2 Densitat (hab/km2) 51.256 22.447 15.960 Població per sexeDones 11.575 76.737 853.436Homes 10.124 70.637 777.058
Pb l i ódt(%)Població per e dat (en %)
0‐14 11,7 12,8 12,115‐24 8,9 9,1 9,025‐64 57,2 58,7 58,365 i mes 22,1 19,4 20,5Població per lloc de naixement (en %)Barcelona 54,0 53,8 51,2
RestaCatalunya 72 67 73 Resta Catalunya 7,2 6,7 7,3
Resta Espanya 20,6 22,0 19,7Estranger 18,1 17,5 21,8Població per nacionalitat (en %)Espanyols 86,1 87,1 82,7Estrangers 13,9 12,9 17,3Principals nacionalitats estrangeres
Xina Equador Pakistan Xina Equador Pakistan
398 2.136 22.342Perú Perú Itàlia370 1.814 22.002Equador Pakistan Equador294 1.436 17.966 Taxa natalitat / 1.000 hab 8,7 9,8 8,8
Turismes (personesfísiques)/1 000hab 344 7 328 5 329 3 Turismes  (persones  físiques)  / 1.000 hab 344 ,

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Martí
Provençals del Poblenou
INDICADORSSOCIOECONÒMICS2010 SC CO INDICADORS  SOCIOECONÒMICS  2010 BARRI DI STRI CTE BAR CELONA Població 19.636 232.323 1.630.494 Superfície (km2) 1,1 10,5 102,2 Densitat (hab/km2) 17.771 22.076 15.960 Població per sexeDones 10.159 119.803 853.436Homes 9.477 112.520 777.058
bl ód() Població per e dat (en % )
0‐14 13,1 13,0 12,115‐24 10,1 8,8 9,025‐64 59,7 58,9 58,365 i mes 17,1 19,3 20,5Població per lloc de naixement (en %)Barcelona 55,1 51,9 51,2
l Resta Cata lunya 6,6 6,8 7,3Resta Espanya 21,7 21,8 19,7Estranger 16,6 19,6 21,8Població per nacionalitat (en %)Espanyols 86,8 84,8 82,7Estrangers 13,2 15,2 17,3Principals nacionalitats estrangeres
Ià l i Pk i Pk i Itàlia Pakistan Pakistan200 3.065 22.342Xina Itàlia Itàlia197 2.849 22.002Equador Xina Equador178 2.602 17.966 Taxa natalitat / 1.000 hab 10,1 9,5 8,8
Ti (fí i )/1 000hb 326 4 323 3 329 3 Turismes  (persones  físiques ) / 1.000 hab 326 ,4 323 ,3 329 ,3
Motos (persones físiques) / 1

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Les Corts
Barri les Corts
INDICADORSSOCIOECONÒMICS2009 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2009 BARRI DISTRICTE BARCELONA
Població 47.664 83.296 1.638.103 Superfície (km2) 1,4 6,0 102,2 Densitat (hab/km2) 33.743 13.842 16.035 Població per sexeDones 25.061 44.068 856.591Homes 22.603 39.228 781.512
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,0 11,4 11,915‐24 9,9 10,1 9,325‐64 58,0 57,0 58,565 i mes 21,1 21,5 20,3Població per lloc de naixement (en %)Barcelona 55,1 54,0 51,0
RestaCatalunya 10 5 10 7 73 Resta Catalunya 10,5 10,7 7,3
Resta Espanya 17,8 18,7 20,0Estranger 16,6 16,7 21,8Població per nacionalitat (en %)Espanyols 89,6 88,6 82,4Estrangers 10,4 11,4 17,6Principals nacionalitats estrangeres
Itàlia França Itàlia Itàlia França Itàlia
462 901 22.946França Itàlia Equador307 892 20.459Xina Colòmbia Pakistan286 538 18.150 Taxa natalitat / 1.000 hab 7,3 7,1 8,7
Turismes (personesfísiques)/1 000hab 376 4 408 9 328 8 Turismes  (persones  físiques)  /

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sarrià‐Sant Gervasi
Barri el Putxet i el Farró
INDICADORSSOCIOECONÒMICS2009 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2009 BARRI DISTRICTE BARCELONA
Població 30.461 145.532 1.638.103 Superfície (km2) 0,8 20,1 102,2 Densitat (hab/km2) 35.993 7.242 16.035 Població per sexeDones 16.223 78.899 856.591Homes 14.238 66.633 781.512
Poblacióperedat(en%)Població  per edat (en %)
0‐14 13,1 15,3 11,915‐24 9,6 10,0 9,325‐64 58,6 54,1 58,565 i mes 18,7 20,6 20,3Població per lloc de naixement (en %)Barcelona 58,0 62,3 51,0
RestaCatalunya 89 91 73 Resta Catalunya 8,9 9,1 7,3
Resta Espanya 11,7 12,2 20,0Estranger 21,4 16,4 21,8Població per nacionalitat (en %)Espanyols 86,7 88,6 82,4Estrangers 13,3 11,4 17,6Principals nacionalitats estrangeres
Itàlia Itàlia Itàlia Itàlia Itàlia Itàlia
506 2.042 7,7França França Equador346 1.945 7,5Alemanya Alemanya Pakistan226 914 6,0 Taxa natalitat / 1.000 hab 9,4 8,9 8,7
Turismes (personesfísiques)/1 000hab 369 8 429 0 328 8 Turismes  (persone

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri el Guinardó
INDICADORSSOCIOECONÒMICS2009 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2009 BARRI DISTRICTE BARCELONA
Població 35.836 172.018 1.638.103 Superfície (km2) 1,3 11,9 102,2 Densitat (hab/km2) 27.392 14.398 16.035 Població per sexeDones 19.128 90.398 856.591Homes 16.708 81.620 781.512
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,4 11,8 11,915‐24 8,7 8,9 9,325‐64 58,0 57,1 58,565 i mes 21,9 22,2 20,3Població per lloc de naixement (en %)Barcelona 54,6 52,1 51,0
RestaCatalunya 72 59 73 Resta Catalunya 7,2 5,9 7,3
Resta Espanya 20,1 25,2 20,0Estranger 18,2 16,9 21,8Població per nacionalitat (en %)Espanyols 85,4 87,1 82,4Estrangers 14,6 12,9 17,6Principals nacionalitats estrangeres
Equador Equador Itàlia Equador Equador Itàlia
492 2.304 7,7Perú Bolívia Equador443 1.808 7,5colombia Perú Pakistan407 1.698 6,0 Taxa natalitat / 1.000 hab 8,8 8,5 8,7
Turismes (personesfísiques)/1 000hab 344 1 351 3 328 8 Turismes  (persones  físiques) 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri la Clota
INDICADORSSOCIOECONÒMICS2009 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2009 BARRI DISTRICTE BARCELONA
Població 480 172.018 1.638.103 Superfície (km2) 0,2 11,9 102,2 Densitat (hab/km2) 2.693 14.398 16.035 Població per sexeDones 222 90.398 856.591Homes 258 81.620 781.512
Pb l i ódt(%)Població per e dat (en %)
0‐14 12,9 11,8 11,915‐24 7,7 8,9 9,325‐64 55,4 57,1 58,565 i mes 24,0 22,2 20,3Població per lloc de naixement (en %)Barcelona 52,5 52,1 51,0
RestaCatalunya 31 59 73 Resta Catalunya 3,1 5,9 7,3
Resta Espanya 21,3 25,2 20,0Estranger 23,1 16,9 21,8Població per nacionalitat (en %)Espanyols 80,8 87,1 82,4Estrangers 19,2 12,9 17,6Principals nacionalitats estrangeres
Pakistan Equador Itàlia Pakistan Equador Itàlia
36 2.304 7,7Equador Bolívia Equador9 1.808 7,5Romania Perú Pakistan6 1.698 6,0 Taxa natalitat / 1.000 hab 12,5 8,5 8,7
Turismes (personesfísiques)/1 000hab 270 8 351 3 328 8 Turismes  (persones  físiques)  / 1.000 hab 270 ,8 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Nou Barris
Barri les Roquetes
INDICADORSSOCIOECONÒMICS2009 SC CO INDICADORS  SOCIOECONÒMICS  2009 BARRI DI STRI CTE BAR CELONA Població 16.233 170.092 1.638.103 Superfície (km2) 0,6 8,0 102,2 Densitat (hab/km2) 25.296 21.152 16.035 Població per sexeDones 8.158 88.342 856.591Homes 8.075 81.750 781.512
bl ód() Població per e dat (en % )
0‐14 14,0 12,3 11,915‐24 11,2 9,6 9,325‐64 58,4 55,9 58,565 i mes 16,4 22,1 20,3Població per lloc de naixement (en %)Barcelona 43,3 46,1 51,0
l Resta Cata lunya 3,1 4,7 7,3Resta Espanya 28,7 28,7 20,0Estranger 25,0 20,5 21,8Població per nacionalitat (en %)Espanyols 78,5 83,7 82,4Estrangers 21,5 16,3 17,6Principals nacionalitats estrangeres
Ed Ed Ià l i Equa dor Equa dor Itàlia691 4.891 7,7Hondures Bolívia Equador387 2.818 7,5Pakistan Perú Pakistan277 1.804 6,0 Taxa natalitat / 1.000 hab 9,8 9,0 8,7
Ti (fí i )/1 000hb 321 6 330 0 328 8 Turismes  (persones  físiques ) / 1.000 hab 321 ,6 330 ,0 328 ,8
Motos (persones físiques) / 1.000 hab 58,2 6

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Andreu
La Trinitat Vella
INDICADORSSOCIOECONÒMICS2009 SC CO INDICADORS  SOCIOECONÒMICS  2009 BARRI DI STRI CTE BAR CELONA Població 10.574 147.573 1.638.103 Superfície (km2) 0,8 6,6 102,2 Densitat (hab/km2) 13.059 22.478 16.035 Població per sexeDones 4.899 76.671 856.591Homes 5.675 70.902 781.512
bl ód() Població per e dat (en % )
0‐14 14,9 12,5 11,915‐24 11,6 9,5 9,325‐64 60,6 59,0 58,565 i mes 12,9 19,1 20,3Població per lloc de naixement (en %)Barcelona 36,2 53,5 51,0
l Resta Cata lunya 3,1 6,7 7,3Resta Espanya 22,0 22,5 20,0Estranger 38,7 17,4 21,8Població per nacionalitat (en %)Espanyols 66,0 86,8 82,4Estrangers 34,0 13,2 17,6Principals nacionalitats estrangeres
Pk i Ed Ià l i Pakistan Equa dor Itàlia768 2.374 7,7Marroc Perú Equador586 1.946 7,5Equador Marroc Pakistan523 1.374 6,0 Taxa natalitat / 1.000 hab 13,7 9,5 8,7
Ti (fí i )/1 000hb 280 2 327 5 328 8 Turismes  (persones  físiques ) / 1.000 hab 280 ,2 327 ,5 328 ,8
Motos (persones físiques) / 1.000 hab 50,2 80

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Martí
El Clot
INDICADORSSOCIOECONÒMICS2009 SC CO INDICADORS  SOCIOECONÒMICS  2009 BARRI DI STRI CTE BAR CELONA Població 27.562 231.928 1.638.103 Superfície (km2) 0,7 10,5 102,2 Densitat (hab/km2) 39.603 22.039 16.035 Població per sexeDones 14.269 119.411 856.591Homes 13.293 112.517 781.512
bl ód() Població per e dat (en % )
0‐14 13,2 12,8 11,915‐24 9,8 9,1 9,325‐64 60,5 59,0 58,565 i mes 16,4 19,1 20,3Població per lloc de naixement (en %)Barcelona 54,5 51,7 51,0
l Resta Cata lunya 6,8 6,7 7,3Resta Espanya 20,6 22,2 20,0Estranger 18,1 19,3 21,8Població per nacionalitat (en %)Espanyols 85,5 84,7 82,4Estrangers 14,5 15,3 17,6Principals nacionalitats estrangeres
Xi Ià l i Ià l i Xina Itàlia Itàlia346 2.891 7,7Equador Equador Equador325 2.830 7,5Itàlia Xina Pakistan310 2.518 6,0 Taxa natalitat / 1.000 hab 8,7 9,3 8,7
Ti (fí i )/1 000hb 316 5 321 9 328 8 Turismes  (persones  físiques ) / 1.000 hab 316 ,5 321 ,9 328 ,8
Motos (persones físiques) / 1.000 hab 83,8 81,2 112,2Pob

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Martí
Sant Martí de Provençals
INDICADORSSOCIOECONÒMICS2009 SC CO INDICADORS  SOCIOECONÒMICS  2009 BARRI DI STRI CTE BAR CELONA Població 26.457 231.928 1.638.103 Superfície (km2) 0,7 10,5 102,2 Densitat (hab/km2) 35.493 22.039 16.035 Població per sexeDones 13.991 119.411 856.591Homes 12.466 112.517 781.512
bl ód() Població per e dat (en % )
0‐14 11,6 12,8 11,915‐24 8,2 9,1 9,325‐64 55,3 59,0 58,565 i mes 24,8 19,1 20,3Població per lloc de naixement (en %)Barcelona 51,7 51,7 51,0
l Resta Cata lunya 6,6 6,7 7,3Resta Espanya 27,6 22,2 20,0Estranger 14,1 19,3 21,8Població per nacionalitat (en %)Espanyols 89,3 84,7 82,4Estrangers 10,7 15,3 17,6Principals nacionalitats estrangeres
Ed Ià l i Ià l i Equa dor Itàlia Itàlia346 2.891 7,7Perú Equador Equador282 2.830 7,5Xina Xina Pakistan243 2.518 6,0 Taxa natalitat / 1.000 hab 8,1 9,3 8,7
Ti (fí i )/1 000hb 356 6 321 9 328 8 Turismes  (persones  físiques ) / 1.000 hab 356 ,6 321 ,9 328 ,8
Motos (persones físiques) / 1.000 hab 86

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de l'Eixample
Barri l'Antiga Esquerra de l'Eixample
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 42.076 267.534 1.630.494 Superfície (km2) 1,2 7,5 102,2 Densitat (hab/km2) 34.085 35.784 15.960 Població per sexeDones 22.790 143.964 853.436Homes 19.286 123.570 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 10,8 10,8 12,115‐24 8,3 8,5 9,025‐64 58,8 58,9 58,365 i mes 22,1 21,8 20,5Població per lloc de naixement (en %)Barcelona 51,2 51,5 51,2
RestaCatalunya 97 91 73 Resta Catalunya 9,7 9,1 7,3
Resta Espanya 14,8 16,5 19,7Estranger 24,3 22,9 21,8Població per nacionalitat (en %)Espanyols 81,4 82,6 82,7Estrangers 18,6 17,4 17,3Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
915 5.340 22.342Xina Xina Itàlia666 4.388 22.002França França Equador508 2.705 17.966 Taxa natalitat / 1.000 hab 7,7 7,8 8,8
Turismes (personesfísiques)/1 000hab 314 6 316 4 329 3 Turismes  

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sants‐Montjuïc
Barri Hostafrancs
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 16.062 184.288 1.630.494 Superfície (km2) 0,4 22,9 102,2 Densitat (hab/km2) 39.156 8.033 15.960 Població per sexeDones 8.317 95.609 853.436Homes 7.745 88.679 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 10,6 11,5 12,115‐24 9,2 9,3 9,025‐64 61,5 59,9 58,365 i mes 18,6 19,3 20,5Població per lloc de naixement (en %)Barcelona 49,0 49,0 51,2
RestaCatalunya 80 72 73 Resta Catalunya 8,0 7,2 7,3
Resta Espanya 17,5 19,2 19,7Estranger 25,5 24,6 21,8Població per nacionalitat (en %)Espanyols 79,1 80,8 82,7Estrangers 20,9 19,2 17,3Principals nacionalitats estrangeres
Pakistan Pakistan Pakistan Pakistan Pakistan Pakistan
280 2.955 22.342Equador Equador Itàlia279 2.670 22.002Xina Marroc Equador278 2.250 17.966 Taxa natalitat / 1.000 hab 9,3 8,8 8,8
Turismes (personesfísiques)/1 000hab 303 2 306 9 329 3 Turismes  (persone

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sarrià‐Sant Gervasi
Barri Sarrià
INDICADORSSOCIOECONÒMICS2010 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2010 BARRI DISTRICTE BARCELONA
Població 24.028 145.550 1.630.494 Superfície (km2) 3,0 20,1 102,2 Densitat (hab/km2) 7.900 7.243 15.960 Població per sexeDones 12.942 78.716 853.436Homes 11.086 66.834 777.058
Poblacióperedat(en%)Població  per edat (en %)
0‐14 17,6 15,5 12,115‐24 9,9 9,9 9,025‐64 52,0 53,8 58,365 i mes 20,5 20,9 20,5Població per lloc de naixement (en %)Barcelona 63,9 62,7 51,2
RestaCatalunya 92 91 73 Resta Catalunya 9,2 9,1 7,3
Resta Espanya 11,6 12,1 19,7Estranger 15,3 16,2 21,8Població per nacionalitat (en %)Espanyols 88,8 89,2 82,7Estrangers 11,2 10,8 17,3Principals nacionalitats estrangeres
França Itàlia Pakistan França Itàlia Pakistan
548 1.901 22.342Itàlia França Itàlia343 1.718 22.002Japó Alemanya Equador167 789 17.966 Taxa natalitat / 1.000 hab 10,0 9,1 8,8
Turismes (personesfísiques)/1 000hab 406 7 429 2 329 3 Turismes  (persones  físiq

In [26]:
# look length of list obtained to spot errors
print(len(years))
print(len(barris))
print(len(pop))
print(len(spanish))
print(len(guiris))
print(len(studies))
print(len(nojob))

# get rid of unwanted strings ('\xa0') replacing them by blankspaces (' ')
n = 0
for x in barris:
    barris[n] = x.replace('\xa0', ' ')
    n += 1
    
# Hardcode with real data  since, so far...its only missing cell :d
barris[0] = 'Barri el Raval'    

146
146
146
146
146
146
146


In [27]:
# create lists to append values 
years2 = []
districtes2 = []
barris2 = []
pop2 = []
spanish2 = []
guiris2 = []
studies2 = []
nojob2 = []   

# second folder
my_path = 'C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\fitxes_de_barri\\pdf_output\\2011_12_13_18_19\\'
pdf_paths = []
for filename in os.listdir(my_path):
    path = os.path.join(my_path, filename)
    pdf_paths.append(path)
    
    # iterate over list of pdf_paths and use method to extract text of each
for file_path in pdf_paths:
    text = extract_pdfText(file_path)
    print(text,'\n')

    # save variables with values obtained using regex
    year = re.findall(r"MICS\s+(20\d{2})", text)
    barri = re.findall(r"\n(.+)\nIND", text)
    population = re.findall(r"Població\s(['.',0-9]+)\s", text)
    spaniards = re.findall(r"Espanyols (['.',0-9]+)", text)
    strangers = re.findall(r"Estrangers (['.',0-9]+)", text)
    unemployment = re.findall(r"\sregistrats\s+\(\d+\)\s*(['.',0-9]+)\s+", text) 
    tuition = re.findall(r"CFGS\s+\(\d\)\s+(['.',0-9]+)", text)
    
    # process variables to desired types
    year = [int(x) for x in year]
    
    if len(population) == 0:
        population = [np.nan]
    else:
        population =[x.replace('.', '') for x in population]
        population = [float(x) for x in population]
    
    spaniards = [x.replace(',', '.') for x in spaniards]
    spaniards = [float(x) for x in spaniards]
    
    strangers = [x.replace(',', '.') for x in strangers]  
    strangers = [float(x) for x in strangers]
    
    unemployment =[x.replace('.', '').replace('975,7','975') for x in unemployment]
    unemployment = [float(x) for x in unemployment]
    
    tuition = [x.replace(',', '.') for x in tuition]
    if len(tuition) == 0:
        tuition = [np.nan]
    else:
        tuition = [float(x) for x in tuition]
    
    
    # extending every list of values instead of appending it      
    years2.extend(year)
    barris2.extend(barri)
    pop2.extend(population)
    spanish2.extend(spaniards)
    guiris2.extend(strangers)
    studies2.extend(tuition)
    nojob2.extend(unemployment)


Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Ciutat Vella
Barri el Raval
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 48.485 104.056 1.615.985 Superfície (km2) 1,1 4,4 102,2 Densitat (hab/km2) 44.141 23.819 15.818 Població per sexeDones 22.068 49.089 847.636Homes 26.417 54.967 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 12,2 10,7 12,315‐24 10,3 9,3 8,925‐64 64,3 64,7 58,065 i mes 13,1 15,3 20,8Població per lloc de naixement (en %)Barcelona 27,6 32,7 51,5
RestaCatalunya 45 52 74 Resta Catalunya 4,5 5,2 7,4
Resta Espanya 13,0 15,0 19,5Estranger 54,9 47,1 21,5Població per nacionalitat (en %)Espanyols 50,8 58,2 82,6Estrangers 49,2 41,8 17,4Principals nacionalitats estrangeres
Pakistan Pakistan Pakistan Pakistan Pakistan Pakistan
5.682 7.215 23.281Filipines Filipines Itàlia4.307 4.836 22.909Bangla Desh Itàlia Xina2.212 3.791 15.875 % Titulats superiors i CFGS (1) 18,8 23,7 24,9
Taxanatalitat /1 000hab 10 0 89 86 Taxa natalitat  / 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Ciutat Vella
Barri Sant Pere, Santa Caterina i la Ribera
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 22.632 104.056 1.615.985 Superfície (km2) 1,1 4,4 102,2 Densitat (hab/km2) 20.310 23.819 15.818 Població per sexeDones 11.337 49.089 847.636Homes 11.295 54.967 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 10,0 10,7 12,315‐24 8,5 9,3 8,925‐64 65,0 64,7 58,065 i mes 16,4 15,3 20,8Població per lloc de naixement (en %)Barcelona 35,4 32,7 51,5
RestaCatalunya 61 52 74 Resta Catalunya 6,1 5,2 7,4
Resta Espanya 16,9 15,0 19,5Estranger 41,6 47,1 21,5Població per nacionalitat (en %)Espanyols 62,9 58,2 82,6Estrangers 37,1 41,8 17,4Principals nacionalitats estrangeres
Itàlia Pakistan Pakistan Itàlia Pakistan Pakistan
1.113 7.215 23.281Marroc Filipines Itàlia734 4.836 22.909França Itàlia Xina637 3.791 15.875 % Titulats superiors i CFGS (1) 31,7 23,7 24,9
Taxanatalitat /1 000hab 95 89 86 Taxa nat

Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de l'Eixample
Barri la Dreta de l'Eixample
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 43.206 264.997 1.615.985 Superfície (km2) 2,1 7,5 102,2 Densitat (hab/km2) 20.347 35.443 15.818 Població per sexeDones 23.504 142.690 847.636Homes 19.702 122.307 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,9 10,8 12,315‐24 8,1 8,4 8,925‐64 57,9 58,8 58,065 i mes 22,0 22,0 20,8Població per lloc de naixement (en %)Barcelona 53,4 51,7 51,5
RestaCatalunya 98 91 74 Resta Catalunya 9,8 9,1 7,4
Resta Espanya 14,7 16,4 19,5Estranger 22,1 22,8 21,5Població per nacionalitat (en %)Espanyols 82,1 82,0 82,6Estrangers 17,9 18,0 17,4Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
1.138 5.535 23.281França Xina Itàlia739 4.694 22.909Xina França Xina555 2.791 15.875 % Titulats superiors i CFGS (1) 43,0 33,9 24,9
Taxanatalitat /1 000hab 86 77 86 Taxa natalitat  / 1.000 hab 8,6 7,

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de l'Eixample
Barri l'Antiga Esquerra de l'Eixample
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 41.653 264.997 1.615.985 Superfície (km2) 1,2 7,5 102,2 Densitat (hab/km2) 33.741 35.443 15.818 Població per sexeDones 22.566 142.690 847.636Homes 19.087 122.307 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 10,9 10,8 12,315‐24 8,2 8,4 8,925‐64 58,9 58,8 58,065 i mes 22,1 22,0 20,8Població per lloc de naixement (en %)Barcelona 51,5 51,7 51,5
RestaCatalunya 98 91 74 Resta Catalunya 9,8 9,1 7,4
Resta Espanya 14,8 16,4 19,5Estranger 23,9 22,8 21,5Població per nacionalitat (en %)Espanyols 80,9 82,0 82,6Estrangers 19,1 18,0 17,4Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
973 5.535 23.281Xina Xina Itàlia676 4.694 22.909França França Xina507 2.791 15.875 % Titulats superiors i CFGS (1) 39,4 33,9 24,9
Taxanatalitat /1 000hab 76 77 86 Taxa natalitat  / 1.000 hab

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sants‐Montjuïc
Barri la Marina del Prat Vermell
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 1.065 182.771 1.615.985 Superfície (km2) 14,3 22,9 102,2 Densitat (hab/km2) 75 7.967 15.818 Població per sexeDones 526 94.797 847.636Homes 539 87.974 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 13,0 11,6 12,315‐24 10,7 9,1 8,925‐64 53,7 59,7 58,065 i mes 22,6 19,5 20,8Població per lloc de naixement (en %)Barcelona 71,3 49,2 51,5
RestaCatalunya 38 73 74 Resta Catalunya 3,8 7,3 7,4
Resta Espanya 13,9 19,1 19,5Estranger 11,0 24,5 21,5Població per nacionalitat (en %)Espanyols 91,7 80,1 82,6Estrangers 8,3 19,9 17,4Principals nacionalitats estrangeres
Marroc Pakistan Pakistan Marroc Pakistan Pakistan
23 3.550 23.281Xina Equador Itàlia18 2.292 22.909Paraguai Xina Xina13 2.270 15.875 % Titulats superiors i CFGS (1) 2,6 18,9 24,9
Taxanatalitat /1 000hab 44 86 86 Taxa natalitat  / 1.000 hab 4,4 8,6 8

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sants‐Montjuïc
Barri Sants ‐ Badal
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 24.431 182.771 1.615.985 Superfície (km2) 0,4 22,9 102,2 Densitat (hab/km2) 59.504 7.967 15.818 Població per sexeDones 12.930 94.797 847.636Homes 11.501 87.974 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 10,6 11,6 12,315‐24 9,3 9,1 8,925‐64 59,9 59,7 58,065 i mes 20,2 19,5 20,8Població per lloc de naixement (en %)Barcelona 47,9 49,2 51,5
RestaCatalunya 89 73 74 Resta Catalunya 8,9 7,3 7,4
Resta Espanya 21,4 19,1 19,5Estranger 21,7 24,5 21,5Població per nacionalitat (en %)Espanyols 81,8 80,1 82,6Estrangers 18,2 19,9 17,4Principals nacionalitats estrangeres
Bolívia Pakistan Pakistan Bolívia Pakistan Pakistan
406 3.550 23.281Equador Equador Itàlia361 2.292 22.909Xina Xina Xina334 2.270 15.875 % Titulats superiors i CFGS (1) 18,6 18,9 24,9
Taxanatalitat /1 000hab 79 86 86 Taxa natalitat  / 1.000 hab 7,9 8,6

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Les Corts
Barri la Maternitat i Sant Ramon
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 23.758 82.340 1.615.985 Superfície (km2) 1,9 6,0 102,2 Densitat (hab/km2) 12.487 13.683 15.818 Població per sexeDones 12.645 43.724 847.636Homes 11.113 38.616 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,3 11,7 12,315‐24 9,2 9,6 8,925‐64 56,5 55,9 58,065 i mes 23,1 22,8 20,8Població per lloc de naixement (en %)Barcelona 51,8 54,5 51,5
RestaCatalunya 11 3 10 7 74 Resta Catalunya 11,3 10,7 7,4
Resta Espanya 22,0 18,3 19,5Estranger 14,9 16,4 21,5Població per nacionalitat (en %)Espanyols 88,8 88,4 82,6Estrangers 11,2 11,6 17,4Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
222 863 23.281Colòmbia França Itàlia207 795 22.909Perú Colòmbia Xina181 569 15.875 % Titulats superiors i CFGS (1) 31,5 35,0 24,9
Taxanatalitat /1 000hab 77 77 86 Taxa natalitat  / 1.000 hab 7,

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sarrià‐Sant Gervasi
Barri Sant Gervasi ‐ Galvany
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 46.207 144.791 1.615.985 Superfície (km2) 1,7 20,1 102,2 Densitat (hab/km2) 27.853 7.206 15.818 Població per sexeDones 25.420 78.506 847.636Homes 20.787 66.285 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 14,7 15,6 12,315‐24 10,0 10,0 8,925‐64 52,5 53,3 58,065 i mes 22,8 21,2 20,8Població per lloc de naixement (en %)Barcelona 63,4 63,3 51,5
RestaCatalunya 92 92 74 Resta Catalunya 9,2 9,2 7,4
Resta Espanya 12,1 12,0 19,5Estranger 15,4 15,6 21,5Població per nacionalitat (en %)Espanyols 89,2 89,0 82,6Estrangers 10,8 11,0 17,4Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
597 2.000 23.281França França Itàlia486 1.750 22.909Bolívia Alemanya Xina237 812 15.875 % Titulats superiors i CFGS (1) 45,1 44,3 24,9
Taxanatalitat /1 000hab 84 90 86 Taxa natalitat  / 1.00

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Gràcia
Barri la Salut
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 13.199 121.430 1.615.985 Superfície (km2) 0,6 4,2 102,2 Densitat (hab/km2) 20.511 29.011 15.818 Població per sexeDones 7.142 65.458 847.636Homes 6.057 55.972 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,5 11,3 12,315‐24 9,0 8,0 8,925‐64 56,9 59,5 58,065 i mes 22,6 21,1 20,8Població per lloc de naixement (en %)Barcelona 58,3 55,5 51,5
RestaCatalunya 93 90 74 Resta Catalunya 9,3 9,0 7,4
Resta Espanya 16,4 15,9 19,5Estranger 16,0 19,6 21,5Població per nacionalitat (en %)Espanyols 87,6 84,7 82,6Estrangers 12,4 15,3 17,4Principals nacionalitats estrangeres
Itàlia Itàlia Pakistan Itàlia Itàlia Pakistan
243 2.572 23.281Colòmbia França Itàlia98 1.257 22.909França Colòmbia Xina84 812 15.875 % Titulats superiors i CFGS (1) 33,5 34,0 24,9
Taxanatalitat /1 000hab 88 91 86 Taxa natalitat  / 1.000 hab 8,8 9,1 8,6
Població de mé

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri Can Baró
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 8.984 169.512 1.615.985 Superfície (km2) 0,4 11,9 102,2 Densitat (hab/km2) 23.411 14.188 15.818 Població per sexeDones 4.758 89.546 847.636Homes 4.226 79.966 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,7 12,1 12,315‐24 8,6 8,6 8,925‐64 57,6 56,2 58,065 i mes 22,2 23,0 20,8Població per lloc de naixement (en %)Barcelona 55,1 52,8 51,5
RestaCatalunya 59 60 74 Resta Catalunya 5,9 6,0 7,4
Resta Espanya 22,7 24,4 19,5Estranger 16,3 16,8 21,5Població per nacionalitat (en %)Espanyols 86,3 87,4 82,6Estrangers 13,7 12,6 17,4Principals nacionalitats estrangeres
Itàlia Equador Pakistan Itàlia Equador Pakistan
76 1.801 23.281Colòmbia Bolívia Itàlia65 1.656 22.909Equador Perú Xina61 1.556 15.875 % Titulats superiors i CFGS (1) 19,6 18,5 24,9
Taxanatalitat /1 000hab 60 77 86 Taxa natalitat  / 1.000 hab 6,0 7,7 8,6
Pobla

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri la Teixonera
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 11.420 169.512 1.615.985 Superfície (km2) 0,3 11,9 102,2 Densitat (hab/km2) 33.882 14.188 15.818 Població per sexeDones 5.906 89.546 847.636Homes 5.514 79.966 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 13,0 12,1 12,315‐24 8,8 8,6 8,925‐64 57,9 56,2 58,065 i mes 20,3 23,0 20,8Població per lloc de naixement (en %)Barcelona 47,8 52,8 51,5
RestaCatalunya 39 60 74 Resta Catalunya 3,9 6,0 7,4
Resta Espanya 29,1 24,4 19,5Estranger 19,3 16,8 21,5Població per nacionalitat (en %)Espanyols 84,9 87,4 82,6Estrangers 15,1 12,6 17,4Principals nacionalitats estrangeres
Perú Equador Pakistan Perú Equador Pakistan
189 1.801 23.281Equador Bolívia Itàlia164 1.656 22.909Bolívia Perú Xina124 1.556 15.875 % Titulats superiors i CFGS (1) 11,8 18,5 24,9
Taxanatalitat /1 000hab 86 77 86 Taxa natalitat  / 1.000 hab 8,6 7,7 8,6
Po

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Horta‐Guinardó
Barri Horta
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 27.312 169.512 1.615.985 Superfície (km2) 3,1 11,9 102,2 Densitat (hab/km2) 8.861 14.188 15.818 Població per sexeDones 14.185 89.546 847.636Homes 13.127 79.966 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,5 12,1 12,315‐24 8,8 8,6 8,925‐64 55,9 56,2 58,065 i mes 23,8 23,0 20,8Població per lloc de naixement (en %)Barcelona 55,0 52,8 51,5
RestaCatalunya 58 60 74 Resta Catalunya 5,8 6,0 7,4
Resta Espanya 23,9 24,4 19,5Estranger 15,3 16,8 21,5Població per nacionalitat (en %)Espanyols 90,8 87,4 82,6Estrangers 9,2 12,6 17,4Principals nacionalitats estrangeres
Bolívia Equador Pakistan Bolívia Equador Pakistan
200 1.801 23.281Equador Bolívia Itàlia189 1.656 22.909Dominicana, República Perú Xina177 1.556 15.875 % Titulats superiors i CFGS (1) 17,0 18,5 24,9
Taxanatalitat /1 000hab 72 77 86 Taxa natalitat  / 1.000 hab 7

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Nou Barris
Barri el Turó de la Peira
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 15.270 167.548 1.615.985 Superfície (km2) 0,4 8,0 102,2 Densitat (hab/km2) 43.121 20.835 15.818 Població per sexeDones 8.164 87.425 847.636Homes 7.106 80.123 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 12,6 12,8 12,315‐24 9,3 9,2 8,925‐64 51,6 55,1 58,065 i mes 26,5 23,0 20,8Població per lloc de naixement (en %)Barcelona 41,7 46,6 51,5
RestaCatalunya 45 47 74 Resta Catalunya 4,5 4,7 7,4
Resta Espanya 24,1 27,7 19,5Estranger 29,6 20,9 21,5Població per nacionalitat (en %)Espanyols 76,0 83,4 82,6Estrangers 24,0 16,6 17,4Principals nacionalitats estrangeres
Bolívia Equador Pakistan Bolívia Equador Pakistan
839 3.832 23.281Equador Bolívia Itàlia469 2.650 22.909Dominicana, República Pakistan Xina351 2.057 15.875 % Titulats superiors i CFGS (1) 10,3 10,1 24,9
Taxanatalitat /1 000hab 81 89 86 Taxa natalitat  

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Nou Barris
Barri la Prosperitat
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 26.594 167.548 1.615.985 Superfície (km2) 0,6 8,0 102,2 Densitat (hab/km2) 44.708 20.835 15.818 Població per sexeDones 13.797 87.425 847.636Homes 12.797 80.123 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 12,8 12,8 12,315‐24 8,4 9,2 8,925‐64 54,6 55,1 58,065 i mes 24,1 23,0 20,8Població per lloc de naixement (en %)Barcelona 44,9 46,6 51,5
RestaCatalunya 43 47 74 Resta Catalunya 4,3 4,7 7,4
Resta Espanya 32,0 27,7 19,5Estranger 18,8 20,9 21,5Població per nacionalitat (en %)Espanyols 84,8 83,4 82,6Estrangers 15,2 16,6 17,4Principals nacionalitats estrangeres
Equador Equador Pakistan Equador Equador Pakistan
581 3.832 23.281Bolívia Bolívia Itàlia381 2.650 22.909Xina Pakistan Xina305 2.057 15.875 % Titulats superiors i CFGS (1) 8,5 10,1 24,9
Taxanatalitat /1 000hab 87 89 86 Taxa natalitat  / 1.000 hab 8,7 8,9 8

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Nou Barris
Ciutat Meridiana
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 10.832 167.548 1.615.985 Superfície (km2) 0,4 8,0 102,2 Densitat (hab/km2) 30.477 20.835 15.818 Població per sexeDones 5.255 87.425 847.636Homes 5.577 80.123 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 15,6 12,8 12,315‐24 11,4 9,2 8,925‐64 57,1 55,1 58,065 i mes 15,9 23,0 20,8Població per lloc de naixement (en %)Barcelona 33,0 46,6 51,5
RestaCatalunya 30 47 74 Resta Catalunya 3,0 4,7 7,4
Resta Espanya 23,9 27,7 19,5Estranger 40,1 20,9 21,5Població per nacionalitat (en %)Espanyols 64,7 83,4 82,6Estrangers 35,3 16,6 17,4Principals nacionalitats estrangeres
Equador Equador Pakistan Equador Equador Pakistan
681 3.832 23.281Pakistan Bolívia Itàlia580 2.650 22.909Marroc Pakistan Xina415 2.057 15.875 % Titulats superiors i CFGS (1) 4,4 10,1 24,9
Taxanatalitat /1 000hab 10 5 89 86 Taxa natalitat  / 1.000 hab 10,5 8,9 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Andreu
La Sagrera
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 29.136 146.956 1.615.985 Superfície (km2) 1,0 6,6 102,2 Densitat (hab/km2) 29.963 22.383 15.818 Població per sexeDones 15.491 76.767 847.636Homes 13.645 70.189 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,4 12,9 12,315‐24 9,1 9,0 8,925‐64 58,8 58,3 58,065 i mes 20,7 19,8 20,8Població per lloc de naixement (en %)Barcelona 50,5 54,1 51,5
RestaCatalunya 64 68 74 Resta Catalunya 6,4 6,8 7,4
Resta Espanya 24,6 21,7 19,5Estranger 18,5 17,4 21,5Població per nacionalitat (en %)Espanyols 86,4 87,0 82,6Estrangers 13,6 13,0 17,4Principals nacionalitats estrangeres
Perú Equador Pakistan Perú Equador Pakistan
519 1.832 23.281Equador Perú Itàlia406 1.735 22.909Bolívia Pakistan Xina328 1.561 15.875 % Titulats superiors i CFGS (1) 17,7 17,0 24,9
Taxanatalitat /1 000hab 88 92 86 Taxa natalitat  / 1.000 hab 8,8 9,2 8,6
Població de

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Andreu
Navas
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 21.758 146.956 1.615.985 Superfície (km2) 0,4 6,6 102,2 Densitat (hab/km2) 51.393 22.383 15.818 Població per sexeDones 11.576 76.767 847.636Homes 10.182 70.189 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,9 12,9 12,315‐24 9,0 9,0 8,925‐64 56,9 58,3 58,065 i mes 22,3 19,8 20,8Població per lloc de naixement (en %)Barcelona 54,1 54,1 51,5
RestaCatalunya 72 68 74 Resta Catalunya 7,2 6,8 7,4
Resta Espanya 20,1 21,7 19,5Estranger 18,6 17,4 21,5Població per nacionalitat (en %)Espanyols 85,6 87,0 82,6Estrangers 14,4 13,0 17,4Principals nacionalitats estrangeres
Xina Equador Pakistan Xina Equador Pakistan
457 1.832 23.281Perú Perú Itàlia368 1.735 22.909Equador Pakistan Xina257 1.561 15.875 % Titulats superiors i CFGS (1) 20,0 17,0 24,9
Taxanatalitat /1 000hab 82 92 86 Taxa natalitat  / 1.000 hab 8,2 9,2 8,6
Població de més de 

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Martí
El Besòs i el Maresme
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 23.998 231.584 1.615.985 Superfície (km2) 1,3 10,5 102,2 Densitat (hab/km2) 18.830 22.005 15.818 Població per sexeDones 11.475 119.634 847.636Homes 12.523 111.950 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 13,0 13,1 12,315‐24 10,7 8,7 8,925‐64 57,8 58,6 58,065 i mes 18,4 19,6 20,8Població per lloc de naixement (en %)Barcelona 41,9 52,2 51,5
RestaCatalunya 47 69 74 Resta Catalunya 4,7 6,9 7,4
Resta Espanya 22,7 21,5 19,5Estranger 30,7 19,5 21,5Població per nacionalitat (en %)Espanyols 75,0 84,4 82,6Estrangers 25,0 15,6 17,4Principals nacionalitats estrangeres
Pakistan Pakistan Pakistan Pakistan Pakistan Pakistan
2.176 3.538 23.281Marroc Itàlia Itàlia419 2.957 22.909Equador Xina Xina397 2.796 15.875 % Titulats superiors i CFGS (1) 7,7 20,1 24,9
Taxanatalitat /1 000hab 97 94 86 Taxa natalitat  / 1.000 hab 9

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.


Districte de Sant Martí
Sant Martí de Provençals
INDICADORSSOCIOECONÒMICS2011 BARRI DISTRICTE BARCELONA INDICADORS  SOCIOECONÒMICS  2011 BARRI DISTRICTE BARCELONA
Població 26.178 231.584 1.615.985 Superfície (km2) 0,7 10,5 102,2 Densitat (hab/km2) 35.118 22.005 15.818 Població per sexeDones 13.904 119.634 847.636Homes 12.274 111.950 768.349
Poblacióperedat(en%)Població  per edat (en %)
0‐14 11,8 13,1 12,315‐24 7,9 8,7 8,925‐64 54,6 58,6 58,065 i mes 25,7 19,6 20,8Població per lloc de naixement (en %)Barcelona 52,2 52,2 51,5
RestaCatalunya 67 69 74 Resta Catalunya 6,7 6,9 7,4
Resta Espanya 26,7 21,5 19,5Estranger 14,4 19,5 21,5Població per nacionalitat (en %)Espanyols 88,9 84,4 82,6Estrangers 11,1 15,6 17,4Principals nacionalitats estrangeres
Xina Pakistan Pakistan Xina Pakistan Pakistan
352 3.538 23.281Equador Itàlia Itàlia256 2.957 22.909Perú Xina Xina246 2.796 15.875 % Titulats superiors i CFGS (1) 13,9 20,1 24,9
Taxanatalitat /1 000hab 83 94 86 Taxa natalitat  / 1.000 hab 8,3 9,4 8,

In [28]:
# look length of list obtained to spot errors
print(len(years2))
print(len(barris2))
print(len(pop2))
print(len(spanish2))
print(len(guiris2))
print(len(studies2))
print(len(nojob2))

# since this folder contains diferent years mixed (2011_12_13_18_19) we'll separate the last two to keep the order
years2b = years2[219:]
years2 = years2[:219]

barris2b = barris2[219:]
barris2 = barris2[:219]

pop2b = pop2[219:]
pop2 = pop2[:219]

spanish2b = spanish2[219:]
spanish2 = spanish2[:219]

guiris2b = guiris2[219:]
guiris2 = guiris2[:219]

studies2b = studies2[219:]
studies2 = studies2[:219]

nojob2b = nojob2[219:]
nojob2 = nojob2[:219]



365
365
365
365
365
365
365


In [29]:
# define another method that extracts the text of all pages from pdf
def extract_pdfText2(path):
    with open(path, 'rb') as f:
        reader = PdfFileReader(f)
        num_pages = reader.getNumPages()
        text = ""                # define variable for using in loop.
        for page_number in range(num_pages):
            page = reader.getPage(page_number)
            text += page.extractText()  
    return text


years3 = []
districtes3 = []
barris3 = []
pop3 = []
spanish3 = []
guiris3= []
studies3 = []
nojob3 = []   

# third folder
my_path = 'C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\fitxes_de_barri\\pdf_output\\2014_15_16_17\\'
pdf_paths = []
for filename in os.listdir(my_path):
    path = os.path.join(my_path, filename)
    pdf_paths.append(path)
    
# iterate over list of pdf_paths and use method to extract text of each
for file_path in pdf_paths:
    text = extract_pdfText2(file_path)
    print(text,'\n')

    # save variables with values obtained using regex 
    year = re.findall(r"MICS\s+(20\d{2})", text)
    barri = re.findall(r"\n(.+)\nIND", text)
    population = re.findall(r"\sPoblació\.\s*.+\d\d\d\d\s*(['.',0-9]+)", text)
    spaniards = re.findall(r"Espanyols (['.',0-9]+)", text)
    strangers = re.findall(r"Estrangers (['.',0-9]+)", text)
    unemployment = re.findall(r"\sregistrats\s+\(\d+\)\s*(['.',0-9]+)\s+", text) 
    tuition = re.findall(r"CFGS\s+\(\d\)\s+(['.',0-9]+)", text)
    
    # process variables to desired types
    year = [int(x) for x in year]
    
    population =[x.replace(".", "") for x in population]
    population = [int(x) for x in population]
   
    spaniards = [x.replace(',', '.') for x in spaniards]
    spaniards = [float(x) for x in spaniards]
    
    strangers = [x.replace(',', '.') for x in strangers]  
    strangers = [float(x) for x in strangers]
    
    unemployment =[x.replace('.','') for x in unemployment]
    unemployment = [re.sub(r"\,\d*","",x) for x in unemployment]
    unemployment = [float(x) for x in unemployment]
    
    tuition = [x.replace(',', '.') for x in tuition]        
    tuition = [float(x) for x in tuition]
        
    # extending every list of values instead of appending it      
    years3.extend(year)
    barris3.extend(barri)
    pop3.extend(population)
    spanish3.extend(spaniards)
    guiris3.extend(strangers)
    studies3.extend(tuition)
    nojob3.extend(unemployment)

Barri el Raval
Districte de Ciutat VellaDistricte de Ciutat Vella
Barri el Raval
INDICADORS SOCIOECONÒMICS 2014 BARRI DISTRICTE BARCELONA Població. Juny 2014 48.471 102.237 1.613.393Superfície (km2) 1,1 4,4 102,2 Densitat (hab/km2) 44.145 23.406 15.793 Població per sexeDones 21.859 48.267 848.743Homes 26.612 53.970 764.650Població per edat (en %)0‐14 12,5 10,7 12,515‐24 10,3 9,3 8,725‐64 64,8 65,6 57,365 i mes 12,3 14,4 21,4Població per lloc de naixement (en %)Barcelona 27,3 31,7 51,9Resta Catalunya 4,4 5,0 7,5Resta Espanya 11,6 13,2 18,4Estranger 56,7 50,0 22,2Població per nacionalitat (en %). Gener 2015Espanyols 52,1 57,6 83,7Estrangers 47,9 42,4 16,3Principals nacionalitats estrangeresPakistan Pakistan Itàlia5.082 6.594 25.707Filipines Filipines Pakistan4.034 4.542 19.414Bangla Desh Itàlia Xina2.431 4.459 17.487 % Titulats superiors i CFGS (1) 21,1 27,3 28,4
Taxa natalitat / 1.000 hab 9,0 7,7 8,4Població de més de 65 anys que viu sola (%)(2) 32,6 32,3 25,6 Índex de sobreenvelliment 

In [30]:
# look length of list obtained to spot errors
print(len(years3))
print(len(barris3))
print(len(pop3))
print(len(spanish3))
print(len(guiris3))
print(len(studies3))
print(len(nojob3))

292
292
292
292
292
292
292


In [31]:
# once every list is correct, join all lists
years = years + years2 + years3 + years2b
barris = barris + barris2 + barris3 + barris2b
pop = pop + pop2 + pop3 + pop2b
spaniards = spanish + spanish2 + spanish3 + spanish2b
strangers = guiris + guiris2 + guiris3 + guiris2b
college = studies + studies2 + studies3 + studies2b
unemployed = nojob + nojob2 + nojob3 + nojob2b

# look length of list, last entry and type to spot errors
print(len(years), years[-1], type(years[-1]))
print(len(barris), barris[-1], type(barris[-1]))
print(len(pop), pop[-1], type(pop[-1]))
print(len(spaniards), spaniards[-1], type(spaniards[-1]))
print(len(strangers), strangers[-1], type(strangers[-1]))
print(len(college), college[-1], type(college[-1]))
print(len(unemployed), unemployed[-1], type(unemployed[-1]))

803 2019 <class 'int'>
803 La Verneda i la Pau <class 'str'>
803 28878.0 <class 'float'>
803 85.3 <class 'float'>
803 14.7 <class 'float'>
803 15.1 <class 'float'>
803 1461.0 <class 'float'>


In [32]:
# put lists obtained in a dictionary 
d = {'year' : years,
     'neighbourhood' : barris,
     'population' : pop,
     '% spaniards' : spaniards,
     '% strangers' : strangers,
     '% w/ higher education' : college,
     'unemployed' : unemployed }

df_socioeconomics = pd.DataFrame.from_dict(d)

# print length and the list of names
print(len(df_socioeconomics['neighbourhood'].unique()))
list1 = list(set(df_socioeconomics['neighbourhood'].unique()))
list2 = list()

148


In [33]:
# solve blank spaces problem
df_socioeconomics['neighbourhood'] = [x.strip() for x in df_socioeconomics['neighbourhood']]

# solve typos spotted
df_socioeconomics['neighbourhood'] = [x.replace("\xa0", " ").replace(',', '') for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"Barri\s", "", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"lArpa", "l'Arpa", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"lAntiga", "l'Antiga", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"lEixample", "l'Eixample", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"dHebron", "d'Hebron", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"\s-\s.+Franca", "", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"den", "d'en", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"\s-\s.+uïc", "", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"El", "el", x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r"La", "la", x) for x in df_socioeconomics['neighbourhood']]


print(len(df_socioeconomics['neighbourhood'].unique()))

77


So...number of rows is correct yet number unique name for neighbourhoods is not. 
It must be 4 typos. Zero-sum game style...

In [34]:
# make dict to count each neighbourhhod name ocurrence. Counted beans baby!
lst = list(df_socioeconomics['neighbourhood'])
my_dict = {x : lst.count(x) for x in lst}

# other than 11 is WRONG since our database spans 11 years!!!
for k,v in my_dict.items():
    if v != 11:
        print(k, v)

el Parc i la Llacuna del Poblenou 10
Sants ‐ Badal 9
Sant Gervasi ‐ la Bonanova 9
Sant Gervasi ‐ Galvany 9
Districte Sant MartíDistricte de Sant Martíel Parc i la Llacuna del Poblenou 1
Sants - Badal 2
Sant Gervasi - la Bonanova 2
Sant Gervasi - Galvany 2


In [35]:
# correct typos
df_socioeconomics['neighbourhood'] = [ re.sub(r"^\s*San.+vany\s*$","Sant Gervasi - Galvany", x)for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [ re.sub(r"^\s*Sant.+anova\s*$","Sant Gervasi ‐ la Bonanova", x)for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [ re.sub(r"^\s*Sant.+adal\s*$","Sants ‐ Badal", x)for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [ re.sub(r"^\s*Distric.+tíDis.+tí","",x)for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r".+Sec\s*$","el Poble Sec - Parc Montjuïc",x) for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [x.replace(",", "") for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [ re.sub(r".+Gòtic\s*$","el Barri Gòtic", x)for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [ re.sub(r".+Badal\s*$","Sants - Badal",x)for x in df_socioeconomics['neighbourhood']]
df_socioeconomics['neighbourhood'] = [re.sub(r".+nanova\s*$","Sant Gervasi - la Bonanova",x) for x in df_socioeconomics['neighbourhood']]



b = len(df_socioeconomics['neighbourhood'].unique())
if b == 73:
    print('solved!')
else: 
    print('keep on')

# order by year and neighbourhood 
df_socioeconomics_copy = df_socioeconomics.sort_values(by=['year', 'neighbourhood'], ignore_index=True)
display(df_socioeconomics_copy)


solved!


Unnamed: 0,year,neighbourhood,population,% spaniards,% strangers,% w/ higher education,unemployed
0,2009,Baró de Viver,2372.0,92.7,7.3,,
1,2009,Can Baró,9159.0,86.2,13.8,,
2,2009,Can Peguera,2210.0,96.3,3.7,,
3,2009,Canyelles,7359.0,95.3,4.7,,
4,2009,Ciutat Meridiana,11355.0,64.5,35.5,,
...,...,...,...,...,...,...,...
798,2019,la Vila Olímpica del Poblenou,9385.0,80.6,19.4,54.1,391.0
799,2019,la Vila de Gràcia,50926.0,76.4,23.6,47.5,1939.0
800,2019,les Corts,46731.0,86.2,13.8,45.0,1665.0
801,2019,les Roquetes,16417.0,75.3,24.7,9.8,906.0


### f) Gini index(%)

In [36]:
df_gini = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\index_gini/*.csv")), ignore_index=True)
display(df_gini)
display(df_gini.info())

Unnamed: 0,Any,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Seccio_Censal,Index_Gini
0,2015,1,Ciutat Vella,1,el Raval,1,36.5
1,2015,1,Ciutat Vella,1,el Raval,2,40.4
2,2015,1,Ciutat Vella,1,el Raval,3,38.4
3,2015,1,Ciutat Vella,1,el Raval,4,40.4
4,2015,1,Ciutat Vella,1,el Raval,5,38.6
...,...,...,...,...,...,...,...
5335,2019,10,Sant Martí,73,la Verneda i la Pau,143,27.1
5336,2019,10,Sant Martí,65,el Clot,234,26.3
5337,2019,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,235,31.3
5338,2019,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,236,26.6


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5340 entries, 0 to 5339
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Any             5340 non-null   int64 
 1   Codi_Districte  5340 non-null   int64 
 2   Nom_Districte   5340 non-null   object
 3   Codi_Barri      5340 non-null   int64 
 4   Nom_Barri       5340 non-null   object
 5   Seccio_Censal   5340 non-null   int64 
 6   Index_Gini      5340 non-null   object
dtypes: int64(4), object(3)
memory usage: 292.2+ KB


None

In [37]:
df_gini.drop(columns=['Codi_Districte','Nom_Districte','Codi_Barri','Seccio_Censal'], axis=1, inplace=True)

# make sure it's a string because although appears as a object in df.info() regex says it's a float (¿?)
df_gini['Index_Gini'] = df_gini['Index_Gini'].apply(str)
df_gini['Index_Gini']= [x.replace("-", "")for x in df_gini['Index_Gini']]
df_gini['Index_Gini'] = [re.sub(r"\.$","",x) for x in df_gini['Index_Gini']]  

# solve blank spaces problem replacing them by nans and converting to float type
df_gini['Index_Gini'] = [float(x.strip() or np.nan) for x in df_gini['Index_Gini']]

# grouping by year and neighbourhood computing the mean
df_gini = df_gini.groupby(['Any','Nom_Barri'])['Index_Gini'].mean()      
df_gini = df_gini.reset_index()       

# change columns names
df_gini.columns = ['year', 'neighbourhood', 'gini_index(%)']

# change typos to fit other dfs
df_gini['neighbourhood'] = [re.sub(r".+Sec\s*$","el Poble Sec - Parc Montjuïc",x) for x in df_gini['neighbourhood']]
df_gini['neighbourhood'] = [re.sub(r".+òtic\s*$","el Barri Gòtic",x) for x in df_gini['neighbourhood']]
df_gini['neighbourhood'] = [re.sub(r".+Farró\s*$","el Putxet i el Farró",x) for x in df_gini['neighbourhood']]
df_gini['neighbourhood'] = [re.sub(r".+Arpa.+$","el Camp de l'Arpa del Clot",x) for x in df_gini['neighbourhood']]
df_gini['neighbourhood'] = [re.sub(r".+Gràcia\s*$","la Vila de Gràcia",x) for x in df_gini['neighbourhood']]

df_gini['neighbourhood'] = [x.replace(",", "") for x in df_gini['neighbourhood']]

# order by year and neighbourhood 
df_gini = (df_gini.sort_values(by=['year', 'neighbourhood'], ignore_index=True))

#check
display(df_gini)

Unnamed: 0,year,neighbourhood,gini_index(%)
0,2015,Baró de Viver,33.450000
1,2015,Can Baró,32.800000
2,2015,Can Peguera,34.450000
3,2015,Canyelles,26.140000
4,2015,Ciutat Meridiana,34.566667
...,...,...,...
360,2019,la Vila Olímpica del Poblenou,33.820000
361,2019,la Vila de Gràcia,33.850000
362,2019,les Corts,33.605714
363,2019,les Roquetes,30.040000


### g) Disposable income 

In [38]:
df_disp_income = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\renda_disponible/*.csv")), ignore_index=True)
display(df_disp_income)
display(df_disp_income.info())

Unnamed: 0,Any,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Import_€_Any,Euros_Any
0,2015,1,Ciutat Vella,1,el Raval,10896.0,
1,2015,1,Ciutat Vella,2,el Barri Gòtic,14456.0,
2,2015,1,Ciutat Vella,3,la Barceloneta,14714.0,
3,2015,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",15154.0,
4,2015,2,Eixample,5,el Fort Pienc,20817.0,
...,...,...,...,...,...,...,...
360,2019,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,,25589.0
361,2019,10,Sant Martí,70,el Besòs i el Maresme,,12787.0
362,2019,10,Sant Martí,71,Provençals del Poblenou,,20080.0
363,2019,10,Sant Martí,72,Sant Martí de Provençals,,18637.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Any             365 non-null    int64  
 1   Codi_Districte  365 non-null    int64  
 2   Nom_Districte   365 non-null    object 
 3   Codi_Barri      365 non-null    int64  
 4   Nom_Barri       365 non-null    object 
 5   Import_€_Any    219 non-null    float64
 6   Euros_Any       146 non-null    float64
dtypes: float64(2), int64(3), object(2)
memory usage: 20.1+ KB


None

In [39]:
#Sum columns with different name
df_disp_income['disp_income(€/year)'] = df_disp_income['Import_€_Any'].add(df_disp_income['Euros_Any'],fill_value=0)

# drop unwanted columns
df_disp_income.drop(columns=['Codi_Districte','Nom_Districte','Codi_Barri','Import_€_Any','Euros_Any'], axis=1, inplace=True)

#change columns name
df_disp_income.columns = ['year','neighbourhood','disp_income(€/year)']

# change typos to fit other dfs
df_disp_income['neighbourhood'] = [re.sub(r".+Sec\s*$","el Poble Sec - Parc Montjuïc",x) for x in df_disp_income['neighbourhood']]
df_disp_income['neighbourhood'] = [x.replace(",", "") for x in df_disp_income['neighbourhood']]

# order by year and neighbourhood 
df_disp_income = (df_disp_income.sort_values(by=['year', 'neighbourhood'], ignore_index=True))

display(df_disp_income.info())
display(df_disp_income)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 3 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   year                 365 non-null    int64  
 1   neighbourhood        365 non-null    object 
 2   disp_income(€/year)  365 non-null    float64
dtypes: float64(1), int64(1), object(1)
memory usage: 8.7+ KB


None

Unnamed: 0,year,neighbourhood,disp_income(€/year)
0,2015,Baró de Viver,11217.0
1,2015,Can Baró,18883.0
2,2015,Can Peguera,12002.0
3,2015,Canyelles,17003.0
4,2015,Ciutat Meridiana,10203.0
...,...,...,...
360,2019,la Vila Olímpica del Poblenou,32436.0
361,2019,la Vila de Gràcia,23749.0
362,2019,les Corts,28898.0
363,2019,les Roquetes,13014.0


### h) Number of registered  real state Purchases

In [40]:
df = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\venda_pisos\\nombre_compraventes/*.csv")),  ignore_index=True)
display(df.info())
display(df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4440 entries, 0 to 4439
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Any             4440 non-null   int64  
 1   Trimestre       4440 non-null   int64  
 2   Codi_Districte  4440 non-null   int64  
 3   Nom_Districte   4440 non-null   object 
 4   Codi_Barri      4440 non-null   int64  
 5   Nom_Barri       4440 non-null   object 
 6   Compravendes    4440 non-null   object 
 7   Nombre          4439 non-null   float64
dtypes: float64(1), int64(4), object(3)
memory usage: 277.6+ KB


None

Unnamed: 0,Any,Trimestre,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Compravendes,Nombre
0,2015,1,1,Ciutat Vella,1,el Raval,Habitatge nou lliure,0.0
1,2015,1,1,Ciutat Vella,2,el Barri Gòtic,Habitatge nou lliure,3.0
2,2015,1,1,Ciutat Vella,3,la Barceloneta,Habitatge nou lliure,14.0
3,2015,1,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Habitatge nou lliure,2.0
4,2015,1,2,Eixample,5,el Fort Pienc,Habitatge nou lliure,5.0
...,...,...,...,...,...,...,...,...
4435,2019,4,10,Sant Martí,70,el Besòs i el Maresme,Habitatge usat,73.0
4436,2019,4,10,Sant Martí,71,Provençals del Poblenou,Habitatge usat,3.0
4437,2019,4,10,Sant Martí,72,Sant Martí de Provençals,Habitatge usat,34.0
4438,2019,4,10,Sant Martí,73,la Verneda i la Pau,Habitatge usat,67.0


In [41]:
# create dummy from Type of purchase
dummy = pd.get_dummies(df['Compravendes'])

# concatenate
df_num_buys = pd.concat([df, dummy], axis=1)

# drop unwanted columns
df_num_buys.drop(columns=['Codi_Districte', 'Nom_Districte', 'Codi_Barri', 'Compravendes'], axis=1, inplace=True)

# check
df_num_buys

Unnamed: 0,Any,Trimestre,Nom_Barri,Nombre,Habitatge nou lliure,Habitatge nou protegit,Habitatge usat
0,2015,1,el Raval,0.0,1,0,0
1,2015,1,el Barri Gòtic,3.0,1,0,0
2,2015,1,la Barceloneta,14.0,1,0,0
3,2015,1,"Sant Pere, Santa Caterina i la Ribera",2.0,1,0,0
4,2015,1,el Fort Pienc,5.0,1,0,0
...,...,...,...,...,...,...,...
4435,2019,4,el Besòs i el Maresme,73.0,0,0,1
4436,2019,4,Provençals del Poblenou,3.0,0,0,1
4437,2019,4,Sant Martí de Provençals,34.0,0,0,1
4438,2019,4,la Verneda i la Pau,67.0,0,0,1


In [42]:
# Get dummy-based household condition (new/protected/used), these gets us 3 df's one for each condition
a = df_num_buys.loc[df_num_buys['Habitatge nou lliure']==1]
a = pd.DataFrame(a)
a.reset_index(drop=True, inplace=True)
a.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Nombre':'new_household_purchases'}, inplace=True)
a.drop(columns=['Habitatge nou lliure','Habitatge nou protegit', 'Habitatge usat'], axis=1, inplace=True)
display(a)

b = df_num_buys.loc[df_num_buys['Habitatge nou protegit']==1]
b = pd.DataFrame(b)
b.reset_index(drop=True, inplace=True)
b.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Nombre':'protected_household_purchases'}, inplace=True)
b.drop(columns=['Trimestre', 'Habitatge nou lliure','Habitatge nou protegit', 'Habitatge usat'], axis=1, inplace=True)
display(b)

c = df_num_buys.loc[df_num_buys['Habitatge usat']==1]
c = pd.DataFrame(c)
c.reset_index(drop=True, inplace=True)
c.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Nombre':'used_household_purchases'}, inplace=True)
c.drop(columns=['Trimestre', 'Habitatge nou lliure','Habitatge nou protegit', 'Habitatge usat'], axis=1, inplace=True)
display(c)

Unnamed: 0,year,Trimestre,neighbourhood,new_household_purchases
0,2015,1,el Raval,0.0
1,2015,1,el Barri Gòtic,3.0
2,2015,1,la Barceloneta,14.0
3,2015,1,"Sant Pere, Santa Caterina i la Ribera",2.0
4,2015,1,el Fort Pienc,5.0
...,...,...,...,...
1475,2019,4,el Besòs i el Maresme,11.0
1476,2019,4,Provençals del Poblenou,2.0
1477,2019,4,Sant Martí de Provençals,0.0
1478,2019,4,la Verneda i la Pau,3.0


Unnamed: 0,year,neighbourhood,protected_household_purchases
0,2015,el Raval,0.0
1,2015,el Barri Gòtic,0.0
2,2015,la Barceloneta,0.0
3,2015,"Sant Pere, Santa Caterina i la Ribera",0.0
4,2015,el Fort Pienc,0.0
...,...,...,...
1475,2019,el Besòs i el Maresme,0.0
1476,2019,Provençals del Poblenou,0.0
1477,2019,Sant Martí de Provençals,0.0
1478,2019,la Verneda i la Pau,0.0


Unnamed: 0,year,neighbourhood,used_household_purchases
0,2015,el Raval,126.0
1,2015,el Barri Gòtic,43.0
2,2015,la Barceloneta,47.0
3,2015,"Sant Pere, Santa Caterina i la Ribera",57.0
4,2015,el Fort Pienc,56.0
...,...,...,...
1475,2019,el Besòs i el Maresme,73.0
1476,2019,Provençals del Poblenou,3.0
1477,2019,Sant Martí de Provençals,34.0
1478,2019,la Verneda i la Pau,67.0


In [43]:
# control differences between list of neighbourhood names
list1 = list(set(a['neighbourhood'].unique()).symmetric_difference(b['neighbourhood'].unique()))
list2 = list(set(b['neighbourhood'].unique()).symmetric_difference(c['neighbourhood'].unique()))
if (len(list1) == 0) and (len(list2) == 0):
    print('Its OK to join')

Its OK to join


In [44]:
# drop redundant columns and join
b.drop(columns=['year','neighbourhood'], axis=1, inplace=True)
c.drop(columns=['year','neighbourhood'], axis=1, inplace=True)
df_num_buys = pd.concat([a,b,c], axis=1)

# remove the "No consta" rows since neighbourhood count is 73
df_num_buys = df_num_buys[df_num_buys['neighbourhood'].str.contains("No consta")==False]

# sum all quarters of  each year/hood
df_num_buys = df_num_buys.groupby(['year', 'neighbourhood'])['new_household_purchases', 'protected_household_purchases', 'used_household_purchases'].agg(sum)

# sort df by year and neighbourhood
df_num_buys.sort_values(by=['year', 'neighbourhood'], ignore_index=True)
df_num_buys.reset_index(inplace=True)
display(df_num_buys)

Unnamed: 0,year,neighbourhood,new_household_purchases,protected_household_purchases,used_household_purchases
0,2015,Baró de Viver,0.0,0.0,8.0
1,2015,Can Baró,0.0,0.0,59.0
2,2015,Can Peguera,0.0,0.0,5.0
3,2015,Canyelles,0.0,0.0,31.0
4,2015,Ciutat Meridiana,0.0,0.0,68.0
...,...,...,...,...,...
360,2019,la Vila Olímpica del Poblenou,1.0,0.0,15.0
361,2019,la Vila de Gràcia,27.0,0.0,281.0
362,2019,les Corts,35.0,2.0,396.0
363,2019,les Roquetes,0.0,0.0,185.0


### i) Price of registered  real state Purchases

In [45]:
df = pd.concat(map(pd.read_csv, glob.glob("C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\venda_pisos\\preu_compraventes/*.csv")),  ignore_index=True)
display(df.info())
display(df)



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Any                   8760 non-null   int64  
 1   Trimestre             8760 non-null   int64  
 2   Codi_Districte        8760 non-null   int64  
 3   Nom_Districte         8760 non-null   object 
 4   Codi_Barri            8760 non-null   int64  
 5   Nom_Barri             8760 non-null   object 
 6   Preu_mitja_habitatge  8760 non-null   object 
 7   Valor                 6363 non-null   float64
dtypes: float64(1), int64(4), object(3)
memory usage: 547.6+ KB


None

Unnamed: 0,Any,Trimestre,Codi_Districte,Nom_Districte,Codi_Barri,Nom_Barri,Preu_mitja_habitatge,Valor
0,2015,1,1,Ciutat Vella,1,el Raval,Total. Milers d'euros,142.8
1,2015,1,1,Ciutat Vella,2,el Barri Gòtic,Total. Milers d'euros,380.9
2,2015,1,1,Ciutat Vella,3,la Barceloneta,Total. Milers d'euros,169.6
3,2015,1,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Total. Milers d'euros,341.4
4,2015,1,2,Eixample,5,el Fort Pienc,Total. Milers d'euros,309.3
...,...,...,...,...,...,...,...,...
8755,2019,4,10,Sant Martí,69,Diagonal Mar i el Front Marítim del Poblenou,Usat. Euros/m2 construït,5778.6
8756,2019,4,10,Sant Martí,70,el Besòs i el Maresme,Usat. Euros/m2 construït,3030.9
8757,2019,4,10,Sant Martí,71,Provençals del Poblenou,Usat. Euros/m2 construït,
8758,2019,4,10,Sant Martí,72,Sant Martí de Provençals,Usat. Euros/m2 construït,3555.6


In [46]:
# create dummy from Type of purchase
dummy = pd.get_dummies(df['Preu_mitja_habitatge'])

# concatenate
df_price_buys = pd.concat([df, dummy], axis=1)

# drop unwanted columns
df_price_buys.drop(columns=['Codi_Districte', 'Nom_Districte', 'Codi_Barri', 'Preu_mitja_habitatge'], axis=1, inplace=True)

# check
df_price_buys.loc[df_price_buys['Nom_Barri']=='el Raval'][:50]

Unnamed: 0,Any,Trimestre,Nom_Barri,Valor,Nou. Euros/m2 construït,Nou. Milers d'euros,Total. Euros/m2 construït,Total. Milers d'euros,Usat. Euros/m2 construït,Usat. Milers d'euros
0,2015,1,el Raval,142.8,0,0,0,1,0,0
73,2015,1,el Raval,,0,1,0,0,0,0
146,2015,1,el Raval,142.8,0,0,0,0,0,1
219,2015,1,el Raval,2813.0,0,0,1,0,0,0
292,2015,1,el Raval,,1,0,0,0,0,0
365,2015,1,el Raval,2813.0,0,0,0,0,1,0
438,2015,2,el Raval,195.2,0,0,0,1,0,0
511,2015,2,el Raval,,0,1,0,0,0,0
584,2015,2,el Raval,195.6,0,0,0,0,0,1
657,2015,2,el Raval,2948.8,0,0,1,0,0,0


In [47]:
# Get dummy-based household conditions (new/total/used), these gets us 6 df's one for each condition
a = df_price_buys.loc[df_price_buys['Nou. Euros/m2 construït']==1]
a = pd.DataFrame(a)
a.reset_index(drop=True, inplace=True)
a.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Valor':'Nou_Euros/m2_construït'}, inplace=True)
a.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)

a.drop(columns=['Nou. Euros/m2 construït', "Nou. Milers d'euros", "Total. Euros/m2 construït", "Total. Milers d'euros", "Usat. Euros/m2 construït", "Usat. Milers d'euros"], axis=1, inplace=True)
display(a)
display(a.isna().sum())

b = df_price_buys.loc[df_price_buys["Nou. Milers d'euros"]==1]
b = pd.DataFrame(b)
b.reset_index(drop=True, inplace=True)
b.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Valor':"new_household_purchases"}, inplace=True)
b.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)
b.drop(columns=['Nou. Euros/m2 construït', "Nou. Milers d'euros", "Total. Euros/m2 construït", "Total. Milers d'euros", "Usat. Euros/m2 construït", "Usat. Milers d'euros"], axis=1, inplace=True)
display(b)
display(b.isna().sum())

c = df_price_buys.loc[df_price_buys["Total. Euros/m2 construït"]==1]
c = pd.DataFrame(c)
c.reset_index(drop=True, inplace=True)
c.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Valor':"total_Euros/m2 construït"}, inplace=True)
c.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)
c.drop(columns=['Nou. Euros/m2 construït', "Nou. Milers d'euros", "Total. Euros/m2 construït", "Total. Milers d'euros", "Usat. Euros/m2 construït", "Usat. Milers d'euros"], axis=1, inplace=True)
display(c)
display(c.isna().sum())

d = df_price_buys.loc[df_price_buys["Total. Milers d'euros"]==1]
d = pd.DataFrame(d)
d.reset_index(drop=True, inplace=True)
d.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Valor':"Total_household_purchases"}, inplace=True)
d.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)
d.drop(columns=['Nou. Euros/m2 construït', "Nou. Milers d'euros", "Total. Euros/m2 construït", "Total. Milers d'euros", "Usat. Euros/m2 construït", "Usat. Milers d'euros"], axis=1, inplace=True)
display(d)
display(d.isna().sum())

e = df_price_buys.loc[df_price_buys["Usat. Euros/m2 construït"]==1]
e = pd.DataFrame(e)
e.reset_index(drop=True, inplace=True)
e.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Valor':"Usat_Euros/m2 construït"}, inplace=True)
e.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)
e.drop(columns=['Nou. Euros/m2 construït', "Nou. Milers d'euros", "Total. Euros/m2 construït", "Total. Milers d'euros", "Usat. Euros/m2 construït", "Usat. Milers d'euros"], axis=1, inplace=True)
display(e)
display(e.isna().sum())

f = df_price_buys.loc[df_price_buys["Usat. Milers d'euros"]==1]
f = pd.DataFrame(f)
f.reset_index(drop=True, inplace=True)
f.rename(columns={'Any':'year', 'Nom_Barri':'neighbourhood','Valor':"Used_household_purchases"}, inplace=True)
f.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)
f.drop(columns=['Nou. Euros/m2 construït', "Nou. Milers d'euros", "Total. Euros/m2 construït", "Total. Milers d'euros", "Usat. Euros/m2 construït", "Usat. Milers d'euros"], axis=1, inplace=True)
display(f)
display(f.isna().sum())

Unnamed: 0,year,Trimestre,neighbourhood,Nou_Euros/m2_construït
0,2015,1,Baró de Viver,
1,2015,2,Baró de Viver,
2,2015,3,Baró de Viver,
3,2015,4,Baró de Viver,
4,2015,1,Can Baró,
...,...,...,...,...
1455,2019,4,les Roquetes,
1456,2019,1,les Tres Torres,
1457,2019,2,les Tres Torres,
1458,2019,3,les Tres Torres,


year                        0
Trimestre                   0
neighbourhood               0
Nou_Euros/m2_construït    992
dtype: int64

Unnamed: 0,year,Trimestre,neighbourhood,new_household_purchases(x1000€)
0,2015,1,Baró de Viver,
1,2015,2,Baró de Viver,
2,2015,3,Baró de Viver,
3,2015,4,Baró de Viver,
4,2015,1,Can Baró,
...,...,...,...,...
1455,2019,4,les Roquetes,
1456,2019,1,les Tres Torres,
1457,2019,2,les Tres Torres,
1458,2019,3,les Tres Torres,


year                                 0
Trimestre                            0
neighbourhood                        0
new_household_purchases(x1000€)    992
dtype: int64

Unnamed: 0,year,Trimestre,neighbourhood,total_Euros/m2 construït
0,2015,1,Baró de Viver,
1,2015,2,Baró de Viver,3273.0
2,2015,3,Baró de Viver,
3,2015,4,Baró de Viver,
4,2015,1,Can Baró,1724.4
...,...,...,...,...
1455,2019,4,les Roquetes,3265.9
1456,2019,1,les Tres Torres,6191.4
1457,2019,2,les Tres Torres,5203.3
1458,2019,3,les Tres Torres,5700.2


year                         0
Trimestre                    0
neighbourhood                0
total_Euros/m2 construït    99
dtype: int64

Unnamed: 0,year,Trimestre,neighbourhood,Total_household_purchases(x1000€)
0,2015,1,Baró de Viver,
1,2015,2,Baró de Viver,284.8
2,2015,3,Baró de Viver,
3,2015,4,Baró de Viver,
4,2015,1,Can Baró,161.0
...,...,...,...,...
1455,2019,4,les Roquetes,193.2
1456,2019,1,les Tres Torres,853.2
1457,2019,2,les Tres Torres,743.2
1458,2019,3,les Tres Torres,722.9


year                                   0
Trimestre                              0
neighbourhood                          0
Total_household_purchases(x1000€)    100
dtype: int64

Unnamed: 0,year,Trimestre,neighbourhood,Usat_Euros/m2 construït
0,2015,1,Baró de Viver,
1,2015,2,Baró de Viver,3273.0
2,2015,3,Baró de Viver,
3,2015,4,Baró de Viver,
4,2015,1,Can Baró,1724.4
...,...,...,...,...
1455,2019,4,les Roquetes,3265.9
1456,2019,1,les Tres Torres,6191.4
1457,2019,2,les Tres Torres,5203.3
1458,2019,3,les Tres Torres,5700.2


year                         0
Trimestre                    0
neighbourhood                0
Usat_Euros/m2 construït    106
dtype: int64

Unnamed: 0,year,Trimestre,neighbourhood,Used_household_purchases(x1000€)
0,2015,1,Baró de Viver,
1,2015,2,Baró de Viver,284.8
2,2015,3,Baró de Viver,
3,2015,4,Baró de Viver,
4,2015,1,Can Baró,161.0
...,...,...,...,...
1455,2019,4,les Roquetes,193.2
1456,2019,1,les Tres Torres,853.2
1457,2019,2,les Tres Torres,743.2
1458,2019,3,les Tres Torres,722.9


year                                  0
Trimestre                             0
neighbourhood                         0
Used_household_purchases(x1000€)    108
dtype: int64

We'll get just the dataframes containing the price in thousands of euros

In [48]:
# control differences between list of neighbourhood names
list1 = list(set(b['neighbourhood'].unique()).symmetric_difference(d['neighbourhood'].unique()))
list2 = list(set(b['neighbourhood'].unique()).symmetric_difference(f['neighbourhood'].unique()))
list3 = list(set(d['neighbourhood'].unique()).symmetric_difference(f['neighbourhood'].unique()))

if (len(list1) == 0) and (len(list2) == 0) and (len(list3) == 0):
    print('Its OK to join')

Its OK to join


In [49]:
# drop redundant columns and join
d.drop(columns=['year','neighbourhood'], axis=1, inplace=True)
f.drop(columns=['year','neighbourhood'], axis=1, inplace=True)
df_price_buys = pd.concat([b,d,f], axis=1)

# remove the "No consta" rows since neighbourhood count is 73
df_price_buys = df_price_buys[df_price_buys['neighbourhood'].str.contains("No consta")==False]
df_price_buys.rename

# get yearly mean from all quarters of  each year/hood
df_price_buys = df_price_buys.groupby(['year', 'neighbourhood'])['new_household_purchases(x1000€)', 'Used_household_purchases(x1000€)', 'Total_household_purchases(x1000€)'].mean()
df_price_buys.reset_index(inplace=True)
display(df_price_buys)

Unnamed: 0,year,neighbourhood,new_household_purchases(x1000€),Used_household_purchases(x1000€),Total_household_purchases(x1000€)
0,2015,Baró de Viver,,284.800000,284.800000
1,2015,Can Baró,,171.400000,171.400000
2,2015,Can Peguera,,,
3,2015,Canyelles,,135.600000,135.600000
4,2015,Ciutat Meridiana,,63.400000,63.400000
...,...,...,...,...,...
360,2019,la Vila Olímpica del Poblenou,,544.333333,544.333333
361,2019,la Vila de Gràcia,389.100,437.800000,435.750000
362,2019,les Corts,685.025,381.725000,395.025000
363,2019,les Roquetes,,173.575000,173.575000


In [50]:
# change typos to fit other dfs
df_num_buys['neighbourhood'] = [re.sub(r".+Sec\s*$","el Poble Sec - Parc Montjuïc",x) for x in df_num_buys['neighbourhood']]
df_num_buys['neighbourhood'] = [re.sub(r".+Tibi.+$","Vallvidrera el Tibidabo i les Planes",x) for x in df_num_buys['neighbourhood']]
df_num_buys['neighbourhood'] = [x.replace(",", "") for x in df_num_buys['neighbourhood']]
df_price_buys['neighbourhood'] = [re.sub(r".+Sec\s*$","el Poble Sec - Parc Montjuïc",x) for x in df_price_buys['neighbourhood']]
df_price_buys['neighbourhood'] = [re.sub(r".+Tibi.+$","Vallvidrera el Tibidabo i les Planes",x) for x in df_price_buys['neighbourhood']]
df_price_buys['neighbourhood'] = [x.replace(",", "") for x in df_price_buys['neighbourhood']]


In [51]:
# control differences between list of neighbourhood names
list1 = list(set(df_surface_uses['neighbourhood'].unique()).symmetric_difference(df_price_buys['neighbourhood'].unique()))
list2 = list(set(df_surface_uses['neighbourhood'].unique()).symmetric_difference(df_num_buys['neighbourhood'].unique()))
list3 = list(set(df_num_buys['neighbourhood'].unique()).symmetric_difference(df_price_buys['neighbourhood'].unique()))

if (len(list1) == 0) and (len(list2) == 0) and (len(list3) == 0):
    print('dfs surface_uses, num_buys and prices_buys have the same neighbourhood names')

else: print('Houston...we have a problem')

dfs surface_uses, num_buys and prices_buys have the same neighbourhood names


- ## *Solving issues found in Debug Section*

After checking the official documents it turns out data is duplicated because the same pdf is posted for different years, twice.
However we can manually imputate the info from the official census webpage <sup>[12]</sup>

In [52]:
# get index of duplicate rows
df = df_socioeconomics_copy
filtered2015 = df[ (df['neighbourhood']=='el Parc i la Llacuna del Poblenou') & (df['year']==2015) ]
filtered2017 = df[ (df['neighbourhood']=="la Dreta de l'Eixample") & (df['year']==2017) ]

df = pd.concat([filtered2015,filtered2017])
display(df)



Unnamed: 0,year,neighbourhood,population,% spaniards,% strangers,% w/ higher education,unemployed
479,2015,el Parc i la Llacuna del Poblenou,14764.0,80.3,19.7,32.2,751.0
480,2015,el Parc i la Llacuna del Poblenou,14764.0,80.3,19.7,32.2,751.0
635,2017,la Dreta de l'Eixample,43880.0,77.8,22.2,51.2,1338.0
636,2017,la Dreta de l'Eixample,,77.8,22.2,,1338.0


In [53]:
# Fill with real data available checking the four rows
# row 479 is all correct
df_socioeconomics_copy.iloc[480] = [2016,
                                    "el Parc i la Llacuna del Poblenou", 
                                    14861, 
                                    79.8, 
                                    20.2, 
                                    32.7, 
                                    664.0 ]

df_socioeconomics_copy.iloc[635] = [2017,
                                    "la Dreta de l'Eixample", 
                                    44246, 
                                    78.5, 
                                    21.5, 
                                    50.4, 
                                    1338.0 ]

df_socioeconomics_copy.iloc[636] = [2018,
                                    "la Dreta de l'Eixample",
                                    43880,
                                    77.8,
                                    22.2,
                                    51.2,
                                    1217.0 ]

# check results to confirm
display(df_socioeconomics_copy.iloc[[480,635,636]])

df_socioeconomics_copy.sort_values(by=['year', 'neighbourhood'], ignore_index=True, inplace=True)

Unnamed: 0,year,neighbourhood,population,% spaniards,% strangers,% w/ higher education,unemployed
480,2016,el Parc i la Llacuna del Poblenou,14861.0,79.8,20.2,32.7,664.0
635,2017,la Dreta de l'Eixample,44246.0,78.5,21.5,50.4,1338.0
636,2018,la Dreta de l'Eixample,43880.0,77.8,22.2,51.2,1217.0


# df_bcn2009

In [54]:
# drop repeated columns!!!
df_price_rent_copy = df_price_rent.drop(columns=['year', 'neighbourhood'], axis=1)
df_internal_migration.drop(columns=['year', 'neighbourhood'], axis=1, inplace=True)
df_surface_uses.drop(columns=['year', 'neighbourhood'], axis=1, inplace=True)


# concatenate everything and save dataframe as a csv 2009-2019   
df_bcn2009 = pd.concat([df_socioeconomics_copy, df_surface_uses, df_price_rent_copy, df_internal_migration], axis=1)   
display(df_bcn2009)
df_bcn2009.info()

# minor checkings


# save it to csv
path = r"C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\bcn_dataset2009_2019.csv"
df_bcn2009.to_csv(path)

Unnamed: 0,year,neighbourhood,population,% spaniards,% strangers,% w/ higher education,unemployed,total_surface(m2),housing(m2),parking(m2),...,religious(m2),entertainment(m2),other_uses(m2),avg_€/month,avg_€/m2,avg_housing(m2),new_contracts_1000_hab,expired_contracts_1000_hab,win_lost_rents_1000_hab,binary_rent_growth_1000_hab
0,2009,Baró de Viver,2372.0,92.7,7.3,,,110971,80163,10269,...,224,0.0,307,,,,32.883642,40.050590,-7.166948,0.0
1,2009,Can Baró,9159.0,86.2,13.8,,,410247,316139,28780,...,0,0.0,54,787.00,11.930,65.968148,56.119664,56.228846,-0.109182,0.0
2,2009,Can Peguera,2210.0,96.3,3.7,,,67195,54156,3910,...,374,0.0,0,,,,65.610860,36.199095,29.411765,1.0
3,2009,Canyelles,7359.0,95.3,4.7,,,303783,240025,8142,...,105,0.0,364,776.00,10.200,76.078431,26.226389,32.884903,-6.658513,0.0
4,2009,Ciutat Meridiana,11355.0,64.5,35.5,,,277044,233846,2009,...,992,0.0,0,704.00,10.720,65.671642,93.879348,99.251431,-5.372083,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
798,2019,la Vila Olímpica del Poblenou,9385.0,80.6,19.4,54.1,391.0,1125546,421411,209949,...,736,0.0,67582,1359.70,17.525,77.586305,45.100000,46.100000,-1.000000,0.0
799,2019,la Vila de Gràcia,50926.0,76.4,23.6,47.5,1939.0,3422349,2228680,258600,...,45463,20326.0,11,962.95,15.100,63.771523,63.100000,72.000000,-8.900000,0.0
800,2019,les Corts,46731.0,86.2,13.8,45.0,1665.0,3889778,1961694,650576,...,13155,7837.0,117,1096.30,15.025,72.965058,47.000000,47.800000,-0.800000,0.0
801,2019,les Roquetes,16417.0,75.3,24.7,9.8,906.0,569201,418422,43119,...,123,0.0,0,621.05,11.275,55.082040,65.600000,66.200000,-0.600000,0.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 803 entries, 0 to 802
Data columns (total 27 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   year                         803 non-null    int64  
 1   neighbourhood                803 non-null    object 
 2   population                   803 non-null    float64
 3   % spaniards                  803 non-null    float64
 4   % strangers                  803 non-null    float64
 5   % w/ higher education        657 non-null    float64
 6   unemployed                   657 non-null    float64
 7   total_surface(m2)            803 non-null    int64  
 8   housing(m2)                  803 non-null    int64  
 9   parking(m2)                  803 non-null    int64  
 10  commerce(m2)                 803 non-null    int64  
 11  industry(m2)                 803 non-null    int64  
 12  offices(m2)                  803 non-null    int64  
 13  education(m2)       

- ## *Debug*

Prices are correct before concatenating RENT PRICES , but after that, order of rows gets displaced by one

In [55]:
# for debbugging
print('rents 14-19\n',df_rents.loc[df_rents['neighbourhood']=='Pedralbes'], '\n')
print('rents 09-19\n',df_price_rent.loc[df_price_rent['neighbourhood']== 'Pedralbes'])
f = df_bcn2009.loc[df_bcn2009['neighbourhood']== 'Pedralbes']
print('\nwhen in df_bcn2009 Pedralbes shows\n',f['avg_€/month'])

rents 14-19
      year neighbourhood  avg_€/month  avg_€/m2
10   2014     Pedralbes    1489.3450   12.4400
83   2015     Pedralbes    1714.1450   14.1875
156  2016     Pedralbes    1653.6975   14.6725
229  2017     Pedralbes    1785.8850   16.1425
302  2018     Pedralbes    1707.0125   15.6875
375  2019     Pedralbes    1863.1750   15.6750 

rents 09-19
      year neighbourhood  avg_€/month  avg_€/m2  avg_housing(m2)
10   2009     Pedralbes       1.2950   14.2100         0.091133
83   2010     Pedralbes       1.3180   15.0100         0.087808
156  2011     Pedralbes       2.3720   14.4200         0.164494
229  2012     Pedralbes          NaN       NaN              NaN
302  2013     Pedralbes          NaN       NaN              NaN
375  2014     Pedralbes    1489.3450   12.4400       119.722267
448  2015     Pedralbes    1714.1450   14.1875       120.820793
521  2016     Pedralbes    1653.6975   14.6725       112.707276
594  2017     Pedralbes    1785.8850   16.1425       110.632492
667

Problem is in duplicated row in df_socioeconomics since it's the one that keeps the year and neighbourhood columns when concatenating!! Hence when its concatenated the other dataframes are incorrectly labelled.
So far we know:
- number of lines of df_socioeconomics is correct(803 lines) 



In [56]:
list1 = list(set(df_socioeconomics['neighbourhood'].unique()).symmetric_difference(df_price_rent['neighbourhood'].unique()))
list2 = list(set(df_price_rent['neighbourhood'].unique()).symmetric_difference(df_gini['neighbourhood'].unique()))
list3 = list(set(df_num_buys['neighbourhood'].unique()).symmetric_difference(df_price_rent['neighbourhood'].unique()))

if (len(list1) == 0) and (len(list2) == 0) and (len(list3) == 0):
    print('ok, same names')

else: print('Houston...we have a problem')

ok, same names


- maybe problem is in the number of year column

In [57]:
# get dict to count ocurrence of years
lst = list(df_socioeconomics['year'])
d = {x:lst.count(x) for x in lst}
d

{2009: 73,
 2010: 73,
 2011: 73,
 2012: 73,
 2013: 73,
 2014: 73,
 2015: 74,
 2016: 72,
 2017: 74,
 2018: 72,
 2019: 73}

Yes! 2015 and 2017 have duplicated years belonging to 2016 and 2018

In [58]:
df_dup15 = df_socioeconomics.loc[df_socioeconomics['year']== 2015]
df_dup17 = df_socioeconomics.loc[df_socioeconomics['year']== 2017]

df_missing2016 = df_socioeconomics.loc[df_socioeconomics['year']== 2016]
df_missing2018 = df_socioeconomics.loc[df_socioeconomics['year']== 2018]

   


Find the neighbourhoods that have duplicated years 2015 & 2017

In [59]:
# 2015   
lst = list(df_dup15['neighbourhood'])
d ={x:lst.count(x) for x in lst}
for k,v in d.items():
    if v != 1:
        print('2015 dupl: ', k)
        
# 2017   
lst = list(df_dup17['neighbourhood'])
d ={x:lst.count(x) for x in lst}
for k,v in d.items():
    if v != 1:
        print('2017 dupl: ', k)

2015 dupl:  el Parc i la Llacuna del Poblenou
2017 dupl:  la Dreta de l'Eixample


Same but with missing years 2016 & 2018

In [60]:
#Get real neighbourhood list as control pattern to compare with
lst_good = list(df_socioeconomics['neighbourhood'].unique())

# 2016   
lst = list(df_missing2016['neighbourhood'])
missing = list(set(lst_good).difference(set(lst)))
print('missing 2016: ', missing )

# 2018   
lst = list(df_missing2018['neighbourhood'])
missing = list(set(lst_good).difference(set(lst)))
print('missing 2018: ', missing )


missing 2016:  ['el Parc i la Llacuna del Poblenou']
missing 2018:  ["la Dreta de l'Eixample"]


Now we know for certain the relationship and which year misses what neighbourhood, so we can modify df_socionomics_copy (see fixing bug)

### df_bcn2015

In [61]:
# select from 2015, included
df_bcn2015 = df_bcn2009.loc[df_bcn2009['year'] >= 2015].reset_index(drop=True)

# drop repeated columns!!!
df_gini.drop(['year', 'neighbourhood'], axis=1, inplace=True)
df_disp_income.drop(['year', 'neighbourhood'], axis=1, inplace=True)
df_num_buys.drop(['year', 'neighbourhood'], axis=1, inplace=True)
df_price_buys.drop(['year', 'neighbourhood'], axis=1, inplace=True)
 
#concatenate with the rest of dataframes that started in 2015 
df_bcn2015 = pd.concat([df_bcn2015, df_gini, df_disp_income, df_num_buys, df_price_buys], axis=1)
display(df_bcn2015)
# save it to csv
path = r"C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\bcn_dataset2015_2019.csv"
df_bcn2015.to_csv(path)

Unnamed: 0,year,neighbourhood,population,% spaniards,% strangers,% w/ higher education,unemployed,total_surface(m2),housing(m2),parking(m2),...,win_lost_rents_1000_hab,binary_rent_growth_1000_hab,gini_index(%),disp_income(€/year),new_household_purchases,protected_household_purchases,used_household_purchases,new_household_purchases(x1000€),Used_household_purchases(x1000€),Total_household_purchases(x1000€)
0,2015,Baró de Viver,2482.0,89.7,10.3,5.7,150.0,110971,80219,10269,...,-20.708881,0.0,33.450000,11217.0,0.0,0.0,8.0,,284.800000,284.800000
1,2015,Can Baró,8938.0,86.9,13.1,24.2,442.0,413313,318343,30196,...,-16.278292,0.0,32.800000,18883.0,0.0,0.0,59.0,,171.400000,171.400000
2,2015,Can Peguera,2267.0,92.9,7.1,7.0,141.0,69811,56595,4100,...,-8.574007,0.0,34.450000,12002.0,0.0,0.0,5.0,,,
3,2015,Canyelles,6946.0,95.9,4.1,10.4,554.0,309727,240025,12396,...,-0.577617,0.0,26.140000,17003.0,0.0,0.0,31.0,,135.600000,135.600000
4,2015,Ciutat Meridiana,10156.0,72.2,27.8,5.7,1146.0,271520,228636,2838,...,0.497265,1.0,34.566667,10203.0,0.0,0.0,68.0,,63.400000,63.400000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,2019,la Vila Olímpica del Poblenou,9385.0,80.6,19.4,54.1,391.0,1125546,421411,209949,...,-1.000000,0.0,33.820000,32436.0,1.0,0.0,15.0,,544.333333,544.333333
361,2019,la Vila de Gràcia,50926.0,76.4,23.6,47.5,1939.0,3422349,2228680,258600,...,-8.900000,0.0,33.850000,23749.0,27.0,0.0,281.0,389.100,437.800000,435.750000
362,2019,les Corts,46731.0,86.2,13.8,45.0,1665.0,3889778,1961694,650576,...,-0.800000,0.0,33.605714,28898.0,35.0,2.0,396.0,685.025,381.725000,395.025000
363,2019,les Roquetes,16417.0,75.3,24.7,9.8,906.0,569201,418422,43119,...,-0.600000,0.0,30.040000,13014.0,0.0,0.0,185.0,,173.575000,173.575000


# 2. NOW TO AIRBNB 
## Assuming that airbnb has almost a complete target niche (short term flat rents in a foreign city) it's really tempting too choo-choo-chooose it as an indicator of tourist undercover economy as most of them dont have license. EDA needed for further conclusions



In [62]:
# save it to csv
path = r"C:\\Users\\motxi\\Documents\\Data_Science_IT_Academy\\PROJECTE\\data\\airbnb_dataset.csv"
df_airbnb.to_csv(path)

___
- 3r ultimes tesis i enfoques


## cites per aprofitar

**"** ...The first groups to gentrify an area are thought to often be young, childless,
well-educated adults. With the influx of households that do not have children, neighborhood
culture can change. Amenities directed toward families may be replaced with bars, retail, and
nightlife that appeal to young adults. An influx of people with higher levels of formal education
can both signal that increasing investments in an area as well as changes to neighborhood
culture... **"**

**"** ...Tenure in combination with low incomes and high housing burden indicates
the specific risk of displacement. Renters with low incomes are less able to afford rent increases.
Homeowners with low incomes may struggle to pay increased property taxes as home values
rise. In addition, changes in tenure can be used to identify housing market changes. Increases in
the share of homeowners may reflect wealthier residents with higher levels of formal education
moving into the neighborhood and purchasing homes. This may contribute to displacement risk
by reducing the amount of available rental housing... **"**


**"** ...An increase in the number and price of home sales can indicate an accelerating housing market. This can increase pressure on homeowners with low-incomes and reduce the amount of housing available for renters... **"**



### en cuant als lloguers
**"** ...Many analyses of neighborhood change include tracking rent prices in the focus area. Rises in rental prices are early indicators that there is increasing demand from households with higher incomes. Because renters tend to be less financially stable than homeowners and are subject to landlord decisionmaking, renters with low-income are particularly vulnerable to displacement from gentrification... **"**

**"** ...the growth of Airbnb short-term rentals may also be affecting the supply and costs in the rental market, contributing to displacement pressures... **"**




___

- 4rt al lio!!

- 5è Machine mon amour

- 6è kaizen again and again

___

- 7º Results

- 8º Acknoledgements & Lessons Learned

___



- 8º Documentation


[1] Zuk, Miriam, Ariel H. Bierbaum, Karen Chapple, Karolina Gorska, Anastasia, Loukaitou-Sideris, Paul Ong, and Trevor Thomas. 2015. ***Gentrification,Displacement, and the Role of Public Investment: A Literature Review.***

[2] Miquel Angel Garcia-López, Jordi Jofre-Monseny, R.Martínez-Mazza, M.Segú. 2020.
***Do short-term rental platforms affect housing markets? Evidence from Airbnb in Barcelona***. https://doi.org/10.1016/j.jue.2020.103278
 
[3] Zuk, Miriam, Ariel H. Bierbaum, Karen Chapple, Karolina Gorska, Anastasia, Loukaitou-Sideris, Paul Ong, and Trevor Thomas. 2015. ***Guide_to_measuring_neighborhood_change_to_understand_and_prevent_displacement***. https://www.urban.org/sites/default/files/publication/100135/guide_to_measuring_neighborhood_change_to_understand_and_prevent_displacement.pdf

 
[4] Inside AirBnB Project Web. ***Historical data from Airbnb***. http://insideairbnb.com/barcelona

[5] Inside AirBnB Project Web. ***Data Dictionary for Airbnb listings databases***. https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit#gid=1322284596

[6] Opendatasoft. ***Historical data from Airbnb***. https://public.opendatasoft.com/explore/dataset/airbnb-listings/information/?disjunctive.host_verifications&disjunctive.amenities&disjunctive.features&refine.country=Spain&refine.city=Barcelona

[7] Ajuntament de Barcelona. ***Historical Census of area per types of activity 2014-2022 (m<sup>2</sup>)***. https://ajuntament.barcelona.cat/estadistica/castella/Estadistiques_per_temes/Habitatge_i_mercat_immobiliari/Edificis_i_habitatges/Dades_cadastrals/locals/sup/

[8] Ajuntament de Barcelona. ***Average rent prices 2014-2022 (€ / m<sup>2</sup> and € / month)***. https://ajuntament.barcelona.cat/estadistica/catala/Estadistiques_per_territori/Barris/Habitatge_i_mercat_immobiliari/Mercat_immobiliari/Habitatges_lloguer/index.htm

[9] Ajuntament de Barcelona. ***Average rent prices 2009-2011 (€ / m<sup>2</sup> and € / month)***. https://ajuntament.barcelona.cat/estadistica/catala/Estadistiques_per_temes/Habitatge_i_mercat_immobiliari/Mercat_immobiliari/Taules_metodologia_anterior/h2mallo/index.htm

[7] Ajuntament de Barcelona. ***Registered household purchase prices  2014-2019***. https://opendata-ajuntament.barcelona.cat/data/ca/dataset/est-mercat-immobiliari-compravenda-preu-total

[7] Ajuntament de Barcelona. ***Number of registered household purchases  2014-2019***.

[12] Ajuntament de Barcelona. ***official padron***. https://ajuntament.barcelona.cat/estadistica/catala/Estadistiques_per_territori/Barris/Poblacio_i_demografia/Poblacio/Padro_municipal_habitants/index.htm

[10] Ajuntament de Barcelona. ***Gini index 2014-2019***. https://opendata-ajuntament.barcelona.cat/data/ca/dataset/atles-renda-index-gini

[11] Ajuntament de Barcelona. ***Disposable income of the households per capita(€/year) 2014-2019***. https://opendata-ajuntament.barcelona.cat/data/ca/dataset/renda-disponible-llars-bcn

- 9º Versions and software blablabla