# Catalunya's Amenities import

This **jupyter notebook** ([source](https://github.com/mapcolabora/osm_imports_preparations/blob/master/imports/2020_03_equipaments_catalunya/equipaments_cat.ipynb)) contains the script for importing different types of amenities in Catalunya into OSM, as well as the documentation of the whole process in a single file, making it easier to review both the process and the results as well as the decisions taken.

The goal is to manually merge and import all the amenities information provided by Generalitat de Catalunya, while testing the scripts for data preparation.

## Data Sources

* [https://analisi.transparenciacatalunya.cat/Urbanisme-infraestructures/Equipaments-de-Catalunya/8gmd-gz7i](https://analisi.transparenciacatalunya.cat/Urbanisme-infraestructures/Equipaments-de-Catalunya/8gmd-gz7ia)

## License
Data is released under CC0 (Public domain)

## Import type

This import will be done manually, using JOSM to edit the data. Consider using Task Manager.

## Data preparations

All data preparations will be made automatically in this notebook.



In [1]:
import numpy as np
import pandas as pd
import geopandas as gpd
import geopy
from osmi_helpers import data_gathering as osmi_dg

# Define Data Sources
DATA_RAW = 'data/raw/Equipaments_de_Catalunya.geojson'
CSV_PARSER = 'fields_mapping.csv'

## Data gathering and exploration.

Run the code below to download original datasources and convert them into a dataframe and explore its contents.

In [2]:
# Download a file and convert it into a dataframe.
gdf_raw = gpd.read_file(DATA_RAW)

gdf_raw

Unnamed: 0,sufix_via,comarca,utmx,telefon1,email,poblacio,cpostal,categoria,alies,longitud,...,nom,fax,data_modificacio,propietats,via,telefon2,utmy,tipus_via,idequipament,geometry
0,,Segrià,0.0,973032744,ot.lleida@gencat.cat,Alguaire,25125,Turisme|Oficines de Turisme de la Xarxa|Altres...,OFICINA DE TURISME DE CATALUNYA A LLEIDA-AEROP...,0.0,...,OFICINA DE TURISME DE CATALUNYA A LLEIDA-AEROP...,,2020-03-20T07:54:39,Marca_Turistica|TERRES DE LLEIDA,"Ctra. N-230 qm. 14,5",,0.0,,11443807,
1,,Barcelonès,427996.7316643046,93 400 69 00,,Barcelona,08021,,"Direcció General d'Innovació, Recerca i Cultur...",2.139648279,...,"Direcció General d'Innovació, Recerca i Cultur...",,2020-03-20T07:37:02,,"Via Augusta, 202-226",,4583094.806658751,,10242296,POINT (2.13965 41.39798)
2,,Barcelonès,427996.7316643046,93 400 69 00,,Barcelona,08021,,Sub-direcció General de Centres Privats,2.139648279,...,Sub-direcció General de Centres Privats,,2020-03-20T07:36:10,,"Via Augusta, 202-226",932 415 342,4583094.806658751,,3041655,POINT (2.13965 41.39798)
3,,Baix Llobregat,410754.0,93 683 27 38,cultura@vallirana.cat,Vallirana,08759,,Centre d'Interpretació del Patrimoni Masia Mol...,1.93261908797568,...,Centre d'Interpretació del Patrimoni Masia Mol...,93 683 28 97,2020-03-20T07:15:37,,"C. del Molí, 2 - 4",,4581937.0,,28561,POINT (1.93262 41.38401)
4,,Tarragonès,352259.786,977 24 70 36,,Tarragona,43005,,Servei Territorial de l'Agència de l'Habitatge...,1.240220699,...,Servei Territorial de l'Agència de l'Habitatge...,,2020-03-20T07:36:17,Horari|<br /><b></b><br />de dilluns a divendr...,"Carrer del Cardenal Vidal i Barraquer, 12-14",,4553291.179,,3041757,POINT (1.24022 41.11748)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33033,,Baix Camp,340788.637,977331806,pba.imac@reus.cat,Reus,43202,"Cultura|Teatres, auditoris i espais escènics e...",la Palma,1.1023539633548318,...,la Palma,,2020-03-20T07:27:04,Any de construcció|1902|Any del cens|1998|Supe...,C/ Ample 75,,4558298.564,,10629895,POINT (1.10235 41.16040)
33034,,Baix Camp,0.0,977834353,lateneu@tinet.cat,Duesaigües,43773,"Cultura|Centres culturals: ateneus, centres cí...",Ateneu de Duesaigües,-1.4887438843851004,...,Ateneu de Duesaigües,,2020-03-20T07:27:05,Any de construcció|1974|Any del cens|2005|Supe...,Pl. 15 d'agost 15,,0.0,,10630412,POINT (-1.48874 0.00000)
33035,,Segrià,308327.075,973190117,ajuntament@corbins.cat,Corbins,25137,"Cultura|Centres culturals: ateneus, centres cí...",Patronat Sant Jaume de Corbins,0.696756024266889,...,Patronat Sant Jaume de Corbins,,2020-03-20T07:26:53,Any de construcció|1940|Any del cens|2000|Supe...,Pl. Carnisseria 6,,4618120.126,,10630164,POINT (0.69676 41.69179)
33036,,Segrià,301511.775,973266303,,Lleida,25006,"Cultura|Centres culturals: ateneus, centres cí...",Centre Cultural Vallcalent,0.6175989338249559,...,Centre Cultural Vallcalent,,2020-03-20T07:26:53,Any de construcció|1968|Any del cens|1996|Supe...,C/ Vallcalent 28,,4610071.677,,10630158,POINT (0.61760 41.61769)


## Data cleanup

### Fields' mapping.

In [3]:
# Create a copy
gdf = gdf_raw

Run the cell below to convert raw data into a suitable OSM-friendly structure, according to the provided CSV fields with fields' mappings stated in `CSV_PARSER` variable.

In [4]:
# Read CSV file with fields' mapping and description.
fields_mapping = pd.read_csv(CSV_PARSER)

# Display table.
fields_mapping



Unnamed: 0,Original field,Description,OSM tagging,Comments
0,idequipament,Identificador intern de l'equipament a BDE,source:pkey,Not imported.
1,alies,Àlies de l'equipament,,Not imported. Same values as `nom`
2,nom,Nom de l'eqiupament,name,
3,categoria,Categories / subcategories de l'equipament,tmp_category,Not imported. Only used for filtering. Will be...
4,tipus_via,Tipus de via (adreça),,
5,via,Nom de la via (adreça),addr:full,The geojson has all the information stored in ...
6,sufix_via,Sufix (adreça),,Empty. Not imported.
7,num,Número de portal (adreça),addr:housenumber,
8,cpostal,Codi postal,addr:postcode,
9,poblacio,Població,addr:city,


In [5]:
# Selects and renames fields according to CSV parser.
gdf = osmi_dg.csv_parser(gdf, CSV_PARSER)


gdf.head(10)

Unnamed: 0,source:pkey,name,tmp_category,addr:full,addr:housenumber,addr:postcode,addr:city,phone,fax,email,website,source:date
0,11443807,OFICINA DE TURISME DE CATALUNYA A LLEIDA-AEROP...,Turisme|Oficines de Turisme de la Xarxa|Altres...,"Ctra. N-230 qm. 14,5",,25125,Alguaire,973032744,,ot.lleida@gencat.cat,http://www.catalunya.com,2020-03-20T07:54:39
1,10242296,"Direcció General d'Innovació, Recerca i Cultur...",,"Via Augusta, 202-226",,8021,Barcelona,93 400 69 00,,,,2020-03-20T07:37:02
2,3041655,Sub-direcció General de Centres Privats,,"Via Augusta, 202-226",,8021,Barcelona,93 400 69 00,,,,2020-03-20T07:36:10
3,28561,Centre d'Interpretació del Patrimoni Masia Mol...,,"C. del Molí, 2 - 4",,8759,Vallirana,93 683 27 38,93 683 28 97,cultura@vallirana.cat,http://www.vallirana.cat,2020-03-20T07:15:37
4,3041757,Servei Territorial de l'Agència de l'Habitatge...,,"Carrer del Cardenal Vidal i Barraquer, 12-14",,43005,Tarragona,977 24 70 36,,,http://agenciahabitatge.gencat.cat,2020-03-20T07:36:17
5,3040033,CCMA - Catalunya Ràdio,,"Avinguda Diagonal, 614-616",,8021,Barcelona,93 306 92 00,93 306 92 01,,http://www.ccma.cat/catradio,2020-03-20T07:37:15
6,6016896,Sub-direcció General de Serveis,,"Carrer de la Diputació, 355",,8009,Barcelona,93 567 40 00,93 567 40 02,,,2020-03-20T07:36:07
7,3039714,Assessoria Jurídica,,"Via Laietana, 26",,8003,Barcelona,93 567 17 00,93 567 17 51,,http://politiquesdigitals.gencat.cat,2020-03-20T07:37:18
8,6015310,Gabinet de Relacions Externes i Protocol,,"Rambla de Catalunya, 19-21",,8007,Barcelona,93 316 20 00,93 316 21 60,,,2020-03-20T07:36:57
9,3040392,Coordinació Territorial de Joventut a Lleida,,"Rambla d'Aragó, 8",,25002,Lleida,973 27 92 17,973 27 92 01,joventut.lleida.tsf@gencat.cat,,2020-03-20T07:37:04


### Calculate some fields

The following code calculates some fields that are needed in OSM.

In [6]:
# Fix uppercase.
gdf['name'] = gdf['name'].str.title()

# Addresses' cleanup.
gdf['addr:full'] = gdf['addr:full'].str.title()
# Split address.
gdf['addr:street'], gdf['addr:housenumber'], gdf['addr:unit'] = gdf['addr:full'].str.split(',', 2).str
gdf['addr:street'].replace({'C/': 'Carrer'}, inplace=True, regex=True)
gdf['addr:street'].replace({'Ctra.': 'Carretera'}, inplace=True, regex=True)
gdf['addr:street'].replace({'Pl.': 'Plaça'}, inplace=True, regex=True)
gdf['addr:housenumber'] = gdf['addr:housenumber'].replace(regex = 'S/N', value = '')

# Filter out entries without category
gdf = gdf.dropna(subset=['tmp_category'])

# Remove pharmacies, because they have already been imported
gdf = gdf[gdf.tmp_category != 'Salut|Farmàcies||']

# Create amenity column according to `CATEGORIA`
# Health
gdf.loc[gdf.tmp_category.str.contains("Centres d'atenció primària"), 'amenity' ] = 'clinic'
gdf.loc[gdf.tmp_category.str.contains("Centres amb atenció continuada"), 'amenity' ] = 'clinic'
gdf.loc[gdf.tmp_category.str.contains("Centres amb atenció continuada"), 'emergency' ] = 'yes'
#gdf.loc[gdf.tmp_category.str.contains('Centres de salut mental'), 'amenity' ] = 'social_facility'
#gdf.loc[gdf.tmp_category.str.contains('Centres de salut mental'), 'social_facility:for' ] = 'social_facility'
gdf.loc[gdf.tmp_category.str.contains('Hospital'), 'amenity' ] = 'hospital'

# Other
gdf.loc[gdf.tmp_category.str.contains('Museus'), 'amenity' ] = 'museum'
gdf.loc[gdf.tmp_category.str.contains('Teatres'), 'amenity' ] = 'theatre'

gdf

Unnamed: 0,source:pkey,name,tmp_category,addr:full,addr:housenumber,addr:postcode,addr:city,phone,fax,email,website,source:date,addr:street,addr:unit,amenity,emergency
0,11443807,Oficina De Turisme De Catalunya A Lleida-Aerop...,Turisme|Oficines de Turisme de la Xarxa|Altres...,"Ctra. N-230 Qm. 14,5",5,25125,Alguaire,973032744,,ot.lleida@gencat.cat,http://www.catalunya.com,2020-03-20T07:54:39,Carretera N-230 Qm. 14,,,
24,14174,Deixalleria De Sitges,Medi ambient|Deixalleries||,-,,08870,Sitges,938109100,,vilafjs@sitges.cat,http://www.sitges.cat/jsp/directori/detall.jsp...,2020-03-12T16:06:01,-,,,
25,23577,Oficina De Turisme De Peratallada,Turisme|Oficines de Turisme de la Xarxa|Altres...,"Pl. Del Castell, Nº 3",Nº 3,17113,Forallac,972645522,,turisme@forallac.com,http://www.forallac.cat,2020-03-20T07:54:35,Plaça Del Castell,,,
26,49680,Servei D'Informació I Atenció A Les Dones (Sia...,Societat. Ciutadania. Famílies|Oficines d'info...,Fanalets De Sant Jaume,,25002,Lleida,973700461,,politiquesigualtat@paeria.cat,,2018-03-08T15:46:02,Fanalets De Sant Jaume,,,
27,49701,Servei D'Informació I Atenció A Les Dones (Sia...,Societat. Ciutadania. Famílies|Oficines d'info...,"Muralla Del Carme, 24, Baixos",24,43800,Valls,977608225,,pad@valls.cat,,2018-03-08T15:45:58,Muralla Del Carme,Baixos,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33033,10629895,La Palma,"Cultura|Teatres, auditoris i espais escènics e...",C/ Ample 75,,43202,Reus,977331806,,pba.imac@reus.cat,,2020-03-20T07:27:04,Carrer Ample 75,,theatre,
33034,10630412,Ateneu De Duesaigües,"Cultura|Centres culturals: ateneus, centres cí...",Pl. 15 D'Agost 15,,43773,Duesaigües,977834353,,lateneu@tinet.cat,,2020-03-20T07:27:05,Plaça 15 D'Agost 15,,,
33035,10630164,Patronat Sant Jaume De Corbins,"Cultura|Centres culturals: ateneus, centres cí...",Pl. Carnisseria 6,,25137,Corbins,973190117,,ajuntament@corbins.cat,,2020-03-20T07:26:53,Plaça Carnisseria 6,,,
33036,10630158,Centre Cultural Vallcalent,"Cultura|Centres culturals: ateneus, centres cí...",C/ Vallcalent 28,,25006,Lleida,973266303,,,,2020-03-20T07:26:53,Carrer Vallcalent 28,,,


In [7]:
type(gdf)

pandas.core.frame.DataFrame

## Export clean data

If the attributes above are correct, we have to proceed to export them into a `CSV` and `geojson` files that can be used in the Task Manager's project.

In [8]:
# Drop unnecessary fields.
gdf = gdf.drop(columns=['tmp_category'])

# Split dataframe into different dataframes
health_amenities = ['clinic', 'hospital']
gdf_health = gdf.loc[gdf['amenity'].isin(health_amenities)]


gdf_health

# Generate  a CSV File.
gdf_health.to_csv('data/processed/health.csv', index = False)

# Export to geojson.
#gdf_health.to_file('data/processed/health.geojson', driver='GeoJSON')

As a result of this script, we get the following files (all of them stored in `data/processed` folder:

* `data/processed/health.geojson`: file containing hospitals, and clinics.
* `data/processed/health.csv`: CSV file containing hospitals, and clinics.


