Code to repeat the experiments of "The economic value of neighborhoods: Predicting real estate prices from the urban environment"
Clone or download
Latest commit 0fd85a2 Aug 7, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config First commit Aug 6, 2018
data Added empty dirs Aug 6, 2018
figures First commit Aug 6, 2018
.gitignore First commit Aug 6, 2018
README.md First commit Aug 6, 2018
compute_walkability.py First commit Aug 6, 2018
data_processing_houses.ipynb First commit Aug 6, 2018
data_processing_neighborhood.py First commit Aug 6, 2018
plots.ipynb First commit Aug 6, 2018
predict.py First commit Aug 6, 2018
requirements.txt First commit Aug 6, 2018

README.md

The economic value of neighborhoods: Predicting real estate prices from the urban environment

This repository contains all the code required to reproduce the results presented in the following paper:

  • M. De Nadai, B. Lepri. The economic value of neighborhoods: Predicting real estate prices from the urban environment, 2018.

Input, intermediary and source data can be downloaded from figshare.

Dependencies

Dependencies are listed in the requirements.txt file at the root of the repository. Using Python 3.6 with pip all the required dependencies can be installed automatically.

pip3 install -r install/requirements.txt

Data

Due to storage constraints, input data are not integrated to this repository. However, input and intermediary files required to run the analysis can be downloaded from a figshare. To run the following code, input and/or the intermediary files must be downloaded and placed in the folder. Then, do:

createdb dsaa
gunzip < intermediate_db_backup.sql.gz | psql dsaa

To produce the intermediary files, go to the section "DIY Instructions".

Code

The code of the analysis in divided in two parts: the Python scripts and modules used to support the analysis, and the notebooks where the outputs of the analysis have been produced.

Scripts

  • data_processing_houses.ipynb : script used for the pre-processing of Immobiliare.it data.
  • compute_walkability.py : script used to generate the walkability scores for each census area.
  • data_processing_neighborhood.py : script used to create all the dataset.
  • predict.py : script used to predict the housing value from the intermediary files.
  • plots.ipynb : script used to produce the images of the manuscript.

License

This code is licensed under the MIT license.

DIY Instructions

Here we generate the entire database from ground. To do so, we have to create the minimal setup from this command:

psql dsaa < data/SQL/minimal.sql
psql dsaa < data/SQL/minimal_materialize.sql

Additional dependencies

Census data

Census data have to complay to the format of the census_areas_onfocus table. Only when you did import data to this table you can proceed with all the steps. When you imported the data, you can generate the spatial matrix here:

psql dsaa < data/SQL/first-DIY-step.sql

Walkability

A OpenStreetMap file has to be downloaded (preferably from here), and placed in data/OSM. Then they are imported in PostGIS with:

osm2pgsql -c -d dsaa --create --style "config/osm2pgsql.style" --multi-geometry --number-processes 5 --latlong -C 30000 [FILENAME].osm.pbf

The same file OSM file can then be used to produce the OSRM database:

osrm-extract -p config/profiles/foot.lua [FILENAME].osm.pbf
osrm-contract [FILENAME].osrm

To run the server, use the command

osrm-routed [FILENAME].osrm

After this everything is set up to create the intermediate data in the database. Import all the materialized view, then run the script. Before running it, personalize line 13 and 35 of compute_walkability.py.

psql dsaa < data/SQL/walkability.sql
python3 compute_walkability.py

Security perception

To create the security perception scores, we use the code and weights of the following paper:

  • De Nadai, M., Vieriu, R. L., Zen, G., Dragicevic, S., Naik, N., Caraviello, M., ... & Lepri, B. Are safer looking neighborhoods more lively?: A multimodal investigation into urban life. In ACM MM 2016.

Everything is available here. All the prediction should be placed inside the placepulse table in PostgreSQL. Then, you can impor/refresh the materialized view present here:

psql dsaa < data/SQL/security.sql

Companies

You can insert a dataset with the census areas (geoid) and a proxy of companies earnings (fatturato) in data/companies.csv. Pay attention that this is included only in the non-open model version.

Land value

You can insert a dataset with the census areas (geoid) and a proxy of land value (assessed_land_value) in data/land_value.csv. Pay attention that this is included only in the non-open model version.

Census

Census data has to be inserted with the same format as the files placed in data/census and data/census/industry. To change this, change the corrisponding code at data_processing_housing.py.

Land use

Download satellite shapefiles from https://land.copernicus.eu/local/urban-atlas/urban-atlas-2012/view. Import them in the urban_atlas PostgreSQL table. Then run the code:

psql dsaa < data/SQL/urban_atlas.sql