Skip to content

DavidJKTofan/ba-master

Repository files navigation

Final Project

Master Thesis of our Masters in Business Analytics. The focus of this project is the city of Madrid, Spain.

Title in Spanish: Predicción del precio de la vivienda mediante el uso de métodos estadísticos espaciales.

Title in English: Prediction of house prices through the use of spatial statistical methods.


DISCLAIMER: educational purposes only. Some data files were excluded from the repository in order to respect certain privacy laws. This repository does not represent the Contributors' or any other person's or institution's political or personal views or opinions. The information contained in this repository is for general information and educational purposes only. While we – the Contributors – endeavour to keep the information up to date and correct, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the content or the websites, services, or related graphics contained on this repository for any purpose. Any reliance you place on such information is therefore strictly at your own risk. In no event will we be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or in connection with, the use of the information stated in this repository.

Abstract

The housing market offers great opportunities to those who know how to take advantage of them. In this document we propose a spatial statistical study of the price of housing aimed both at obtaining an accurate prediction and at ensuring that said prediction is based on a solid and stable statistical model throughout the space; guaranteeing acceptable results even when trying to predict based on the data of a new house, totally unrelated to those used in the modeling phase. Thanks to these techniques, we intend to offer a different point of view, basing our study on the influence that the price of a house suffers due to the value of its neighboring homes, and thus proposing a new framework of work and research little explored compared to other more traditional statistical models used for this topic.

Statistical Models used

  • Generalized Linear Model (GLM)
  • Spatial Generalized Linear Model (GLM)
  • Spatial AutoRegressive (SAR)

Data Sources

Data Cleaning & Transformation

Data downloaded over time (API requests) as JSON files with Idealista_API.R, then unified all records and deleted duplicates (by propertyCode), and finally converted to DataFrame with JSON2DF.ipynb on Google Colab.

Made sure that all datatypes are appropriate (e.g. size = number, district = character).

Dashboard Data Sources

Data sources which are especifically used for the Shiny Dashboard.

  • final_df.rds: DataFrame of consolidated data.

  • final_df_scaled.rds: scaled/normalized DataFrame of consolidated data.

  • final_model.rds: Final statistical model.

  • leaflet_data.Rda: DataFrame with all visual data for the interactive leaflet.

  • cod_postal_analysis.Rda: DataFrame of Madrid postal data with average property price per area.

Contributors

  • Andrés David DELGADO MOSQUERA

  • Iván Carlos BARRIO HERREROS

  • David Jo Konstantin TOFAN