We are set to explore the relationship between socioeconomic factors and property valuations within the context of Denmark's municipalities. Our research is split into two main questions:
- How can we predict high-demand areas and property value, both historically and presently?
- How can we model and forecast property demand trends over the next 5 years?
Our analysis will leverage data from various sources:
-
Property Listing Data
- Sites such as Home.dk, Boliga.dk, and Boligsiden.dk
- Acquisition method: Webscraping
-
Historical Property Data
- Public sources like Vurderingsstyrelsen, BBR, and DAWA
- Acquisition method: API
-
Social/Economic Data
- Historical data from sources such as Danmarks Statistik (DST) and DinGeo
The project's data analysis will adhere to a typical analytical pipeline:
- Data Collection: Tools such as web-scraping and fetching data via API will be used.
- Data Preparation & Processing: Conducted using Python.
- Exploratory Data Analysis (EDA): To comprehend the nature of the regressions we'll be dealing with.
- Machine Learning (ML) Analysis: Our primary tool for predicting property values based on aggregated socioeconomic data. Depending on data distribution, we'll pick a suitable regression model. K-fold cross validation will be employed to optimize our training and test sets. Additionally, we'll explore L1, L2, or elastic net regressions based on multicollinearity in our dataset.
- Sharing & Visualization: Our findings will be represented through intuitive visualizations and concise data tables or dashboards.