A comprehensive analysis and prediction system for property prices in the Helsinki metropolitan area, featuring data collection, exploratory analysis, and an interactive multi-page Streamlit application.
I've built an interactive demo to explore the results: 🔗 Live Application
This project includes comprehensive Jupyter notebooks covering the entire data science pipeline:
- 📊 Web Scraping - Data collection from Etuovi.com
- 🧹 Data Cleaning - Raw data preprocessing and validation
- 🔍 Exploratory Analysis - Statistical analysis and geographical visualization
- Interactive property map with multiple color schemes (percentile-based, logarithmic, price tiers)
- 3D density visualization showing property distribution
- Geographic analysis across Helsinki, Vantaa, and Espoo
- Real-time property statistics and insights
- Machine learning-powered price predictions using Random Forest
- 25+ location presets covering the Helsinki metropolitan area
- Market comparison and contextual analysis
- Property characteristic impact visualization
- Comprehensive performance metrics (R², MAE, RMSE, MAPE)
- Interactive feature importance analysis with error bars
- Partial dependence plots showing individual feature effects
- Model reliability assessment and insights
helsinki-house-price/
├── app/
│ ├── 🏠_Home.py # Main landing page
│ └── pages/
│ ├── 1_📊_Data_Explorer.py # Data visualization and exploration
│ ├── 2_🔮_Price_Predictor.py # Price prediction interface
│ └── 3_🧠_Model_Analysis.py # Model performance and analysis
├── notebooks/ # Jupyter analysis notebooks
├── data/
│ └── cleaned/
│ └── helsinki_house_price_cleaned.xls
├── requirements.txt
└── README.md
-
Clone the repository
git clone https://github.com/albertonietos/helsinki-house-price.git cd helsinki-house-price
-
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
streamlit run app/🏠_Home.py
The application will open in your browser at http://localhost:8501
The dataset contains property listings scraped from Etuovi.com covering:
- Helsinki - City center and surrounding areas
- Vantaa - Airport region and suburbs
- Espoo - Tech hub and residential areas
- Size: Property area in square meters
- Year: Year the property was built
- Total_rooms: Number of rooms
- Latitude/Longitude: Geographic coordinates
- Price: Target variable (in euros)
- Algorithm: Random Forest Regressor
- Features: Size, Year, Total number of rooms, Latitude, Longitude
- Performance: ~80% R² score
- Validation: Train/test split
- Percentile-based: Balanced color distribution
- Logarithmic: Good for wide price ranges
- Price tiers: Quartile-based categories
- Linear: Traditional min-max scaling
- Hover tooltips with property details
- Zoom and pan capabilities
- Real-time parameter updates
- Responsive design
The analysis reveals key factors affecting Helsinki property prices:
- Size - Most important factor (~60% importance)
- Location - Latitude/Longitude combined (~25% importance)
- Year Built - Age and condition factor (~10% importance)
- Room Count - Layout and functionality (~5% importance)
- Model based on limited feature set
- Market conditions change over time
- External factors not captured (schools, transport, etc.)
- Remember: All models are wrong, some models are useful
- Streamlit: Web application framework
- Pandas: Data manipulation and analysis
- NumPy: Numerical computing
- Scikit-learn: Machine learning algorithms
- Plotly: Interactive visualizations
- PyDeck: Advanced mapping capabilities
@st.cache_data
for data loading@st.cache_resource
for model training- Efficient data processing pipelines
- Responsive UI design
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source and available under the MIT License.
Alberto Nieto
- LinkedIn: albertonietosandino
- GitHub: albertonietos
- Data sourced from Etuovi.com
- Built with Streamlit
- Visualizations powered by Plotly and PyDeck
For questions, suggestions, or issues, please open a GitHub issue or contact me.