# Real Estate Data Analysis – ImmoEliza

## Project Overview

The goal of this challenge is to support the real estate company *ImmoEliza* in its ambition to become the leading real estate player in Belgium. To do so, the company needs a strong pricing strategy based on data.

Before building a machine learning model, we will perform a thorough data analysis to:

- Understand the structure and content of the dataset
- Clean and prepare the data
- Extract key insights for business decision-making
- Visualize patterns and trends in the Belgian real estate market

This project is carried out as part of the `challenge-data-analysis`.

## Team Members
- [Evi]
- [Moussa]
- [Yves]

## Notebook Structure
1. Data loading and exploration  
2. Data cleaning  
3. Exploratory data analysis (EDA)  
4. Guided analysis and visual questions  
5. Interpretation and business insights  
6. Optional bonus visualizations  
7. Export and documentation  

# 1. Data loading and exploration  
- 1.1. Import Required Libraries
- 1.2. Load the Dataset
- 1.3. First Glance at the Data
  - Dataset shape
  - Column names
  - Data types
  - First rows (`.head()`)

## 1.1. Import Required Libraries

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 1.2. Load the Dataset


In [None]:
csv_path = "../data/zimmo_real_estate_jgchoti.csv"
df = pd.read_csv(csv_path)

## 1.3. First Glance at the Data
  - Dataset shape
  - Column names
  - Data types
  - First rows (`.head()`)

In [None]:
# Display dataset shape: number of rows and columns
print("Dataset shape:", df.shape)

# Display column names
print("\nColumn names:")
print(df.columns.tolist())

# Display data types
print("\nData types:")
print(df.dtypes)

# Display the first 5 rows
df.head()

# 2. Data cleaning 
- 2.1. Remove Duplicates
- 2.2. Handle Missing Values
- 2.3. Clean Whitespace and Fix Formatting
- 2.4. Save Cleaned Dataset (Optional)

# 3. Exploratory data analysis (EDA) 
- 3.1. Variable Types: Quantitative vs Qualitative
- 3.2. Missing Values Overview
- 3.3. Descriptive Statistics (Mean, Median, etc.)
- 3.4. Distribution Visualizations
  - Histograms
  - Boxplots
- 3.5. Correlation Matrix & Heatmap
- 3.6. Outlier Detection

# 4. Guided analysis and visual questions  
- 4.1. Most & Least Expensive Municipalities
  - Belgium, Wallonia, Flanders
  - Avg / Median / Price per m<sup>2</sup>
-  4.2. Most Influential Variables on Price
-  4.3. Variables with Low or No Impact
-  4.4. Histogram: Properties by Surface
-  4.5. Encoding Strategy for Categorical Variables

# 5. Interpretation and Business Insights
- 5.1. Summary of Key Findings
- 5.2. Business Recommendations for ImmoEliza
- 5.3. Data Limitations

# 6. Optional bonus visualizations 
-  6.1. Geo Mapping (price per region/municipality)
-  6.2. Trendlines or Regression Analysis
-  6.3. Clustering or Time Evolution (if available)

# 7. Export and documentation  
- 7.1. Export Final Clean Dataset
- 7.2. Save Visuals and Aggregated Tables
- 7.3. Final README Content
  - Project description
  - Installation
  - Usage
  - Visual examples
  - Team & timeline