# Heatmap matrix correlation between price and variables in every region in Belgium
We are analyzing property data in Belgium and generating correlation heatmaps for different property types.
## Step 1: Import Required Libraries
We begin by importing the necessary libraries:

In [1]:
import pandas as pd
import plotly.express as px
import numpy as np


## Step 2: Load the Data
Next, we load the property data from a CSV file:

In [2]:
df = pd.read_csv("property_data.csv")

## Step 3: Data Cleaning
3.1. Removing Duplicates

To ensure each data point is unique, we remove duplicate entries:

In [3]:
df = df.drop_duplicates()

3.2. Removing Leading and Trailing Spaces
To maintain data consistency, we strip leading and trailing spaces:

In [4]:
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)


3.3. Filling Missing Values
We replace missing values in numeric columns with '0' and in non-numeric columns with 'unknown':

In [5]:
df.loc[:, df.dtypes == np.float64] = df.loc[:, df.dtypes == np.float64].fillna(0)
df.loc[:, df.dtypes == np.int64] = df.loc[:, df.dtypes == np.int64].fillna(0)
df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('unknown')

## Step 4: Data Processing
4.1. Ensure Correct Data Types
We convert specific columns into appropriate data types:

In [6]:
df['Zip code'] = df['Zip code'].astype(int)
df['Subtype of property'] = df['Subtype of property'].astype(str)
df['Type of Sale'] = df['Type of Sale'].astype(str)
df['State of the building'] = df['State of the building'].astype(str)


4.2. One-Hot Encoding

We perform one-hot encoding on categorical variables, resulting in binary columns for each category:

In [7]:
df_encoded = pd.get_dummies(df, columns=['Subtype of property', 'Type of Sale', 'State of the building'])

## Step 6: Data Analysis
We group the data by 'Type of property' and conduct an analysis for each group. We skip the groups where type_of_property is '0' or 'unknown'. Depending on the type of property ('apartment' or other), we select a different set of numeric columns. We then calculate a correlation matrix for these numeric columns and visualize this matrix using a heatmap:

In [8]:
# Calculate the correlation matrix of the numeric columns within each region and type of property
for (region, type_of_property), group_df in df.groupby(['Region', 'Type of property']):
    print(f"Region: {region}, Type of property: {type_of_property}")

    # Use a different set of numeric columns for apartments and houses
    if type_of_property == 'apartment':
        numeric_cols = ['Price of property in euro', 'Kitchen', 'Number of bedrooms', 'Living area', 'Terrace area', 'Garden', 'Garden area', 'Number of facades']
    else:
        numeric_cols = ['Price of property in euro', 'Kitchen', 'Number of bedrooms', 'Living area', 'Terrace area', 'Garden', 'Garden area', 'Surface of the land(or plot of land)', 'Number of facades', 'Swimming pool']
    
    corr = group_df[numeric_cols].corr()
    fig = px.imshow(corr, title=f"Correlation matrix of variables and price in {region} for {type_of_property}", zmin=-1, zmax=1)
    fig.show()


KeyError: 'Region'