# Heatmap matrix correlation between price and variables in every region in Belgium
We are analyzing property data in Belgium and generating correlation heatmaps for different property types.
## Step 1: Import Required Libraries
We begin by importing the necessary libraries:

In [1]:
import pandas as pd
import plotly.express as px
import numpy as np


## Step 2: Load the Data
Next, we load the property data from a CSV file:

In [2]:
df = pd.read_csv("../property_data.csv")

## Step 3: Data Cleaning
3.1. Removing Duplicates

To ensure each data point is unique, we remove duplicate entries:

In [3]:
df = df.drop_duplicates()

3.2. Removing Leading and Trailing Spaces
To maintain data consistency, we strip leading and trailing spaces:

In [4]:
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)


3.3. Filling Missing Values
We replace missing values in numeric columns with '0' and in non-numeric columns with 'unknown':

In [5]:
df.loc[:, df.dtypes == np.float64] = df.loc[:, df.dtypes == np.float64].fillna(0)
df.loc[:, df.dtypes == np.int64] = df.loc[:, df.dtypes == np.int64].fillna(0)
df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('unknown')

## Step 4: Getting regions

In [6]:
def get_region(zip_code):
    if 1000 <= zip_code <= 1299:
        return 'Brussels-Capital Region'
    elif 1300 <= zip_code <= 1499:
        return 'Province of Walloon Brabant'
    elif (1500 <= zip_code <= 1999) or (3000 <= zip_code <= 3499):
        return 'Province of Flemish Brabant'
    elif 2000 <= zip_code <= 2999:
        return 'Province of Antwerp'
    elif 3500 <= zip_code <= 3999:
        return 'Province of Limburg'
    elif 4000 <= zip_code <= 4999:
        return 'Province of Liege'
    elif 5000 <= zip_code <= 5999:
        return 'Province of Namur'
    elif (6000 <= zip_code <= 6599) or (7000 <= zip_code <= 7999):
        return 'Province of Hainaut'
    elif 6600 <= zip_code <= 6999:
        return 'Province of Luxembourg'
    elif 8000 <= zip_code <= 8999:
        return 'Province of West Flanders'
    elif 9000 <= zip_code <= 9999:
        return 'Province of East Flanders'

## Step 4: Data Processing
4.1. Ensure Correct Data Types
We convert specific columns into appropriate data types:

In [7]:
# Ensure that the 'Zip code' column is in integer format
df['Zip code'] = df['Zip code'].astype(int)

# Map the 'Zip code' column to regions
df['Region'] = df['Zip code'].apply(get_region)


## Step 5: Data Analysis
We group the data by 'Type of property' and conduct an analysis for each group. We skip the groups where type_of_property is '0' or 'unknown'. Depending on the type of property ('apartment' or other), we select a different set of numeric columns. We then calculate a correlation matrix for these numeric columns and visualize this matrix using a heatmap:

In [8]:
# Calculate the correlation matrix of the numeric columns within each region and type of property
for (region, type_of_property), group_df in df.groupby(['Region', 'Type of property']):
    print(f"Region: {region}, Type of property: {type_of_property}")

    # Use a different set of numeric columns for apartments and houses
    if type_of_property == 'apartment':
        numeric_cols = ['Price of property in euro', 'Kitchen', 'Number of bedrooms', 'Living area', 'Terrace area', 'Garden', 'Garden area', 'Number of facades']
    else:
        numeric_cols = ['Price of property in euro', 'Kitchen', 'Number of bedrooms', 'Living area', 'Terrace area', 'Garden', 'Garden area', 'Surface of the land(or plot of land)', 'Number of facades', 'Swimming pool']
    
    corr = group_df[numeric_cols].corr()
    fig = px.imshow(corr, title=f"Correlation matrix of variables and price in {region} for {type_of_property}", zmin=-1, zmax=1)
    fig.show()


Region: Brussels-Capital Region, Type of property: apartment


Region: Brussels-Capital Region, Type of property: house


Region: Province of Antwerp, Type of property: apartment


Region: Province of Antwerp, Type of property: house


Region: Province of East Flanders, Type of property: apartment


Region: Province of East Flanders, Type of property: house


Region: Province of Flemish Brabant, Type of property: apartment


Region: Province of Flemish Brabant, Type of property: house


Region: Province of Hainaut, Type of property: apartment


Region: Province of Hainaut, Type of property: house


Region: Province of Liege, Type of property: apartment


Region: Province of Liege, Type of property: house


Region: Province of Limburg, Type of property: apartment


Region: Province of Limburg, Type of property: house


Region: Province of Luxembourg, Type of property: apartment


Region: Province of Luxembourg, Type of property: house


Region: Province of Namur, Type of property: apartment


Region: Province of Namur, Type of property: house


Region: Province of Walloon Brabant, Type of property: apartment


Region: Province of Walloon Brabant, Type of property: house


Region: Province of West Flanders, Type of property: apartment


Region: Province of West Flanders, Type of property: house


Price has the biggest correlation with variable living area and number of bedrooms because they are indicators of the size and capacity of the property. Larger and more spacious properties tend to have higher prices than smaller and more cramped ones. This means that larger and more spacious properties tend to have higher prices than smaller and more cramped ones.
Price has the smallest correlation with type of property appartement and type of sale under an option because they are not very relevant or distinctive factors for determining the price. Appartements are a common and diverse type of property that can have a wide range of prices depending on other features. Type of sale under an option is a temporary and conditional agreement that does not reflect the final or actual price of the property.
We find that price has the smallest correlation with type of property appartement and type of sale under an option, which are not very relevant or distinctive factors for determining the price. This means that these types do not have much influence on the price and can be ignored or excluded from our analysis.
	
Properties that need renovation tend to have lower prices than properties that are in good or new condition, because they require more work and investment from the buyer. Ground floor properties may have less privacy, security, or view than properties on higher floors, which may reduce their appeal and value.