# Feature Distribution Analysis
## House Prices in Grand Tunis - Data Mining Project

This notebook contains the exploratory data analysis of individual features including size, rooms, bathrooms, and geographic distributions.

## Import Libraries and Load Cleaned Data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load cleaned data from previous notebook
df = pd.read_csv('../data/processed/source_1/apartments_cleaned.csv')
print(f"Loaded dataset shape: {df.shape}")
display(df.head())

## Data Types of Features

In [None]:
display(df.info())

## Understanding Each Feature

Based on your dataset's columns and common property features, here's what each feature represents:

- **`room_count`**: A numerical feature representing the number of rooms in the property.
- **`bathroom_count`**: A numerical feature representing the number of bathrooms in the property.
- **`size`**: A numerical feature representing the size of the property, likely in square meters.
- **`price`**: A numerical feature representing the sale price of the property in kTND.
- **`city`**: A categorical feature indicating the city where the property is located.
- **`region`**: A categorical feature providing a more specific geographical region or neighborhood within the city.

## Distribution of `size`

In [None]:
print("Descriptive statistics for 'size' column:")
display(df['size'].describe())

plt.figure(figsize=(10, 6))
sns.histplot(df['size'], bins=30, kde=True, palette='viridis')
plt.title('Distribution of Property Sizes')
plt.xlabel('Size (square meters)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()

## Distribution of `room_count`

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='room_count', palette='viridis')
plt.title('Distribution of Properties by Room Count')
plt.xlabel('Number of Rooms')
plt.ylabel('Number of Properties')
plt.tight_layout()
plt.show()

## Distribution of `bathroom_count`

In [None]:
print("Descriptive statistics for 'bathroom_count' column:")
display(df['bathroom_count'].describe())

plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='bathroom_count', palette='viridis')
plt.title('Distribution of Properties by Bathroom Count')
plt.xlabel('Number of Bathrooms')
plt.ylabel('Number of Properties')
plt.tight_layout()
plt.show()

## Distribution of Properties by City

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='city', order=df['city'].value_counts().index, palette='viridis')
plt.title('Number of Properties per City')
plt.xlabel('City')
plt.ylabel('Number of Properties')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## Distribution of Properties by Region

In [None]:
plt.figure(figsize=(12, 10)) # Increased figure height for better spacing
sns.countplot(data=df, y='region', order=df['region'].value_counts().index, palette='viridis')
plt.title('Number of Properties per Region')
plt.xlabel('Number of Properties')
plt.ylabel('Region')
plt.show()