# Feature Engineering: data generation from existing datasets
We are going to generate KPIs to get more meaningful and actionable insights from the existing dataset. These Key Performance Indicators (KPIs) will help us better understand the relationships between different features in the dataset and will provide us with new, valuable information for training the machine learning model. By creating derived features such as price per square meter, price per room, and the relationship between bathrooms and bedrooms, we can enhance the model's ability to predict the target variable. Additionally, we will focus on generating new features that account for location-related factors, property characteristics, and the overall price adjustment based on area, helping us improve the model's accuracy and interpretability.


## Step 1: import dataset and libraries

In [1]:
import pandas as pd
import numpy as np

# Leer el archivo generado con los datos aumentados
file_path = '/home/mike/Escritorio/codes/projects/PropNet/PropNet-project/2_Data_Processing/correlation_analysis/generated_data.csv'
df = pd.read_csv(file_path)

# Ver las primeras filas para asegurarnos de que los datos se cargaron correctamente
df.head()

Unnamed: 0,mainroad,parking,guestroom,airconditioning,price,semi-furnished,bathrooms,prefarea,bedrooms,stories,area,furnished,unfurnished,basement
0,1,2,0,1,13300000,0,2,1,4,3,7420,1,0,0
1,1,3,0,1,12250000,0,4,0,4,4,8960,1,0,0
2,1,2,0,0,12250000,1,2,1,3,2,9960,0,0,1
3,1,3,0,1,12215000,0,2,1,4,2,7500,1,0,1
4,1,2,1,1,11410000,0,1,0,4,2,7420,1,0,1


### Step 2: Create KPI's from existing data


1. `Price per square meter`:
This KPI provides insight into how much the property costs per unit of area. It's calculated by dividing the price by the area. This is helpful to normalize the price with respect to property size.

2. `Price per room`:
This KPI reflects how much the property costs per room, factoring in both bedrooms and bathrooms. It's calculated by dividing the price by the total number of rooms (bedrooms + bathrooms).

3. `Bathroom to Bedroom Ratio`:
This ratio helps to evaluate the balance between the number of bathrooms and bedrooms. It's calculated by dividing the number of bathrooms by the number of bedrooms. A higher ratio may indicate a property with more bathrooms relative to bedrooms, potentially affecting comfort and price.

4. `Total features count`:
This KPI aggregates the number of additional features in the property, such as a guestroom, basement, air conditioning, and whether the property is located in a preferred area or has a furnishing status. This can provide a quick overview of the property’s overall feature set, which may influence the price.

5. `Price adjustment per area`:
This ratio adjusts the price based on the area of the property. It is calculated by dividing the price by the area. This is useful to understand if larger properties have a proportionally higher price or if the price per unit area changes based on size.

6. `Location and features`:
A composite KPI that combines location-based features such as whether the property is on a main road or in a preferred area. This can help assess if properties in prime locations, with additional features, have a higher price.

7. `Furnishing status impact`:
This KPI evaluates the impact of different furnishing statuses (e.g., furnished, semi-furnished, unfurnished) on the property price. It can be used to analyze whether furnished properties tend to have higher prices compared to unfurnished ones.