# Housing prices in Hyderabad, India

## Project Objective 🎯

The objective of this project is to develop a regression model to predict housing prices in Hyderabad, India. Using features such as the property's area, location, number of bedrooms, and available amenities, the model will aim to estimate the market value of a property as accurately as possible.

- This predictive model will be a valuable tool for:
- Home Buyers and Sellers: To obtain an objective price estimate for a property.
- Real Estate Agents: To assist with property valuation and client advisory.
- Investors: To identify potentially undervalued or overvalued properties in the market.

## 1.1 Getting training, validation, test datasets

In [13]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import sys

sys.path.append('../../src/utils')


# Utilities
from regresion_metrics import evaluate_model_metrics


training_features = pd.read_parquet('../../datasets/processed/housing_prices/hyderabad_house_price_training_features.parquet')
training_labels = pd.read_parquet('../../datasets/processed/housing_prices/hyderabad_house_price_training_labels.parquet')

validation_features = pd.read_parquet('../../datasets/processed/housing_prices/hyderabad_house_price_validation_features.parquet')
validation_labels = pd.read_parquet('../../datasets/processed/housing_prices/hyderabad_house_price_validation_labels.parquet')

test_features = pd.read_parquet('../../datasets/processed/housing_prices/hyderabad_house_price_test_features.parquet')
test_labels= pd.read_parquet('../../datasets/processed/housing_prices/hyderabad_house_price_test_labels.parquet')


## 1.2 Training and Predict with predetermined hyperparameters

In [15]:
# Training
linealRegresionModel = LinearRegression()
linealRegresionModel.fit(training_features, training_labels)

# Predict data sets (validation, test)
validation_predictions = linealRegresionModel.predict(validation_features)
test_predictions = linealRegresionModel.predict(test_features)

# Metrics
validation_metrics = {
    'MAE': mean_absolute_error(validation_labels, validation_predictions),
    'MSE': mean_squared_error(validation_labels, validation_predictions),
    'RMSE': np.sqrt(mean_squared_error(validation_labels, validation_predictions)),
    'R²': r2_score(validation_labels, validation_predictions)
}

test_metrics = {
    'MAE': mean_absolute_error(test_labels, test_predictions),
    'MSE': mean_squared_error(test_labels, test_predictions),
    'RMSE': np.sqrt(mean_squared_error(test_labels, test_predictions)),
    'R²': r2_score(test_labels, test_predictions)
}

comparison_df = pd.DataFrame({
    'Validation Set': validation_metrics,
    'Test Set': test_metrics
})

comparison_df = comparison_df.round(4)

print("--- Regresion Metrics ---")
print(comparison_df)

intercepto = linealRegresionModel.intercept_
coeficientes = linealRegresionModel.coef_

coeficientes_df = pd.DataFrame(
    data=coeficientes.T,
    index=training_features.columns, 
    columns=['Coeficiente (m)']
)

print(f"\nIntercept (b): {intercepto[0]:.4f}\n")
print("--- Coefficients for each feature ---")
coeficientes_df

--- Regresion Metrics ---
      Validation Set  Test Set
MAE           0.1501    0.1605
MSE           0.0393    0.0516
RMSE          0.1982    0.2272
R²            0.8859    0.8750

Intercept (b): 5.3275

--- Coefficients for each feature ---


Unnamed: 0,Coeficiente (m)
Area,1.375493
No. of Bedrooms,-0.063752
Resale,0.035315
MaintenanceStaff,-0.064707
Gymnasium,-0.052377
...,...
Location_Tarnaka,0.628071
Location_Tellapur,0.546346
Location_TellapurOsman Nagar Road,0.651408
Location_Toli Chowki,0.531638
