# Housing prices in Hyderabad, India

## Project Objective ðŸŽ¯

The objective of this project is to develop a regression model to predict housing prices in Hyderabad, India. Using features such as the property's area, location, number of bedrooms, and available amenities, the model will aim to estimate the market value of a property as accurately as possible.

- This predictive model will be a valuable tool for:
- Home Buyers and Sellers: To obtain an objective price estimate for a property.
- Real Estate Agents: To assist with property valuation and client advisory.
- Investors: To identify potentially undervalued or overvalued properties in the market.

## 5. Evaluate the models

### 5.1 Loading the dataset

In [16]:
import pandas as pd

import sys
sys.path.append('../../src/utils')

# Utilities
from regresion_metrics_column_definition import Metric


metrics = '../../datasets/processed/housing_prices/hyderabad_house_price_metrics.csv'

metrics_dataset = pd.read_csv(metrics)

models_without_pca = metrics_dataset[~metrics_dataset['model_name'].str.contains('PCA')]
models_with_pca = metrics_dataset[metrics_dataset['model_name'].str.contains('PCA')]


### 5.3 Evaluate Models without PCA

In [17]:
# weight for rmse and r2
w_rmse = 1.0
w_r2 = 1.0

# Normalize and get the rmse cost
rmse = -models_without_pca[Metric.MEAN_TEST_NEG_RMSE.name]
delta_rmse = rmse.max() - rmse.min()
cost_rmse = (rmse - rmse.min()) / delta_rmse if delta_rmse > 0 else 0

# Normalize and get the r2 cost
r2 = models_without_pca[Metric.MEAN_TEST_R2.name]
delta_r2 = r2.max() - r2.min()
cost_r2 = (r2.max() - r2) / delta_r2 if delta_r2 > 0 else 0

# Getting the custom score
models_without_pca[Metric.SCORE.name] = (w_rmse * cost_rmse) + (w_r2 * cost_r2)

# Finding the best model
best_model_idx = models_without_pca[Metric.SCORE.name].idxmin()
best_model_info = models_without_pca.loc[best_model_idx]

# --- Show Results ---
print("--- Best found Model without PCA ---")
print(f"Index: {best_model_idx}")
print(f"Model Name: {best_model_info['model_name']}")
print(f"Score: {best_model_info[Metric.SCORE.name]:.8f} (smaller is better)")
print("\nModel Metrics:")
print(f"  -> RMSE: {-best_model_info['mean_test_neg_rmse']:.4f}")
print(f"  -> R2 Score: {best_model_info['mean_test_r2']:.4f}")
print("\Hyperparameter:")
print(best_model_info['params'])

--- Best found Model without PCA ---
Index: 212
Model Name: L2
Score: 0.00000098 (smaller is better)

Model Metrics:
  -> RMSE: 0.2313
  -> R2 Score: 0.8660
\Hyperparameter:
{'regresion__alpha': np.float64(0.004520353656360241)}


  print("\Hyperparameter:")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  models_without_pca[Metric.SCORE.name] = (w_rmse * cost_rmse) + (w_r2 * cost_r2)


### 5.3 Evaluate Models without PCA

In [18]:
# weight for rmse and r2 and components
w_rmse = 1.0
w_r2 = 1.0
w_pca = 0.5 

# Normalize and get the rmse cost
rmse = -models_with_pca[Metric.MEAN_TEST_NEG_RMSE.name]
delta_rmse = rmse.max() - rmse.min()
cost_rmse = (rmse - rmse.min()) / delta_rmse if delta_rmse > 0 else 0

# Normalize and get the r2 cost
r2 = models_with_pca[Metric.MEAN_TEST_R2.name]
delta_r2 = r2.max() - r2.min()
cost_r2 = (r2.max() - r2) / delta_r2 if delta_r2 > 0 else 0

# Normalize and get the components cost
components = models_with_pca[Metric.PCA_COMPONENTS.name]
delta_components = components.max() - components.min()
cost_components = (components - components.min()) / delta_components if delta_components > 0 else 0

# Getting the custom score
models_with_pca[Metric.SCORE.name] = (w_rmse * cost_rmse) + (w_r2 * cost_r2) + (w_pca * cost_components)

# Finding the best model
best_model_idx = models_with_pca[Metric.SCORE.name].idxmin()
best_model_info = models_with_pca.loc[best_model_idx]

# --- Show Results ---
print("--- Best found Model with PCA ---")
print(f"Index: {best_model_idx}")
print(f"Model Name: {best_model_info['model_name']}")
print(f"Score: {best_model_info[Metric.SCORE.name]:.8f} (smaller is better)")
print("\nModel Metrics:")
print(f"  -> RMSE: {-best_model_info['mean_test_neg_rmse']:.4f}")
print(f"  -> R2 Score: {best_model_info['mean_test_r2']:.4f}")
print("\Hyperparameter:")
print(best_model_info['params'])

--- Best found Model with PCA ---
Index: 1527
Model Name: PCA+L1
Score: 0.30556237 (smaller is better)

Model Metrics:
  -> RMSE: 0.2595
  -> R2 Score: 0.8325
\Hyperparameter:
{'pca__n_components': 35, 'regresion__alpha': np.float64(0.00014873521072935117)}


  print("\Hyperparameter:")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  models_with_pca[Metric.SCORE.name] = (w_rmse * cost_rmse) + (w_r2 * cost_r2) + (w_pca * cost_components)
