Predicting house prices in Bengaluru using machine learning.
This project focuses on predicting house prices in Bengaluru, India, based on various features such as location, size, number of bathrooms, and balconies. It uses machine learning techniques to build a regression model capable of providing accurate price estimates for residential properties.
The dataset used for this project is sourced from 'Bengaluru_House_Data.csv.' It contains information about properties in Bengaluru, including their attributes and corresponding prices.
- Handling Missing Values: Rows with missing values in the 'location,' 'size,' 'bath,' and 'balcony' columns are removed.
- Handling 'total_sqft' column: The 'total_sqft' column is converted to numeric, handling different formats.
- Encoding Categorical Variables: Location data is label-encoded for model compatibility.
- Feature Selection: Relevant features including 'location_encoded,' 'total_sqft,' 'bath,' and 'balcony' are selected.
- Data Splitting: The dataset is split into training and testing sets for model evaluation.
- Polynomial Features: Polynomial features of degree 2 are added to capture complex relationships.
- Ridge Regression: A Ridge Regression model with L2 regularization is trained to predict house prices.
- Pipeline: A Scikit-Learn pipeline is used to streamline data preprocessing and model training.
The trained model is evaluated using the following metrics:
- Mean Absolute Error (MAE): 42.83188201062303
- Mean Squared Error (MSE): 9688.112980255195
- Root Mean Squared Error (RMSE): 98.42821231870055
- R-squared (R2): 0.48247268430488044
- Cross-Validation RMSE: 97.99593594870805
- Best Hyperparameters: {'ridge__alpha': 0.001}
Cross-validation is performed to assess model performance across multiple folds. The root mean squared error (RMSE) is used as the evaluation metric.
RandomizedSearchCV is employed to fine-tune the hyperparameters of the Ridge Regression model, optimizing its predictive accuracy.
- Clone this repository.
- Ensure you have the required libraries installed (
pip install -r requirements.txt
). - Run the Python script to predict house prices.
For any questions, feedback, or clarifications, please feel free to reach out via GitHub or LinkedIn.
This project is licensed under the MIT License - see the LICENSE file for details.