Skip to content

Predict home prices using regression, optimizing for R2 and RMSE metrics to create a robust model with minimal prediction deviation.

Notifications You must be signed in to change notification settings

Cintia0528/Data_Science-Supervised_Machine_Learning_Regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Predicting Home Prices with Regression

Goal:

The project aims to predict the prices of homes using regression techniques, evaluating the model's performance based on R2 and RMSE metrics. The goal is to create a reliable model that minimizes the deviation between predicted and actual home prices.

Overview:

In this machine learning journey, I explored regression to predict home prices, moving beyond a binary classification whether a home is expensive or not. The focus shifted to evaluating the model's ability to approximate actual home values. I experimented with different models, hyperparameter tuning, feature selection, and preprocessing techniques to enhance predictive accuracy.

Approach:

  1. Baseline Model: Implemented a DecisionTreeRegressor with RandomSearch hyperparameter tuning as a baseline, achieving an initial R2 of 0.79.
  2. Troubleshooting: Explored options for improvement, including revisiting preprocessing choices and experimenting with more robust scalers, imputers, and models.
  3. Feature Importances: Utilized LazyPredict to identify top-performing models and aggregated their feature importances. Explored the impact of removing features with less than 1% predictive power.
  4. Data Exploration: Analyzed missing values, data types, scale, skewness, and correlation of features. Explored the significance of remaining features.
  5. Transforming, Scaling, and Imputing: Implemented a LogTransformer, MinMaxScaler, and KNN imputer to enhance the model's ability to handle diverse numerical features.
  6. Final Model: Employed a VotingRegressor on the top predictive features identified by the ensemble of models, achieving a final R2 of close to 0.95.

Deliverables:

The project includes a Google Colab notebook containing the code for the regression analysis here. The findings and methodology are further explained in a detailed Medium article here.

Skills and Tools:

  • Regression modeling
  • Hyperparameter tuning
  • Feature Engineering
  • Data Exploration
  • Preprocessing
  • Python
  • scikit-learn
  • XGBoost
  • LightGBM
  • LazyPredict

Further Analysis:

To enhance the model's performance:

  • Feature Engineering: Continue exploring feature engineering techniques to create new meaningful features.
  • Ensemble Techniques: Experiment with different ensemble models or stacking approaches to leverage the strengths of multiple models.
  • Advanced Hyperparameter Tuning: Explore more advanced hyperparameter tuning techniques to fine-tune model parameters for better performance.
  • External Data: Consider incorporating external datasets to provide additional information and improve predictive accuracy.

By focusing on these areas, the model's R2 score and RMSE could potentially be further improved, providing more accurate predictions of home prices.

About

Predict home prices using regression, optimizing for R2 and RMSE metrics to create a robust model with minimal prediction deviation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published