Skip to content

JinalShah2002/House-Prices-Challenge-Solution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Housing Prices

alt text

What is this repository?

As an upcoming ML engineer, I challenged myself to put my machine learning skills to the test. I challenged myself by tackling the Housing Prices Challenge on Kaggle. The goal of this challenge is to predict the prices of houses in Ames, Iowa based on a given set of features. To be exact, there are 79 features in total. This project allows the engineer (in this case myself) to practice critical Data Science & Machine Learning techniques.

This repository is organized via various self-explanatory folders.

The model is evaluated using the Root Mean Square Error, as this is the metric we are trying to minimize. My best model has a RMSE of 0.13757. This currently ranks in the top 43%. In reality, my solution would be much higher for various reasons:

  • Some solutions have an unfeasible RMSE of 0.0. No Machine Learning model can predict with such accuracy. I suspect cheating occured here.
  • Some solutions have a RMSE of 0.00044. After further inspection of such solutions, I found that these solutions are invalid because of the fact that competitors are simply providing the results of answers to a similar challenge (Boston Housing Prices). Once again, I believe this is cheating since no real Machine Learning methodologies are being deployed.

Final Model: My best model is a tuned CatBoost Model.

Note: you may use my solution as a reference; however, I would strongly advise you to tackle this challenge on your own. The only way you will get better at machine learning is to practice it on your own. I do not condone nor am I responsible for any cheating that may occur as a result of this repository.

Machine Learning Project Checklist:

This checklist is what I use for every ML project. This goes through every major step & ensures that I have done everything correctly.

  1. Framing the Problem - Complete
  2. Getting the Data - Complete
  3. Exploring the Data - Complete
  4. Data Preprocessing - Complete
  5. Model Development - Complete
  6. Model Tuning/Ensemble Learning - Complete
  7. Deploying Model on Test Set & Presentation of Solution - Complete

What tools are used in this project?

References

Future Adjustments

In reality, there are infinite adjustments I could make to improve my score; however, here a couple fruitful ones:

  • Combine the Tuned-CatBoost model with some other models (Linear Regression & Support Vector Machines seem promising)
  • Feature Engineering: I could maybe cut down the categories for certain features.
  • Feature Importance: Further feature selection. Use my model to make better selections for features.
  • Maybe incorporate outside data like many credible top-ranked solutions.

Closing Remarks

This project was very enjoyable ,and I definitely learned a lot along the way! I would recommend this challenge to anyone who is looking to dive into Machine Learning & Data Science. It is quite simple, and the dataset is relatively small & not overwhelming. Overall, this challenge was really fun and a great learning experience!

About the author

I am an undergraduate student @ Rutgers University New Brunswick, who is pursing bachelor degrees in Computer Science and Cognitive Science. Furthermore, I am pursing a certificate in Data Science. I have a passion for AI ,and I am always intriguied by its power. Feel free to contact me via Linkedln.
Enjoy!
Jinal Shah