This project utilizes machine learning algorithms to predict house prices based on a pool of characteristics,
ranging from the house square footage to the number of bathrooms to how the house was graded by the relevant
authorities. It was originally completed as part of the final project for my course, 'Data Analysis with
Python', offered online by IBM, however I expanded upon it to showcase more skills and techniques learnt
through the course and make it wider in scope.
To build a machine learning model that can predict house prices, the house attributes that are most associated
with price are identified, prepared and preprocessed, and finally used to train the model. More specifically,
different models are developed with the data, trained, evaluated, and improved, before selecting the model that
best accounts for the data, and therefore proves to be the best at producing valid and reliable price predictions.
Each model is tested and verified through in-sample evaluation metrics to evaluate the model's performance in reference
to the data fed to it, out-of-sample evaluations to estimate how the model is likely to perform in the real world, with
novel datasets, and through visualizations to compare the distributions of the predicted prices to the actual prices in
the dataset. Finally, the best model is selected and used to generate predictions.
Overall, the project is broken down into five parts:
1) Loading, Inspecting, and Cleaning the Data
2) Data Preparation and Preprocessing
3) Model Development and Evaluation
4) Hyperparameter Tuning
5) Model Prediction
The dataset being used here was taken from Kaggle.com, a popular website for finding and publishing datasets.
You can quickly access it by clicking here. It presents house sales
in Seattle-King County made between May 2014 to May 2015, and consisting of different house characteristics and
the corresponding sale price for each house.
You can view each coloumn and its description in the table below:
Variable | Description |
---|---|
id | Unique ID for each house sold |
date | Date of the house sale |
price | Price of each house sold |
bedrooms | Number of bedrooms |
bathrooms | Number of bathrooms |
sqft_living | Square footage of the house interior living space |
sqft_lot | Square footage of the lot (land space) |
floors | Number of house floors |
waterfront | Whether a house is overlooking a waterfront (1) or not (0) |
view | Rating of how good the house view is |
condition | Rating of the overall house condition |
grade | Overall grade given to the housing unit, based on King County grading system |
sqft_above | Square footage of the interior housing space that is above ground level |
sqft_basement | Square footage of the interior housing space that is below ground level |
yr_built | Year the house was built |
yr_renovated | Year when house was last renovated |
zipcode | Zip code |
lat | Latitude coordinate |
long | Longitude coordinate |
sqft_living15 | Square footage of the interior housing living space for the closest 15 houses |
sqft_lot15 | Square footage of the lot (land space) for the closest 15 houses |
Here's a screenshot with a sample of the dataset:
The aim of this project is to demonstrate my abilities and coding skills to build, evaluate, and deploy
machine learning models for tasks such as predictive pricing.
For a quick access to the program, you can click on either of the links below. The first one renders the project
ready for viewing whilst the second enables you to both view the code and also interact with it. Both links will
direct you to a Jupyter notebook with the code and its resulting output, segregated and organized into separate blocks
and provided with in-depth explanations or conclusions. If you wish to execute the code or reproduce the analysis results
from the second link, make sure to run the first two cells, which will allow you to install and import all the Python packages
that will be used across the project. To run any given block of code, you have to simply select the cell and click on the 'Run'
icon on the notebook toolbar.
To view the project only, click on the following link:
https://nbviewer.org/github/Mo-Khalifa96/Machine-Learning-for-Predictive-Pricing/blob/main/ML%20for%20Predictive%20Pricing%20%28Predictive%20House%20Prices%29%20-%20Jupyter%20version.ipynb
Alternatively, to view the project and interact with its code, click on the following link:
https://mybinder.org/v2/gh/Mo-Khalifa96/Machine-Learning-for-Predictive-Pricing/main?labpath=ML%20for%20Predictive%20Pricing%20(Predictive%20House%20Prices)%20-%20Jupyter%20version.ipynb