House Price Prediction Model

Built and optimized a predictive regression model of housing prices with historical CA housing data.

1 | Background

THIS is the dataset used in the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome.

The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.

2 | Project Overview

Many factors affect the value of a house property (eg. location, size, condition, age, etc). Predicting property prices is challenging due to the large number of factors that can affect them. In addition, the limited availability of data means that feature engineering is required to extract meaningful insights from the data. As a result, it is generally difficult to accurately predict property prices that take into account all the factors that influence them.

This project consists of 3 key parts:

Identification of outliers in the dataset
Creation and modification of the features, Feature Engineering, we want to input a feature matrix that will help teach the model data that is most relevant to the prediction of a target variable
Developing the actual model to predict the median_house_value which is our target variable

3 | Exploring the Dataset

The columns are as follows, their names are pretty self explanitory:

longitude
latitude
housing_median_age
total_rooms
total_bedrooms
population
households
median_income
median_house_value
ocean_proximity

Each row pertains to a GROUP of houses (by block), therefore each row represents MEDIAN values of houses in close proximity grouped together.

The dataset has a number of features with some missing data, but not too many instances overall (257/20640); to save time, we simply dropped those rows since the missing data is small compared to the total data.

Observations & Patterns

There are some housing blocks with old age homes in them.
We will need to split the ocean_proximity into binary columns.
We should standardize the scale of the data for any non-tree based methods. As some of the variables range from 0-10, while others go up to 500,000.
The Median_house_value feature has an peak at its maximum value (around 500k), which could be an outlier. This is probably due to "posh" high-end houses on the coast or on various islands.
Population, total_bedrooms, and total_rooms all describe connected attributes, which is revealed by their similar distribution in histogram plots.

4 | Conclusions

As expected, the most positive correlation variables are: median_income, <1H Ocean, Near Bay, and household_rooms. Median income was the strongest indicator of housing value. Nearness to the ocean was the second strongest indicator, showing that house value increases as one approaches the coast. Number of rooms in the household was the third strongest indicator, showing that even a larger inland multi-room house will have a lower value than a smaller house near the coast.

5 | Dependencies

A list of dependencies is included in the pyproject.toml file in this repository.

python = "^3.10"
numpy = "^1.25"
pandas = "^2.0"
matplotlib = "^3.8.0"
seaborn = "^0.13.0"
scikit-learn = "^1.3.1"

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data_raw		data_raw
.gitattributes		.gitattributes
CA-housing.jpg		CA-housing.jpg
LICENSE		LICENSE
README.md		README.md
house-price-prediction.ipynb		house-price-prediction.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

House Price Prediction Model

Table of Contents

1 | Background

2 | Project Overview

3 | Exploring the Dataset

Observations & Patterns

4 | Conclusions

5 | Dependencies

6 | Author

7 | License

8 | Acknowledgements

About

Releases

Packages

Languages

License

ahmedlrashed/housing-prediction-model

Folders and files

Latest commit

History

Repository files navigation

House Price Prediction Model

Table of Contents

1 | Background

2 | Project Overview

3 | Exploring the Dataset

Observations & Patterns

4 | Conclusions

5 | Dependencies

6 | Author

7 | License

8 | Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages