Skip to content

DLaury/Housing_Price_Analysis

Repository files navigation

Housing_Price_Analysis\

Alt text

Our team set out to create a machine learning algorithm that can predict of sale prices of homes in Ames, Iowa by using datasets from Kaggle to analyze similar features of houses. The data sets can be found on our site and here: https://www.kaggle.com/c/house-prices-advanced-regression-techniques.

By applying a machine learning regression algorithm we were able to train our program to see the effect of different individual house features (Size, quality, etc) on the final sale price of the house. To showcase this we built a website where you can input the parameters of a new house and the algorithm will return it's predicted value.

Overview

Below, we check for the distribution of the sale price, so that we know its properties for when we want to conduct some regression analysis. From the histogram we can see that our variable is reasonably normally distributed.

Alt text

The correlation matrix is computed into what is known as the correlation coefficient, which ranges between -1 and +1. This heat map gives us a better understanding of the correlation between the different variables. A perfect correlation implies that one of the variable would explain most of movement of the sale price.

Alt text

Alt text

Dealing with missing data: Alt text

Home Features Used: Alt text

Alt text

Principal Component Analysis revealed that the majority of the variance in home prices could be explained with these six features.

Model: Alt text

Elastic Net Regression was used because it works well with datasets that have correlated paramters. (e.g. Number of cars and garage size)

Mean Squared Error (MSE): Measures the average difference between the estimated values and what is estimated. The closer the MSE is to zero the better the models is predicting values.

R-Squared(R2): Measures how close the data are to the fitted regression line/the percentage of the response variable variation that is explained by a linear model. A score of 100% indicates that the model explains all of the variability of the response data around its mean.

Results:

Alt text

Alt text

Our scores indicate that our model explains 79.8% of the response variable variation, and have an average error of 0.233. Put more simply, our model is able to predict what the selling price of a house in Ames, Iowa will be with an accuracy rate of about 80%

New Home Price Predictor:

Alt text

About

Detailed analysis of housing prices by county in Iowa. Using data from Kaggle to analyze features and predict pricing of houses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages