Skip to content

JRRRRRRR/Multiple-Regression-on-House-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multiple Regression on House Data

Introduction

For this project, I'll be working with the King County House Sales dataset.

  • A house buyer assigns me a task about the house in King County. He wants to buy a house in this area but doesn't have any ideas about the housing market. And he has some preferred features in his mind, he wants to have a predicted price so that he can prepare for that.
  • In this business problem, I will use King County House Sales dataset to analyze. Base on this situation, I am planning give him a overview about the house market and show him some important features that need to concern about. Next, I will dig into how the footage of the home(sqft_living) affect the price. Base on the features we get above, I will give some suggestions on selecting which neighborhood to invest. At last, I will make a prediction of price with his prefer features using model. Therefore, I divide this issue into four parts.
  1. What features does he need to concern about?
    • Find the most related features with the price.
  2. How the footage of the home (sqft_living) affect the price?
    • Find the correlation between them and the regression model.
  3. Which neighborhood is better to invest and which should avoid?
    • Find the neighborhood by grade, price, year built sorting
  4. How much should he prepare for the dream house?
    • Find the prediction of price with the model.

Techniques

  • Data collecting, Data cleaning, Exploratory data analysis, and Visualization are all in the Jupyter Notebook.
  • A PowerPoint and pdf for presentation
  • A 5 minutes recording to review
  • https://youtu.be/FNqwf82lGvs

Recommendation

  • For buyers: When buying a house, we should more concern about the powerful grade ranking and house size including footage of the house, number of bathrooms, and so on because these features are most related to the house price. If the fund is enough, Bellevue and Newcastle are good neighborhood to invest because the good grade of them. Otherwise, Federal Way and Kent are alternative choices. When buying houses which located at Seattle, we should be careful because there are a lot of old houses located there.

  • For analysts: Transformation is a good thing to improve R-squared and reduce the condition number

Future Work

  • Interactions: Find some interactions on the model to see whether helpful to improve the R-square
  • Kurtosis: Find some ways to reduce kurtosis to make the distribution more normal
  • Detailed Prediction: Give a price prediction to the buyer based on his prefer features using the model
  • More analyses: Try another business case such as helping a house seller

Summary

From this project, I had the experience that applying multiple regression in our real world. It provided me a chance to review my knowledge and skills. I found my weakness and make a plan to improve those.

About

Multiple Regression on House Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published