Skip to content

codewithchirag18/House-Sale-Analysis-EDA-Regression-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏠 House Sales Analysis in King County, USA- EDA + Regression Project

📑 Table of Contents


📌 Project Overview

This project involves:

  • Performing Exploratory Data Analysis (EDA) to identify trends, patterns, and correlations.

  • Handling missing data and preparing the dataset.

  • Building and evaluating regression models (Linear Regression, Polynomial Regression, Ridge Regression).

  • Comparing model performance to find the best approach for predicting house prices.


🎯 Objectives

The objective of this project is to analyze residential housing data from King County, USA (including Seattle) and predict house sale prices using regression models. The project simulates a real-world scenario where a Real Estate Investment Trust wants to estimate property values based on features like square footage, number of bedrooms, bathrooms, grade, and location.


🗂️ Dataset

  • Source: King County Housing dataset (modified for learning purposes)

  • Rows: 21,613

  • Columns: 22

  • Target Variable: price

--

Features:

  • bedrooms, bathrooms

  • sqft_living, sqft_lot

  • floors, waterfront, view, condition, grade

  • sqft_above, sqft_basement

  • yr_built, yr_renovated, zipcode, lat, long

  • sqft_living15, sqft_lot15


🔧 Tools & Libraries

  • Python 🐍
  • Pandas – Data manipulation & analysis
  • Matplotlib,Seaborn – Data visualization
  • Scikit-learn (Linear Regression, Ridge Regression, Polynomial Features, Pipelines) - Model Development
  • Jupyter Notebook – Development & exploration
  • Github

🔎 Exploratory Data Analysis (EDA)

  • Checked missing values (bedrooms, bathrooms) and replaced with mean.

  • Dropped irrelevant columns (id, Unnamed: 0).

  • Distribution of houses by number of floors.

  • Boxplot: waterfront vs price → waterfront houses tend to be more expensive.

  • Regression plot: sqft_above vs price → strong positive correlation.

  • Correlation heatmap → sqft_living, grade, and sqft_above most correlated with price.


📈 Model Development

  1. Simple Linear Regression
  • Feature: long → very weak predictor (R² ≈ 0.00046).

  • Feature: sqft_living → moderate predictor (R² ≈ 0.49).

  1. Multiple Linear Regression
  • Features: floors, waterfront, lat, bedrooms, sqft_basement, view, bathrooms, sqft_living15, sqft_above, grade, sqft_living.

  • R² ≈ 0.65

  1. Polynomial Regression with Pipeline
  • Degree = 2

  • R² ≈ 0.75


Model Evaluation & Refinement

  • Train-Test Split (85% train / 15% test).

  • Ridge Regression (α=0.1): R² ≈ 0.64 on test data.

  • Polynomial Features + Ridge Regression: R² ≈ 0.70 on test data.

--

📌 Key Insights

  • Most important predictors:sqft_living, grade, sqft_above, sqft_living15.

  • Waterfront houses and homes with higher grade have significantly higher prices.

  • Polynomial regression improves performance compared to simple linear models.

📂 Project Structure


📦 House-Sale-Analysis-EDA-Regression-Python
│
├── README.md
├── .gitignore
├── notebooks/                  # Jupyter notebooks
│   ├── regression_exploratory_data_analysis.ipynb
├── data/                
    └──kc_house_data_NaN.csv


🚀 How to Run the Project

  1. Clone the repository:
    git clone https://github.com/codewithchirag18/House-Sale-Analysis-EDA-Regression-Python.git
  2. Navigate to the folder:
bash
Copy code
cd sales-analysis-eda
  1. Install required libraries:
bash
Copy code
pip install -r requirements.txt
  1. Open Jupyter Notebook:
bash
Copy code
jupyter notebook regression-exploratory_data_analysis.ipynb

📌 Future Work

  • Try Random Forest or Gradient Boosting for better accuracy.

  • Feature engineering (e.g., combine year built + renovation into “house age”).

  • Deploy the model using Flask / Streamlit for predictions.


Author & Contact

Chirag Tomar

Data Analyst

📧 Email: tomarchirag431@gmail.com

🔗 LinkedIn

🔗 LeetCode

--

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published