🏠 House Sales Analysis in King County, USA- EDA + Regression Project

📑 Table of Contents

Project Overview
Objectives
Dataset
Features
Tools & Libraries
Exploratory Data Analysis(EDA)
Model Evaluation & Refinement
Key Insights
Project Structure
How to Run the Project
Future Improvements
Author & Contact

📌 Project Overview

This project involves:

Performing Exploratory Data Analysis (EDA) to identify trends, patterns, and correlations.
Handling missing data and preparing the dataset.
Building and evaluating regression models (Linear Regression, Polynomial Regression, Ridge Regression).
Comparing model performance to find the best approach for predicting house prices.

🎯 Objectives

The objective of this project is to analyze residential housing data from King County, USA (including Seattle) and predict house sale prices using regression models. The project simulates a real-world scenario where a Real Estate Investment Trust wants to estimate property values based on features like square footage, number of bedrooms, bathrooms, grade, and location.

🗂️ Dataset

Source: King County Housing dataset (modified for learning purposes)
Rows: 21,613
Columns: 22
Target Variable: price

--

Features:

bedrooms, bathrooms
sqft_living, sqft_lot
floors, waterfront, view, condition, grade
sqft_above, sqft_basement
yr_built, yr_renovated, zipcode, lat, long
sqft_living15, sqft_lot15

🔧 Tools & Libraries

Python 🐍
Pandas – Data manipulation & analysis
Matplotlib,Seaborn – Data visualization
Scikit-learn (Linear Regression, Ridge Regression, Polynomial Features, Pipelines) - Model Development
Jupyter Notebook – Development & exploration
Github

🔎 Exploratory Data Analysis (EDA)

Checked missing values (bedrooms, bathrooms) and replaced with mean.
Dropped irrelevant columns (id, Unnamed: 0).
Distribution of houses by number of floors.
Boxplot: waterfront vs price → waterfront houses tend to be more expensive.
Regression plot: sqft_above vs price → strong positive correlation.
Correlation heatmap → sqft_living, grade, and sqft_above most correlated with price.

📈 Model Development

Simple Linear Regression

Feature: long → very weak predictor (R² ≈ 0.00046).
Feature: sqft_living → moderate predictor (R² ≈ 0.49).

Multiple Linear Regression

Features: floors, waterfront, lat, bedrooms, sqft_basement, view, bathrooms, sqft_living15, sqft_above, grade, sqft_living.
R² ≈ 0.65

Polynomial Regression with Pipeline

Degree = 2
R² ≈ 0.75

Model Evaluation & Refinement

Train-Test Split (85% train / 15% test).
Ridge Regression (α=0.1): R² ≈ 0.64 on test data.
Polynomial Features + Ridge Regression: R² ≈ 0.70 on test data.

--

📌 Key Insights

Most important predictors:sqft_living, grade, sqft_above, sqft_living15.
Waterfront houses and homes with higher grade have significantly higher prices.
Polynomial regression improves performance compared to simple linear models.

📂 Project Structure


📦 House-Sale-Analysis-EDA-Regression-Python
│
├── README.md
├── .gitignore
├── notebooks/                  # Jupyter notebooks
│   ├── regression_exploratory_data_analysis.ipynb
├── data/                
    └──kc_house_data_NaN.csv

🚀 How to Run the Project

Clone the repository:

git clone https://github.com/codewithchirag18/House-Sale-Analysis-EDA-Regression-Python.git

Navigate to the folder:

bash
Copy code
cd sales-analysis-eda

Install required libraries:

bash
Copy code
pip install -r requirements.txt

Open Jupyter Notebook:

bash
Copy code
jupyter notebook regression-exploratory_data_analysis.ipynb

📌 Future Work

Try Random Forest or Gradient Boosting for better accuracy.
Feature engineering (e.g., combine year built + renovation into “house age”).
Deploy the model using Flask / Streamlit for predictions.

Author & Contact

Chirag Tomar

Data Analyst

📧 Email: tomarchirag431@gmail.com

🔗 LinkedIn

🔗 LeetCode

--

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏠 House Sales Analysis in King County, USA- EDA + Regression Project

📑 Table of Contents

📌 Project Overview

🎯 Objectives

🗂️ Dataset

Features:

🔧 Tools & Libraries

🔎 Exploratory Data Analysis (EDA)

📈 Model Development

Model Evaluation & Refinement

📌 Key Insights

📂 Project Structure

🚀 How to Run the Project

📌 Future Work

Author & Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

codewithchirag18/House-Sale-Analysis-EDA-Regression-Python

Folders and files

Latest commit

History

Repository files navigation

🏠 House Sales Analysis in King County, USA- EDA + Regression Project

📑 Table of Contents

📌 Project Overview

🎯 Objectives

🗂️ Dataset

Features:

🔧 Tools & Libraries

🔎 Exploratory Data Analysis (EDA)

📈 Model Development

Model Evaluation & Refinement

📌 Key Insights

📂 Project Structure

🚀 How to Run the Project

📌 Future Work

Author & Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages