Linear Regression modeling on House Prices dataset using Python (scikit-learn).
This project applies simple and multiple linear regression techniques using Python and scikit-learn to predict house prices based on numerical features. It is part of my AI & ML internship focused on building predictive modeling skills.
- Name: House Prices β Advanced Regression Techniques
- Source: Kaggle
- File:
train.csv - Target Variable:
SalePrice
To predict house sale prices using linear regression models based on relevant features like GrLivArea, OverallQual, and GarageCars.
- Import & explore dataset
- Handle missing values (initial check only)
- Build a Simple Linear Regression model (1 feature)
- Build a Multiple Linear Regression model (3 features)
- Evaluate models using:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- RΒ² Score
- Visualize the simple regression line
- Python (Google Colab)
- Pandas, NumPy
- Scikit-learn
- Seaborn & Matplotlib
| File | Description |
|---|---|
train.csv |
Dataset used for modeling |
linear_regression_house_prices.ipynb |
Complete notebook with EDA & regression models |
README.md |
This project overview and documentation |
Both simple and multiple regression models were implemented and evaluated.
The multiple regression model achieved higher accuracy and interpretability.
- Linear regression workflow (fit β predict β evaluate)
- How to interpret model coefficients
- Regression evaluation metrics
- Using visualization to validate predictions
π Part of a 45-day AI & ML Internship (2025)