# House Price Prediction using Linear Regression

`Author:` [Sagar Kanekar](https://github.com/TheShade1551)\
`Date:` 10.December.2024\
`Dataset:`[AMES Housing Dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)


### About Dataset(Meta Data)

#### Context:-
- Following Dataset is a Multivariate Dataset of Collection of Different Houses & their Detailed Descriptions with the Price at Which they were sold.
- This Dataset Contains 79 Attributes (Categorical & Numerical) which provide Diverse Information to Predict the Target Variable i.e. Selling Price of the Respective House.

#### Content:-
- The 79 Attributes in this dataset can be classified as:-
  1. **Zone Attributes**:- Focusing on the Location of Property in context of Surroundings.
  2. **Lot Configurations**:- Regarding Properties of the Lot Specifications.
  3. **House Floors**:- Data about Floorwise Informations(Basement, 1st & 2nd Floor)
  4. **Amenities**:- About Different Aspects of House other than Living Area viz. Fireplaces, Garage, Pool, Porches,
  5. **Rooms**:- Information about Living Rooms, Bathrooms & Kitchens within House.
  6. **Selling Data**:- Selling Details, eg. Month, Year, Type of Deal. etc.

### Acknowledgements:-
- The Ames Housing dataset was compiled by Dean De Cock for use in data science education.
- It Provides a Modernized and Expanded version of the often cited Boston Housing dataset.
- The Dataset is Made Available by Kaggle through Open Competition.

### Citation:-
- [Anna Montoya and DataCanary. House Prices - Advanced Regression Techniques.](www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview/$citation)

---
### Problem Statement:-
- To Implement a Linear Regression Model to Predict the Prices of Houses based on the following:-
  1. Square Footage of House
  2. Number of Bedrooms & Bathrooms

### Procedure:-
- To Get the Designated Features within a Dataframe
- Exploratory Data Analysis
- Handling Missing Values
- Scaling Data(Wherever Necessary)
- Combining Data into 3 Features
  1. Total_Living_Area:- Footage of All Living Area Combined.(All the Floors)
  2. Total_Square_Footage:- Total Area = Living Area + Amenities Area
  3. Rooms:- Total Number of Bedrooms & Bathrooms(considering appropriate weights)
- Multivariate Linear Regression
  1. With Total_Living_Area & Rooms Features
  2. With Total_Square_Footage & Rooms Features
  3. With Plot Area & Rooms Features
- Measuring Accuracy using Evaluation Metrics






# `Import Libraries`

In [None]:
# Importing Libraries

# Data Manipulation & Analysis.
import pandas as pd
import numpy as np

# Data Visulization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning Models & Utilities
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

#`Load Dataset`

In [None]:
#Loading Dataset
df = pd.read_csv('AMES Housing Dataset.csv')

In [None]:
#Understanding Features:- Data types & Non-Null Counts
df.info()

In [None]:
df.shape

#### Shortlisting Features
As We Undergo the Regression Task, We Can Only Consider the Numerical Features within the Dataset.
- **Living Area Features**:- `TotalBsmtSF`,`1stFlrSF`,`2ndFlrSF`
- (not considering `GrLivArea` as it only provides the living area above ground, which is already mentioned through `1stFlrSF` & `2ndFlrSF`)
- **Amenities**:- `GarageArea`, `WoodDeckSF`, `OpenPorchSF`, `EnclosedPorch`, `3SsnPorch`, `ScreenPorch`, `PoolArea`
- **Rooms**:- `BsmtFullBath`, `BsmtHalfBath`, `FullBath`, `HalfBath`, `BedroomAbvGr`
- **Lot-Area**:- `LotArea` gives the total Lot Area of the House-Plot
- **Selling-Price**:-`SalePrice`, the Target Variable.

Hence, Shortlisting these 17 Features within a new Dataframe.


In [None]:
# Creating New Dataframe with Selected Features
Data = df[['TotalBsmtSF','1stFlrSF','2ndFlrSF','GarageArea','WoodDeckSF','OpenPorchSF','EnclosedPorch','3SsnPorch','ScreenPorch','PoolArea','BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','BedroomAbvGr','LotArea','SalePrice']]

In [None]:
Data.sample(10)

#### Concatenating Features
- Concatenating Living Area Features:- `TotalBsmtSF`,`1stFlrSF`,`2ndFlrSF`
- Concatenating All Porch-Area Features into one (as Most Houses have only 1 type of Porch out of given 4)

In [None]:
#Creating Feature:- LivingArea
Data['LivingArea'] = Data[['TotalBsmtSF','1stFlrSF','2ndFlrSF']].sum(axis=1)
#Creating Feature:- PorchArea
Data['PorchArea'] = Data[['OpenPorchSF','EnclosedPorch','3SsnPorch','ScreenPorch']].sum(axis=1)

#Dropping Already Concatenated Features
Data.drop(['TotalBsmtSF','1stFlrSF','2ndFlrSF','OpenPorchSF','EnclosedPorch','3SsnPorch','ScreenPorch'],axis=1, inplace=True)

In [None]:
Data.head()

In [None]:
Data.shape

In [None]:
#Checking Null Values
[np.count_nonzero(Data[x]) for x in Data.columns]

Analysis from Null Values:- Very Less Non-Zero Cells in Features `PoolArea` & `BsmtHalfBath`.
But We'll Keep them because we are concatenating features yet again.

#### Concatenating Features - Part-2
- Concatenating Rooms Feature:- `BsmtFullBath`,`BsmtHalfBath`,`FullBath`,`HalfBath`,`BedroomAbvGr` with half rooms as half weights.
- Concatenating Amenities Feature:- `GarageArea`,`WoodDeckSF`,`PoolArea`,`PorchArea`


In [None]:
#Changing Column Datatypes to Float for Weight-wise Room addition
Data = Data.astype({'BsmtFullBath':'float','BsmtHalfBath':'float','FullBath':'float','HalfBath':'float','BedroomAbvGr':'float'})
print(Data.dtypes)

In [None]:
#Creating Feature:- Rooms
Data['BsmtHalfBath'] = Data['BsmtHalfBath']*0.5
Data['HalfBath'] = Data ['HalfBath']*0.5
Data.head()

In [None]:
Data['Rooms']= Data[['BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','BedroomAbvGr']].sum(axis=1)
Data.drop(['BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','BedroomAbvGr'],axis=1, inplace=True)
Data.head()

In [None]:
#Creating Feature:- Amenities
Data['Amenities']= Data[['GarageArea','WoodDeckSF','PoolArea','PorchArea']].sum(axis=1)
Data.drop(['GarageArea','WoodDeckSF','PoolArea','PorchArea'],axis=1, inplace=True)

In [None]:
Data.head()

In [None]:
#Reordering the Columns
Data= Data.iloc[:, [0,2,4,3,1]]
Data.head()

In [None]:
#Checking Null Values Again
[np.count_nonzero(Data[x]) for x in Data.columns]

Analysis of Null Values:- Null Values have been Sufficiently Handled.