# House Price Prediction with Linear Regression

## Introduction

Welcome to the House Price Prediction project, where we employ a professional approach to develop machine learning model using Linear Regression. This project focuses on predicting house prices by leveraging key features such as square footage, the number of bedrooms, and the number of bathrooms.

## Objectives

The primary goal is to create an accurate and reliable predictive model that assists in estimating property values. By utilizing advanced techniques in linear regression, we aim to deliver a high-quality solution for house price prediction.


<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#">About the dataset</a></li>
        <li><a href="#">Data Visualization and Analysis</a></li>
        <li><a href="#">Simple Linear Regression</a></li>
    </ol>
</div>
<br>
<hr>


In [7]:
# pip install -r requirements.txt


Load Required Libraries

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
%matplotlib inline

# About The Dataset

<summary><strong>Data Description</strong></summary>

## Overview

This dataset contains information related to housing properties with various features. The primary objective is to predict the property's sale price in dollars, which serves as the target variable for the prediction task.

## Data Description

- **SalePrice:** The property's sale price in dollars (target variable).
- **Bedroom:** Number of bedrooms above basement level.
- **TotalBsmtSF:** Total square feet of basement area.
- **BsmtFullBath:** Basement full bathrooms.
- **Neighborhood:** Physical locations within Ames city limits.
- **Street:** Type of road access.
- **HouseStyle:** Style of dwelling.
- **GarageArea:** Size of the garage in square feet.
- **Kitchen:** Number of kitchens.
- **MoSold:** Month Sold.
- **SaleType:** Type of sale.
- **SaleCondition:** Condition of sale.


<details>

- **SalePrice:** The property's sale price in dollars (target variable).
- **MSSubClass:** The building class.
- **MSZoning:** The general zoning classification.
- **LotFrontage:** Linear feet of street connected to the property.
- **LotArea:** Lot size in square feet.
- **Street:** Type of road access.
- **Alley:** Type of alley access.
- **LotShape:** General shape of the property.
- **LandContour:** Flatness of the property.
- **Utilities:** Type of utilities available.
- **LotConfig:** Lot configuration.
- **LandSlope:** Slope of the property.
- **Neighborhood:** Physical locations within Ames city limits.
- **Condition1:** Proximity to the main road or railroad.
- **Condition2:** Proximity to the main road or railroad (if a second is present).
- **BldgType:** Type of dwelling.
- **HouseStyle:** Style of dwelling.
- **OverallQual:** Overall material and finish quality.
- **OverallCond:** Overall condition rating.
- **YearBuilt:** Original construction date.
- **YearRemodAdd:** Remodel date.
- **RoofStyle:** Type of roof.
- **RoofMatl:** Roof material.
- **Exterior1st:** Exterior covering on the house.
- **Exterior2nd:** Exterior covering on the house (if more than one material).
- **MasVnrType:** Masonry veneer type.
- **MasVnrArea:** Masonry veneer area in square feet.
- **ExterQual:** Exterior material quality.
- **ExterCond:** Present condition of the material on the exterior.
- **Foundation:** Type of foundation.
- **BsmtQual:** Height of the basement.
- **BsmtCond:** General condition of the basement.
- **BsmtExposure:** Walkout or garden level basement walls.
- **BsmtFinType1:** Quality of basement finished area.
- **BsmtFinSF1:** Type 1 finished square feet.
- **BsmtFinType2:** Quality of the second finished area (if present).
- **BsmtFinSF2:** Type 2 finished square feet.
- **BsmtUnfSF:** Unfinished square feet of basement area.
- **TotalBsmtSF:** Total square feet of basement area.
- **Heating:** Type of heating.
- **HeatingQC:** Heating quality and condition.
- **CentralAir:** Central air conditioning.
- **Electrical:** Electrical system.
- **1stFlrSF:** First Floor square feet.
- **2ndFlrSF:** Second floor square feet.
- **LowQualFinSF:** Low-quality finished square feet (all floors).
- **GrLivArea:** Above grade (ground) living area square feet.
- **BsmtFullBath:** Basement full bathrooms.
- **BsmtHalfBath:** Basement half bathrooms.
- **FullBath:** Full bathrooms above grade.
- **HalfBath:** Half baths above grade.
- **Bedroom:** Number of bedrooms above basement level.
- **Kitchen:** Number of kitchens.
- **KitchenQual:** Kitchen quality.
- **TotRmsAbvGrd:** Total rooms above grade (does not include bathrooms).
- **Functional:** Home functionality rating.
- **Fireplaces:** Number of fireplaces.
- **FireplaceQu:** Fireplace quality.
- **GarageType:** Garage location.
- **GarageYrBlt:** Year the garage was built.
- **GarageFinish:** Interior finish of the garage.
- **GarageCars:** Size of the garage in car capacity.
- **GarageArea:** Size of the garage in square feet.
- **GarageQual:** Garage quality.
- **GarageCond:** Garage condition.
- **PavedDrive:** Paved driveway.
- **WoodDeckSF:** Wood deck area in square feet.
- **OpenPorchSF:** Open porch area in square feet.
- **EnclosedPorch:** Enclosed porch area in square feet.
- **3SsnPorch:** Three-season porch area in square feet.
- **ScreenPorch:** Screen porch area in square feet.
- **PoolArea:** Pool area in square feet.
- **PoolQC:** Pool quality.
- **Fence:** Fence quality.
- **MiscFeature:** Miscellaneous feature not covered in other categories.
- **MiscVal:** $Value of miscellaneous features.
- **MoSold:** Month Sold.
- **YrSold:** Year Sold.
- **SaleType:** Type of sale.
- **SaleCondition:** Condition of sale.

</details>


In [22]:
df = pd.read_csv("data/train.csv")
cdf = df[['Bedroom','TotalBsmtSF','GarageArea','Kitchen','MoSold','SalePrice']]
cdf.head()

Unnamed: 0,Bedroom,TotalBsmtSF,GarageArea,Kitchen,MoSold,SalePrice
0,3,856,548,1,2,208500
1,3,1262,460,1,5,181500
2,3,920,608,1,9,223500
3,3,756,642,1,2,140000
4,4,1145,836,1,12,250000


In [45]:
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
x = np.asanyarray(cdf[['Bedroom','TotalBsmtSF','GarageArea','Kitchen','MoSold']])
y = np.asanyarray(cdf[['SalePrice']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)
from sklearn.metrics import r2_score
print("R2-score: %.2f" % r2_score(y , regr.predict(x)) )
test = pd.read_csv("data/test.csv")
test = test[['Bedroom','TotalBsmtSF','GarageArea','Kitchen','MoSold']]
test_x = np.asanyarray(test)
test_y = regr.predict(test_x)


Coefficients:  [[ 13830.01241858     71.9853757     153.39571612 -39953.11382728
     763.21692321]]
Intercept:  [29594.68374904]
R2-score: 0.54


ValueError: Input X contains NaN.
LinearRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values