![House](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQYe3sqKwz1VPqm3fO_r1LRrBN8tXsYlWY5ww&usqp=CAU)

# **Introduction**

**Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.**

**With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# importing different libraries
import seaborn as sns   
from matplotlib import pyplot as plt

In [None]:
# importing data
train=pd.read_csv("../input/home-data-for-ml-course/train.csv")
test=pd.read_csv("../input/home-data-for-ml-course/test.csv")
submission=pd.read_csv("../input/home-data-for-ml-course/sample_submission.csv")

In [None]:
# Reading data
train.head(5)

In [None]:
# check non null counta and data
train.info()

In [None]:
#total null values
train.isnull().sum().sum()

# **Correlation Of Different Features**

In [None]:
# check correlation b/w columns and heat map 
train_corr = train.corr()
plt.figure(figsize=(22,22))
sns.heatmap(train_corr, vmin=-1, vmax=1, cmap="viridis", annot=True, linewidth=0.1)

In [None]:
# taking Main Features having correlation more than 0.45
main_features = ['LotArea','OverallQual', 'YearBuilt', 'YearRemodAdd', '1stFlrSF', 'GrLivArea', 'FullBath', 'Fireplaces', 'TotRmsAbvGrd', 'OpenPorchSF']
#(, 'GarageYrBlt', 'TotalBsmtSF', 'GarageCars', 'GarageArea')

# **Scatter plots**

In [None]:
# new varibales to different features named 'feature1,feature2,......,feature10
saleprice = train['SalePrice']
#LotArea: Lot size in square feet
feature1= train['LotArea']
#OverallQual: Rates the overall material and finish of the house
feature2= train['OverallQual']
#YearBuilt: Original construction date
feature3= train['YearBuilt']
#YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
feature4= train['YearRemodAdd']
#1stFlrSF: First Floor square feet
feature5= train['1stFlrSF']
#GrLivArea: Above grade (ground) living area square feet
feature6= train['GrLivArea']
#FullBath: Full bathrooms above grade
feature7= train['FullBath']
#Fireplaces: Number of fireplaces
feature8= train['Fireplaces']
#TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
feature9= train['TotRmsAbvGrd']
#OpenPorchSF: Open porch area in square feet
feature10= train['OpenPorchSF']

In [None]:
#LotArea: Lot size in square feet
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature1,color='#808080')
plt.title('Scatterplot for Saleprice and LotArea')
plt.show()

In [None]:
#OverallQual: Rates the overall material and finish of the house
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature2,color='#808080')
plt.title('Scatterplot for Saleprice and OverallQual')
plt.show()

In [None]:
#YearBuilt: Original construction date
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature3,color='#808080')
plt.title('Scatterplot for Saleprice and YearBuilt')
plt.show()

In [None]:
#YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature4,color='#808080')
plt.title('Scatterplot for Saleprice and YearRemodAdd')
plt.show()

In [None]:
#1stFlrSF: First Floor square feet
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature5,color='#808080')
plt.title('Scatterplot for Saleprice and 1stFlrSF')
plt.show()

In [None]:
#GrLivArea: Above grade (ground) living area square feet
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature6,color='#808080')
plt.title('Scatterplot for Saleprice and GrLivArea')
plt.show()

In [None]:
#FullBath: Full bathrooms above grade
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature7,color='#808080')
plt.title('Scatterplot for Saleprice and FullBath')
plt.show()

In [None]:
#Fireplaces: Number of fireplaces
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature8,color='#808080')
plt.title('Scatterplot for Saleprice and Fireplaces')
plt.show()

In [None]:
#TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature9,color='#808080')
plt.title('Scatterplot for Saleprice and TotRmsAbvGrd')
plt.show()

In [None]:
#OpenPorchSF: Open porch area in square feet
plt.figure(figsize=(10,7))
sns.scatterplot(saleprice,feature10,color='#808080')
plt.title('Scatterplot for Saleprice and OpenPorchSF')
plt.show()

# **Plotting Graphs**

**Plotting a bargraph of values we got from above(features named 'feature1,feature2,......,feature10)**

In [None]:
plt.figure(figsize=(10,7))
train['LotArea'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("LotArea")
plt.ylabel("Count")
plt.title('Value Counts Graph For LotArea')
plt.show()

In [None]:
 #10    Very Excellent
 #9    Excellent
 #8    Very Good
 #7    Good 
 #6    Above Average
 #5    Average
 #4    Below Average
 #3    Fair
 #2    Poor
 #1    Very Poor
plt.figure(figsize=(10,7))
train['OverallQual'].value_counts().plot.bar(color='#808080')
plt.xlabel("LotArea")
plt.ylabel("Count")
plt.title('Value Counts Graph For LotArea')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['YearBuilt'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("LotArea")
plt.ylabel("Count")
plt.title('Value Counts Graph For LotArea')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['YearRemodAdd'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("LotArea")
plt.ylabel("Count")
plt.title('Value Counts Graph For LotArea')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['1stFlrSF'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("LotArea")
plt.ylabel("Count")
plt.title('Value Counts Graph For LotArea')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['GrLivArea'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("GrLivArea")
plt.ylabel("Count")
plt.title('Value Counts Graph For GrLivArea')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['FullBath'].value_counts().plot.bar(color='#808080')
plt.xlabel("FullBath")
plt.ylabel("Count")
plt.title('Value Counts Graph For FullBath')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['Fireplaces'].value_counts().plot.bar(color='#808080')
plt.xlabel("Fireplaces")
plt.ylabel("Count")
plt.title('Value Counts Graph For Fireplaces')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['TotRmsAbvGrd'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("TotRmsAbvGrd")
plt.ylabel("Count")
plt.title('Value Counts Graph For TotRmsAbvGrd')
plt.show()

In [None]:
plt.figure(figsize=(10,7))
train['OpenPorchSF'].value_counts().head(10).plot.bar(color='#808080')
plt.xlabel("OpenPorchSF")
plt.ylabel("Count")
plt.title('Value Counts Graph For OpenPorchSF')
plt.show()

# **Model building**

In [None]:
# main features for model building
train_new= train[main_features].copy()
test_new = test[main_features].copy()
y= train.SalePrice

In [None]:
# train_test_split to check the best model 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(train_new,y, train_size=0.8, test_size=0.2, random_state=0)

**LinearRegression**

In [None]:
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(X_train, y_train)
pred=regr.predict(X_test)
print(regr.score(X_test, y_test))

**RandomForestRegressor**

In [None]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 150, max_depth=12, n_jobs=4, random_state = 0)
regressor.fit(X_train, y_train)  
pred=regressor.predict(X_test)
print(regressor.score(X_test, y_test))

**DecisionTreeRegressor**

In [None]:
from sklearn.tree import DecisionTreeRegressor
regressorTree = DecisionTreeRegressor(random_state = 0)
regressorTree.fit(X_train, y_train)  
pred=regressorTree.predict(X_test)
print(regressorTree.score(X_test, y_test))

# **Submission**

**RandomForest Regressor gives more accuracy**

In [None]:
test_new=test[main_features].copy()

In [None]:
regressor.fit(train_new,y)
pred_test=regressor.predict(test_new)

In [None]:
# final submission 
my_submission = pd.DataFrame({'Id':test.Id, 'SalePrice': pred_test})
my_submission.to_csv('submission.csv', index=False)

In [None]:
print(my_submission)