## **Kolkata Rental Price using Linear Regression**

Hello friends,
In this kernel, I will discuss the rental prices and understand the key factors behind their cost of Kolkata city by using a famous supervised machine learning algorithm "Linear Regression".

**If this kernel helped in your learning, then please UPVOTE – because they are the source of motivation!**

### Content
**Key Independent Features include-**

1. Number of bedrooms
2. Number of bathrooms
3. Layout
4. Area
5. Furnished Status
6. Type of Property
7. Type of Seller

**By using the above independent features we will build our Linear Regression Model for rental price prediction.**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [None]:
# for Linear Regression
from statsmodels.api import OLS

In [None]:
kol= pd.read_csv('../input/house-rent-prices-of-metropolitan-cities-in-india/Kolkata_rent.csv')
kol.head()

In [None]:
kol.info()

We have to convert price from comma seperated object dtype to float dtype

In [None]:
kol.price= kol.price.str.replace(',','').astype(float)

In [None]:
# null values percentage
(kol.isnull().sum()/len(kol.index))*100

In [None]:
# dropping null values
kol= kol.dropna()

In [None]:
kol.shape

In [None]:
from sklearn import preprocessing

In [None]:
# Label Encoding the necessary columns
le= preprocessing.LabelEncoder()

In [None]:
kol.seller_type.value_counts()

In [None]:
#AGENT    0
#BUILDER  1
#OWNER    2 
kol.seller_type= le.fit_transform(kol.seller_type)

In [None]:
kol.layout_type.value_counts()

In [None]:
#BHK 0    
#RK  1
kol.layout_type= le.fit_transform(kol.layout_type)

In [None]:
kol.property_type.value_counts()

In [None]:
#Apartment            0
#Independent Floor    1
#Independent House    2
#Studio Apartment     3
#Villa                4 
kol.property_type= le.fit_transform(kol.property_type)

In [None]:
kol.locality.unique()

There are a lot of unique values of locality, it's better to drop this feature

In [None]:
kol.drop('locality',1,inplace=True)

In [None]:
kol.furnish_type.value_counts()

In [None]:
#Unfurnished       2
#Semi-Furnished    1
#Furnished         0
kol.furnish_type= le.fit_transform(kol.furnish_type)

In [None]:
kol.bathroom.value_counts()

In [None]:
# clubbing low frequency together and replacing it with null value
kol.bathroom= kol.bathroom.replace(['5 bathrooms','6 bathrooms','7 bathrooms','8 bathrooms','9 bathrooms','16 bathrooms','East facing','SouthEast facing','NorthEast facing','SouthWest facing','South facing','NorthWest facing'],np.nan)

In [None]:
kol.bathroom.value_counts(dropna=False)

We can drop NaN values because they are very few in numbers

In [None]:
kol= kol.dropna()

In [None]:
kol.bathroom.value_counts(dropna=False)

In [None]:
# removing bathrooms suffix from the dataset
kol.bathroom= kol.bathroom.apply(lambda x: x.strip('bathrooms'))

In [None]:
kol.bathroom= kol.bathroom.astype('int64')

In [None]:
kol.head()

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
scaler= MinMaxScaler()

In [None]:
var= ['price','area','seller_type','bedroom','layout_type','property_type','furnish_type','bathroom']

In [None]:
# scaling the features
kol[var]= scaler.fit_transform(kol[var])

In [None]:
kol.head()

In [None]:
kol.describe(percentiles=[0.25,0.50,0.75,0.90,0.99]).T

### Model Building

In [None]:
y= kol.pop('price')
X= kol

In [None]:
# train_test_split
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test= train_test_split(X,y, test_size=0.3, random_state=100)

In [None]:
X_train.head()

In [None]:
y_train.head()

In [None]:
# model
model= OLS(y_train, X_train).fit()

In [None]:
print(model.summary())