# Simple Linear Regression

##  Introduction

Simple Linear Regression is a supervised machine learning algorithm used to model the relationship between one independent variable (feature) and one dependent variable (target).

It attempts to fit a straight line that best represents the relationship between the two variables by minimizing the sum of squared errors (Ordinary Least Squares method).

---

##  Mathematical Equation

y = β₀ + β₁x

Where:
- y → Dependent variable (target)
- x → Independent variable (feature)
- β₀ → Intercept
- β₁ → Slope (coefficient)

---

##  Objective

The goal is to find optimal values of β₀ and β₁ such that the prediction error is minimized.

---

##  Assumptions

- Linear relationship between x and y
- Independence of errors
- Constant variance of errors (Homoscedasticity)
- Errors are normally distributed

---

##  Applications

- House price prediction (based on area)
- Salary prediction (based on experience)
- Sales forecasting
- Trend analysis

### Import Requirements 

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

### Load Dataset 

In [3]:
df=pd.read_csv("bangalore house price prediction OHE-data.csv")

In [4]:
df.sample(5)

Unnamed: 0,bath,balcony,price,total_sqft_int,bhk,price_per_sqft,area_typeSuper built-up Area,area_typeBuilt-up Area,area_typePlot Area,availability_Ready To Move,...,location_Kalena Agrahara,location_Horamavu Agara,location_Vidyaranyapura,location_BTM 2nd Stage,location_Hebbal Kempapura,location_Hosur Road,location_Horamavu Banaswadi,location_Domlur,location_Mahadevpura,location_Tumkur Road
2791,3.0,2.0,115.0,2087.01,3,5510.275466,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5953,3.0,0.0,180.0,2250.0,3,8000.0,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1405,4.0,2.0,100.0,1991.0,3,5022.601708,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3627,2.0,3.0,43.0,1215.0,2,3539.09465,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
936,2.0,1.0,68.0,1215.0,2,5596.707819,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0


### Extaracted Important Features form Dataset 

In [5]:
df2=df[['total_sqft_int','bhk','price_per_sqft','price']]
df2.head()

Unnamed: 0,total_sqft_int,bhk,price_per_sqft,price
0,1672.0,3,8971.291866,150.0
1,1750.0,3,8514.285714,149.0
2,1750.0,3,8571.428571,150.0
3,1250.0,2,3200.0,40.0
4,1200.0,2,6916.666667,83.0


In [6]:
X=df2.drop('price',axis=1)
y=df2['price']
print("X shape",X.shape)
print("Y shape",y.shape)

X shape (7120, 3)
Y shape (7120,)


### Train Test Split 

In [7]:
# Train test split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [8]:
# Check the shape of split data 
print("shape of X_train",X_train.shape)
print("shape of X_train",X_test.shape)
print("shape of X_train",y_train.shape)
print("shape of X_train",y_test.shape)

shape of X_train (5696, 3)
shape of X_train (1424, 3)
shape of X_train (5696,)
shape of X_train (1424,)


### Apply Standard Scaler 

In [9]:
sc=StandardScaler()

In [10]:
X_train=sc.fit_transform(X_train) 
X_train

array([[ 0.07232582,  0.62913924, -0.24561231],
       [ 0.17729421,  0.62913924,  2.91350758],
       [-0.50500027, -0.55044507, -1.34159041],
       ...,
       [-0.35279612, -0.55044507, -0.75276603],
       [ 0.98030232,  0.62913924,  4.16584411],
       [-0.12186568, -0.55044507, -0.22296363]])

In [11]:
X_test=sc.transform(X_test)
X_test

array([[-0.34334897, -0.55044507, -0.72383255],
       [-0.27826857, -0.55044507, -0.38548539],
       [-0.29716288, -0.55044507, -0.19243527],
       ...,
       [-0.35909422, -0.55044507, -0.37516274],
       [-0.69604273, -0.55044507, -1.07509659],
       [-0.40842936, -0.55044507, -0.56106947]])

### Make Dataframe of X_train

In [12]:
X_train_df=pd.DataFrame(X_train,columns=['total_sqft_int','bhk','price_per_sqft'])

In [13]:
X_train_df.head()

Unnamed: 0,total_sqft_int,bhk,price_per_sqft
0,0.072326,0.629139,-0.245612
1,0.177294,0.629139,2.913508
2,-0.505,-0.550445,-1.34159
3,-0.609969,-0.550445,0.294986
4,-0.20584,-0.550445,-0.489552


### Linear Regression 

In [14]:
lr=LinearRegression()

In [15]:
lr.fit(X_train,y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


In [16]:
lr.coef_

array([ 80.58173653, -11.19291962,  57.91326273])

In [17]:
lr.intercept_

96.10553809691012

### Check Prediction 

In [34]:
y_pred=lr.predict(X_train)
y_pred

array([ 80.66756312, 272.08103816, -16.12305126, ...,  30.2425645 ,
       409.31572142,  79.53392584])

In [26]:
y_pred=lr.predict([[-1.554608,-2.167491,-2.323633]])

In [39]:
y_pred

array([ 80.66756312, 272.08103816, -16.12305126, ...,  30.2425645 ,
       409.31572142,  79.53392584])

In [41]:
lr.score(X_test,y_test)

0.9046168660684996