# R- Squared Regression Analysis:

![](https://miro.medium.com/max/710/0*M75Q957hPBZm57-_)

## Outline:
* What is Residual?
* What is R Square?
* Formula?
* What is goodness of fit?
* Best R-sqrt score?
* Practical

## What is Residual?
```
Residual = Observed value -  Regression Line Fitted Value = (Y - Y^)

```
* Minimum the Residual is the best.

## What is R-Square?
* It is a statistical measure to how close the data are to the fitted regression line. It is also known as the `coefficient of determination`, or `coefficient of multiple determination` for Multiple Regression.
* 0 < value of R-Square < 1
* 1 means 100% - The Best score
* 0 means 0% - The worst score

## Formula:
<img src='https://cdn-images-1.medium.com/max/1600/0*nMzUDuKtVtzKgESx.png' width='400' hight='400'>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<img src='https://th.bing.com/th/id/OIP.FuWG6941DytF8jJtCALxFAAAAA?pid=ImgDet&rs=1' width='400' hight='400'>

## Practical: We use the Banglore House Prediction cleaned data

### Step 1: Loading the libraries:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Step 2: Loading The Dataset:

In [2]:
df=pd.read_csv('Cleaned Banglore House Price Prediction Data.csv')

In [3]:
df.head()

Unnamed: 0,area_type,availability,location,bath,balcony,price,new_size,new_total_sqft,price_per_sqft
0,3,38,341,2.0,1.0,39.07,2,1056.0,3699.810606
1,2,77,251,5.0,3.0,120.0,4,2600.0,4615.384615
2,0,77,964,2.0,3.0,62.0,3,1440.0,4305.555556
3,3,77,629,3.0,1.0,95.0,3,1521.0,6245.890861
4,3,77,592,2.0,1.0,51.0,2,1200.0,4250.0


#### Note:
* This is our cleaned data we have cleaned this in Linera Regression Model.
* Now again we use `Linear Regression` use perform the `Ridge & Lasso` techniques.

### Step 3: Performing The Linear Regression Method:

In [4]:
# Spliting the dataset into Target (y) and Features(X) variables:
X=df.drop('price',axis=1)
y=df['price']

print('Shape of X:', X.shape)
print('Shape of y:',y.shape)

Shape of X: (11018, 8)
Shape of y: (11018,)


In [5]:
from sklearn.model_selection import train_test_split

In [6]:
# Spliting the data into train & test part:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.2, random_state=6)

print('Shape of X_train = ', X_train.shape)
print('Shape of y_train = ', y_train.shape)
print('Shape of X_test = ', X_test.shape)
print('Shape of y_test = ', y_test.shape)

Shape of X_train =  (8814, 8)
Shape of y_train =  (8814,)
Shape of X_test =  (2204, 8)
Shape of y_test =  (2204,)


In [7]:
'''
Feature scalling of the dataset: We'll normalize the different numerica value of the data 
                                 so that machine can easly understand.
'''
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
sc.fit(X_train)
X_train=sc.transform(X_train)
X_test=sc.transform(X_test)

In [8]:
# Linear Regression ML Model Training:
from sklearn.linear_model import LinearRegression
lr=LinearRegression()

lr.fit(X_train,y_train)

LinearRegression()

In [11]:
'''Exploring The Trained Data
'''
# Seeing the coefficent values:
lr.coef_

array([ 0.15192576, -0.42192927, -0.22437487, -0.17351644,  0.15535644,
       -4.40445112, 55.26726976, 25.18508851])

In [12]:
# Seeing the intersect value:
lr.intercept_

82.08727989562063

In [13]:
# Testing the model:
lr.predict(X_test)

array([107.6745738 ,  34.500194  ,  60.94472328, ...,  82.4907132 ,
       147.67319159, 163.35794178])

In [14]:
# Verifing the actual values:
y_test

4064     104.00
7855      42.81
10856     61.11
4491      47.00
9184      33.50
          ...  
3991      75.00
26        57.39
9239      80.00
8944     150.00
10194    170.00
Name: price, Length: 2204, dtype: float64

In [15]:
# Checking the accuracy of the model:
lr.score(X_test,y_test)

0.94471512931249

### Conclusion:
* Our model is give us `94.4715%` accureacy which is impressive.
* Now we need to evaluate the model.

### Step 4: Model Evaluation:

In [16]:
# Predict the price (y^)
y_pred = lr.predict(X_test)
y_pred

array([107.6745738 ,  34.500194  ,  60.94472328, ...,  82.4907132 ,
       147.67319159, 163.35794178])

## R-Square:

In [17]:
# impoting the `r2_score` from `sklearn.metrics`:
from sklearn.metrics import r2_score

In [18]:
r2_score(y_test,y_pred)

0.94471512931249

### Conclusion:
* Our model's `r2 is approx equal to 1 i.e., 0.9447` which is good.

```
Thanking You!
```