# Linear Regression

![](https://thumbs.gfycat.com/GlisteningUntriedIberianchiffchaff-size_restricted.gif)

#### It is supervised machine learning algorithm used for perofroming regression tasks.
- **Regression**: Means to predict a Quantitive variable.\
- **Loss Function**: 
    - ### Mean Squarred Error : It  measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.
- For optimisation of the loss:
    - Gradient Descent: It is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent is simply used in machine learning to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible.
- For checking up the measure of fit :
    - R2 Statistics: R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable .
    

In [None]:
import numpy as np
import scipy.stats as s
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly as py
sns.set_style('darkgrid')
from tqdm.notebook import tqdm_notebook


In [None]:
raw_data = pd.read_csv("../input/insurance/insurance.csv")

In [None]:
raw_data.head(10)

In [None]:
raw_data.describe()

In [None]:
raw_data.info()

In [None]:
raw_data.isnull().sum()

In [None]:
raw_data['sex'].replace(to_replace=['male','female'],value=[-1,1],inplace=True)
raw_data['smoker'].replace(to_replace=['yes','no'],value=[-1,1],inplace=True)

In [None]:
raw_data['region'].replace(to_replace=np.unique(raw_data.region),value=[0,1,2,3],inplace=True)


# Data Visualisation

## Age

In [None]:
fig,axes = plt.subplots(1,2,figsize=(15,6))

sns.kdeplot(raw_data.age,shade=True,ax=axes[0])
sns.distplot(raw_data.age,label="Age",hist=True,ax=axes[1])
fig.show()

In [None]:
fig,axes = plt.subplots(1,2,figsize=(15,6))

sns.kdeplot(raw_data.bmi,shade=True,ax=axes[0])
sns.distplot(raw_data.bmi,label="Age",hist=True,ax=axes[1])
fig.show()

In [None]:
fig,axes = plt.subplots(1,3,figsize=(19,5))
sns.distplot(raw_data[(raw_data.smoker == -1)]["charges"],ax=axes[0])
axes[0].set_title('Distribution of charges for smokers')

sns.distplot(raw_data[(raw_data.smoker == 1)]['charges'],ax=axes[1])
axes[1].set_title('Distribution of charges for non-smokers')

sns.countplot(x="smoker",data=raw_data,ax=axes[2])
axes[2].set_title("Countplot of Smokers v/s Non Smokers")
fig.text(0.35,1,"Smokers v/s Non Smokers",{'fontname':'Serif', 'weight':'bold','color': 'black', 'size':35})
fig.show()

In [None]:
fig,axes = plt.subplots(1,3,figsize=(19,5))
sns.distplot(raw_data[(raw_data.sex == -1)]["charges"],ax=axes[0])
axes[0].set_title('Distribution of charges for Males')

sns.distplot(raw_data[(raw_data.sex == 1)]['charges'],ax=axes[1])
axes[1].set_title('Distribution of charges for Females')

sns.countplot(x="smoker",data=raw_data,ax=axes[2])
axes[2].set_title("Countplot of Males v/s Females")
fig.text(0.35,1,"Males v/s Females",{'fontname':'Serif', 'weight':'bold','color': 'black', 'size':35})
fig.show()

In [None]:
fig,axes = plt.subplots(1,5,figsize=(26,9))
for i in range(0,4):
    sns.distplot(raw_data[(raw_data.region == raw_data.region.unique()[i])]["charges"],ax=axes[i])
    axes[i].set_title(f'Distribution of charges for {raw_data.region.unique()[i]}')

sns.countplot(x="region",data=raw_data,ax=axes[4])
axes[2].set_title("Countplot of People Regionwise")
fig.text(0.35,1,'Regionwise Analysis of Charges',{'fontname':'Serif', 'weight':'bold','color': 'black', 'size':35})
fig.show()

In [None]:
c = ['age','bmi','children','charges']
fig,axes = plt.subplots(1,4,figsize=(26,7))
for i in range(0,4):
    sns.boxplot(x = c[i],data = raw_data,ax=axes[i])

In [None]:
corr = raw_data.corr()
sns.heatmap(corr,annot=True)

In [None]:
raw_data['bmi'] = (raw_data['bmi'] - raw_data['bmi'].mean()) / raw_data['bmi'].std()


In [None]:
def train_test_split(df, split_ratio = 0.8,seed = 42):
    """
    Split the dataset into train and test dataset.
    Input:
    df-> dataset to be split.
    split_ratio -> ratio to split the dataset.
    seed -> random state to use for the random shuffling of the dataset.
    Output:
    trainX,trainY-> train dataset containing features and train labels.
    testX,testY -> test dataset containig features and test labels.
    """
    
    trainX = df.sample(frac=split_ratio,random_state = seed)
    testX = df.drop(trainX.index)
    trainX = trainX.reset_index(drop=True)
    testX = testX.reset_index(drop=True)
    trainY = trainX.charges.values
    trainY = trainY.reshape(trainY.shape[0],1)
    testY = testX.charges.values
    testY = testY.reshape(testY.shape[0],1)
    trainX.drop("charges",axis=1,inplace=True)
    trainX.drop('children',axis=1,inplace=True)
    trainX.drop('sex',axis=1,inplace=True)
    trainX.drop('region',axis=1,inplace=True) 
    testX.drop('children',axis=1,inplace=True)
    testX.drop('sex',axis=1,inplace=True)
    testX.drop('region',axis=1,inplace=True)
    testX.drop("charges",axis=1,inplace=True)
    return trainX,trainY,testX,testY


In [None]:
trainX,trainY,testX,testY = train_test_split(raw_data)

In [None]:
def data_preparation(train,test):
    '''
    Appends 1 to both train and test dataset so that the theta_0 and theta_1 can be combined into theta.
    train-> train dataset
    test -> test dataset
    '''
    train = np.array(train)
    test = np.array(test)
    trainX = np.array(list(map(lambda x: np.append([1],x) , train)))
    testX = np.array(list(map(lambda x: np.append([1],x) , test)))
    return trainX,testX

In [None]:
trainX,testX = data_preparation(trainX,testX)

Line :
$Y = \theta_0 + \theta_1.X$

#### MSE(Mean Sqaurred Error):
It is the average of squared error occurred between the predicted values and actual values. It can be written as:
$$ MSE = \frac{1}{2.n}\sum_{i=0}^{n} (Y_{true} - Y_{pred})^2$$

In [None]:
def mse(train,theta_final,train_labels):
    '''
    Calculates the mean squared error for the algorithm.
    Input:
    train-> train dataset
    theta_final -> Final features
    train_labels-> training labels(correct answers)
    Return:
    Value of Mean Sqaured error. 
    '''
    m = train.shape[0]
    Y_hat = np.matmul(train,theta_final)
    return (1/(2*m)) * np.sum(np.square(Y_hat - train_labels))


In [None]:

def delta(train,train_labels,p):
    '''
    Calculate the derivative of the loss function to required to perform gradient descent.
    Inputs:
    train-> train_dataset
    train_labels-> ground truth labels for train dataset.
    p -> theta (so as to get the final gradient)
    Return: Value of the derivative.
    '''
    m = train.shape[0]
    y_hat = np.dot(train, p)
    a = np.dot(train.T,(y_hat - train_labels))
    return  (2/m)*a

In [None]:
theta_initial = np.zeros((trainX.shape[1],1))
theta_final =  np.zeros((trainX.shape[1],1))
m = trainX.shape[0]
iterations = []
residual_points = [0]
lr = 10 ** (-1)

# Gradient Descent 
![](https://miro.medium.com/max/1400/1*OG1d4edy5BFYeQ0yHjBOJA.gif)

### Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent is simply used in machine learning to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible.

In [None]:
i = 0
for p in tqdm_notebook(range(140478)):
    theta_final = theta_initial - (lr) * delta(trainX,trainY,theta_initial) * (1/ m)
    E = int(mse(trainX,theta_final,trainY))
    residual_points.append(E)
    if residual_points[i] == residual_points[i-1] :
        break

    iterations.append(i)
    theta_initial = theta_final
    i +=1

### R2 Statistics is the way of calculating the fit for our model .
$$R2 = 1 - \frac{RSS}{TSS}$$
Here,
- **RSS : Residual error sum of squares =**   $\sum_{i=1}^{N} (predictedy_i - mean_y)^2$
 
- **TSS : Total sum of squares =**  $\sum_{i=1}^{N} (y_i - mean_y)^2 $

In [None]:
def R2_Statistics(theta_final):
    sst = np.sum((testY-testY.mean())**2)
    ssr = np.sum((np.matmul(testX,theta_final)-testY)**2)
    r2 = 1-(ssr/sst)
    return(r2)

### Now if R2 is closer to 1 that means our model is accurately explain the variability of data and if it is closer to 0 then it mean it is not a good fit
$$ 0 \leq R2 \leq 1 $$

In [None]:
R2_Statistics(theta_final)

In [None]:
plt.scatter(x = iterations,y=residual_points[1:])

Here , It is clearly visible that the residual error is decreasing with each iterations

If you like my work please upvote and please also check out my other notebooks:
- [SurvivingTheTitanic](https://www.kaggle.com/govindsrathore/survivingthetitanic)
- [HeartAttackAnalysis](https://www.kaggle.com/govindsrathore/heart-attack-analysis-prediction-91-acc)
- [PneumoniaChestXray](https://www.kaggle.com/govindsrathore/vgg-transfer-learning-data-augmentation-94-acc)