# Linear Regression.
---
Here we will implement Linear Regression from scratch, we will use a dataset consists of 1300+ records containing a person's medical data and the target is the "charge" column.  
The goal is to predict the charge for the new persons using the Linear Regression model.  
This dataset can be downloaded from [here](https://www.kaggle.com/mirichoi0218/insurance).   


# The imports.

In [5]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Load the data.

In [6]:
dataset = pd.read_csv("insurance.csv")
dataset

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


# Divide the data into input/feature and output/target.

In [7]:
input_data = dataset.iloc[:, : -1].values
output_data = dataset.iloc[:, 6].values

# Preprocessing on data.
In this dataset, we don't have null values but (sex, smoker, region) columns need to be encoded as we will work with equations.

In [8]:
encoder = LabelEncoder()
input_data[:, 1] = encoder.fit_transform(input_data[:, 1])
input_data[:, 4] = encoder.fit_transform(input_data[:, 4])
input_data[:, 5] = encoder.fit_transform(input_data[:, 5])

# Split the data using train_test_split().
We will split the data into train data to fit & train the model, and test data to test the model.  
The test data will take 20% of the whole data.

In [9]:
x_train, x_test, y_train, y_test = train_test_split(input_data, output_data, test_size= 0.2)

#  Write the hypothesis
Our hypothesis for this problem will be something like that:  
h(x) = $\theta$<sub>0</sub> + $\theta$<sub>1</sub> x<sub>1</sub> + $\theta$<sub>2</sub> x<sub>2</sub> + $\theta$<sub>3</sub> x<sub>3</sub> + $\theta$<sub>4</sub> x<sub>4</sub> + $\theta$<sub>5</sub> x<sub>5</sub> + $\theta$<sub>6</sub> x<sub>6</sub>  
Note: this hypothesis may be written like this:  
h(x) = $\theta$<sup>T</sup> X 

In [10]:
def hypothesis(theta, features):
    hyp = theta[:, 0] + theta[:, 1] * features[:, 0] + theta[:, 2] * features[:, 1] + theta[:, 3] * features[:, 2] + theta[:, 4] * features[:, 3] + theta[:, 5] * features[:, 4] + theta[:, 6] * features[:, 5]
    return hyp

# Mean Squared Error
This is a loss function, corresponding to the expected value of the squared error loss.   
![Alt text](quicklatex.com-50d568506216f6ab6402504298c570e2_l3.svg)

In [11]:
def meanSquaredError(theta, features, y_actual):
    n = len(features)
    y_predicted = hypothesis(theta, features)  # Make prediction.
    cost = (1 / n) * sum([val**2 for val in (y_actual - y_predicted)])
    print('The accuracy is:', cost)

# Gradient Descent
It is an algorithm to find the minimum of our loss function( Mean Squared Error ), and get the best values of $\theta$ to our hypothesis.  
Note: you can change $\alpha$ ( Learning rate ) and the number of iterations to get the best result in the loss function, which is the minmum value you can get. 

In [12]:
def gradientDescent(theta, feature):
    learning_rate = 0.0001
    length = len(feature)
    y_predicted = hypothesis(theta, feature)

    for i in range(2):
        theta[:, 0] = theta[:, 0] - learning_rate * (-(2 / length) * sum(y_train - y_predicted))
        theta[:, 1] = theta[:, 1] - learning_rate * (-(2 / length) * sum(x_train[:, 0] * (y_train - y_predicted)))
        theta[:, 2] = theta[:, 2] - learning_rate * (-(2 / length) * sum(x_train[:, 1] * (y_train - y_predicted)))
        theta[:, 3] = theta[:, 3] - learning_rate * (-(2 / length) * sum(x_train[:, 2] * (y_train - y_predicted)))
        theta[:, 4] = theta[:, 4] - learning_rate * (-(2 / length) * sum(x_train[:, 3] * (y_train - y_predicted)))
        theta[:, 5] = theta[:, 5] - learning_rate * (-(2 / length) * sum(x_train[:, 4] * (y_train - y_predicted)))
        theta[:, 6] = theta[:, 6] - learning_rate * (-(2 / length) * sum(x_train[:, 5] * (y_train - y_predicted)))
        meanSquaredError(theta, feature, y_train)
    return theta

# Prediction
Make an actual prediction.  
Here we will initialize some parameters, and then run our gradient descent to get our best parameters.  
At last, we will run meanSquaredError method to get the accuracy of the predicted data.

In [13]:
ini_parameters = np.array([[0, 0, 0, 0, 0, 0, 0]])

parameters = gradientDescent(ini_parameters, x_train)

meanSquaredError(parameters, x_test, y_test)


The accuracy is: 175552283.02094334
The accuracy is: 132752145.50033832
The accuracy is: 125183056.21600938
