## Linear Regression with stochastic gradient descent

Reference: https://machinelearningmastery.com/implement-linear-regression-stochastic-gradient-descent-scratch-python/
mainly followed this above link


This notebook focuses on

How to estimate linear regression coefficients using stochastic gradient descent.

How to make predictions for multivariate linear regression.

How to implement linear regression with stochastic gradient descent to make predictions on new data

###Multivariate Linear Regression

Linear regression is a technique for predicting a real value.

Linear regression is a technique where a straight line is used to model the relationship between input and output values. In more than two dimensions, this straight line may be thought of as a plane or hyperplane.

Predictions are made as a combination of the input values to predict the output value.

Each input attribute (x) is weighted using a coefficient (b), and the goal of the learning algorithm is to discover a set of coefficients that results in good predictions (y).

    y = b0 + b1 * x1 + b2 * x2 + ...

Coefficients can be found using stochastic gradient descent.



###Stochastic gradient descent

Gradient Descent is the process of minimizing a function by following the gradients of the cost function.

This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction, e.g. downhill towards the minimum value.

In machine learning, we can use a technique that evaluates and updates the coefficients every iteration called stochastic gradient descent to minimize the error of a model on our training data.

The way this optimization algorithm works is that each training instance is shown to the model one at a time. The model makes a prediction for a training instance, the error is calculated and the model is updated in order to reduce the error for the next prediction. This process is repeated for a fixed number of iterations.

This procedure can be used to find the set of coefficients in a model that result in the smallest error for the model on the training data. Each iteration, the coefficients (b) in machine learning language are updated using the equation:

      b = b - learning_rate * error * x
      
Where b is the coefficient or weight being optimized, learning_rate is a learning rate that you must configure (e.g. 0.01), error is the prediction error for the model on the training data attributed to the weight, and x is the input value.



###Implementation using California Housing Dataset
Here we will use california housing dataset as I'm using google colab this dataset is given by default

In [1]:
import pandas as pd
df=pd.read_csv("sample_data/california_housing_train.csv")
df_test=pd.read_csv("sample_data/california_housing_test.csv")
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


In [2]:
#normalize the data using min max

def dataset_minmax(dataset):
 minmax = list()
 for col in dataset:
    col_values=dataset[col]
    value_min = min(col_values)
    value_max = max(col_values)
    minmax.append([value_min, value_max])
 return minmax

def normalize_dataset(dataset, minmax):
  for index,row in dataset.iterrows():
  #  print(row)
    for i,v in enumerate(row):
    #  print(index,i)
    #  print(v)
    #  print(minmax[i][0],minmax[i][1])
      v = (v - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
      dataset.iloc[index,i]=v
  return dataset




In [3]:
minmax=dataset_minmax(df)
dataset=normalize_dataset(df, minmax)
dataset

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,1.000000,0.175345,0.274510,0.147885,0.198945,0.028364,0.077454,0.068530,0.107012
1,0.984064,0.197662,0.352941,0.201608,0.294848,0.031559,0.075974,0.091040,0.134228
2,0.975100,0.122210,0.313725,0.018927,0.026847,0.009249,0.019076,0.079378,0.145775
3,0.974104,0.116897,0.254902,0.039515,0.052142,0.014350,0.037000,0.185639,0.120414
4,0.974104,0.109458,0.372549,0.038276,0.050435,0.017405,0.042921,0.098281,0.104125
...,...,...,...,...,...,...,...,...,...
16995,0.008964,0.854410,1.000000,0.058389,0.060987,0.025337,0.060516,0.128081,0.198764
16996,0.007968,0.866100,0.686275,0.061869,0.081782,0.033381,0.076303,0.139170,0.131960
16997,0.004980,0.988310,0.313725,0.070515,0.082247,0.034782,0.074823,0.174577,0.182682
16998,0.004980,0.984060,0.352941,0.070384,0.085506,0.036296,0.078441,0.102054,0.145981


In [4]:
test_dataset=normalize_dataset(df_test,minmax)

In [5]:
from math import sqrt
# Estimate linear regression coefficients using stochastic gradient descent
def predict(row, coefficients):
  yhat = coefficients[0]
  for i in range(len(row)-1):
    yhat += coefficients[i + 1] * row[i]
  return yhat

def coefficients_sgd(train, l_rate, n_epoch):
  #initaling coef as 0.0 for all the features 
  coef = [0.0 for i in range(len(train.columns))]

  for epoch in range(n_epoch):
    for index,row in train.iterrows():

      yhat = predict(row, coef)
      error = yhat - row[-1]

      coef[0] = coef[0] - l_rate * error
      
      
      for i in range(len(row)-1):
        coef[i + 1] = coef[i + 1] - l_rate * error * row[i]
    print(l_rate, epoch, error)
    print(coef)
  return coef

# Linear Regression Algorithm With Stochastic Gradient Descent
def linear_regression_sgd(train, test, l_rate, n_epoch):
	predictions = list()
	coef = coefficients_sgd(train, l_rate, n_epoch)
	for index,row in test.iterrows():
		yhat = predict(row, coef)
		predictions.append(yhat)
	return(predictions)

def rmse_metric(actual, predicted):
	sum_error = 0.0
	for i in range(len(actual)):
		prediction_error = predicted[i] - actual[i]
		sum_error += (prediction_error ** 2)
	mean_error = sum_error / float(len(actual))
	return sqrt(mean_error)

In [6]:
predicted=linear_regression_sgd(dataset, test_dataset,0.001, 10)

0.001 0 0.2033200495759856
[0.26676956767365056, 0.17792962071966956, -0.05304555939744157, 0.0838110596513244, 0.04768197003324251, 0.03691216939401818, 0.00809944446226513, 0.03550528517873965, 0.32087400623610884]
0.001 1 0.19334730137726336
[0.2644636936742789, 0.14227632156945547, -0.1094043049744996, 0.08732681272249478, 0.06472178524767334, 0.04416124730338679, 0.003289912504612612, 0.04404209077197698, 0.5246220486115853]
0.001 2 0.19066269150711573
[0.2599767767080246, 0.10237532759532664, -0.15302394712538256, 0.09838180830203042, 0.07637266629185016, 0.050314190145388016, -0.002171108303550121, 0.051053479008492615, 0.6791912911051863]
0.001 3 0.19116242639091074
[0.2562755434996066, 0.06212088314605453, -0.18740694605185804, 0.11052839068985579, 0.0846113681494265, 0.056423099950538534, -0.007794910740043701, 0.057595840677207, 0.7970640218316298]
0.001 4 0.19292267838816463
[0.2544222468797674, 0.023366042469684142, -0.21518388994650317, 0.12138727841469524, 0.090485682630

In [10]:
print("RMSE error",rmse_metric(test_dataset.iloc[:,-1], predicted))

RMSE error 0.17679553395593556
