### Linear Regression
1. Ordinary Least Squares Method: 
In this method, we find the regression coefficient weights that minimize the sum of the squared residuals.

Formula:  $$ weights = (X^T \cdot X)^{-1} \cdot X^T \cdot y$$

To find the predicted values, we multiply the feature matrix X with the weights vector.
Formula: $$ y_{pred} = X \cdot weights $$

Mean Squared Error: $$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_{pred} - y_{true})^2 $$

In [86]:
# importing the libraries
import numpy as np
import mxnet as mx
from mxnet import ndarray as nd
from LinearRegression import LinearRegression

In [87]:
# read data from DAT file in NDArray format
data_ctx = mx.cpu()
model_ctx = mx.cpu()
data_file = open("airfoil_self_noise.dat", "r")
data = np.loadtxt(data_file, delimiter="\t")
data_file.close()

In [88]:
# split data into features and labels
features = nd.array(data[:, 0:-1], ctx=data_ctx)
labels = nd.array(data[:, -1], ctx=data_ctx)

In [89]:
# splitting the data into training and testing sets (80% training, 20% testing)
X_train = features[:int(len(features)*0.8)]
X_test = features[int(len(features)*0.8):]
y_train = labels[:int(len(labels)*0.8)]
y_test = labels[int(len(labels)*0.8):]
y_train = y_train.reshape((len(y_train), 1))
y_test = y_test.reshape((len(y_test), 1))

In [90]:
# printing the shapes of the training and testing sets to check dimensions
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(1202, 5)
(301, 5)
(1202, 1)
(301, 1)


In [91]:
# implementing linear regression using OLS Method
linear_regression = LinearRegression()
weights = linear_regression.OLS_fit(X_train, y_train)
y_pred = linear_regression.OLS_predict(X_test, weights)
mse = nd.mean(nd.square(y_test - y_pred))

In [92]:
# printing the result of the OLS Method
print("MSE: ", mse.asscalar())

MSE:  26.421574


### Linear Regression
#### Linear Regression with Gradient Descent 
In this method, we find the regression coefficient weights that minimize the sum of the squared residuals.
The formulation of the loss function is given as-
Formula: $$ L = \frac{1}{2n} \sum_{i=1}^{n} (y_{pred} - y_{true})^2 $$
The gradient of the loss function is given as-
Formula: $$ \frac{\partial L}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} (y_{pred} - y_{true}) \cdot x $$
The weights are updated as-
Formula: $$ w = w - \alpha \cdot \frac{\partial L}{\partial w} $$
where $\alpha$ is the learning rate.

In [101]:
# # implementing linear regression using Gradient Descent Method with MXNet
from mxnet import nd
from mxnet.gluon import nn
from mxnet import autograd
from mxnet import gluon
from mxnet import np, npx
import mxnet as mx
npx.set_np()
data_ctx = mx.cpu()
model_ctx = mx.cpu()

In [102]:
# reading data from DAT file in NDArray format
data = np.genfromtxt('airfoil_self_noise.dat', delimiter='\t')

In [103]:
# splitting the data into features and labels

features = data[:, :-1]
labels = data[:, -1]
labels = np.reshape(labels, (labels.shape[0], 1))
# convert the data into float32 format
features = features.astype(np.float32)
labels = labels.astype(np.float32)

In [104]:
# splitting the data into training and testing data set

X_train = features[:features.shape[0]*8//10, :]
X_test = features[features.shape[0]*8//10:, :]
y_train = labels[:features.shape[0]*8//10, :]
y_test = labels[features.shape[0]*8//10:, :]

In [105]:
# implementation of the gradient descent 
# declaring the parameters and important variables
no_of_data_points = X_train.shape[0]
no_of_features = X_train.shape[1]
no_of_epochs = 100000

In [106]:
# weights initialisation
weights = np.random.normal(0, 1, (no_of_features, 1))
# bias initialisation
bias = np.random.normal(0, 1, (1, 1))

In [107]:
# setting the parameters for the gradient descent method
learning_rate = 0.00000000001
num_of_epochs = 100000
derivative_weights = np.zeros((no_of_features, 1))
derivative_intercept = 0

In [110]:
for epoch in range(num_of_epochs):
    # calculating the derivative of the weights and intercept
    y_pred = np.dot(X_train, weights) + bias
    difference = y_pred - y_train
    derivative_weights = np.dot(X_train.T, difference)
    derivative_intercept = np.sum(difference)
    # updating the weights and intercept
    weights = weights - learning_rate * derivative_weights
    bias = bias - learning_rate * derivative_intercept
    if (epoch%10000==0): 
        MSE = np.mean(np.square(y_train - y_pred))
        print(f"Epoch No.: {epoch}, MSE: {MSE}, Absolute Error: {np.mean(np.abs(y_train - y_pred))}")
        if (MSE < 0.0001):
            break

Epoch No.: 0, MSE: 3392.2109375, Absolute Error: 52.189208984375
Epoch No.: 10000, MSE: 2714.1455078125, Absolute Error: 45.9943962097168
Epoch No.: 20000, MSE: 2248.6962890625, Absolute Error: 41.103965759277344
Epoch No.: 30000, MSE: 1929.163330078125, Absolute Error: 37.34485626220703
Epoch No.: 40000, MSE: 1709.7784423828125, Absolute Error: 34.66575241088867
Epoch No.: 50000, MSE: 1559.184326171875, Absolute Error: 32.86613845825195
Epoch No.: 60000, MSE: 1455.788818359375, Absolute Error: 31.83800506591797
Epoch No.: 70000, MSE: 1384.7685546875, Absolute Error: 31.391223907470703
Epoch No.: 80000, MSE: 1335.9691162109375, Absolute Error: 31.247987747192383
Epoch No.: 90000, MSE: 1302.425537109375, Absolute Error: 31.164520263671875


In [111]:
# calculating the MSE for the testing data
y_pred = np.dot(X_test, weights) + bias
MSE = np.mean(np.square(y_test - y_pred))
print("MSE: ", MSE)

MSE:  1195.8896
