### Linear Regression
1. Ordinary Least Squares Method: 
In this method, we find the regression coefficient weights that minimize the sum of the squared residuals.

Formula:  $$ weights = (X^T \cdot X)^{-1} \cdot X^T \cdot y$$

To find the predicted values, we multiply the feature matrix X with the weights vector.
Formula: $$ y_{pred} = X \cdot weights $$

Mean Squared Error: $$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_{pred} - y_{true})^2 $$

In [1]:
# importing the libraries
import numpy as np
import mxnet as mx
from mxnet import ndarray as nd
from LinearRegression import LinearRegression

In [2]:
# read data from DAT file in NDArray format
data_ctx = mx.cpu()
model_ctx = mx.cpu()
data_file = open("airfoil_self_noise.dat", "r")
data = np.loadtxt(data_file, delimiter="\t")
data_file.close()

In [3]:
# split data into features and labels
features = nd.array(data[:, 0:-1], ctx=data_ctx)
labels = nd.array(data[:, -1], ctx=data_ctx)

In [4]:
# splitting the data into training and testing sets (80% training, 20% testing)
X_train = features[:int(len(features)*0.8)]
X_test = features[int(len(features)*0.8):]
y_train = labels[:int(len(labels)*0.8)]
y_test = labels[int(len(labels)*0.8):]
y_train = y_train.reshape((len(y_train), 1))
y_test = y_test.reshape((len(y_test), 1))

In [5]:
# printing the shapes of the training and testing sets to check dimensions
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(1202, 5)
(301, 5)
(1202, 1)
(301, 1)


In [6]:
# implementing linear regression using OLS Method
linear_regression = LinearRegression()
weights = linear_regression.OLS_fit(X_train, y_train)
y_pred = linear_regression.OLS_predict(X_test, weights)
mse = nd.mean(nd.square(y_test - y_pred))

In [7]:
# printing the result of the OLS Method
print("MSE: ", mse.asscalar())

MSE:  26.421574


### Linear Regression
#### Linear Regression with Gradient Descent 
In this method, we find the regression coefficient weights that minimize the sum of the squared residuals.
The formulation of the loss function is given as-
Formula: $$ L = \frac{1}{2n} \sum_{i=1}^{n} (y_{pred} - y_{true})^2 $$
The gradient of the loss function is given as-
Formula: $$ \frac{\partial L}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} (y_{pred} - y_{true}) \cdot x $$
The weights are updated as-
Formula: $$ w = w - \alpha \cdot \frac{\partial L}{\partial w} $$
where $\alpha$ is the learning rate.

In [None]:
# implementing linear regression using Gradient Descent Method with MXNet
import mxnet as mx
from mxnet import autograd, nd

# defining the hyperparameters
epochs = 1000
learning_rate = 0.0001
batch_size = 10

# defining the model
linear_regression = LinearRegression()
linear_regression.initialize(mx.init.Normal(sigma=0.01), ctx=model_ctx)

# defining the loss function
square_loss = mx.gluon.loss.L2Loss()
