

*   Write a Python code to implement Linear Regression for multi-dimensional input and one-dimensional output using Matrix Inverse. You can use NumPy to do matrix inverse, but you are encouraged to write your own code for this task also.

*   Verify your results using the scikit-learn Linear Regression package.

*   Write a code to minimise the squared error function using Gradient Descent, and compare the results with the above methods.
*   Find the best fit hyperplane for the four synthetic datasets attached. Two of them will directly give good results with the usual Linear Regression algo, one of them will require a non-linear transformation of the input features, and for one of them the standard Linear Regression algo is not suitable. You need to figure out which of the 4 datasets belongs to which of these categories, with proper reasoning.




Submission Deadline : Jan 27, 2024 (Saturday)

Submission Form : https://forms.gle/bLDU3WY1P82Uer6H6





**Data Set - 4**

In [1]:
# importing required module and packages
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
import plotly.express as px

In [2]:
# reading the csv file
df = pd.read_csv("Data4.csv")
# adding column xo with values 1 at position of 2nd column to adjust the constant
df.insert(1, "x0", 1)
# initializing the value for X and Y
Y = np.array(df["y"])
X = np.array(df[["x0", "x1", "x2", "x3"]])
df.head(5)

Unnamed: 0,Sl.,x0,x1,x2,x3,y
0,1,1,0.1,0.311089,0.390541,14.705876
1,2,1,0.2,0.639066,1.389918,26.5851
2,3,1,0.3,0.840228,1.939903,36.791509
3,4,1,0.4,1.29971,2.153009,34.090806
4,5,1,0.5,0.941784,0.945136,31.118089


In [3]:
# explicit values of features x1,x2,x3
x1 = df["x1"]
x2 = df["x2"]
x3 = df["x3"]

# New Section

In [4]:
def md_linear_regression(X, Y):
    sum_xiyi = np.zeros(X.shape[1])
    # Evaluating the sum of dot products of xi and yi
    for i in range(len(Y)):
        sum_xiyi += Y[i] * X[i]

    # Evaluating the sum of outer products of xi
    sum_xixit = 0
    for i in range(len(Y)):
        xi = X[i].reshape((1, -1))
        transposed_xi = xi.T
        sum_xixit += np.dot(transposed_xi, xi) # Taking the transpose

    print("Sum of dot products of xi and yi:")
    print(sum_xiyi)

    print("Sum of dot products of xi and xi transpose:")
    print(sum_xixit)

    # Assuming A is the sum of dot products of xi
    A = sum_xixit
    print("Shape of A:", A.shape)

    # Evaluating  the inverse using numpy
    inverse_A = np.linalg.inv(A)
    print("Inverse of A:")
    print(inverse_A)

    s = np.dot(inverse_A, sum_xiyi)
    return s

In [5]:
# applying the linear regression model
output = md_linear_regression(X, Y)
print("\n")
print("The Coefficiants corresponding to x_i : ", output)


Sum of dot products of xi and yi:
[10441.80045016 66074.55450872 71379.27277894 76343.57296322]
Sum of dot products of xi and xi transpose:
[[ 100.          505.          554.04577271  606.12649879]
 [ 505.         3383.5        3641.01643904 3863.4271195 ]
 [ 554.04577271 3641.01643904 3929.68278392 4171.37422047]
 [ 606.12649879 3863.4271195  4171.37422047 4472.48767615]]
Shape of A: (4, 4)
Inverse of A:
[[ 0.11668684  0.0911748  -0.05419394 -0.04402717]
 [ 0.0911748   0.17232455 -0.13967998 -0.03093789]
 [-0.05419394 -0.13967998  0.1444378  -0.00671043]
 [-0.04402717 -0.03093789 -0.00671043  0.03917373]]


The Coefficiants corresponding to x_i :  [13.23947782  6.13243763  2.39226554  7.74681038]


In [6]:
# actual values
actual_values = np.array(df["y"])
#predicted values
predicted_values = np.dot(X, output)

In [7]:
#Actual value and the predicted value
reg_model_diff = pd.DataFrame({'Actual value': actual_values, 'Predicted value': predicted_values})
reg_model_diff

Unnamed: 0,Actual value,Predicted value
0,14.705876,17.622378
1,26.585100,26.762216
2,36.791509,32.117316
3,34.090806,35.480661
4,31.118089,25.880484
...,...,...
95,183.744594,174.148177
96,175.907274,179.536429
97,183.992845,176.255167
98,173.920425,179.586624


In [8]:
# setting the values of X and Y
X = df[['x1', 'x2', 'x3']]
Y = df['y']

In [9]:
# applying the sk learn model
regr = LinearRegression()
regr.fit(X, Y)
print("Slopes: ",regr.coef_)
print("Intercept: ",regr.intercept_)

Slopes:  [6.13243763 2.39226554 7.74681038]
Intercept:  13.239477824445359


In [10]:
# predicted values by sk learn regression model
predicted_Y_SK = []
for i in range(100):
  r1 = 13.239477824445359 + 6.13243763 * x1[i] + 2.39226554 * x2[i] + 7.74681038 * x3[i]
  predicted_Y_SK.append(r1)

In [11]:
print(predicted_values)
print(predicted_Y_SK)

[ 17.62237823  26.76221602  32.11731638  35.48066075  25.880484
  34.59029681  33.78031148  36.42275747  40.56514627  42.68978999
  42.24372491  37.83247816  42.31963654  46.01688725  52.82945638
  46.67350452  55.67055561  58.32541674  45.2429466   56.10802161
  57.93205744  64.7319409   61.96644942  60.85882881  64.12205602
  64.52541358  66.7441677   73.38687232  71.45581653  68.09593516
  72.34131982  77.609325    73.1817339   81.69309361  80.05936156
  76.08451525  86.25302341  83.95440248  90.88977097  88.28551595
  93.70954052  86.84553065  83.67370282  86.4181877  102.47512509
 103.02855475  95.7290018   95.20841065  96.48888031  95.88655689
 104.47470489 109.00980514 109.63881763 101.70570521 114.31434827
 111.76728317 119.00026129 111.68783713 118.46428629 123.95349877
 119.69384212 126.28372658 122.06820051 119.24719724 128.6341511
 129.48464845 126.22720209 129.74093708 134.87150721 134.22378473
 140.09632654 133.6523858  149.15560245 138.4341003  147.6819663
 139.90163235 

**Verification Of My own LinearRegession Model Relative To SK  Learn Model**

In [12]:
# Error between "prediction by My sk learn Regression model" and "given data."
mae = mean_absolute_error(y_true = actual_values,y_pred = predicted_Y_SK )
#squared True returns MSE value, False returns RMSE value.
mse = mean_squared_error(y_true=actual_values,y_pred = predicted_Y_SK) #default=True
rmse = mean_squared_error(y_true=actual_values,y_pred = predicted_Y_SK,squared=False)
r_square = r2_score(actual_values,predicted_Y_SK)

print("MAE:",mae)
print("MSE:",mse)
print("RMSE:",rmse)
print(r_square)

MAE: 5.15550562646378
MSE: 34.62048082924356
RMSE: 5.883917133104745
0.9841749058943147


In [13]:
# Error between "prediction by My Own Regression model" and "given data."
mae = mean_absolute_error(y_true = actual_values,y_pred = predicted_values )
#squared True returns MSE value, False returns RMSE value.
mse = mean_squared_error(y_true=actual_values,y_pred = predicted_values) #default=True
rmse = mean_squared_error(y_true=actual_values,y_pred = predicted_values,squared=False)
r_square = r2_score(actual_values,predicted_values)

print("MAE:",mae)
print("MSE:",mse)
print("RMSE:",rmse)
print(r_square)

MAE: 5.155505630377769
MSE: 34.62048082924355
RMSE: 5.8839171331047435
0.9841749058943147


**Method-2(Using Gradient Descent)**

In [14]:
def gradient_descent(x1, x2, x3, y, learning_rate, epoch):

    total_data_point = len(y)
    const = 0 #initialising with 0.
    weight1 = 0
    weight2 = 0
    weight3 = 0

    for _ in range(epoch):

        sum = 0
        for i in range(total_data_point):
            r = const + weight1*x1[i] +  weight2*x2[i] +  weight3*x3[i] - y[i]
            sum = sum + r
        const_new = const - learning_rate * ((2/total_data_point) * sum)
        const = const_new

        sum = 0
        for i in range(total_data_point):
            r = const*x1[i] + weight1*x1[i]*x1[i] + weight2*x2[i]*x1[i] + weight3*x3[i]*x1[i] - y[i]*x1[i]
            sum = sum + r
        weight1_new = weight1 - learning_rate * ((2/total_data_point) * sum)
        weight1 = weight1_new

        sum = 0
        for i in range(total_data_point):
            r = const*x1[i] + weight1*x1[i]*x2[i] + weight2*x2[i]*x2[i] + weight3*x3[i]*x2[i] - y[i]*x2[i]
            sum = sum + r
        weight2_new = weight2 - learning_rate * ((2/total_data_point) * sum)
        weight2 = weight1_new

        sum = 0
        for i in range(total_data_point):
            r = const*x3[i] + weight1*x1[i]*x3[i] + weight2*x2[i]*x3[i] + weight3*x3[i]*x3[i] - y[i]*x3[i]
            sum = sum + r
        weight3_new = weight3 - learning_rate * ((2/total_data_point) * sum)
        weight3 = weight3_new

    return (weight1, weight2, weight3, const)

result = gradient_descent(x1, x2, x3, Y, 0.015, 5000)
print(result)

# Slopes:  [6.13243763 2.39226554 7.74681038]
# Intercept:  13.239477824445359


(4.293667771213319, 4.293667771213319, 7.658473605875606, 12.526059486410091)


**Stochastic Gradient Descent**

In [15]:
def stochastic_gradient_descent(x1, x2, x3, y, learning_rate):

    n = len(Y) #total number of datasets.
    c = 0 #initialising with 0.
    weight1 = 0
    weight2 = 0
    weight3 = 0
    k = 1

    for _ in range(1000):  #epoch

        j = len(y)
        while j > k:
            sum = 0
            for i in range(k):
                i=j-i-1
                q = c + weight1*x1[i] +  weight2*x2[i] +  weight3*x3[i] - y[i]
                sum = sum + q
            c_new = c - learning_rate * ((2/n) * sum)
            c = c_new
            # print(c)
            j = j-k

        j = len(y)
        while j > k:
            sum = 0
            for i in range(k):
                i=j-i-1
                q = c*x1[i] + weight1*x1[i]*x1[i] + weight2*x2[i]*x1[i] + weight3*x3[i]*x1[i] - y[i]*x1[i]
                sum = sum + q
            weight1_new = weight1 - learning_rate * ((2/n) * sum)
            weight1 = weight1_new
            # print(m)
            j = j-k


        j = len(y)
        while j > k:
            sum = 0
            for i in range(k):
                i=j-i-1
                q = c*x2[i] + weight1*x1[i]*x2[i] + weight2*x2[i]*x2[i] + weight3*x3[i]*x2[i] - y[i]*x2[i]
                sum = sum + q
            weight2_new = weight2 - learning_rate * ((2/n) * sum)
            weight2 = weight2_new
            # print(m)
            j = j-k

        j = len(y)
        while j > k:
            sum = 0
            for i in range(k):
                i=j-i-1
                q = c*x3[i] + weight1*x1[i]*x3[i] + weight2*x2[i]*x3[i] + weight3*x3[i]*x3[i] - y[i]*x3[i]
                sum = sum + q
            weight3_new = weight3 - learning_rate * ((2/n) * sum)
            weight3 = weight3_new
            # print(m)
            j = j-k

    return (weight1, weight2, weight3, c)

shuffle = np.random.permutation(len(Y))
v = np.array(x1)
s_x1=v[shuffle]

v = np.array(x2)
s_x2=v[shuffle]

v = np.array(x3)
s_x3=v[shuffle]

l = np.array(Y)
s_y=l[shuffle]

d=stochastic_gradient_descent(s_x1, s_x2, s_x3, s_y, 0.01)
print(d)

# (6.3412695431008865, 2.724677446942416, 7.164331162929843, 13.783425782256382) one time I got this

(4.3741729576716395, 3.4622628316990345, 8.462603487739324, 11.868427924014178)


In [16]:
predicted_Y_SGD = []
for i in range(100):
  r1 = 13.783425782256382+ 6.3412695431008865 * x1[i] + 2.724677446942416 * x2[i] + 7.164331162929843 * x3[i]
  predicted_Y_SGD.append(r1)

In [17]:
# Error between "prediction by My Own Regression model" and "given data."
mae = mean_absolute_error(y_true = Y,y_pred = predicted_Y_SGD )
#squared True returns MSE value, False returns RMSE value.
mse = mean_squared_error(y_true= Y,y_pred = predicted_Y_SGD) #default=True
rmse = mean_squared_error(y_true= Y,y_pred = predicted_Y_SGD,squared=False)
r_square = r2_score(actual_values,predicted_values)
print(r_square)
print("MAE:",mae)
print("MSE:",mse)
print("RMSE:",rmse)

0.9841749058943147
MAE: 5.167976423609323
MSE: 34.72110319659666
RMSE: 5.892461556649875
