**Analytic Method**

### 1. Implement Linear Regression and calculate sum of residual error on the following Datasets.

 *x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]*

 *y = [1, 3, 2, 5, 7, 8, 8, 9, 10, 12]*
### Compute the regression coefficients using analytic formulation and calculate Sum Squared Error (SSE) and R2 value.
### Implement gradient descent (both Full-batch and Stochastic with stopping criteria) on Least Mean Square loss formulation to compute the coefficients of regression matrix and compare the results using performance measures such as R2 SSE etc.



In [None]:
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Y = [1, 3, 2, 5, 7, 8, 8, 9, 10, 12]

mean_X = sum(X) / len(X)
mean_Y = sum(Y) / len(Y)

num=0
den = 0

for i in range(len(X)):
    num += (X[i] - mean_X) * (Y[i] - mean_Y)
    den += (X[i] - mean_X) ** 2

beta1 = num / den
beta0 = mean_Y - beta1 * mean_X

print("Slope (beta1):", beta1)
print("Intercept (beta0):", beta0)

#predicted Y and SSE
predicted_Y = [beta1 * x + beta0 for x in X]
SSE = sum((Y[i] - predicted_Y[i]) ** 2 for i in range(len(Y)))

print("Sum of Squared Errors (SSE):", SSE)

#r2
SS_total = sum((Y[i] - mean_Y) ** 2 for i in range(len(Y)))
R_squared = 1 - (SSE / SS_total)

print("R-square:", R_squared)
print("Linear Regression Equation is y = ",beta1,"X + ", beta0)

Slope (beta1): 1.1696969696969697
Intercept (beta0): 1.2363636363636363
Sum of Squared Errors (SSE): 5.624242424242421
R-square: 0.952538038613988
Linear Regression Equation is y =  1.1696969696969697 X +  1.2363636363636363


**Full Batch Gradient Descent**

In [None]:
learning_rate = 0.01
iterations = 1000
beta0 = 0
beta1 = 0

# Full batch gradient descent
for _ in range(iterations):
    gradient_beta0 = 0
    gradient_beta1 = 0
    for i in range(len(X)):
        gradient_beta0 += (beta0 + beta1 * X[i] - Y[i])
        gradient_beta1 += (beta0 + beta1 * X[i] - Y[i]) * X[i]
    beta0 = beta0 - (learning_rate * gradient_beta0 / len(X))
    beta1 = beta1 - (learning_rate * gradient_beta1 / len(X))

print("Full Batch Gradient Descent:")
print("Intercept (beta0):", beta0)
print("Slope (beta1):", beta1)

predicted_Y = [beta0 + beta1 * x for x in X]

#SSE
SSE = sum((Y[i] - predicted_Y[i]) ** 2 for i in range(len(Y)))

print("Sum of Squared Errors (SSE):", SSE)

#r2
mean_Y = sum(Y) / len(Y)
SS_total = sum((Y[i] - mean_Y) ** 2 for i in range(len(Y)))
R_squared = 1 - (SSE / SS_total)

print("R-square:", R_squared)

Full Batch Gradient Descent:
Intercept (beta0): 1.175803611388339
Slope (beta1): 1.1793547634798334
Sum of Squared Errors (SSE): 5.634861529064237
R-square: 0.9524484259150697


**Stochiastic Gradient Descent**

In [None]:
learning_rate = 0.01
iterations = 1000
beta0 = 0
beta1 = 0

# Stochastic gradient descent
for _ in range(iterations):
    for i in range(len(X)):
        gradient_beta0 = beta0 + beta1 * X[i] - Y[i]
        gradient_beta1 = (beta0 + beta1 * X[i] - Y[i]) * X[i]
        beta0 = beta0 - learning_rate * gradient_beta0
        beta1 = beta1 - learning_rate * gradient_beta1

print("\nStochastic Gradient Descent:")
print("Intercept (beta0):", beta0)
print("Slope (beta1):", beta1)

predicted_Y = [beta0 + beta1 * x for x in X]

#SSE
SSE = sum((Y[i] - predicted_Y[i]) ** 2 for i in range(len(Y)))

print("Sum of Squared Errors (SSE):", SSE)

#r2
mean_Y = sum(Y) / len(Y)
SS_total = sum((Y[i] - mean_Y) ** 2 for i in range(len(Y)))
R_squared = 1 - (SSE / SS_total)

print("R-square:", R_squared)


Stochastic Gradient Descent:
Intercept (beta0): 1.1434234183192076
Slope (beta1): 1.1924414227954474
Sum of Squared Errors (SSE): 5.667805958642662
R-square: 0.9521704138511168


### 2.Download Boston Housing Rate Dataset. Analyse the input attributes and find out the attribute that best follow the linear relationship with the output price. Implement both the analytic formulation and gradient descent (Full-batch, stochastic) on LMS lossformulation to compute the coefficients of regression matrix and compare the results.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import requests
from io import StringIO

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data"
response = requests.get(url)
data = response.text

names = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT", "MEDV"]
df = pd.read_csv(StringIO(data), delim_whitespace=True, names=names)

X = df.drop(columns=['MEDV']).values
y = df['MEDV'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

correlation_coefficients = np.abs(np.corrcoef(X_train_scaled.T, y_train)[0, 1:])

best_attribute_index = np.argmax(correlation_coefficients)
best_attribute_name = names[best_attribute_index]
print("Attribute with the highest correlation with price:", best_attribute_name)

X_train_with_bias = np.c_[np.ones((X_train_scaled.shape[0], 1)), X_train_scaled]
analytic_solution = np.linalg.inv(X_train_with_bias.T.dot(X_train_with_bias)).dot(X_train_with_bias.T).dot(y_train)

def full_batch_gradient_descent(X, y, learning_rate=0.01, n_iterations=1000):
    m = len(y)
    theta = np.random.randn(X.shape[1])
    for iteration in range(n_iterations):
        gradients = 2/m * X.T.dot(X.dot(theta) - y)
        theta -= learning_rate * gradients
    return theta

def stochastic_gradient_descent(X, y, learning_rate=0.01, n_iterations=1000):
    m = len(y)
    theta = np.random.randn(X.shape[1])
    for iteration in range(n_iterations):
        for i in range(m):
            random_index = np.random.randint(m)
            xi = X[random_index:random_index+1]
            yi = y[random_index:random_index+1]
            gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
            theta -= learning_rate * gradients
    return theta

theta_full_batch = full_batch_gradient_descent(X_train_with_bias, y_train)

theta_stochastic = stochastic_gradient_descent(X_train_with_bias, y_train)

print("Analytic solution coefficients:", analytic_solution)

Attribute with the highest correlation with price: DIS
Analytic solution coefficients: [22.79653465 -1.00213533  0.69626862  0.27806485  0.7187384  -2.0223194
  3.14523956 -0.17604788 -3.0819076   2.25140666 -1.76701378 -2.03775151
  1.12956831 -3.61165842]


In [2]:
print("Full-batch gradient descent coefficients:", theta_full_batch)

Full-batch gradient descent coefficients: [22.79653462 -0.95188403  0.56191963  0.10717462  0.74568816 -1.95105073
  3.20563282 -0.17932266 -2.96328203  1.69982491 -1.15978267 -2.02592308
  1.1305053  -3.58399282]


In [3]:
print("Stochastic gradient descent coefficients:", theta_stochastic)

Stochastic gradient descent coefficients: [22.65500657 -0.42834202  1.34458471 -0.25347829 -0.63088908 -1.81602241
  2.75898688 -0.16921792 -2.29956408  1.65546777 -2.50559512 -2.31705352
  0.88137425 -2.82715737]
