Imports

In [19]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


1. Load & Observe Dataset

In [7]:
data = pd.read_csv("student.csv")

data.head()
data.tail()
data.info()
data.describe()

# Features (X) and Label/Target (Y)
X = data[["Math", "Reading"]].values.astype(float)
Y = data["Writing"].values.astype(float)

print("X shape:", X.shape)  # (n, 2)
print("Y shape:", Y.shape)  # (n,)




<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Math     1000 non-null   int64
 1   Reading  1000 non-null   int64
 2   Writing  1000 non-null   int64
dtypes: int64(3)
memory usage: 23.6 KB
X shape: (1000, 2)
Y shape: (1000,)


2. Split Feature Matrix (X) and Target (Y)

In [8]:
# W (weights) for 2 features: Math, Reading
W = np.zeros(X.shape[1])  # shape = (2,)

# Prediction rule (no bias): y_pred = X dot W
Y_pred_demo = X @ W

print("W shape:", W.shape)
print("Example predictions shape:", Y_pred_demo.shape)



W shape: (2,)
Example predictions shape: (1000,)


3. Train–Test Split (from scratch)

In [9]:
def train_test_split_np(X, Y, test_size=0.2, random_state=42):
    rng = np.random.default_rng(random_state)
    idx = np.arange(len(X))
    rng.shuffle(idx)

    test_n = int(len(X) * test_size)
    test_idx = idx[:test_n]
    train_idx = idx[test_n:]

    return X[train_idx], X[test_idx], Y[train_idx], Y[test_idx]

X_train, X_test, Y_train, Y_test = train_test_split_np(X, Y, test_size=0.2, random_state=42)

print("Train:", X_train.shape, Y_train.shape)
print("Test :", X_test.shape, Y_test.shape)



Train: (800, 2) (800,)
Test : (200, 2) (200,)


4. Cost Function (Mean Squared Error)

In [10]:
def cost_function(X, Y, W):
    n = len(Y)
    Y_pred = X @ W
    cost = (1/(2*n)) * np.sum((Y_pred - Y)**2)
    return cost



5. Pass the given test case (cost must be 0)

In [11]:
X_test_case = np.array([[1, 2], [3, 4], [5, 6]])
Y_test_case = np.array([3, 7, 11])
W_test_case = np.array([1, 1])

cost = cost_function(X_test_case, Y_test_case, W_test_case)

if cost == 0:
    print("Proceed Further")
else:
    print("something went wrong: Reimplement a cost function")
    print("Cost function output:", cost)



Proceed Further


6. Gradient Descent (No Bias)

In [12]:
def gradient_descent(X, Y, W, alpha, iterations):
    n = len(Y)
    cost_history = []

    for _ in range(iterations):
        Y_pred = X @ W
        gradient = (1/n) * (X.T @ (Y_pred - Y))
        W = W - alpha * gradient
        cost_history.append(cost_function(X, Y, W))

    return W, np.array(cost_history)



To-Do-7: Train model (run GD)

In [13]:
W_init = np.zeros(X_train.shape[1])
alpha = 0.00001
iterations = 1000

W_optimal, cost_history = gradient_descent(X_train, Y_train, W_init, alpha, iterations)

print("Final W:", W_optimal)
print("First 10 costs:", cost_history[:10])



Final W: [0.35107246 0.64360849]
First 10 costs: [1992.85006903 1628.24690693 1330.93045861 1088.48291521  890.77818311
  729.55894631  598.0917822   490.88592601  403.4639374   332.17469078]


8. RMSE

In [14]:
def rmse(Y, Y_pred):
    return np.sqrt(np.mean((Y_pred - Y)**2))



9. R² Score

In [16]:
def r2(Y, Y_pred):
    mean_y = np.mean(Y)
    ss_tot = np.sum((Y - mean_y)**2)
    ss_res = np.sum((Y - Y_pred)**2)
    return 1 - (ss_res / ss_tot)



To-Do-10: Main function

In [17]:
def main():
    data = pd.read_csv("student.csv")

    X = data[["Math", "Reading"]].values.astype(float)
    Y = data["Writing"].values.astype(float)

    X_train, X_test, Y_train, Y_test = train_test_split_np(X, Y, test_size=0.2, random_state=42)

    W = np.zeros(X_train.shape[1])
    alpha = 0.00001
    iterations = 1000

    W_optimal, cost_history = gradient_descent(X_train, Y_train, W, alpha, iterations)

    Y_pred = X_test @ W_optimal

    model_rmse = rmse(Y_test, Y_pred)
    model_r2 = r2(Y_test, Y_pred)

    print("Final Weights:", W_optimal)
    print("Cost History (First 10):", cost_history[:10])
    print("RMSE on Test Set:", model_rmse)
    print("R-Squared on Test Set:", model_r2)

if __name__ == "__main__":
    main()


Final Weights: [0.35107246 0.64360849]
Cost History (First 10): [1992.85006903 1628.24690693 1330.93045861 1088.48291521  890.77818311
  729.55894631  598.0917822   490.88592601  403.4639374   332.17469078]
RMSE on Test Set: 5.131334826568035
R-Squared on Test Set: 0.8889734095560186


To-Do-11: Findings (overfit/underfit + learning rate experiment)

1. Model Performance Analysis

The performance of the linear regression model is acceptable.

The model does not overfit because the training and test errors are close to each other.

The model does not underfit because it is able to learn a strong relationship between the features (Math and Reading) and the target (Writing).

The obtained RMSE is reasonably low, and the R² value is high, indicating that the model explains most of the variance in the writing scores.

Conclusion:
The model generalizes well and its performance is acceptable.

2. Effect of Learning Rate

Different learning rates were tested to observe their effect on model convergence and performance.

Very small learning rate (e.g., 0.000001):
The model learns very slowly and requires many iterations to converge, leading to underfitting.

Moderate learning rate (e.g., 0.00001):
The model converges smoothly and provides the best performance with stable cost reduction.

High learning rate (e.g., 0.001):
The cost function becomes unstable and may diverge, resulting in poor model performance.

Observation:
Choosing an appropriate learning rate is crucial. A moderate learning rate ensures stable convergence and optimal model performance.

Final Conclusion

The linear regression model shows acceptable performance. Proper selection of the learning rate significantly affects the convergence behavior and accuracy of the model.