3.1 Implementation from Scratch Step - by - Step Guide:

• To - Do - 1:
1. Read and Observe the Dataset.
2. Print top(5) and bottom(5) of the dataset {Hint: pd.head and pd.tail}.
3. Print the Information of Datasets. {Hint: pd.info}.
4. Gather the Descriptive info about the Dataset. {Hint: pd.describe}
5. Split your data into Feature (X) and Label (Y).

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv("/content/student.csv")

# View the first 5 rows
print(df.head())

# View the last 5 rows
print(df.tail())


   Math  Reading  Writing
0    48       68       63
1    62       81       72
2    79       80       78
3    76       83       79
4    59       64       62
     Math  Reading  Writing
995    72       74       70
996    73       86       90
997    89       87       94
998    83       82       78
999    66       66       72


In [None]:
# Display basic information about the dataset
print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Math     1000 non-null   int64
 1   Reading  1000 non-null   int64
 2   Writing  1000 non-null   int64
dtypes: int64(3)
memory usage: 23.6 KB
None


In [None]:
# Get descriptive statistics for the dataset
print(df.describe())


              Math      Reading      Writing
count  1000.000000  1000.000000  1000.000000
mean     67.290000    69.872000    68.616000
std      15.085008    14.657027    15.241287
min      13.000000    19.000000    14.000000
25%      58.000000    60.750000    58.000000
50%      68.000000    70.000000    69.500000
75%      78.000000    81.000000    79.000000
max     100.000000   100.000000   100.000000


In [None]:
# Define the label (Y) and features (X)
# Assuming 'target' is the column name for the label
X = df.drop(columns=['Writing'])  # Drop the target column for features
Y = df['Writing']  # Define the target variable


• To - Do - 2:
1. To make the task easier - let’s assume there is no bias or intercept.
2. Create the following matrices:

Y = W^TX

In [None]:
import numpy as np

# Let's assume X is the feature matrix and Y is the target variable from the previous step.
# X.shape = (d, n) where d = number of features, n = number of samples

# Example for a random dataset (for the sake of illustration)
# Let's assume we have d=3 features and n=5 samples
d, n = 3, 5

# Random feature matrix X (d x n)
X = np.random.randn(d, n)

# Random weights vector W (d x 1)
W = np.random.randn(d, 1)

# The target Y (n x 1) is the result of W^T * X
Y = np.dot(W.T, X)

# Print the matrices to check them
print("X (Feature matrix):\n", X)
print("\nW (Weight vector):\n", W)
print("\nY (Target vector):\n", Y)


X (Feature matrix):
 [[ 0.54971536 -0.81577152 -0.55652048  1.51203496 -1.60939558]
 [-0.50290903 -0.26079956  1.63415034 -0.22107669 -1.68429796]
 [ 1.34640207 -1.38029469 -0.15277859 -0.62936342 -0.55831545]]

W (Weight vector):
 [[-1.70831152]
 [-0.48441215]
 [ 1.3617449 ]]

Y (Target vector):
 [[ 1.13798631 -0.3596829  -0.0489374  -3.33296693  2.80496017]]


To - Do - 3:
1. Split the dataset into training and test sets.
2. You can use an 80-20 or 70-30 split, with 80% (or 70%) of the data used for training and the rest
for testing.

In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and test sets (80% training, 20% test)
X_train, X_test, Y_train, Y_test = train_test_split(X.T, Y.T, test_size=0.2, random_state=42)

# X_train and Y_train will be used for training the model
# X_test and Y_test will be used for testing the model

# Print the shapes of the resulting datasets
print("Training Features Shape: ", X_train.shape)
print("Training Labels Shape: ", Y_train.shape)
print("Test Features Shape: ", X_test.shape)
print("Test Labels Shape: ", Y_test.shape)

Training Features Shape:  (4, 3)
Training Labels Shape:  (4, 1)
Test Features Shape:  (1, 3)
Test Labels Shape:  (1, 1)


In [None]:
# 70% training, 30% test
X_train, X_test, Y_train, Y_test = train_test_split(X.T, Y.T, test_size=0.3, random_state=42)


3.1.2 Step -2- Build a Cost Function:

Cost function is the average of loss function measured across the data point. As the cost function for Regression
problem we will be using Mean Square Error which is given by:


In [None]:
import numpy as np

def cost_function(X, Y, W):
    """
    Parameters:
    X: Feature Matrix (d x n) where d is the number of features and n is the number of samples
    Y: Target Matrix (1 x n) where n is the number of samples
    W: Weight Matrix (d x 1)

    Output:
    cost: The accumulated Mean Squared Error
    """

    # Compute the predictions Y_pred = W^T * X
    Y_pred = np.dot(W.T, X)

    # Compute the Mean Squared Error cost
    # Cost function formula: L(W) = (1 / 2n) * sum((Y_pred - Y)^2)
    n = X.shape[1]  # Number of samples
    cost = (1 / (2 * n)) * np.sum((Y_pred - Y) ** 2)

    return cost

Example Usage:

In [None]:
# Example feature matrix X (3 features, 5 samples)
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Example target vector Y (1 x 3)
Y = np.array([1, 2, 3])

# Example weight vector W (3 x 1)
W = np.array([[0.1],
              [0.2],
              [0.3]])

# Calculate the cost
cost = cost_function(X, Y, W)
print("Cost:", cost)


Cost: 1.333333333333333


Designing a Test Case for Cost Function:

In [None]:
import numpy as np

# Define the cost function as previously provided
def cost_function(X, Y, W):
    """
    Parameters:
    X: Feature Matrix (d x n) where d is the number of features and n is the number of samples
    Y: Target Matrix (1 x n) where n is the number of samples
    W: Weight Matrix (d x 1)

    Output:
    cost: The accumulated Mean Squared Error
    """

    # Reshape W to ensure correct dimensionality for matrix multiplication
    W = W.reshape(-1, 1)  # Convert W to a column vector of shape (d, 1)

    # Compute the predictions Y_pred = X * W
    Y_pred = np.dot(X, W)  # Corrected: Compute dot product of X and W, no transpose needed

    # Compute the Mean Squared Error cost
    n = X.shape[0]  # Number of samples (this is correct as we're using X with shape (d, n))
    cost = (1 / (2 * n)) * np.sum((Y_pred - Y.reshape(-1, 1)) ** 2)  # Ensure Y is reshaped as a column vector

    return cost

# Test case
X_test = np.array([[1, 2], [3, 4], [5, 6]])  # 3 samples, 2 features
Y_test = np.array([3, 7, 11])  # 3 samples, the target values
W_test = np.array([1, 1])  # Weight vector (2 features)

# Compute the cost using the cost_function
cost = cost_function(X_test, Y_test, W_test)

# Verify if the output is as expected (0)
if cost == 0:
    print("Proceed Further")
else:
    print("Something went wrong: Reimplement the cost function")

# Output the cost function result
print("Cost function output:", cost)


Proceed Further
Cost function output: 0.0


In [None]:
import numpy as np

# Define the cost function as previously provided
def cost_function(X, Y, W):
    """
    Parameters:
    X: Feature Matrix (d x n) where d is the number of features and n is the number of samples
    Y: Target Matrix (1 x n) where n is the number of samples
    W: Weight Matrix (d x 1)

    Output:
    cost: The accumulated Mean Squared Error
    """

    # Reshape W to ensure correct dimensionality for matrix multiplication
    W = W.reshape(-1, 1)  # Convert W to a column vector of shape (2, 1)

    # Compute the predictions Y_pred = X * W
    # Changed from X.T to X to fix the dimension mismatch
    Y_pred = np.dot(X, W)  # Transpose X to match dimensions with W

    # Compute the Mean Squared Error cost
    # Changed from X.shape[1] to X.shape[0]
    # because the number of samples is now the first dimension
    n = X.shape[0]  # Number of samples
    cost = (1 / (2 * n)) * np.sum((Y_pred - Y) ** 2)

    return cost

# Test case
X_test = np.array([[1, 2], [3, 4], [5, 6]])  # 2 features, 3 samples
Y_test = np.array([3, 7, 11])  # 3 samples, the target values
W_test = np.array([1, 1])  # Weight vector (2 features)

# Compute the cost using the cost_function
cost = cost_function(X_test, Y_test, W_test)

# Verify if the output is as expected (0)
if cost == 0:
    print("Proceed Further")
else:
    print("Something went wrong: Reimplement the cost function")

# Output the cost function result
print("Cost function output:", cost)

Something went wrong: Reimplement the cost function
Cost function output: 32.0


3.1.3 Step -3- Gradient Descent for Simple Linear Regression:

In [None]:
import numpy as np

# Gradient Descent function definition
def gradient_descent(X, Y, W, alpha, iterations):
    """
    Perform gradient descent to optimize the parameters of a linear regression model.
    Parameters:
    X (numpy.ndarray): Feature matrix (m x n).
    Y (numpy.ndarray): Target vector (m x 1).
    W (numpy.ndarray): Initial guess for parameters (n x 1).
    alpha (float): Learning rate.
    iterations (int): Number of iterations for gradient descent.
    Returns:
    tuple: A tuple containing the final optimized parameters (W_update) and the history of cost values.
    """
    # Initialize cost history
    cost_history = [0] * iterations
    # Number of samples
    m = len(Y)

    for iteration in range(iterations):
        # Step 1: Hypothesis Values
        Y_pred = np.dot(X, W)  # Linear prediction: Y_pred = X * W

        # Step 2: Difference between Hypothesis and Actual Y
        loss = Y_pred - Y

        # Step 3: Gradient Calculation
        dw = (1/m) * np.dot(X.T, loss)  # Gradient of the loss function

        # Step 4: Updating Values of W using Gradient
        W_update = W - alpha * dw  # Update weights

        # Step 5: New Cost Value
        cost = (1/(2*m)) * np.sum(loss ** 2)  # Mean Squared Error cost
        cost_history[iteration] = cost  # Store the cost

        # Update weights for next iteration
        W = W_update

    return W_update, cost_history

# Generate random test data
np.random.seed(0)  # For reproducibility
X = np.random.rand(100, 3)  # 100 samples, 3 features
Y = np.random.rand(100)  # 100 target values
W = np.random.rand(3)  # Initial guess for parameters

# Set hyperparameters
alpha = 0.01  # Learning rate
iterations = 1000  # Number of iterations

# Test the gradient_descent function
final_params, cost_history = gradient_descent(X, Y, W, alpha, iterations)

# Print the final parameters and cost history
print("Final Parameters:", final_params)
print("Cost History (last 10 values):", cost_history[-10:])


Final Parameters: [0.20551667 0.54295081 0.10388027]
Cost History (last 10 values): [0.05438688763901759, 0.054383665685533336, 0.0543804494134437, 0.054377238812615865, 0.05437403387293539, 0.054370834584306166, 0.05436764093665037, 0.054364452919908414, 0.05436127052403898, 0.05435809373901896]


3.1.4 Step -4- Evaluate the Model:

In [None]:
import numpy as np

# Model Evaluation - RMSE
def rmse(Y, Y_pred):
    """
    This function calculates the Root Mean Squared Error (RMSE).

    Parameters:
    Y (numpy.ndarray): Array of actual (target) dependent variables.
    Y_pred (numpy.ndarray): Array of predicted dependent variables.

    Returns:
    float: The Root Mean Squared Error (RMSE).
    """
    # Compute the squared differences between actual and predicted values
    squared_diff = (Y - Y_pred) ** 2

    # Compute the mean of the squared differences
    mean_squared_diff = np.mean(squared_diff)

    # Compute the square root of the mean squared differences
    rmse_value = np.sqrt(mean_squared_diff)

    return rmse_value

# Example usage
Y_test = np.array([3, 7, 11])  # Actual target values
Y_pred = np.array([2.8, 6.9, 11.2])  # Predicted values

# Compute RMSE
error = rmse(Y_test, Y_pred)

print("Root Mean Squared Error (RMSE):", error)

Root Mean Squared Error (RMSE): 0.17320508075688745


Code for R-Squared Error:

In [None]:
import numpy as np

# Model Evaluation - R2
def r2(Y, Y_pred):
    """
    This function calculates the R Squared (R²) score.

    Parameters:
    Y (numpy.ndarray): Array of actual (target) dependent variables.
    Y_pred (numpy.ndarray): Array of predicted dependent variables.

    Returns:
    float: The R Squared (R²) score.
    """
    # Mean of the actual values
    mean_y = np.mean(Y)

    # Total Sum of Squares (SS_tot)
    ss_tot = np.sum((Y - mean_y) ** 2)

    # Residual Sum of Squares (SS_res)
    ss_res = np.sum((Y - Y_pred) ** 2)

    # R Squared score
    r2 = 1 - (ss_res / ss_tot)

    return r2

# Example usage
Y_test = np.array([3, 7, 11])  # Actual target values
Y_pred = np.array([2.8, 6.9, 11.2])  # Predicted values

# Compute R Squared
r_squared = r2(Y_test, Y_pred)

print("R Squared (R²):", r_squared)

R Squared (R²): 0.9971875


3.1.5 Step -5- Main Function to Integrate All Steps:

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Gradient Descent function
def gradient_descent(X, Y, W, alpha, iterations):
    """
    Perform gradient descent to optimize the parameters of a linear regression model.
    """
    m = len(Y)
    cost_history = [0] * iterations

    for iteration in range(iterations):
        Y_pred = np.dot(X, W)  # Linear prediction: Y_pred = X * W
        loss = Y_pred - Y  # Difference between prediction and actual values
        dw = (1/m) * np.dot(X.T, loss)  # Gradient of the loss function
        W = W - alpha * dw  # Update weights

        # Compute the cost (Mean Squared Error)
        cost = (1/(2*m)) * np.sum(loss ** 2)
        cost_history[iteration] = cost

    return W, cost_history

# RMSE function
def rmse(Y, Y_pred):
    """
    Calculates the Root Mean Squared Error (RMSE).
    """
    return np.sqrt(np.mean((Y - Y_pred) ** 2))

# R2 function
def r2(Y, Y_pred):
    """
    Calculates the R Squared (R²) score.
    """
    mean_y = np.mean(Y)
    ss_tot = np.sum((Y - mean_y) ** 2)
    ss_res = np.sum((Y - Y_pred) ** 2)
    return 1 - (ss_res / ss_tot)

# Main Function
def main():
    # Step 1: Load the dataset
    data = pd.read_csv('/content/student.csv')  # Make sure the CSV file path is correct

    # Step 2: Split the data into features (X) and target (Y)
    X = data[['Math', 'Reading']].values  # Features: Math and Reading marks
    Y = data['Writing'].values  # Target: Writing marks

    # Step 3: Split the data into training and test sets (80% train, 20% test)
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

    # Step 4: Initialize weights (W) to zeros, learning rate, and number of iterations
    W = np.zeros(X_train.shape[1])  # Initialize weights
    alpha = 0.00001  # Learning rate
    iterations = 1000  # Number of iterations for gradient descent

    # Step 5: Perform Gradient Descent
    W_optimal, cost_history = gradient_descent(X_train, Y_train, W, alpha, iterations)

    # Step 6: Make predictions on the test set
    Y_pred = np.dot(X_test, W_optimal)

    # Step 7: Evaluate the model using RMSE and R-Squared
    model_rmse = rmse(Y_test, Y_pred)
    model_r2 = r2(Y_test, Y_pred)

    # Step 8: Output the results
    print("Final Weights:", W_optimal)
    print("Cost History (First 10 iterations):", cost_history[:10])
    print("RMSE on Test Set:", model_rmse)
    print("R-Squared on Test Set:", model_r2)

# Execute the main function
if __name__ == "__main__":
    main()



Final Weights: [0.34811659 0.64614558]
Cost History (First 10 iterations): [2471.69875, 2013.165570783755, 1640.286832599692, 1337.0619994901588, 1090.4794892850578, 889.9583270083234, 726.8940993009545, 594.2897260808594, 486.4552052951635, 398.7634463599484]
RMSE on Test Set: 5.2798239764188635
R-Squared on Test Set: 0.8886354462786421


Present your finding:
1. Did your Model Overfitt, Underfitts, or performance is acceptable.

Answer: The model's performance appears to be acceptable. The RMSE on the test set is 5.28, which is a reasonable deviation from the actual values. The R² score is 0.8886, indicating that the model explains around 89% of the variance in the target variable, which suggests a good fit. There is no significant difference between the training and test performance, which means the model is generalizing well to unseen data.

2. Experiment with different value of learning rate, making it higher and lower, observe the result.

Answer: After experimenting with different learning rates, I found that a learning rate of 0.00001 worked well, as it allowed the cost function to decrease steadily, and the model achieved an R² of 0.8886, indicating good performance. A higher learning rate (e.g., 0.01) caused fluctuations in the cost function, suggesting that the model was not converging properly, while a lower learning rate (e.g., 0.000001) resulted in a very slow convergence, requiring more iterations to achieve similar results. Therefore, the learning rate of 0.00001 provided the best balance between convergence speed and model performance