In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("/content/drive/MyDrive/Concepts-and-Technologies-of-AI/Houseprice (1).csv")
df.head()

Unnamed: 0,HouseAge,HouseFloor,HouseArea,HousePrice
0,52,2,112.945574,543917.179841
1,93,1,174.312126,817740.124828
2,15,4,125.219577,387992.503019
3,72,4,121.210124,240840.742388
4,61,4,59.221737,277273.386525


In [3]:
house_age=df['HouseAge'].to_numpy()
house_floor=df['HouseFloor'].to_numpy()
house_area = df['HouseArea'].to_numpy()
house_price=df['HousePrice'].to_numpy()

These lines are extracting columns from the CSV file (loaded into a Pandas DataFrame df) and converting them into NumPy arrays. Each line takes one column—HouseAge, HouseFloor, HouseArea, or HousePrice—and turns it into a NumPy array so it can be used in numerical calculations. This makes it easier to perform operations like matrix multiplication or gradient descent for linear regression.

In short: we’re taking the data from the table and putting it into a format suitable for math and machine learning calculations.

In [4]:
x0=np.ones(len(house_age))
X2=np.array([x0,house_age,house_floor,house_area]).T
W=np.array([0,0,0,0])
Y2=np.array(house_price)

These lines are preparing the data for linear regression. First, x0 creates a column of ones to serve as the intercept (bias) term. Then, X2 combines this intercept column with the features—house_age, house_floor, and house_area—into a feature matrix where each row represents one house and each column represents a feature. W initializes the weights for the model, starting with zeros for the intercept and each feature. Finally, Y2 converts the house prices into a NumPy array so they can be used as the target values during calculations.

In short, this step sets up both the input features and target outputs for training the linear regression model.

In [30]:
def cost_function(X, Y, W):
    """
    Calculates the Mean Squared Error (MSE) for linear regression.
    Parameters:
    X : Feature matrix including the intercept column.
    Y : Array of actual target values.
    W : Current weight vector for the model.
    Returns:
    float
        The mean squared error value for the given weights.
    """
    m = len(Y)
    J = np.sum((X.dot(W) - Y) ** 2) / (2 * m)
    return J

The cost_function computes the Mean Squared Error (MSE) for a linear regression model. It takes the feature matrix X, the actual target values Y, and the current weight vector W. First, it calculates the predicted values by multiplying X and W, then finds the difference from the actual values, squares these differences, and averages them (dividing by 2*m). The result, J, represents how well the current weights fit the data—the lower the value, the better the fit.

In short, this function measures the error of the model for the given weights.

In [34]:
X_test = np.array([[1, 1], [2, 2], [3, 3]])
Y_test = np.array([2, 3, 5])
W_test = np.array([1, 1])
cost = cost_function(X_test, Y_test, W_test)
if cost == 0:
    print('Proceed Further')
else:
    print('Something went wrong: Reimplement the cost function')
    print('Cost function output: ', cost)

Something went wrong: Reimplement the cost function
Cost function output:  0.3333333333333333


This code is a simple test to check if the cost_function works correctly. It creates a small example feature matrix X_test, target values Y_test, and a weight vector W_test. Then it calculates the cost using cost_function. If the cost happens to be 0, it prints “Proceed Further,” indicating the function is working as expected. Otherwise, it prints a message saying something might be wrong, along with the calculated cost value.

In short, it’s a quick way to verify that the cost function is computing errors properly before using it on the full dataset.

In [7]:
initial_cost=cost_function(X2,Y2,W)
print('Initial Cost: ',initial_cost)

Initial Cost:  201528080199.132


This code calculates the initial cost of the model using the current weights W (which are all zeros) and the full dataset (X2 for features and Y2 for house prices). The cost_function computes the mean squared error, showing how far the model’s current predictions are from the actual house prices. Since the weights are initialized to zero, this gives a starting point for gradient descent. Printing initial_cost lets you see the model’s error before training, so you can track how much the cost decreases as the weights are updated.

In short: it measures how “bad” the model is before any learning happens.

In [27]:
def gradient_descent(X, Y, W, alpha, iterations):
    cost_history = [0] * iterations
    m = len(Y)

    for iteration in range(iterations):
        Y_pred = X.dot(W)
        loss = Y_pred - Y
        dw = (X.T.dot(loss)) / m
        W_Update = W - alpha * dw
        cost = cost_function(X, Y, W_Update)
        cost_history[iteration] = cost
    return W_Update, cost_history

This function performs gradient descent to optimize the weights of a linear regression model. It takes the feature matrix X, target values Y, initial weights W, learning rate alpha, and number of iterations. In each iteration, it calculates the model’s predictions (Y_pred), computes the difference from actual values (loss), and finds the gradient (dw). The weights are then updated in the direction that reduces the error, and the cost after the update is stored in cost_history. After all iterations, the function returns the final optimized weights and the history of cost values so you can see how the error decreased over time.

In short: this function trains the model by repeatedly adjusting the weights to minimize the mean squared error.

In [26]:
alpha = 0.00001
iterations = 10000
new_weight, cost_history = gradient_descent(X2, Y2, W, alpha, iterations)
print("Final Weights:", new_weight)
print("Final Cost:", cost_history[-1])

Final Weights: [  767.97022113   448.15799778 -2058.84605337  3251.06605494]
Final Cost: 12058929978.313868


These lines are training the linear regression model using gradient descent. The learning rate alpha = 0.00001 controls how big each weight update step is, and iterations = 10000 sets how many times the weights will be updated. The function gradient_descent returns the optimized weights (new_weight) and the history of cost values (cost_history). Printing new_weight shows the final coefficients for the intercept and each feature, and cost_history[-1] shows the final mean squared error, indicating how well the model fits the data after training.

In short: this step actually “learns” the best weights for the model.

In [28]:
def rmse(Y, Y_pred):
    """
    Computes the Root Mean Squared Error between actual and predicted values.
    Parameters:
    Y :  Actual target values.
    Y_pred : Predicted target values.
    Returns:
    float
        Root Mean Squared Error (RMSE) value.
    """
    rmse = np.sqrt(np.mean((Y - Y_pred) ** 2))
    return rmse

The rmse function calculates the Root Mean Squared Error, which is a measure of how far the model’s predictions are from the actual target values. It takes the actual values Y and the predicted values Y_pred, computes the differences, squares them, averages them, and then takes the square root. The resulting RMSE value gives an overall measure of prediction error in the same units as the target variable, with lower values indicating better model performance.

In short: it tells you, on average, how much the predictions differ from the actual values.

In [29]:
def r2(Y, Y_pred):
    """
    Computes the R-squared value to evaluate model performance.
    Parameters:
    Y : Actual target values.
    Y_pred : Predicted target values.
    Returns:
    float
        R-squared score, indicating how well predictions match actual
        values.
    """
    mean_y = np.mean(Y)
    ss_tot = np.sum((Y - mean_y) ** 2)
    ss_res = np.sum((Y - Y_pred) ** 2)
    return 1 - (ss_res / ss_tot)

The r2 function calculates the R-squared score, which measures how well the model’s predictions match the actual target values. It compares the total variance in the data (ss_tot) with the variance that remains after making predictions (ss_res). The result is a value between 0 and 1, where closer to 1 means the model explains most of the variability in the data.

In short, R² tells you how good the model’s predictions are relative to simply using the mean of the target values.

In [18]:
Y_pred=X2.dot(new_weight)
print("RMSE: ",rmse(Y2,Y_pred))
print("R2 Score: ",r2(Y2,Y_pred))

RMSE:  155299.25935633993
R2 Score:  0.6437721518216238


These lines are evaluating the trained linear regression model. Y_pred = X2.dot(new_weight) calculates the predicted house prices using the optimized weights. Then, rmse(Y2, Y_pred) computes the Root Mean Squared Error, which shows the average difference between the actual and predicted prices. Finally, r2(Y2, Y_pred) calculates the R-squared score, indicating how well the model explains the variation in the data.

In short: these lines measure how accurate and effective the model is at predicting house prices.