# Linear Regression using Gradient Decent Algorithm

### Importing Libraries

- Import NumPy for numerical operations.
- Import the California housing dataset and metrics from scikit-learn.
- Import the **StandardScaler** class for feature scaling.

In [1]:
import numpy as np
from sklearn import datasets, metrics
from sklearn.preprocessing import StandardScaler

### Loading the California Housing Dataset

- Load the California housing dataset. **X** contains the feature data, and **y** contains the target values.

In [2]:
X, y = datasets.fetch_california_housing(return_X_y=True)

### Preparing Training Data

- Slice the first 16000 rows of the feature data for training(**X_train_temp**)
- Create a new array **X_train** with an additional column (bias term) filled with ones, and then copy the feature data.
- This prepares the training data with an extra column for the bias term.

In [3]:
X_train_temp1 = X[0:16000, :]
X_train = np.zeros((X_train_temp1.shape[0], X_train_temp1.shape[1] + 1))
X_train[:, 0] = np.ones((X_train_temp1.shape[0]))
X_train[:, 1:] = X_train_temp1
print("Type of X_train: ", type(X_train), "Shape of X_train: ", X_train.shape)

Type of X_train:  <class 'numpy.ndarray'> Shape of X_train:  (16000, 9)


### Preparing Testing Data

- Slice the remaining data (from row 16,000 to 20,604) for testing (**X_test_temp1**).
- Create a new array **X_test** with an additional column (bias term) filled with ones, and then copy the feature data.

In [4]:
X_test_temp1 = X[16000:20604, :]
X_test = np.zeros((X_test_temp1.shape[0], X_test_temp1.shape[1] + 1))
X_test[:, 0] = np.ones((X_test_temp1.shape[0]))
X_test[:, 1:] = X_test_temp1
print("Type of X_test: ", type(X_test), "Shape of X_test: ", X_test.shape)

Type of X_test:  <class 'numpy.ndarray'> Shape of X_test:  (4604, 9)


### Setting Training and Testing Target Data

- Slice the target values for training and testing datasets.

In [5]:
y_train = y[0:16000]
y_test = y[16000:20604]

### Feature Scaling

- Create a **StandardScaler** object (**scalar**) and fit it to the training feature data (excluding the bias term).
- Standardize both the training and testing feature data.

In [6]:
scalar = StandardScaler()
scalar.fit(X_train[:, 1:])

X_train[:, 1:] = scalar.transform(X_train[:, 1:])
X_test[:, 1:] = scalar.transform(X_test[:, 1:])

### Initializing Theta (Model Parameters)

- Initialize the model parameters (theta) with random values. The shape of **theta** matches the number of features in the training data.

In [7]:
theta = np.random.uniform(0, 1, size = (X_train.shape[1]))
print("Type of theta: ", type(theta), "Shape of theta: ", theta.shape)

Type of theta:  <class 'numpy.ndarray'> Shape of theta:  (9,)


### Gradient Descent

- Define the number of iterations, learning rate (**alpha**), and get the number of data points (**m**) and features (**n**) from the training data.
- Perform gradient descent to update the model parameters (**theta**) over a specified number of iterations.

### Model Training with Gradient Descent

- Inside a loop for **n_iterations**, the code computes the predictions (**y_pred**) using the current **theta**.
- It calculates the error (**error**) as the difference between predictions and actual target values.
- Then, for each feature, it computes the update for the corresponding parameter and updates the parameters.

### Model Prediction and Evaluation

- Use the trained model (represented by **theta**) to make predictions on the testing data.
- Print the Mean Absolute Error (MAE) and Mean Squared Error (MSE) as evaluation metrics for the regression model.

In [8]:
n_iterations = 1000
alpha = 0.01
m = X_train.shape[0]
n = X_train.shape[1]

for i in range(n_iterations):
    y_pred = np.dot(X_train, theta)
    error = y_pred - y_train
    update = np.zeros(X_train.shape[1])
    for j in range(n):
        update[j] = np.sum(error * (X_train.T)[j])
    theta = theta - (1/m)*(alpha)*update
    
print("Shape of theta: ", theta.shape)

predictions = np.dot(X_test, theta)

print("MAE: ", metrics.mean_absolute_error(y_true = y_test, y_pred = predictions))
print("MSE: ", metrics.mean_squared_error(y_true = y_test, y_pred = predictions))

Shape of theta:  (9,)
MAE:  0.592880052130006
MSE:  0.6641376097739029
