# Regularization using Gradient Descent

### Importing necessary libraries

- NumPy for numerical operations
- scikit-learn for datasets and metrics 
- StandardScaler for feature scaling.

In [7]:
import numpy as np
from sklearn import datasets, metrics
from sklearn.preprocessing import StandardScaler

### Data Loading

- Load the California housing dataset using **datasets.fetch_california_housing(return_X_y=True)**. The data is split into **features** (X) and **target values** (y).

In [8]:
X, y = datasets.fetch_california_housing(return_X_y = True)

### Preparing Training Data

- Create training and testing datasets: 
    - **X_train_temp1** contains the first 16,000 data points. 
    - **X_test_temp1** contains the remaining data points (from 16,000 to 20,603).
- For both **X_train_temp1** and **X_test_temp1**, an extra column of ones is added at the beginning to represent the bias term.
- The data is split into training and testing sets, and corresponding target values (**y_train** and **y_test**) are created.

In [9]:
X_train_temp1 = X[0:16000, :]
X_train = np.zeros((X_train_temp1.shape[0], X_train_temp1.shape[1] + 1))
X_train[:, 0] = np.ones((X_train_temp1.shape[0]))
X_train[:, 1:] = X_train_temp1
print("Type of X_train: ", type(X_train), "Shape of X_train: ", X_train.shape)

y_train = y[0:16000]

Type of X_train:  <class 'numpy.ndarray'> Shape of X_train:  (16000, 9)


In [10]:
X_test_temp1 = X[16000:20604]
X_test = np.zeros((X_test_temp1.shape[0], X_test_temp1.shape[1] + 1))
X_test[:, 0] = np.ones((X_test_temp1.shape[0]))
X_test[:, 1:] = X_test_temp1
print("Type of X_test: ", type(X_test), "Shape of X_test: ", X_test.shape)

y_test = y[16000:20604]

Type of X_test:  <class 'numpy.ndarray'> Shape of X_test:  (4604, 9)


### Feature Scaling

- Create a **StandardScaler** object (**scaler**) and fit it on the training data's features (excluding the bias term, which is always 1).
- Scale the features in both the training and testing sets using the fitted scaler. Feature scaling standardizes the data to have zero mean and unit variance, which can improve gradient descent convergence.


### Model Initialization:

- Initialize the model's parameters, represented by the vector **theta**, with random values between 0 and 1. The shape of **theta** is determined by the number of features.

In [11]:
scaler = StandardScaler()
scaler.fit(X_train[:, 1:])

X_train[:, 1:] = scaler.transform(X_train[:, 1:])
X_test[:, 1:] = scaler.transform(X_test[:, 1:])

theta = np.random.uniform(0, 1, size = (X_train.shape[1]))
print("Type of theta: ", type(theta), "Shape of theta: ", theta.shape)

Type of theta:  <class 'numpy.ndarray'> Shape of theta:  (9,)


### Training the Model:

- Set hyperparameters for training:
    - **n_iterations** defines the number of gradient descent iterations.
    - **alpha** is the learning rate.
    - **lambda_reg** is the L2 regularization strength (ridge regression). It helps prevent overfitting by adding a penalty term to the cost function.
    - **m** is the number of training examples.
    - **n** is the number of features.


### Loop over the specified number of iterations:

- Initialize an array **update** to store the updates for each parameter.
- Perform forward propagation to make predictions (**y_pred**) using the current parameters.
- Calculate the error by subtracting the predicted values from the actual target values.
- Loop over each feature to calculate the parameter updates.
- Update each parameter **theta[j]** by applying gradient descent with L2 regularization.

### Testing and Evaluation:

- Use the trained model to make predictions on the testing set (**X_test**) and store the predictions in **predictions**.
- Calculate and print the **Mean Absolute Error (MAE)** and **Mean Squared Error (MSE)** as evaluation metrics for the regression model.

In [12]:
n_iterations = 1000
alpha = 0.01
lambda_reg = 0.1
m = X_train.shape[0]
n = X_train.shape[1]

for i in range(n_iterations):
    update = np.zeros(X_train.shape[1])
    y_pred = np.dot(X_train, theta)
    error = y_pred - y_train
    for j in range(n):
        update[j] = np.sum(error * (X_train.T)[j])
    theta = (theta)*(1 - ((alpha)*(lambda_reg))/(m)) - (1/m)*(alpha)*(update)

print("Shape of theta: ", theta.shape)

predictions = np.dot(X_test, theta)

print("MAE: ", metrics.mean_absolute_error(y_true = y_test, y_pred = predictions))
print("MSE: ", metrics.mean_squared_error(y_true = y_test, y_pred = predictions))

Shape of theta:  (9,)
MAE:  0.5986529608385885
MSE:  0.6732467179651381
