# Linear Regression

## 1- Creating Linear Regression Class

In [None]:
class LinearRegression():
    def __init__(self, fit_method='ols', learning_rate=0.01, epochs=1000, min_step_size=0.001):
        """
        Initialize the LinearRegression model with a specified fitting method.

        Parameters:
        - fit_method: The fitting method to use: 'ols' for Ordinary Least Squares, 'gd' for Gradient Descent.
        - learning_rate: Learning rate for Gradient Descent.
        - epochs: Number of epochs for Gradient Descent.
        - min_step_size: Minimum step size for Gradient Descent.
        """
        self.fit_method = fit_method
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.min_step_size = min_step_size

        self.weights = None
        self.biases = None

    def fit(self, X, y):
        """
        Fit the model to the data based on selected fit method.

        Parameters:
        - X: Input value array for training data. Should be numpy array with shape (n_samples, n_features).
        - y: Target value array for training data. Should be numpy array with shape (n_samples, ).
        """

        pass

    def predict(self, X):
        """
        Predict the target values for given inputs.

        Parameters:
        - X: Input value array for prediction. Should be numpy array with shape (n_samples, n_features).

        Returns:
        - y: Predictions values for input array X. numpy array with shape (n_samples, )
        """
        y = X.dot(self.weights) + self.biases
        return y

### A- Normal Equation

This part is taken from Ian Goodfellow, Yoshua Bengio, Aaron Courville - Deep Learning-The MIT Press (2016).\
Given the gradient of the training Mean Squared Error (MSE):

$$
\nabla_w \text{MSE}_{\text{train}} = 0 \tag{5.6}
$$

This implies:

$$
\nabla_w \left( \frac{1}{m} \| \hat{y}^{(\text{train})} - y^{(\text{train})} \|^2_2 \right) = 0 \tag{5.7}
$$

Expanding it:

$$
\frac{1}{m} \nabla_w \| X^{(\text{train})} w - y^{(\text{train})} \|^2_2 = 0 \tag{5.8}
$$

Taking the gradient with respect to \( w \):

$$
\nabla_w \left( X^{(\text{train})} w - y^{(\text{train})} \right)^{\top} \left( X^{(\text{train})} w - y^{(\text{train})} \right) = 0 \tag{5.9}
$$

This simplifies to:

$$
\nabla_w \left( w^{\top} X^{(\text{train})^{\top}} X^{(\text{train})} w - 2 w^{\top} X^{(\text{train})^{\top}} y^{(\text{train})} + y^{(\text{train})^{\top}} y^{(\text{train})} \right) = 0 \tag{5.10}
$$

Setting the gradient to zero:

$$
2 X^{(\text{train})^{\top}} X^{(\text{train})} w - 2 X^{(\text{train})^{\top}} y^{(\text{train})} = 0 \tag{5.11}
$$

Solving for \( w \):

$$
w = \left( X^{(\text{train})^{\top}} X^{(\text{train})} \right)^{-1} X^{(\text{train})^{\top}} y^{(\text{train})} \tag{5.12}
$$


In [None]:
def linear_regression_normal_eq(X_train, y_train):
    """
    Initializes the ActionCostWrapper with the specified action costs.

    Parameters:
    - env: The Gym environment to be wrapped.
    - action_costs: A list specifying the cost of each action in the order: [left, down, right, up].
                    Defaults to [1, 1, 1, 1] if not provided.
    """