## Build Linear Regression Class on weights

- **Weights** contain both the slope and intercept. It is easier to access each value if we can provide a direct interface instead of asking the user to subscript from the array. 
- Recall the property decorator that can be used to access the attribute of the object.
- Will check the value of weights before running any operation to prevent calling the predict method before fitting the model.

In [None]:
class LinearRegression(object):
    """ Base regression model. Models the relationship between a scalar dependent variable y and the independent 
    variables X. 
    """
    def __init__(self):
      # Initialize the weights
      self.w = None

    def fit(self, X, y):
      # Insert constant ones for bias weights
      X = np.insert(X, 0, 1, axis=1)
      self.w = np.linalg.pinv(np.dot(X.T, X)).dot(X.T).dot(y)

    # define a "coef" getter
    @property
    def coef(self):
      if self.w is None:
        raise AttributeError('The coefficients do not exist')
      return self.w[1:]

    # define a "intercept" getter
    @property
    def intercept(self):
      if self.w is None:
          raise AttributeError('The intercept does not exist')
      return self.w[0]


    def predict(self, X):
      # Check if the model has been fitted yet
      if self.w is None:
        raise AttributeError('You need to fit the model first before running the predictions')
      # Insert constant ones for bias weights
      X = np.insert(X, 0, 1, axis=1)
      y_pred = X @ self.w

      return y_pred

In [None]:
model = LinearRegression()
model.fit(X, y)

In [None]:
model.coef

array([ 2.49905523e-02, -1.08359026e+00, -1.82563948e-01,  1.63312696e-02,
       -1.87422516e+00,  4.36133331e-03, -3.26457970e-03, -1.78811634e+01,
       -4.13653146e-01,  9.16334412e-01,  2.76197700e-01])

In [None]:
model.intercept

21.96520806282637

In [None]:
model.predict(X)

array([5.03285045, 5.13787975, 5.20989474, ..., 5.94304255, 5.47075621,
       6.00819633])

### Evaluate model performance

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
l_regression.fit(X_train, y_train)

LinearRegression()

In [None]:
l_regression.score(X_train, y_train)

0.36119824413213164

In [None]:
l_regression.score(X_test, y_test)

0.3513885332505233

=> The model shows a slightly higher score on the **training set** than the testing set

## Regularized Regression: Ridge and Lasso

- The root cause of the high model variance is due to the appearance of multicollinearity among the data features.
- Intuitively the appearance of such hidden linear relationship makes our
model over-confident about what it learns in the data --- which can **NOT** be generalized to unseen new test sets.
- This suggests that as long as multicollinearity occurs, the linear
coefficients estimated by the normal equation can be larger (in magnitude) than the un-observed true model’s coefficients.

#### **Gradient Descent**
  - We first initialize the weights of the linear model as small values, say $\beta^0$
  - for `i` in `1:n_iterations`:
    - Calculate the cost: $RSS(\beta^i) = (y-X\beta^i)^T(y-X\beta^i)$
    - Calculate the gradient: $\nabla=\frac{\partial RSS}{\partial\beta^i} = -2X^T(y-X\beta^i)$
    - Update the value of $\beta$ by multiplying the gradient with a learning rate: $\beta^{i+1} = \beta^i - \eta\nabla$

### 1) Lasso Regression

- The difference between regularized regression and normal linear regression is the cost function.
- Since the penality term is **NOT** a smooth function, there is no closed form solution to Lasso model's slope coefficients. We can use gradient to find the best weights

$$Cost = (y−X\beta)^T(y−X\beta)+\lambda|\beta|$$

- The $\lambda$ here controls how hard do we want to penalize the weights. The higher the value is, the more shrinkage it will receive.

In [None]:
class l1_regularization():
  """ Regularization for Lasso Regression """
  def __init__(self, alpha):
    self.alpha = alpha
  
  def __call__(self, w):
    return self.alpha * np.linalg.norm(w, 1)

  def grad(self, w):
    return self.alpha * np.sign(w)

### 2) Ridge Regression

- We can see that the only difference between ridge and lasso is the regularization term.
  - $Cost = (y−X\beta)^T(y−X\beta)+\lambda\beta^T\beta$
  - Ridge uses L2 norm (squared) and Lasso uses L1 norm.
- It is a good idea to create a general regression class with ridge and lasso to be the subclasses


In [None]:
class l2_regularization():
  """ Regularization for Ridge Regression """
  def __init__(self, alpha):
    self.alpha = alpha
  
  def __call__(self, w):
    return self.alpha * w.T @ w

  def grad(self, w):
    return self.alpha * w

### Creating the general regression class

In [None]:
import math
import numpy as np

class Regression(object):
    """ Base regression model. Models the relationship between a scalar dependent variable y and the independent 
    variables X.
    Parameters:
    -----------
    n_iterations: float
        The number of training iterations the algorithm will tune the weights for.
    learning_rate: float
        The step length that will be used when updating the weights.
    """
    def __init__(self, n_iterations, learning_rate):
      self.n_iterations = n_iterations
      self.learning_rate = learning_rate

    def initialize_weights(self, n_features):
      """ Initialize weights randomly [-1/p, 1/p] """
      limit = 1/math.sqrt(n_features)
      self.w = np.random.uniform(-limit, limit, (n_features, ))

    def fit(self, X, y):
      # Insert constant ones for bias weights
      X = np.insert(X, 0, 1, axis=1)
      # Initialize the weights
      self.initialize_weights(n_features=X.shape[1])

      # Perform gradient descent for n_iterations
      for i in range(self.n_iterations):
        y_pred = X.dot(self.w)
        # Calculate the loss
        mse = np.mean(0.5 * (y - y_pred)**2 + self.regularization(self.w))
        # Calculate the gradient
        # grad_w = (-(y - y_pred).dot(X)  + self.regularization.grad(self.w))/ X.shape[0]
        grad_w = (-(y - y_pred) @ X  + self.regularization.grad(self.w))/ X.shape[0]
        self.w -= self.learning_rate * grad_w


    # define a "coef" getter
    @property
    def coef(self):
      if self.w is None:
        raise AttributeError('The coefficients do not exist')
      return self.w[1:]

    # define a "intercept" getter
    @property
    def intercept(self):
      if self.w is None:
          raise AttributeError('The intercept does not exist')
      return self.w[0]


    def predict(self, X):
      # Check if the model has been fitted yet
      if self.w is None:
        raise AttributeError('You need to fit the model first before running the predictions')
      # Insert constant ones for bias weights
      X = np.insert(X, 0, 1, axis=1)
      y_pred = X @ self.w

      return y_pred

### Ridge Regression Class

In [None]:
class RidgeRegression(Regression):
    """Also referred to as Tikhonov regularization. Linear regression model with a regularization factor.
    Model that tries to balance the fit of the model with respect to the training data and the complexity
    of the model. A large regularization factor with decreases the variance of the model.
    Parameters:
    -----------
    n_iterations: float
        The number of training iterations the algorithm will tune the weights for.
    learning_rate: float
        The step length that will be used when updating the weights.
    """
    def __init__(self, reg_factor, n_iterations=1000, learning_rate=0.001):
      super().__init__(n_iterations, learning_rate)
      # Adding the regularization term
      self.regularization = l2_regularization(alpha=reg_factor)

### Lasso Regression Class

In [None]:
class LassoRegression(Regression):
    """Linear regression model with a regularization factor which does both variable selection 
    and regularization. Model that tries to balance the fit of the model with respect to the training 
    data and the complexity of the model. A large regularization factor with decreases the variance of 
    the model and do para.
    Parameters:
    -----------
    reg_factor: float
        The factor that will determine the amount of regularization and featureshrinkage. 
    n_iterations: float
        The number of training iterations the algorithm will tune the weights for.
    learning_rate: float
        The step length that will be used when updating the weights.
    """
    def __init__(self, reg_factor, n_iterations=1000, learning_rate=0.001):
      super().__init__(n_iterations, learning_rate)
      # Adding the regularization term
      self.regularization = l1_regularization(alpha=reg_factor)

Need fit the scaler using only the training set but not the test set 
> to prevent data leakage, meaning that the mean and standard deviation is calcalted from the training set.

will be using the **StandardScaler class** from the scikit-learn package.

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
lasso = LassoRegression(reg_factor = 1)
lasso.fit(X_train, y_train)

In [None]:
lasso.coef

array([ 0.23913027, -0.07444481,  0.1355236 , -0.0750558 ,  0.09462986,
        0.03374044,  0.01253292, -0.12177713,  0.20133235, -0.01269698,
        0.31258495])