# Linear Regression From Scratch

* `Linear regression` is an algorithm that provides a linear relationship between an independent (predictor) variable and a dependent (target) variable to predict the outcome of future events.
* Thus, `linear regression` is a supervised learning algorithm that predicts continuous or numeric variables such as sales, salary, age, product price, and so on.


In [1]:
import numpy as np

# Black code formatter (Optional)
%load_ext lab_black

## Equations

### Predicted (Estimated) Value

$$ 
\hat{y} = wX + b
$$

$$ 
\hat{y}_{i} = wx_{i} + b
$$


where 
* `y_hat` is the **estimated value**,
*  `w` is the **weight**, 
*  `X` is the **predictor** 
*  and `b` is the **bias**.

### Loss

* Mean Square Error (MSE): This the the average (mean) difference between the actual and estimated value. The goal is to minimize the loss on every iteration.

$$ 
MSE = \frac{1}{N} \sum_{i=1}^{N}({y_{i} - \hat{y}_{i}})^2
$$

* **Loss Function**: This computes the error for a **single training example**.
* **Cost Function**: This computes the **average** of the loss functions for the **entire training set**. 

### Weight and Bias

* Weight: This is the coefficient of the predictor variable. They're the values that are used to multiply the predictor values.

* Bias: This is a parameter in linear regression that allows us to `learn` a shifted fuction. i.e it allows us to shift the straight-line up an down.


### Training with Gradient Descent

`Gradient Descent` is an optimization algorithm which tries to **iteratively tweak** the **parameters** of a model (weights and biases) in order to find the set of parameter values that **minimizes** the model's **prediction error** (cost function).

* After the parameters of the model have been initialised randomly, each iteration of gradient descent goes as follows: 
  * with the given values of such parameters, we use the model to make a prediction for every instance of the training data, and compare that prediction to the actual target value.

  * Once we have computed this aggregated error (known as cost function), we measure the local gradient of this error with respect to the model parameters, and update these parameters by pushing them in the direction of descending gradient, thus making the cost function decrease.

Source: [Here](https://towardsdatascience.com/linear-regression-explained-d0a1068accb9)

<hr>

<br>

### Learning Rate

* In order for Gradient Descent to work, we must set the `learning rate` to an appropriate value. Using a good learning rate is crucial.

* The `learning rate` determines how fast or slow we will move towards the optimal weights. 
  * If the learning rate is **very large**, we will skip the optimal solution. 
  * If it is **too small**, we will need too many iterations to converge to the best values. 

### Cost Function

* **Differentiation of the weight (dw)**
$$
\frac{df}{dw} = dw = \frac{1}{N}\sum_{i=1}^{N}-2x_{i}({y_{i} - \hat{y}_{i})} = \frac{1}{N}\sum_{i=1}^{N}2x_{i}({\hat{y}_{i} - y_{i})}
$$

* **Differentiation of the bias (db)**

$$
\frac{df}{db} = db = \frac{1}{N}\sum_{i=1}^{N}-2({y_{i} - \hat{y}_{i})} = \frac{1}{N}\sum_{i=1}^{N}2({\hat{y}_{i} - y_{i})}
$$

### Update Rules

* Update the weights and biases

$$
w = w - \alpha.dw
$$
$$
b = b - \alpha.db
$$

## Steps For Building Linear Regression (From Scratch)

1. Training: **Initialize** the weight and bias as **zero** (0).
2. Given data points, **predict** (estimate) the result using:
     $$
      \hat{y} = wX + b
     $$
3. calculate the **error** (loss function).
4. **update** the parameters (weight and bias) using gradient descent until a minimal cost function is obtained.
5. **repeat** the entire steps `n` times.
6. Use the update parameters to make predictions.

In [2]:
class LinearRegression:
    """This is an implementation of linear regression."""

    def __init__(self, learning_rate: float = 0.001, n_iters: int = 1_000) -> None:
        self.learning_rate = learning_rate
        self.n_iters = n_iters
        self.weight = None
        self.bias = None

    def __repr__(self) -> str:
        return f"{__class__.__name__}(learning_rate={self.learning_rate}, n_iters={self.n_iters:,})"

    def fit(self, X=np.ndarray, y=np.ndarray) -> None:
        """This is used to train the linear regression model."""
        n_samples, n_features = X.shape

        # Step1: Initialize the weight and bias
        self.weight = np.zeros((n_features))  # Vector
        self.bias = 0  # Scalar

        # Step2: Estimate the y_value given the data points
        # Note: shape of X: (n_samples, n_features) and shape of weight: (n_features, 1)
        # Dot product: (A, B) x (B, C). i.e the inner dimensions MUST be equal.
        # For more info check: https://numpy.org/doc/stable/reference/generated/numpy.dot.html
        for _ in range(self.n_iters):
            y_pred = np.dot(X, self.weight) + self.bias

            # Step3: Calculate the change in weight and bias values for each training
            # example using gradient descent.
            # shape of x_i: (1, n_features), shape of (y - y_hat): (1,) a rank1 array
            # so we need to transpose x_i. Note that np.dot also performs the summation.
            dw = (1 / n_samples) * 2 * (np.dot(X.T, (y_pred - y)))
            db = (1 / n_samples) * 2 * np.sum(y_pred - y)

            # Step4: Update the parameters
            self.weight -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
        return self

    def predict(self, X: np.ndarray) -> float:
        """This is used to make predictions."""
        # Step5. Use the update parameters to make predictions.
        y_pred = np.dot(X, self.weight) + self.bias
        return y_pred

    @staticmethod
    def calculate_MSE(y_true: np.ndarray, y_pred: np.ndarray) -> float:
        """This is used to calculate the mean square error."""
        mse = np.mean(np.square(y_true - y_pred))
        return round(mse, 2)

In [7]:
from sklearn.datasets import make_regression, make_classification
from sklearn.model_selection import train_test_split

In [4]:
def cal_mse(y_true, y_pred):
    mse = np.mean(np.square(y_true - y_pred))
    return round(mse, 2)


# # visualize the relationship between the predictor and the target
# plt.figure(figsize=(10, 6))
# plt.scatter(X[:, 0], y, s=20)
# plt.show()


def run():
    """Main program"""
    # create a synthetic data
    data = make_regression(n_samples=300, n_features=1, noise=27, random_state=4)
    X, y = data

    # split the data
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=4
    )

    # instantiate model
    linear_reg = LinearRegression(learning_rate=0.001, n_iters=1_000)

    # train model
    linear_reg.fit(X_train, y_train)

    # make predictions
    y_pred = linear_reg.predict(X_test)

    mse = cal_mse(y_test, y_pred)
    return mse

In [5]:
run()

464.83

## Logistic Regression

* Logistic Regression is used for solving classification problems.
* It uses the `sigmoid` function to convert the predictions into a binary value (number between 0 and 1).

* `Sigmoid` function is also called a `squashing` function as its domain is the set of all real numbers, and its range is (0, 1). Hence, if the input to the function is either a very large negative number or a very large positive number, the output is always between 0 and 1. Same goes for any number between -∞ and +∞.


$$
S(x) = \frac{1}{1 + \exp^{-x}}
$$

* For our usecase, this becomes:

$$
y_{pred}(x) = \frac{1}{1 + \exp^{-y_{linear}}}
$$

where:

$$
y_{linear} = wX + b
$$

### Cost Function

* **Differentiation of the weight (dw)**
$$
\frac{df}{dw} = dw = \frac{1}{N}\sum_{i=1}^{N}-2x_{i}({y_{i} - \hat{y}_{i})} = \frac{1}{N}\sum_{i=1}^{N}2x_{i}({\hat{y}_{i} - y_{i})}
$$

* **Differentiation of the bias (db)**

$$
\frac{df}{db} = db = \frac{1}{N}\sum_{i=1}^{N}-2({y_{i} - \hat{y}_{i})} = \frac{1}{N}\sum_{i=1}^{N}2({\hat{y}_{i} - y_{i})}
$$

### Update Rules

* Update the weights and biases

$$
w = w - \alpha.dw
$$
$$
b = b - \alpha.db
$$

In [38]:
class LogisticRegression:
    def __init__(self, n_iters: int = 2_000, learning_rate: float = 0.001) -> None:
        self.n_iters = n_iters
        self.learning_rate = learning_rate
        self.weight = None
        self.bias = None
        self.THRESH = 0.5

    def __repr__(self) -> str:
        return (
            f"{__class__.__name__}(learning_rate={self.learning_rate}, "
            f"n_iters={self.n_iters:,})"
        )

    def fit(self, X: np.ndarray, y: np.ndarray) -> None:
        # Initialize the parameters
        n_samples, n_features = X.shape
        self.weight = np.zeros((n_features))  # Vector
        self.bias = 0  # Scalar

        for _ in np.arange(self.n_iters):
            # Make predictions. Convert the continuous variable 
            # to a number between 0 and 1.
            y_hat = np.dot(X, self.weight) + self.bias
            y_pred = self.__sigmoid(y_hat)

            # Using Gradient descent, minimize the loss function
            # i.e the loss for each training example
            dw = (1 / n_samples) * 2 * (np.dot(X.T, (y_pred - y)))
            db = 2 * np.mean(y_pred - y)

            # Update the parameters
            self.weight -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
        return self

    def __sigmoid(self, y_hat: float) -> float:
        """This returns a number between 0 and 1."""
        _y_pred = 1 / (1 + np.exp(-y_hat))
        return _y_pred

    def predict(self, X: np.ndarray) -> np.ndarray:
        """This is used to make predictions."""
        y_hat = np.dot(X, self.weight) + self.bias
        _y_pred = self.__sigmoid(y_hat)
        y_pred = [1 if val > self.THRESH else 0 for val in _y_pred]
        return np.array(y_pred)

In [35]:
# Config
RANDOM_STATE = 123
TEST_SIZE = 0.1
N_SAMPLES = 2_000
N_FEATURES = 1
N_CLASSES = 2
NOISE = 10


def generate_mock_data(*, type_: str) -> tuple[np.ndarray, np.ndarray]:
    """This generates the synthetic data required for classification
    or regression.

    Params:
        type_ (str): 'classification' or 'regression'

    Returns:
        (X, y) (tuple): It returns the predictor and the target variable.
    """
    type_value = ["classification", "regression"]
    if type_ not in type_value:
        raise ValueError(f"{type_!r} should be {type_value[0]!r} or {type_value[1]!r}.")

    regres_data = make_regression(
        n_samples=N_SAMPLES,
        n_features=N_FEATURES,
        noise=NOISE,
        random_state=RANDOM_STATE,
    )
    classif_data = make_classification(
        n_samples=N_SAMPLES,
        n_features=N_FEATURES + 10,
        n_classes=N_CLASSES,
        random_state=RANDOM_STATE,
    )

    data = regres_data if type == "regression" else classif_data
    X, y = data
    return (X, y)

In [36]:
X, y = generate_mock_data(type_="classification")
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE
)

In [39]:
log_model = LogisticRegression(n_iters=10_000)
log_model.fit(X_train, y_train)
y_pred = log_model.predict(X_test)
y_pred
print(np.mean(y_pred == y_test))

0.935
