### Implement Linear Regression and Logistic Regression
<img src="pics/bridge-2.jpg" width="800" height="400">
In this article you're going to learn about Linear Regression, Logistic Regression and Gradient Descent algorithm which is a essential component of deep learning.

### Agenda
1. How does it work?
2. Gradient Descent
3. Learning Rate
3. Implement Linear Regression on quantitative data
4. Implement Logistic Regression on qualitative data

### 1. How does it work?
Unlike KNN which needs to store whole training samples in order to predict output of unseen samples, Linear Regression and Logistic Regression do not store a single point of training sample but instead they create a linear function to approximate the training set. Below are brief procedures of how to train both of them using **Gradient Descent** (actually there is another method called **Least Square Error** but we're not going to talk about it in this course).

<img src="pics/linear_regression-2.png" width="1000">

```
1. Initial the model with random parameters
2. for _ in range(epochs):
3.    Predict output for training samples from the linear function
4.    Calculate cost from prediction outputs and training labels
5.    Update parameters by Gradient descent
```

<img src="pics/linear_regression_animation.gif">

There are several differents between Linear Regression and Logistic Regression.
1. Linear Regression outputs **continuous quantity**, Logistic Regression outputs **probability**.
2. Linear Regression uses **Mean Squared Error** or **Mean Absolute Error** as cost function, Logistic Regression uses **Binary Cross Entropy** as cost function.

In short, the obvious different between the two is that Linear Regression is for **quantitative data** and another is for **qualitative data**. the following will show how both of them work in greater detail.

### 1.1. Linear Regression
Linear Regression's equation is a simple linear function that map from N independent variables to a dependent variable.
- Simple Linear Regression
#### $$\hat{y} = w_0 + wx$$
- Multiple Linear Regression
#### $$\hat{y} = w_0 + w_1x_1 + w_2x_2 + w_nx_n$$
- Polynomial Linear Regression
#### $$\hat{y} = w_0 + w_1x_1 + w_2x_2 + w_3x_1x_2 + w_4x_1^2 + w_5x_2^2 + w_nx_n$$

<img src="pics/linear_regression-3.png" width="800" height="200">

In order for Gradient Descent algorithm to be able to train the model we need a cost function for optimizer to minimize it.
- Mean Squared Error (MSE)
#### $$MSE = \frac{1}{n}\sum_{i=1}^n{(y_i - \hat{y}_i)^2}$$

<img src="pics/MSE.png">

- Mean Absoluted Error (MAE)
#### $$MAE = \frac{1}{n}\sum_{i=1}^n{|y_i - \hat{y}_i|}$$

<img src="pics/MAE.png">

### 1.2. Logistic Regression
Logistic Regression is a linear equation mapping any N independent variables into one dependent variable that is a **qualitative data**.
- Simple Logistic Regression
#### $$\hat{y} = \frac{1}{1 + e^{-z}},\quad z = w_0 + wx$$
- Multiple Logistic Regression
#### $$\hat{y} = \frac{1}{1 + e^{-z}},\quad z = w_0 + w_1x_1 + w_2x_2 + w_nx_n$$
- Polynomial Logistic Regression
#### $$\hat{y} = \frac{1}{1 + e^{-z}},\quad z = w_0 + w_1x_1 + w_2x_2 + w_3x_1x_2 + w_4x_1^2 + w_5x_2^2 + w_nx_n$$

<img src="pics/logistic_regression-3.png" width="1400" height="200">

Since output of Logistic Regression is probability so it need different type of cost function.
- Binary Cross Entropy (BCE)
####  $$BCE = \frac{1}{n}\sum_{i=1}^n{(-y_i*log(\hat{y}_i) - (1 - y_i)*log(1 - \hat{y}_i))}$$

<img src="pics/BCE.png">

### 2. Gradient Descent
Gradient descent is an optimization algorithm used in machine learning to update model's parameters to the optimal point by minimizing some function by moving iteratively in the steepest descent direction as defined by the negative of the gradient.
- Gradient Descent Equation
#### $$W_{new} = W_{old} - \alpha*\frac{\partial J(W)}{\partial W}$$
When  
$J(W)$: Cost function  
$W$: Model parameters  
$\alpha$: Learning rate

<img src="pics/GradientDescent.png" width="500">

From the figure above the model with initialized parameters start at the **inital point**, then in each epoch the model finds which direction to update each parameter to move closer to the **optimal point** by calculating partial derivative of the cost function with respect to each parameter, and update its parameters by subtraction current value of each parameter with the derivative times learning rate.

### 3. Learning Rate
Learning rate is one of hyperparameters which used to determine how much to update parameters, too small learning rate leads to slow in training time and too big learning rate leads to oscillation or divergence.

<img src="pics/LearningRate.png" width="1000">

### 4. Implement Linear regression on quantitative data
-

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 4.1. Load dataset

In [None]:
# Load dataset
house_data = pd.read_csv("./datasets/housedata/data.csv")

### 4.2. Prepare data

In [None]:
# Remove columns
data = house_data.drop(columns=["date", "street", "country", "house_age", "statezip"])
data.head()

In [None]:
# One-hot encoding categorical columns
categorical_cols = ["view", "condition", "city"]

for col in categorical_cols:
    city_encoded = pd.get_dummies(data[col])
    city_encoded.columns = [col + "_" + str(_col) for _col in city_encoded.columns]
    data = pd.concat([data.drop(columns=col), city_encoded], axis=1)
data.head()

In [None]:
# Seperate prediction/feature
data_x = data.drop(columns="price")
data_y = data.price
print(f"data_x: {data_x.shape}")
print(f"data_y: {data_y.shape}")

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Split train/test
# Group y into bins
bins = np.linspace(0, 1500000, 10)
y_binned = np.digitize(data_y, bins)
plt.hist(y_binned)

# Split with stratify
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size=0.2, random_state=42, shuffle=True, stratify=y_binned)
print(f"train_x: {train_x.shape}")
print(f"test_x: {test_x.shape}")
print(f"train_y: {train_y.shape}")
print(f"test_y: {test_y.shape}")

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Normalize all features to be in range [0, 1] for training set
scaler = StandardScaler()
scaler.fit(train_x)

train_x_scaled = scaler.transform(train_x)
train_x_scaled = pd.DataFrame(train_x_scaled, columns=train_x.columns)
train_x_scaled.describe()

In [None]:
# Normalize all features to be in range [0, 1] for test set
test_x_scaled = scaler.transform(test_x)
test_x_scaled = pd.DataFrame(test_x_scaled, columns=test_x.columns)
test_x_scaled.describe()

In [None]:
# Convert type to numpy array
train_x_scaled = train_x_scaled.to_numpy()
train_x, train_y = train_x.to_numpy(), train_y.to_numpy()
test_x_scaled, test_y = test_x_scaled.to_numpy(), test_y.to_numpy()

### 4.3. Prepare model

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, Model
from sklearn.linear_model import LinearRegression as ExampleLinearRegression

In [16]:
class LinearLayer(layers.Layer):
    def build(self, input_shape):
        # w shape: (feature_size, 1)
        self.w = self.add_weight(name="w",
                                 shape=(input_shape[-1], 1),
                                 initializer=tf.random_normal_initializer(),
                                 trainable=True,
                                 dtype="float32")
        # b shape: (1, )
        self.b = self.add_weight(name="b",
                                 shape=(1, ),
                                 initializer=tf.random_normal_initializer(),
                                 trainable=True,
                                 dtype="float32")
    
    def call(self, inp):
        """
        inp shape: (batch_size, feature_size)
        out shape: (batch_size, 1)
        """
        out = tf.matmul(inp, self.w) + self.b
        return out
    
class LinearRegression(Model):
    def __init__(self):
        super().__init__()
        self.linear_layer = LinearLayer()
        
    def call(self, inp):
        out = self.linear_layer(inp)
        return out

In [21]:
regression_model = LinearRegression()

x = np.random.uniform(low=0, high=5, size=(100, 50)).astype(dtype=np.float32)
y = np.random.uniform(low=0, high=1000, size=(100, )).astype(dtype=np.float32)

y_pred = regression_model.predict(x)

assert y_pred.shape == (x.shape[0], 1)
print("Pass")

Pass


In [6]:
# Test LinearRegression
regression_model = LinearRegression()
ex_regression_model = ExampleLinearRegression()

x = np.random.uniform(low=0, high=5, size=(10, 5))
y = np.random.uniform(low=0, high=1000, size=(10, ))

regression_model.fit(x, y)
ex_regression_model.fit(x, y)

pred = regression_model.predict(x)
ex_pred = ex_regression_model.predict(x)

assert np.all(np.equal(pred, ex_pred))
print("Pass")

RuntimeError: You must compile your model before training/testing. Use `model.compile(optimizer, loss)`.

### 4.4. Fit and evaluate model

In [None]:
from tensorflow.keras.losses import MeanAbsoluteError
from utilities import CrossValidation

In [None]:
model = LinearRegression()
metrics = MeanAbsoluteError()
scaler = StandardScaler()
evaluator = CrossValidation(metrics, k_folds=10, scaler=scaler)

score = evaluator.eval(model, train_x, train_y, verbose=0)
print(f"Validation errors: {evaluator.scores}")
print(f"Validation mean error: {score}")

### 4.5. Search for best hyperparameters

### 4.6. Evaluate on test set

### 5. Implement Logistic regression on qualitative data
-