### Implement Linear Regression and Logistic Regression
<img src="pics/bridge-2.jpg" width="800" height="400">
In this article you're going to learn about Linear Regression, Logistic Regression and Gradient Descent algorithm which is a essential component of deep learning.

### Agenda
1. How does it work?
2. Gradient Descent
3. Learning Rate
3. Implement Linear Regression on quantitative data
4. Implement Logistic Regression on qualitative data

### 1. How does it work?
Unlike KNN which needs to store whole training samples in order to predict output of unseen samples, Linear Regression and Logistic Regression do not store a single point of training sample but instead they create a linear function to approximate the training set. Below are brief procedures of how to train both of them using **Gradient Descent** (actually there is another method called **Least Square Error** but we're not going to talk about it in this course).

<img src="pics/linear_regression-2.png" width="1000">

```
1. Initial the model with random parameters
2. for _ in range(epochs):
3.    Predict output for training samples from the linear function
4.    Calculate cost from prediction outputs and training labels
5.    Update parameters by Gradient descent
```

<img src="pics/linear_regression_animation.gif">

There are several differents between Linear Regression and Logistic Regression.
1. Linear Regression outputs **continuous quantity**, Logistic Regression outputs **probability**.
2. Linear Regression uses **Mean Squared Error** or **Mean Absolute Error** as cost function, Logistic Regression uses **Binary Cross Entropy** as cost function.

In short, the obvious different between the two is that Linear Regression is for **quantitative data** and another is for **qualitative data**. the following will show how both of them work in greater detail.

### 1.1. Linear Regression
Linear Regression's equation is a simple linear function that map from N independent variables to a dependent variable.
- Simple Linear Regression
#### $$\hat{y} = w_0 + wx$$
- Multiple Linear Regression
#### $$\hat{y} = w_0 + w_1x_1 + w_2x_2 + w_nx_n$$
- Polynomial Linear Regression
#### $$\hat{y} = w_0 + w_1x_1 + w_2x_2 + w_3x_1x_2 + w_4x_1^2 + w_5x_2^2 + w_nx_n$$

<img src="pics/linear_regression-3.png" width="800" height="200">

In order for Gradient Descent algorithm to be able to train the model we need a cost function for optimizer to minimize it.
- Mean Squared Error (MSE)
#### $$MSE = \frac{1}{n}\sum_{i=1}^n{(y_i - \hat{y}_i)^2}$$

<img src="pics/MSE.png">

- Mean Absolute Error (MAE)
#### $$MAE = \frac{1}{n}\sum_{i=1}^n{|y_i - \hat{y}_i|}$$

<img src="pics/MAE.png">

### 1.2. Logistic Regression
Logistic Regression is a linear equation mapping any N independent variables into one dependent variable that is a **qualitative data**.
- Simple Logistic Regression
#### $$\hat{y} = \frac{1}{1 + e^{-z}},\quad z = w_0 + wx$$
- Multiple Logistic Regression
#### $$\hat{y} = \frac{1}{1 + e^{-z}},\quad z = w_0 + w_1x_1 + w_2x_2 + w_nx_n$$
- Polynomial Logistic Regression
#### $$\hat{y} = \frac{1}{1 + e^{-z}},\quad z = w_0 + w_1x_1 + w_2x_2 + w_3x_1x_2 + w_4x_1^2 + w_5x_2^2 + w_nx_n$$
- Multi-classes Logistic Regression
#### $$\hat{y} = \frac{e^{z_j}}{\sum_{j=0}^{k} e^{z_j}},\quad z_j = w_{0j} + w_jx$$

<img src="pics/logistic_regression-3.png" width="1400" height="200">

Since output of Logistic Regression is probability so it need different type of cost function.
- Binary Cross Entropy (BCE)
####  $$BCE = \frac{1}{n}\sum_{i=1}^n{(-y_i*log(\hat{y}_i) - (1 - y_i)*log(1 - \hat{y}_i))}$$
- Categorical Cross Entropy (CCE)
####  $$CCE = \frac{1}{n}\sum_{i=1}^n{(-y_i*log(\hat{y}_i))}$$

<img src="pics/BCE.png">

### 2. Gradient Descent
Gradient descent is an optimization algorithm used in machine learning to update model's parameters to the optimal point by minimizing some function by moving iteratively in the steepest descent direction as defined by the negative of the gradient.
- Gradient Descent Equation
#### $$W_{new} = W_{old} - \alpha*\frac{\partial J(W)}{\partial W}$$
When  
$J(W)$: Cost function  
$W$: Model parameters  
$\alpha$: Learning rate

<img src="pics/GradientDescent.png" width="500">

From the figure above the model with initialized parameters start at the **inital point**, then in each epoch the model finds which direction to update each parameter to move closer to the **optimal point** by calculating partial derivative of the cost function with respect to each parameter, and update its parameters by subtraction current value of each parameter with the derivative times learning rate.

### 3. Learning Rate
Learning rate is one of hyperparameters which used to determine how much to update parameters, too small learning rate leads to slow in training time and too big learning rate leads to oscillation or divergence.

<img src="pics/LearningRate.png" width="1000">

### 4. Implement Linear regression on quantitative data
This time we're going to use Linear Regression to predict house price instead of KNN, let see if Linear Regression can do any better than the KNN.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 4.1. Load dataset

In [None]:
# Load dataset
house_data = pd.read_csv("./datasets/housedata/data.csv")

### 4.2. Prepare data

In [None]:
# Remove columns
data = house_data.drop(columns=["date", "yr_built", "yr_renovated", "street", "statezip", "country"])
data.head()

In [None]:
# One-hot encoding categorical columns
categorical_cols = ["view", "condition", "city"]

for col in categorical_cols:
    city_encoded = pd.get_dummies(data[col])
    city_encoded.columns = [col + "_" + str(_col) for _col in city_encoded.columns]
    data = pd.concat([data.drop(columns=col), city_encoded], axis=1)
data.head()

In [None]:
# Seperate prediction/feature
data_x = data.drop(columns="price")
data_y = data.price
print(f"data_x: {data_x.shape}")
print(f"data_y: {data_y.shape}")

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Split train/test
# Group y into bins
bins = np.linspace(0, 1500000, 10)
y_binned = np.digitize(data_y, bins)
plt.hist(y_binned)

# Split with stratify
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size=0.2, random_state=42, shuffle=True, stratify=y_binned)
print(f"train_x: {train_x.shape}")
print(f"test_x: {test_x.shape}")
print(f"train_y: {train_y.shape}")
print(f"test_y: {test_y.shape}")

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Normalize all features to have mean of 0 and standard deviation of 1
scaler_x = StandardScaler()

scaler_x.fit(train_x)

train_x_scaled = scaler_x.transform(train_x)
train_x_scaled = pd.DataFrame(train_x_scaled, columns=train_x.columns)
train_x_scaled.describe()

In [None]:
# Normalize all features to have mean of 0 and standard deviation of 1
test_x_scaled = scaler_x.transform(test_x)
test_x_scaled = pd.DataFrame(test_x_scaled, columns=test_x.columns)
test_x_scaled.describe()

In [None]:
# Convert type to numpy array
train_x_scaled = train_x_scaled.to_numpy()
train_x, train_y = train_x.to_numpy(), train_y.to_numpy()

test_x_scaled = test_x_scaled.to_numpy()
test_x, test_y = test_x.to_numpy(), test_y.to_numpy()

### 4.3. Prepare model
- Implement Linear Regression in matrix form
#### $$\hat{Y} = XW + B$$

#### $$\hat{Y} =
\begin{bmatrix}
x_{11} & x_{12} & x_{13} & x_{1m} \\
x_{21} & x_{22} & x_{23} & x_{2m} \\
x_{n1} & x_{n2} & x_{n3} & x_{nm} \\
\end{bmatrix}
\begin{bmatrix}
w_{11} \\
w_{21} \\
w_{31} \\
w_{m1} \\
\end{bmatrix} + b
$$

#### $$\hat{Y} =
\begin{bmatrix}
x_{11}w_{11} + x_{12}w_{21} + x_{13}w_{31} + x_{1m}w_{m1} \\
x_{21}w_{11} + x_{22}w_{21} + x_{23}w_{31} + x_{2m}w_{m1} \\
x_{n1}w_{11} + x_{n2}w_{21} + x_{n3}w_{31} + x_{nm}w_{m1} \\
\end{bmatrix} + b
$$

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, Model

In [None]:
class LinearLayer(layers.Layer):
    def build(self, input_shape):
        # w shape: (feature_size, 1)
        self.w = self.add_weight(name="W",
                                 shape=(input_shape[-1], 1),
                                 initializer=tf.random_normal_initializer(),
                                 trainable=True,
                                 dtype="float32")
        # b shape: (1, )
        self.b = self.add_weight(name="B",
                                 shape=(1, ),
                                 initializer=tf.random_normal_initializer(),
                                 trainable=True,
                                 dtype="float32")
    
    def call(self, inp):
        """
        inp shape: (batch_size, feature_size)
        out shape: (batch_size, 1)
        """
        # Put your code here
    
class LinearRegression(Model):
    def __init__(self):
        super().__init__()
        self.linear_layer = LinearLayer()
        
    def call(self, inp):
        # Put your code here

In [None]:
# Test fit model on dummy data
target_slope = 230
target_bias = 1
dummy_x = np.arange(1000).astype(np.float32).reshape(-1, 1)
dummy_y = (dummy_x * target_slope + target_bias).astype(np.float32).reshape(-1)

# Define model
regression_model = LinearRegression()
# Define loss function
loss = tf.keras.losses.MeanSquaredError()
# Define metrics function
metrics = tf.keras.metrics.MeanAbsoluteError()
# Define optimizer
# optimizer = tf.keras.optimizers.Adam(learning_rate=100)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-7)

# Compile model
regression_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

# Train model
regression_model.fit(dummy_x, dummy_y, batch_size=16, epochs=10)

# Check trained parameters
slope, bias = regression_model.trainable_variables
print(f"slope: {slope.numpy().squeeze()} ({target_slope})")
print(f"bias: {bias.numpy().squeeze()} ({target_bias})")

# Plot
plt.figure(figsize=(10, 5))
plt.scatter(dummy_x, dummy_y, s=10)
plt.plot(dummy_x, regression_model.predict(dummy_x), "r", linewidth=4)
plt.show()

### 4.4. Fit and evaluate model
### 4.4.1. Split validation

<img src="pics/split_validation.png" width="900">

In [None]:
# Test fit model on training data and evaluate using split validation

# Define model
regression_model = LinearRegression()
# Define loss function
loss = tf.keras.losses.MeanSquaredError()
# Define metrics function
metrics = tf.keras.metrics.MeanAbsoluteError()
# Define optimizer
# optimizer = tf.keras.optimizers.Adam(learning_rate=600)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

# Compile model
regression_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

# Train model
history = regression_model.fit(train_x_scaled, train_y, batch_size=16, epochs=10, validation_split=0.2, verbose=1)
train_loss = history.history["loss"]
train_mae = history.history["mean_absolute_error"]
val_loss = history.history["val_loss"]
val_mae = history.history["val_mean_absolute_error"]
print(f"train_mae: {train_mae[-1]}")
print(f"val_mae: {val_mae[-1]}")

# Plot
plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.plot(train_loss)
plt.plot(val_loss)
plt.title("Loss")
plt.subplot(1, 2, 2)
plt.plot(train_mae)
plt.plot(val_mae)
plt.title("MAE")
plt.show()

### 4.4.2. K-fold cross validation

<img src="pics/k-fold_cross_validation.png" width="1100">

In [None]:
from sklearn.model_selection import KFold

In [None]:
# Test fit model on training data and evaluate using cross validation
class CrossValidation:
    def __init__(self, k_folds=10, scaler=None):
        # Initial properties
        self.k_folds = k_folds
        self.scaler = scaler
        self.scores = []
        
    def eval(self, model, x, y, **kwargs):
        # Initial model params
        model(x.astype(np.float32))
        # Save initial weights
        model.save_weights("init_weights/model")
        
        # Divide training set into k folds
        kf = KFold(n_splits=self.k_folds)
        self.scores = []
        for i, (train_index, val_index) in enumerate(kf.split(x)):
            # Load initial weights
            model.load_weights("init_weights/model")
            
            # Get validation fold
            val_x, val_y = x[val_index], y[val_index]
            
            # Get training fold
            train_x, train_y = x[train_index], y[train_index]
            
            # Normalization
            if scaler is not None:
                train_x = scaler.fit_transform(train_x)
                val_x = scaler.transform(val_x)
                
            # Train model on training set
            model.fit(train_x, train_y, **kwargs)
            
            # Evaluate model on validation set
            test_loss, test_mae = model.evaluate(val_x, val_y, verbose=0)
            
            # Save evaluation result
            self.scores.append(test_mae)
        # Average all evaluation results
        mean_score = np.mean(self.scores)
        return mean_score

In [None]:
# Define model
regression_model = LinearRegression()
# Define loss function
loss = tf.keras.losses.MeanSquaredError()
# Define metrics function
metrics = tf.keras.metrics.MeanAbsoluteError()
# Define optimizer
# optimizer = tf.keras.optimizers.Adam(learning_rate=600)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

# Compile model
regression_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

scaler = StandardScaler()
evaluator = CrossValidation(k_folds=10, scaler=scaler)

score = evaluator.eval(regression_model, train_x, train_y, batch_size=16, epochs=10, verbose=0)
print(f"Validation errors: {evaluator.scores}")
print(f"Validation mean error: {score}")

### 4.5. Evaluate on test set

In [None]:
# Define model
regression_model = LinearRegression()
# Define loss function
loss = tf.keras.losses.MeanSquaredError()
# Define metrics function
metrics = tf.keras.metrics.MeanAbsoluteError()
# Define optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

# Compile model
regression_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

history = regression_model.fit(train_x_scaled, train_y, batch_size=16, epochs=10, verbose=0)

test_loss, test_mae = regression_model.evaluate(test_x_scaled, test_y, verbose=0)
print(f"Test error: {test_mae}")

### 5. Implement Logistic regression on qualitative data
Now let's using Logistic Regression to predict iris flow species.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 5.1. Load data

In [None]:
# Load dataset
iris_data = pd.read_csv("./datasets/Iris/Iris.csv")

### 5.2. Prepare data

In [None]:
# Remove Id column
data = iris_data.drop(columns=["Id"])
data.head()

In [None]:
# One-hot encoding categorical columns
categorical_cols = ["Species"]

for col in categorical_cols:
    city_encoded = pd.get_dummies(data[col])
    city_encoded.columns = [col + "_" + str(_col) for _col in city_encoded.columns]
    data = pd.concat([data.drop(columns=col), city_encoded], axis=1)
data.head()

In [None]:
# Seperate prediction/feature
data_x = data.iloc[:, :4]
data_y = data.iloc[:, 4:]
print(f"data_x: {data_x.shape}")
print(f"data_y: {data_y.shape}")

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Split train/test
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size=0.2, random_state=42, shuffle=True, stratify=data_y)
print(f"train_x: {train_x.shape}")
print(f"test_x: {test_x.shape}")
print(f"train_y: {train_y.shape}")
print(f"test_y: {test_y.shape}")

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Normalize all features to have mean of 0 and standard deviation of 1
scaler_x = StandardScaler()

scaler_x.fit(train_x)

train_x_scaled = scaler_x.transform(train_x)
train_x_scaled = pd.DataFrame(train_x_scaled, columns=train_x.columns)
train_x_scaled.describe()

In [None]:
# Normalize all features to have mean of 0 and standard deviation of 1
test_x_scaled = scaler_x.transform(test_x)
test_x_scaled = pd.DataFrame(test_x_scaled, columns=test_x.columns)
test_x_scaled.describe()

In [None]:
# Convert type to numpy array
train_x_scaled = train_x_scaled.to_numpy()
train_x, train_y = train_x.to_numpy(), train_y.to_numpy()

test_x_scaled = test_x_scaled.to_numpy()
test_x, test_y = test_x.to_numpy(), test_y.to_numpy()

### 5.3. Prepare model
- Implement Logistic Regression in matrix form
#### $$\hat{Y} = \frac{1}{1 + e^{-Z}},\quad Z = XW + B$$
- Implement Multi-classes Logistic Regression in matrix form
#### $$\hat{Y} = \frac{e^{Z_j}}{\sum_{j=0}^{k} e^{Z_j}},\quad Z = XW + B$$

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, Model

In [None]:
class LinearLayer(layers.Layer):
    def __init__(self, class_nums):
        super().__init__()
        self.class_nums = class_nums
    
    def build(self, input_shape):
        # w shape: (feature_size, 1)
        self.w = self.add_weight(name="W",
                                 shape=(input_shape[-1], self.class_nums),
                                 initializer=tf.random_normal_initializer(),
                                 trainable=True,
                                 dtype="float32")
        # b shape: (1, )
        self.b = self.add_weight(name="B",
                                 shape=(self.class_nums, ),
                                 initializer=tf.random_normal_initializer(),
                                 trainable=True,
                                 dtype="float32")
    
    def call(self, inp):
        """
        inp shape: (batch_size, feature_size)
        out shape: (batch_size, 1)
        """
        # Put your code here
    
class LogisticRegression(Model):
    def __init__(self, class_nums):
        super().__init__()
        self.class_nums = class_nums
        self.linear_layer = LinearLayer(class_nums)
        
    def _sigmoid(self, z):
        # Put your code here
    
    def _softmax(self, z):
        # Put your code here
        
    def call(self, inp):
        # Put your code here

In [None]:
# Test fit model on dummy data
dummy_x = np.arange(-500, 500).astype(np.float32).reshape(-1, 1)
dummy_y = (dummy_x > 0).astype(np.float32).reshape(-1)

# Define model
classifier_model = LogisticRegression(class_nums=1)
# Define loss function
loss = tf.keras.losses.BinaryCrossentropy(from_logits=False)
# Define metrics function
metrics = tf.keras.metrics.BinaryAccuracy(threshold=0.5)
# Define optimizer
# optimizer = tf.keras.optimizers.Adam(learning_rate=0.05)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-4)

# Compile model
classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

# Train model
classifier_model.fit(dummy_x, dummy_y, batch_size=16, epochs=10)

# Check trained parameters
w, b = classifier_model.trainable_variables
print(f"w: {w.numpy()}")
print(f"b: {b.numpy()}")

# Plot
plt.figure(figsize=(10, 5))
plt.scatter(dummy_x, dummy_y, s=10)
plt.plot(dummy_x, classifier_model.predict(dummy_x), "r", linewidth=4)
plt.show()

### 5.4. Fit and evaluate model

In [None]:
# Define model
classifier_model = LogisticRegression(class_nums=3)
# Define loss function
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
# Define metrics function
metrics = tf.keras.metrics.CategoricalAccuracy()
# Define optimizer
# optimizer = tf.keras.optimizers.Adam(learning_rate=1)
optimizer = tf.keras.optimizers.SGD(learning_rate=1)

# Compile model
classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

scaler = StandardScaler()
evaluator = CrossValidation(k_folds=10, scaler=scaler)

score = evaluator.eval(classifier_model, train_x, train_y, batch_size=16, epochs=10, verbose=0)
print(f"Validation errors: {evaluator.scores}")
print(f"Validation mean error: {score}")

### 5.5. Evaluate on test set

In [None]:
# Define model
classifier_model = LogisticRegression(class_nums=3)
# Define loss function
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
# Define metrics function
metrics = tf.keras.metrics.CategoricalAccuracy()
# Define optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=1)

# Compile model
classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

history = classifier_model.fit(train_x_scaled, train_y, batch_size=16, epochs=10, verbose=0)

test_loss, test_acc = classifier_model.evaluate(test_x_scaled, test_y, verbose=0)
print(f"Test error: {test_acc}")