# 1. Regression

## 1.1 Generate Synthetic Data

We generate a synthetic data sampled from a sparse linear regression model. To be specific, the input $x = (x_1, \ldots, x_6)$ has six variables but the output $y$ is the linear combination of only the first two variables $x_1$ and $x_2$. Mathematically speaking, the sparse linear regression model is given by
\begin{align*}
y = x_1 + 3x_2 + 2 + \varepsilon,
\end{align*}
where $\varepsilon$ is the noise from the normal distribution $\mathcal{N}(0,0.1)$. To represent it in the matrix form, we denote as $w = (1,3,0,0,0,0)^{\top}$ and $b = 2$. Then we get
\begin{align*}
y = w^{\top} x + b + \varepsilon,
\end{align*}

In [10]:
import numpy as np
from sklearn.model_selection import train_test_split

def simulation(m):
    """
    Generate a specified number of samples according to the sparse linear model.

    Parameters
    -----
    m : num_samples

    Returns
    -----
    x (matrix, m*num_variables) : Input or features 
    y (matrix, m*1): Output or labels
    """

    # Generate independent and identically distributed samples as inputs.
    x1 = np.random.normal(3,1,[m,1])
    x2 = np.random.uniform(0,1,[m,1])
    x3 = np.random.normal(1,4,[m,1])
    x4 = np.random.normal(-1,1,[m,1])
    x5 = np.random.normal(0,1,[m,1])
    x6 = np.random.uniform(-1,1,[m,1])
    # Generate the true outputs according to the sparse linear model.
    y = x1 + 3*x2 + 2 + np.random.normal(0,0.1,[m,1])
    return np.hstack([x1,x2,x3,x4,x5,x6]), y

# Generate 5000 samples and split them into training and test dataset.
X, y = simulation(5000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2022)

## 1.2 Simulation Experiments

Next we will fit the ridge regression and lasso to the simulation data.
To be specific, we will follow the paradiam as belows:

1. Implement the ridge regression and lasso algorithm in the form of ``class``. 
2. Fit the models with training samples, (i.e. ``X_train`` and ``y_train``)
3. Prediect the outputs of test data ``X_test``.
4. Evaluate the perfomance of the fitted model by calculating the MSE between the predictions and true outputs ``y_test``.


### 1.2.1 Ridge Regression 

In the following ``RidgeRegression`` class, we aim to implement the ridge regression algorithm including the functions to fit the model with training data and predict the output for test data.

Please type in the codes in the specified space to complete the construction of the class ``Lasso``.

In [36]:
class RidgeRegression():
    '''
    This is a class for ridge regression algorithm.
    
    The class contains the hyper parameters of the ridge regression algorithm as attributes, such as the regurization 
    parameter(Lambda).
    It also contains the functions for initializing the class, calculating the loss, fitting the ridge regression 
    model and use the fitted model to predict test samples.
    
    Attributes:
        lr:        learning rate of gradient descent
        Lambda:    regularization parameter for L_2 penalty
        max_itr:   maximum number of iteration for gradient descent
        tol:       if the change in loss is smaller than tol, then we stop iteration
        W:         concatenation of weight w and bias b
        
    '''
    def __init__(self, lr, Lambda, max_itr, tol):
        '''
        Initialize the RidgeRegression class
        '''
        self.lr = lr
        self.Lambda = Lambda
        self.max_itr = max_itr
        self.tol = tol
        
    def _loss_ridge(self, X, y, W):
        '''
        Calculating the regularized empirical loss
        '''
        return ((y-X@W).T@(y-X@W))[0,0] + self.Lambda * np.sum(W[:X.shape[1]-1,0]**2)
    
    
    def fit(self,x,y):  
        '''
        estimate the weight and bias in the ridge regression model by gradient descent
        
        Args: 
            x (matrix, num_train*num_variables): input of training samples
            y (matrix, num_test*1): output of training samples
            
        Returns:
            self.W (matrix, (num_variables+1)*1): estimation of weight w and bias b
        ''' 
        m = x.shape[0]
        ### Add the all-one vector to the last column 
        X = np.concatenate((x,np.ones((m,1))),axis=1)
        d = X.shape[1]
        self.W = np.ones((d,1))
        
        ### Use the gradient descent to update W
        previous_loss = self._loss_ridge(X, y, self.W)
        for i in range(self.max_itr):
            L_2_der_W = np.zeros((d,1))
            L_2_der_W[:d,0] = self.W[:d,0]
            gradient = X.T@(X@self.W-y)/m + self.Lambda * L_2_der_W
            self.W = self.W - self.lr * gradient
            current_loss = self._loss_ridge(X, y, self.W)
            if previous_loss - current_loss < self.tol:
                print(f'Converged after {i} iterations')
                break
            else:
                previous_loss = current_loss
        return self.W
    
    def predict(self,x): 
        '''
        predict the output of the test samples
        
        Args: 
            x (matrix, num_test*num_variables): input of test samples
            
        Returns:
            y (matrix, num_test*1): predicted outputs of test samples
        ''' 
        m = x.shape[0]
        X = np.concatenate((x, np.ones((m,1))),axis=1)
        return np.dot(X, self.W)

Next we will use the class ``RidgeRegression`` to fit, predict and evaluate the ridge regression model. 

In [37]:
from sklearn.metrics import mean_squared_error
### Initial the class RidgeRegression by assigning values to the parameters.
model = RidgeRegression(lr=0.01, Lambda=0.002, max_itr = 20000, tol = 1e-5)
### Fit model with training data
W = model.fit(X_train, y_train)
### Predict the output of test samples
y_pred = model.predict(X_test)
### Evaluate the model by calculating the MSE of test samples.
mse = mean_squared_error(y_pred, y_test)
### Print MSE 
print("MSE of Ridge Regression is {}".format(mse))
### Print the estimated w and b
print("The weight w of Ridge Regression is \n {}.".format(W[:X_test.shape[1],0].T))
print("The bias b of Ridge Regression is {}.".format(W[X_test.shape[1],0]))

Converged after 9928 iterations
MSE of Ridge Regression is 0.010401064104879443
The weight w of Ridge Regression is 
 [ 1.00703274e+00  2.95993559e+00  9.41954206e-04 -5.41866994e-03
  1.90221845e-03  4.70667448e-04].
The bias b of Ridge Regression is 1.9876498728411585.


### 1.2.2 Lasso
Similar to the ``RidgeRegression`` class, we will implement the Lasso algorithm.

Please type in the codes in the specified space to complete the construction of the class ``Lasso``.

In [17]:
class Lasso():
    '''
    This is a class for Lasso algorithm.
    
    The class contains the hyper parameters of the lasso algorithm as attributes, such as the regurization 
    parameter(Lambda) of L_1 penality.
    It also contains the functions for initializing the class, fitting the lasso model and use the fitted 
    model to predict test samples.
    
    Attributes:
        Lambda:    regularization parameter for L_1 penalty
        max_itr:   maximum number of iteration for gradient descent
        tol:       if the change in loss is smaller than tol, then we stop iteration
        W:         concatenation of weight w and bias b
        
    '''
    def __init__(self, Lambda=0.5, max_itr=100, tol=0.0001):
        '''
        Initialize the RidgeRegression class
        '''
        self.Lambda = Lambda
        self.max_itr = max_itr
        self.tol = tol  
    
    def _loss_lasso(self, X, y, W):
        '''
        Calculating the regularized empirical loss
        '''
        return ((y-X@W).T@(y-X@W))[0,0] + self.Lambda * np.sum(np.abs(W[:X.shape[1]-1,0]))
    
    def fit(self, x, y):
        '''
        estimate the weight and bias in the lasso model by coordinate gradient descent
        
        Args: 
            x (matrix, num_train*num_variables): input of training samples
            y (matrix, num_test*1): output of training samples
            
        Returns:
            self.W (matrix, (num_variables+1)*1): estimation of weight w and bias b
        '''
        m = x.shape[0]
        ### Add the all-one vector to the last column 
        X = np.concatenate((x,np.ones((m,1))),axis=1)
        # weight and bias initialization
        d = X.shape[1]
        self.W = np.zeros((d,1))
        
        ### Use the cooridinate gradient descent to update W
        previous_loss = self._loss_lasso(X, y, self.W)
        for i in range(self.max_itr):
            ### Update bias
            self.W[-1,0] = np.mean(y.T-x@self.W[:-1,0])
            ### Update W_j, j=0,...,d-2
            for j in range(d-1):
                # Calculate r_j = Y - X@W, with W[j,0]=0 and other elements in W unchanged
                copy_W = self.W.copy()
                copy_W[j,0] = 0
                rj = y - X@copy_W
                # Calculate X[:,j]@r_j and X[:,j].T@X[:,j]
                aj = X[:,j].T@X[:,j]
                bj = 2 * X[:,j]@rj / m 
                if bj <= -self.Lambda:
                    self.W[j,0] = (bj + self.Lambda)/(2*aj)*m
                elif bj >= self.Lambda:
                    self.W[j,0] = (bj - self.Lambda)/(2*aj)*m
                else:
                    self.W[j,0] = 0
            current_loss = self._loss_lasso(X, y, self.W)
            if previous_loss - current_loss < self.tol:
                print(f'Converged after {i} iterations')
                break
            else:
                previous_loss = current_loss
        return self.W
    
    def predict(self, x):
        '''
        predict the output of the test samples
        
        Args: 
            x (matrix, num_test*num_variables): input of test samples
            
        Returns:
            y (matrix, num_test*1): predicted outputs of test samples
        ''' 
        m = x.shape[0]
        X = np.concatenate((x,np.ones((m,1))),axis=1)
        return  X@self.W

Next we will use the class ``Lasso`` to fit, predict and evaluate the lasso model. 

In [19]:
from sklearn.metrics import mean_squared_error
### Initial the class Lasso by assigning values to the parameters.
model = Lasso(Lambda = 0.015, max_itr=10000, tol=1e-5)
### Fit model with training data
W = model.fit(X_train, y_train)
### Predict the output of test samples
y_pred = model.predict(X_test)
### Evaluate the model by calculating the MSE of test samples.
mse = mean_squared_error(y_pred, y_test)
### Print MSE 
print("MSE of Lasso is {}".format(mse))
### Print the estimated w and b
print("The weight w of Lasso is \n {}.".format(W[:X_test.shape[1],0].T))
print("The bias b of Lasso is {}.".format(W[X_test.shape[1],0]))

Converged after 137 iterations
MSE of Lasso is 0.010946914858139166
The weight w of Lasso is 
 [0.99483344 2.90448037 0.         0.         0.         0.        ].
The bias b of Lasso is 2.0631135396663183.


## 1.3 Real-world Experiments -- UCI dataset
In order to apply the estimator `class` in real-world problems, we first learn about the sources of machine learning datasets for evaluating algorithms. For example, the `uci machine learning repository` is a well-known online sources with thousands of real-world datasets. 

https://archive.ics.uci.edu/ml/index.php

### 1.3.1 Energy dataset 
In this tutorial, we select ``Energy`` (https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction) as our target regression dataset. Usually, before we use the dataset downloaded from the website, we are supposed to trim them first, e.g.~Make sure that the features and labels are in their positions.

In [9]:
# Load data for ridge regression and lasso
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
df = pd.read_csv("energydata_rv1.csv")
X = np.array(df.iloc[:,:-1])
y = np.array(df.iloc[:,-1])
# normalize
scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)
y = y.reshape(-1,1)

# split the train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2022)

### 1.3.2 Ridge Regression

In [37]:
from sklearn.metrics import mean_squared_error
### Initial the class RidgeRegression by assigning values to the parameters.
model = RidgeRegression(lr=0.01, Lambda=0.02, max_itr = 20000, tol = 1e-5)
### Fit model with training data
W = model.fit(X_train, y_train)
### Predict the output of test samples
y_pred = model.predict(X_test)
### Evaluate the model by calculating the MSE of test samples.
mse = mean_squared_error(y_pred, y_test)
### Print MSE 
print("MSE of Ridge Regression is {}".format(mse))
### Print the estimated w and b
print("The weight w of Ridge Regression is \n {}.".format(W[:X_test.shape[1],0].T))
print("The bias b of Ridge Regression is {}.".format(W[X_test.shape[1],0]))

MSE of Ridge Regression is 211.40297345443952
The weight w of Ridge Regression is 
 [ 0.16910642 -0.09258337  1.08079126  0.75970807  0.23066261  2.57910311
  0.52520182  1.24506884  1.76542622 -0.47429644 -0.33042693  0.40706676
  1.72303559  3.26209764  1.17317072 -1.27751672  1.97334509  0.0124819
  1.03872677 -0.558929    1.59767475  4.16813434  4.00899188  2.14984311
  1.44358676 -0.99561108].
The bias b of Ridge Regression is 10.163365369387115.


### 1.3.3 Lasso

In [38]:
### Initial the class Lasso by assigning values to the parameters.
model = Lasso(Lambda = 0.015, max_itr=100, tol=1e-5)
### Fit model with training data
W = model.fit(X_train, y_train)
### Predict the output of test samples
y_pred = model.predict(X_test)
### Evaluate the model by calculating the MSE of test samples.
mse = mean_squared_error(y_pred, y_test)
### Print MSE 
print("MSE of Lasso is {}".format(mse))
### Print the estimated w and b
print("The weight w of Lasso is \n {}.".format(W[:X_test.shape[1],0].T))
print("The bias b of Lasso is {}.".format(W[X_test.shape[1],0]))

MSE of Lasso is 209.8119576560742
The weight w of Lasso is 
 [-0.75791663  0.          0.          0.91385972  0.          0.25992675
  0.          0.          0.2034837   0.          0.         -1.41598724
 -0.36242924  0.54596499  0.         -0.03448298  0.          0.73379924
  0.         -1.57685701  0.          0.          0.98427781  0.
 -1.07493677  0.18704327].
The bias b of Lasso is 25.133698679123533.


# 2. Classification

## 2.1 Logistic Regression

In [1]:
class LogisticRegression:
    '''
    This is a class for Logistic Regression algorithm.
    
    The class contains the hyper parameters of the logistic regression algorithm as attributes.
    It also contains the functions for initializing the class, fitting the ridge regression model and use the fitted 
    model to predict test samples.
    
    Attributes:
        lr:        learning rate of gradient descent
        max_itr:   maximum number of iteration for gradient descent
        tol:       if the change in loss is smaller than tol, then we stop iteration
        W:         concatenation of weight w and bias b
        verbose:   whether or not print the value of logitic loss every 1000 iterations
        
    '''
    def __init__(self, lr=0.01, max_itr=100000, tol = 1e-5, verbose = False):
        self.lr = lr
        self.max_itr = max_itr
        self.tol = tol
        self.verbose = verbose
 
    def __sigmoid(self, z):
        '''
        Define the Sigmoid function to convert from real value to [0,1]
        
        Args: 
            z (matrix, num_samples*1): scores or real value
            
        Returns:
            A matrix (num_variables+1)*1: a value in the interval [0,1]
        '''
        return 1 / (1 + np.exp(-z))
    
    def __logistic_loss(self, h, y):
        '''
        Calculate the logistic loss
        '''
        return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
    
    def fit(self, x, y):
        '''
        estimate the weight and bias in the logistic regression model by gradient descent
        
        Args: 
            x (matrix, num_train*num_variables): input of training samples
            y (matrix, num_test*1): labels of training samples, 0 or 1
            
        Returns:
            self.W (matrix, (num_variables+1)*1): estimation of weight and bias, i.e (w,b)
        '''
        ### Add the all-one vector to the last column 
        m = x.shape[0]
        X = np.concatenate((x, np.ones((m, 1))), axis=1)
        y = y.reshape(-1,1)
        # weight and bias initialization
        d = X.shape[1]
        self.W = np.zeros((d,1))
        
        z = np.dot(X, self.W)
        h = self.__sigmoid(z)
        previous_loss = self.__logistic_loss(h, y)
        for i in range(self.max_itr):
            #Calculate the gradient and update w and b
            z = np.dot(X, self.W)
            h = self.__sigmoid(z)
            gradient = np.dot(X.T, (h - y)) / m
            self.W -= self.lr * gradient
            
            #Calculate the new logistic loss
            z = np.dot(X, self.W)
            h = self.__sigmoid(z)
            current_loss = self.__logistic_loss(h, y)
            if previous_loss - current_loss < self.tol:
                print('Converged after {} iterations'.format(i+1))
                print('Logistic loss after {} iterations is {}'.format(i+1,current_loss))
                break
            else:
                previous_loss = current_loss
            if(self.verbose == True and i % 10000 == 0):
                print('Logistic loss after {} iterations is {}'.format(i+1,current_loss))
        return self.W
    
    def predict_prob(self, x):
        '''
        predict the posterior probability p_1(x; W) of the test samples
        
        Args: 
            x (matrix, num_test*num_variables): input of test samples
            
        Returns:
            y (matrix, num_test*1): predicted posterior probability p_1(x; W) of test samples
        ''' 
        m = x.shape[0]
        X = np.concatenate((x, np.ones((m, 1))), axis=1)
        return self.__sigmoid(np.dot(X, self.W))
    
    def predict(self, x):
        '''
        predict the label of the test samples
        
        Args: 
            x (matrix, num_test*num_variables): input of test samples
            
        Returns:
            y (matrix, num_test*1): predicted labels of test samples, 0 or 1
        ''' 
        return self.predict_prob(x).round()

In [2]:
def loadDataSet(dataset_path, file_type="txt"):
    if file_type == "txt":
        X = []                                                        #create feature matrix
        y = []                                                       # create label matrix
        fr = open(dataset_path)                                            #open file
        for line in fr.readlines():                                         #read datum
            lineArr = line.strip().split()                                  #remove the `\n` and obtain the data from string
            X.append([float(x) for x in lineArr[:-1]])     # add to the feature matrix
            y.append(float(lineArr[-1]))                                # add to the label matrix
        fr.close()                                                          # close file
        return X, y    

## 2.2 Load Dataset

In [3]:
# read the data
import numpy as np
X_train, y_train = loadDataSet("horseColicTraining.txt")
X_test, y_test = loadDataSet("horseColicTest.txt")

# transform the data from list to np.array
X_train = np.array(X_train)
y_train = np.array(y_train)
X_test = np.array(X_test)
y_test = np.array(y_test)

# normalize
X = np.vstack([X_train, X_test])
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

## 2.3 Model Fitting

In [4]:
### initiate the logistic regressor
model = LogisticRegression(lr=0.1, max_itr=100000, tol = 1e-8, verbose=True)
### fit the model with training data and get the estimation of parameters (w & b)
W = model.fit(X_train, y_train)
### Print the estimated w and b
print(W.T)
### Print the estimated w and b
print("The weight w of LR is \n {}.".format(W[:X_test.shape[1],0].T))
print("The bias b of LR is {}.".format(W[X_test.shape[1],0]))

Logistic loss after 1 iterations is 0.6898762822641373
Logistic loss after 10001 iterations is 0.5217484321994517
Converged after 11802 iterations
Logistic loss after 11802 iterations is 0.52171924670628
[[ 0.7611359  -0.2002035   1.00541571 -2.51425541  0.82184505 -0.60528775
  -0.36426046 -1.38660266 -0.16129892 -1.17278023  1.47758935 -0.60870321
   1.39102436 -0.31464443 -0.98595971  0.58698705 -0.70142135 -0.4994333
   1.04153942  0.06140696 -1.07582939  0.95021878]]
The weight w of LR is 
 [ 0.7611359  -0.2002035   1.00541571 -2.51425541  0.82184505 -0.60528775
 -0.36426046 -1.38660266 -0.16129892 -1.17278023  1.47758935 -0.60870321
  1.39102436 -0.31464443 -0.98595971  0.58698705 -0.70142135 -0.4994333
  1.04153942  0.06140696 -1.07582939].
The bias b of LR is 0.9502187772809589.


## 2.4 Prediction and Evaluation

In [6]:
y_pred = model.predict(X_test)
accuracy = np.sum(y_pred[:,0] == y_test)/len(y_test)
print("Accuracy of LR on the test dataset is {}.".format(accuracy))

Accuracy of LR on the test dataset is 0.7164179104477612.
