# Logistic Regression Lab

## 准备工作
### 环境准备

请确保完成以下依赖包的安装，并且通过下面代码来导入与验证。运行成功后，你会看到一个新的窗口，其展示了一张空白的figure。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, List

# display the plot in a separate window
%matplotlib tk

np.random.seed(12)

# create a figure and axis
plt.ion()
fig = plt.figure(figsize=(12, 5))

### 数据集准备

你将使用以下二维数据集来训练逻辑分类器，并观察随着训练的进行，线性分割面的变化。

该数据集包含两个特征和一个标签，其中标签 $ y \in \{-1,1\} $。

请执行下面的代码以加载数据集并对其进行可视化。

In [2]:
from data_generator import gen_2D_dataset

x_train, y_train = gen_2D_dataset(100, 100, noise = 0)
x_test, y_test = gen_2D_dataset(50, 50, noise = 0.7) 

In [3]:
from vis_util import visualize_2D_dataset, visualize_2D_border

visualize_2D_dataset(x_train, y_train)
visualize_2D_dataset(x_test,y_test)

![figure1](img/Figure_1.jpeg)

## 逻辑回归 (10 pts)

在这一部分，你将学习并完成逻辑回归相关代码的编写与训练。

在运行这部分代码之前，请确保你已经完成了 `logistics.py` 文件的代码补全。

完成后，运行以下代码，你会看到一张figure来展示$||w||$，loss和决策边界的变化。

In [4]:
from logistic import LogisticRegression

# create a LogisticRegression object 
LR = LogisticRegression()

# fit the model to the training data without regularization (reg = 0)
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0)


iter: 0, loss: 0.6807266354549091, w_module: 1.86503565781866
iter: 10, loss: 0.6047988783795631, w_module: 1.8481489888845297
iter: 20, loss: 0.5660832091387538, w_module: 1.8416649857462604
iter: 30, loss: 0.5309314279730247, w_module: 1.852628762958914
iter: 40, loss: 0.49873990315881916, w_module: 1.8798975391623565
iter: 50, loss: 0.4692664495478231, w_module: 1.9209134420153
iter: 60, loss: 0.4422816596337917, w_module: 1.9730364389416215
iter: 70, loss: 0.4175668558824793, w_module: 2.033845587213826
iter: 80, loss: 0.39491642826729456, w_module: 2.101235926921896
iter: 90, loss: 0.3741392267477511, w_module: 2.173440566023107
iter: 100, loss: 0.3550591551622385, w_module: 2.249013982747482
iter: 110, loss: 0.33751516416978994, w_module: 2.326796185724719
iter: 120, loss: 0.32136081353797313, w_module: 2.4058701532487836
iter: 130, loss: 0.30646354486334976, w_module: 2.4855197670708282
iter: 140, loss: 0.29270377665622754, w_module: 2.5651918359489785
iter: 150, loss: 0.2799739

![figure2](img/Figure_2.jpeg)

运行上述代码，你会发现，在不考虑正则化的情况下，$||w||$ 随着训练次数的增加会不断增大。

训练完成后，你可以利用训练得到的分类器来进行预测。请你编写代码，计算训练集和测试集中的预测准确率。

In [5]:
# Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

# training accuracy

# TODO: compute the y_pred using LR.predict() function

x_train_bias = np.concatenate((x_train, np.ones((x_train.shape[0], 1))), axis=1)

_, y_train_pred = LR.predict(x_train_bias)

# TODO: compute the accuracy

correct_train = np.sum(y_train == y_train_pred)
train_acc = correct_train / len(y_train)


print("Train accuracy: {}".format(train_acc))


# TODO: test accuracy, proceed similarly as above

x_test_bias = np.concatenate((x_test, np.ones((x_test.shape[0], 1))), axis=1)

_, y_test_pred = LR.predict(x_test_bias)

correct_test = np.sum(y_test == y_test_pred)
test_acc = correct_test / len(y_test)


print("Test accuracy: {}".format(test_acc))


Train accuracy: 1.0
Test accuracy: 0.98


In [11]:
# create a LogisticRegression object and train it when using regularization
LR = LogisticRegression()
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0.1)

iter: 0, loss: 1.0359734045322084, w_module: 0.9099088363124627
iter: 10, loss: 0.748761299151733, w_module: 0.7208561735308843
iter: 20, loss: 0.6958635342262046, w_module: 0.530628627712227
iter: 30, loss: 0.6482742878571777, w_module: 0.3756706864340863
iter: 40, loss: 0.6048333779052433, w_module: 0.30766481420772807
iter: 50, loss: 0.5652507493929028, w_module: 0.36197731850752757
iter: 60, loss: 0.5292284278469921, w_module: 0.4891588337932561
iter: 70, loss: 0.49646584517558784, w_module: 0.6398554720594865
iter: 80, loss: 0.4666691974963134, w_module: 0.7953060040964518
iter: 90, loss: 0.4395581224185239, w_module: 0.9490342729607233
iter: 100, loss: 0.4148700146911712, w_module: 1.0986122236605878
iter: 110, loss: 0.3923624203183465, w_module: 1.2431099063019668
iter: 120, loss: 0.3718139643525305, w_module: 1.3822276226309793
iter: 130, loss: 0.3530242275023076, w_module: 1.5159566408570258
iter: 140, loss: 0.33581291837085325, w_module: 1.644429799208971
iter: 150, loss: 0.3

![figure3](img/Figure_3.jpeg)

In [7]:
# TODO: Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

x_train_bias = np.concatenate((x_train, np.ones((x_train.shape[0], 1))), axis=1)

_, y_train_pred = LR.predict(x_train_bias)

# TODO: compute the accuracy

correct_train = np.sum(y_train == y_train_pred)
train_acc = correct_train / len(y_train)


print("Train accuracy: {}".format(train_acc))


# TODO: test accuracy, proceed similarly as above

x_test_bias = np.concatenate((x_test, np.ones((x_test.shape[0], 1))), axis=1)

_, y_test_pred = LR.predict(x_test_bias)

correct_test = np.sum(y_test == y_test_pred)
test_acc = correct_test / len(y_test)


print("Test accuracy: {}".format(test_acc))


Train accuracy: 1.0
Test accuracy: 0.97


运行上述带有正则化的代码后，请观察 $||w||$ 的变化，并讨论正则化的实际意义。(请将答案写在下方)

w的增加变慢了一些，通过限制w的增长，使得模型的鲁棒性更强一些，最大程度限制过拟合  
若提高lr/reg的比值，w的收敛速度会更快一些，比如当lr=1, reg=0.1时，收敛效果比较明显