# Logistic Regression Lab

## 准备工作
### 环境准备

请确保完成以下依赖包的安装，并且通过下面代码来导入与验证。运行成功后，你会看到一个新的窗口，其展示了一张空白的figure。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, List

# display the plot in a separate window
%matplotlib tk

np.random.seed(12)

# create a figure and axis
plt.ion()
fig = plt.figure(figsize=(12, 5))

### 数据集准备

你将使用以下二维数据集来训练逻辑分类器，并观察随着训练的进行，线性分割面的变化。

该数据集包含两个特征和一个标签，其中标签 $ y \in \{-1,1\} $。

请执行下面的代码以加载数据集并对其进行可视化。

In [2]:
from data_generator import gen_2D_dataset

x_train, y_train = gen_2D_dataset(100, 100, noise = 0)
x_test, y_test = gen_2D_dataset(50, 50, noise = 0.7)

In [3]:
from vis_util import visualize_2D_dataset, visualize_2D_border

visualize_2D_dataset(x_train, y_train)
visualize_2D_dataset(x_test,y_test)

## 逻辑回归 (10 pts)

在这一部分，你将学习并完成逻辑回归相关代码的编写与训练。

在运行这部分代码之前，请确保你已经完成了 `logistics.py` 文件的代码补全。

完成后，运行以下代码，你会看到一张figure来展示$||w||$，loss和决策边界的变化。

In [4]:
from logistic import LogisticRegression

# create a LogisticRegression object
LR = LogisticRegression()

# fit the model to the training data without regularization (reg = 0)
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0)


iter: 0, loss: 0.6807266354549091, w_module: 1.86503565781866
iter: 10, loss: 0.6047988783795631, w_module: 1.8481489888845297
iter: 20, loss: 0.5660832091387538, w_module: 1.8416649857462604
iter: 30, loss: 0.5309314279730247, w_module: 1.852628762958914
iter: 40, loss: 0.49873990315881916, w_module: 1.8798975391623567
iter: 50, loss: 0.4692664495478231, w_module: 1.9209134420153002
iter: 60, loss: 0.4422816596337917, w_module: 1.973036438941622
iter: 70, loss: 0.4175668558824795, w_module: 2.0338455872138264
iter: 80, loss: 0.3949164282672946, w_module: 2.101235926921896
iter: 90, loss: 0.3741392267477511, w_module: 2.1734405660231073
iter: 100, loss: 0.3550591551622385, w_module: 2.249013982747483
iter: 110, loss: 0.33751516416979, w_module: 2.3267961857247195
iter: 120, loss: 0.32136081353797313, w_module: 2.405870153248784
iter: 130, loss: 0.30646354486334987, w_module: 2.4855197670708287
iter: 140, loss: 0.29270377665622754, w_module: 2.5651918359489785
iter: 150, loss: 0.2799739

运行上述代码，你会发现，在不考虑正则化的情况下，$||w||$ 随着训练次数的增加会不断增大。

训练完成后，你可以利用训练得到的分类器来进行预测。请你编写代码，计算训练集和测试集中的预测准确率。

In [5]:
# Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

# training accuracy

# TODO: compute the y_pred using LR.predict() function
_,y_pred = LR.predict(x_train)
# TODO: compute the accuracy
train_acc = np.sum(y_train == y_pred) / y_train.shape[0]

print("Train accuracy: {}".format(train_acc))


# TODO: test accuracy, proceed similarly as above
_,y_pred_test = LR.predict(x_test)
test_acc = np.sum(y_test == y_pred_test) / y_test.shape[0]

print("Test accuracy: {}".format(test_acc))


Train accuracy: 1.0
Test accuracy: 0.98


In [6]:
# create a LogisticRegression object and train it when using regularization
LR = LogisticRegression()
LR.fit(x_train, y_train, lr=0.1, n_iter=1000,reg=0.1)

iter: 0, loss: 0.7626338792157056, w_module: 1.1666479112787131
iter: 10, loss: 0.5512105917124799, w_module: 1.0059428772396901
iter: 20, loss: 0.5032670058998098, w_module: 1.043507909594384
iter: 30, loss: 0.4728857424911759, w_module: 1.1300133740395049
iter: 40, loss: 0.44556921164099994, w_module: 1.23224679432182
iter: 50, loss: 0.42068862324445283, w_module: 1.3417235335754876
iter: 60, loss: 0.3979983804556591, w_module: 1.4544396438519054
iter: 70, loss: 0.3772804938548322, w_module: 1.567975460189859
iter: 80, loss: 0.35833700909365235, w_module: 1.6807962148313718
iter: 90, loss: 0.3409891405975292, w_module: 1.7919296223532153
iter: 100, loss: 0.3250761861747587, w_module: 1.9007712502048357
iter: 110, loss: 0.31045421008900465, w_module: 2.006959637602104
iter: 120, loss: 0.2969946166293214, w_module: 2.110294839991969
iter: 130, loss: 0.2845827049256945, w_module: 2.210684709626059
iter: 140, loss: 0.2731162653454001, w_module: 2.3081090719969226
iter: 150, loss: 0.26250

In [7]:
# TODO: Implement the code to compute the accuracy of logistic regression (LR) in the test set. Note that LR itself is already trained, if you have run the above code.

_,y_pred = LR.predict(x_train)

train_acc = np.sum(y_train == y_pred) / y_train.shape[0]

print("Train accuracy: {}".format(train_acc))

_,y_pred_test = LR.predict(x_test)
test_acc = np.sum(y_test == y_pred_test) / y_test.shape[0]

print("Test accuracy: {}".format(test_acc))

Train accuracy: 1.0
Test accuracy: 0.97


运行上述带有正则化的代码后，请观察 $||w||$ 的变化，并讨论正则化的实际意义。(请将答案写在下方)

正则化对过大的w进行惩罚，防止w持续变大。