### 逻辑回归(练习)
---
这一部分将会利用逻辑回归对MNIST数据集中的图片数据是否为0进行分类。

### 数据预处理
首先对数据进行预处理，读入图片，将2维的图片数据（28， 28）转化为向量（784, ）, 并且生成标签，0 对应标签为 1， 其余图片对应标签为 0.

In [48]:
import tensorflow as tf
import os
import numpy as np
import matplotlib.image as mpimg # mpimg 用于读取图片

data_path = '../data'

# 图片的路径，1中包括了数字1-9，0中全部为0
path_1 = os.listdir(os.path.join(data_path, '1'))
np.random.shuffle(path_1)
path_1 = list(map(lambda x: os.path.join(data_path, '1', x), path_1))
path_0 = os.listdir(os.path.join(data_path, '0'))
path_0 = list(map(lambda x: os.path.join(data_path, '0', x), path_0))


def parse_image(image_path):
    """对所给的图像进行处理，变成一维向量, 并且归一化
    Args:
        image_path: 图像的路径
    Returns：
        img: 处理好的图像
    """
    t = mpimg.imread(image_path)
    return np.reshape(t, (28 * 28)) / 255.


def get_label(paths, labels):
    """根据给的路径对图像进行处理，打上标签
    Args:
        paths: 图片路径
        labels: 图片标签
    Returns:
        x: 处理好的图片
        y: 对应长度的标签
    """
    x = list(map(parse_image, paths))
#     print(x[:200])
    if labels == 1:
        y = [[1,0] for _ in range(len(paths))]
    else:
        y = [[0,1] for _ in range(len(paths))]
    return x, y


data_0 = get_label(path_0, 1)
print(data_0[1][:10])


data_1 = get_label(path_1, 0)

[[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0]]


### 练习：
将数据集分割为两部分，一部分用于训练，另一部分用于验证模型。

In [49]:
# 输入图片
data_X_train = np.concatenate((data_0[0][:-500], data_1[0][:-500]))
# 输入标签数据
data_Y_train = np.concatenate((data_0[1][:-500], data_1[1][:-500]))
training_set = np.concatenate([data_X_train, data_Y_train], axis=1)
print(training_set[0])
# 验证集
data_X_test = np.concatenate((data_0[0][-500:], data_1[0][-500:]))
data_Y_test = np.concatenate((data_0[1][-500:], data_1[1][-500:]))

[0.01176471 0.         0.         0.01176471 0.02745098 0.01176471
 0.         0.01176471 0.         0.04313725 0.         0.
 0.01176471 0.         0.         0.01176471 0.03137255 0.
 0.         0.01176471 0.         0.         0.         0.00784314
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.00392157
 0.01960784 0.         0.04705882 0.         0.0627451  0.
 0.         0.01568627 0.         0.00784314 0.03137255 0.01176471
 0.         0.01568627 0.03137255 0.         0.         0.
 0.         0.         0.         0.         0.00784314 0.
 0.         0.         0.00392157 0.00784314 0.00392157 0.04705882
 0.         0.03137255 0.         0.         0.02352941 0.
 0.04313725 0.         0.         0.02352941 0.02745098 0.00784314
 0.         0.         0.         0.         0.         0.
 0.         0.00392157 0.01176471 0.         0.         0.00784314
 0.01176471 0.         0.         0.         0.04705882 0.


### 练习：
定义产生batch的生成器

In [50]:
def gen_batch(dataset, batchsize):
    """根据设定的batchsize大小产生mini batch
    Args:
        dataset: 数据集
        batchsize: batchsize
    Generates:
        x: 输入
        y：输出
    """
    for i in range(np.shape(dataset)[0] // batchsize):
        pos = i * batchsize
        x = dataset[pos:pos + batchsize, 0:-2]
        y = dataset[pos:pos + batchsize, -2:]
        yield x, y
    remain = np.shape(dataset)[0] % batchsize
    if remain != 0:
        x, y = dataset[-remain:, 0:-2], dataset[-remain:, -2:]
        yield x, y

### 定义计算图
在这一步中，我们主要完成了以下几点：
* 定义 placeholder 用于后面输入数据
* 定义权重变量 W
* 定义损失函数
* 定义优化算法，这里使用的时梯度下降
* 计算错误率

In [58]:
batchsize = 64
lr = 0.01
epoch = 100

### 练习：
定义计算图

In [59]:
# 定义计算图
graph = tf.Graph()
with graph.as_default():
    # 定义placeholder
    X = tf.placeholder(shape=(None, 28*28), dtype=tf.float32, name="X")
    Y = tf.placeholder(shape=(None, 2), dtype=tf.float32, name="Y")

    # 定义weight matrix
    W = tf.Variable(tf.truncated_normal(shape=[784, 2]), name="WeightMatrix")
    lgt = tf.matmul(X, W)
    output = tf.nn.softmax(lgt, name="Apply_Sigmoid")
    # 定义loss
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=lgt), name="calculate_loss")
    
    with tf.name_scope("SGD"):
        # 使用梯度下降进行优化
        opt = tf.train.GradientDescentOptimizer(lr).minimize(loss, var_list=[W])
    
    # 计算错误率
    with tf.name_scope("calculate_error_rate"):
        # 概率大于 0.5 预测结果为0， 否则为 0
        prediction_result = tf.cast(tf.equal(tf.argmax(output,axis=1),tf.argmax(Y,axis=1)), dtype=tf.float32)
        

        error_rate = 1 - tf.reduce_mean(prediction_result)

### 练习：
### 定义进程并进行运算
这一步将准备好的数据输入给运算图，对模型中的变量赋予了初值，进行计算和优化，总共对模型进行了500次训练，并且训练完成以后利用测试数据集对模型进行了评估。

In [60]:
with tf.Session(graph=graph) as sess:
    # 初始化变量
    init = tf.global_variables_initializer()
    sess.run(init)
    step = 0
    for epc in range(epoch):
        for x, y in gen_batch(training_set, batchsize):
            l, error, _ = sess.run([loss, error_rate, opt], feed_dict={X: np.reshape(x, (-1, 784)), Y: np.reshape(y, (-1, 2))})
            if step % 50 == 0:
                print("Step: {:>4}, Loss: {:.4f}, Error Rate: {:.4%}".format(step, l, error))
            step += 1
    print("Training finished.")
    l, error, weight_matrix = sess.run([loss, error_rate, W],
                                       {X: data_X_test, Y: data_Y_test})
    print("Testing Loss: {:.4f}, Testing Error Rate: {:.4%}".format(l, error))
    W_value = sess.run(W)

Step:    0, Loss: 4.4679, Error Rate: 57.8125%
Step:   50, Loss: 0.2033, Error Rate: 6.2500%
Step:  100, Loss: 0.9343, Error Rate: 21.8750%
Step:  150, Loss: 0.4969, Error Rate: 18.7500%
Step:  200, Loss: 1.2791, Error Rate: 18.7500%
Step:  250, Loss: 0.9955, Error Rate: 28.1250%
Step:  300, Loss: 1.9551, Error Rate: 35.9375%
Step:  350, Loss: 0.2824, Error Rate: 6.2500%
Step:  400, Loss: 0.4004, Error Rate: 9.3750%
Step:  450, Loss: 0.3505, Error Rate: 7.8125%
Step:  500, Loss: 0.4162, Error Rate: 9.3750%
Step:  550, Loss: 0.9963, Error Rate: 14.0625%
Step:  600, Loss: 0.9964, Error Rate: 20.3125%
Step:  650, Loss: 0.2347, Error Rate: 4.6875%
Step:  700, Loss: 0.3939, Error Rate: 7.8125%
Step:  750, Loss: 0.1811, Error Rate: 6.2500%
Step:  800, Loss: 0.6759, Error Rate: 14.0625%
Step:  850, Loss: 0.3129, Error Rate: 7.8125%
Step:  900, Loss: 0.8652, Error Rate: 21.8750%
Step:  950, Loss: 0.3943, Error Rate: 9.3750%
Step: 1000, Loss: 0.2777, Error Rate: 4.6875%
Step: 1050, Loss: 0.1969

Step: 9050, Loss: 0.1658, Error Rate: 3.1250%
Step: 9100, Loss: 0.0907, Error Rate: 1.5625%
Step: 9150, Loss: 0.0665, Error Rate: 3.1250%
Step: 9200, Loss: 0.1702, Error Rate: 1.5625%
Step: 9250, Loss: 0.0731, Error Rate: 3.1250%
Step: 9300, Loss: 0.0520, Error Rate: 1.5625%
Step: 9350, Loss: 0.4551, Error Rate: 6.2500%
Step: 9400, Loss: 0.1428, Error Rate: 6.8182%
Step: 9450, Loss: 0.5842, Error Rate: 9.3750%
Step: 9500, Loss: 0.0361, Error Rate: 1.5625%
Step: 9550, Loss: 0.0056, Error Rate: 0.0000%
Step: 9600, Loss: 0.0649, Error Rate: 4.6875%
Step: 9650, Loss: 0.0639, Error Rate: 6.2500%
Step: 9700, Loss: 0.2775, Error Rate: 7.8125%
Step: 9750, Loss: 0.0641, Error Rate: 1.5625%
Step: 9800, Loss: 0.0408, Error Rate: 3.1250%
Step: 9850, Loss: 0.0407, Error Rate: 3.1250%
Step: 9900, Loss: 0.1611, Error Rate: 3.1250%
Step: 9950, Loss: 0.2061, Error Rate: 3.1250%
Step: 10000, Loss: 0.2646, Error Rate: 6.2500%
Step: 10050, Loss: 0.0415, Error Rate: 1.5625%
Step: 10100, Loss: 0.0381, Error