## 检测python版本
- 这里我们使用的python的版本为3.6.5
- 我们对于将要使用到到库，在这里统一导入，以避免覆盖和名次空间污染

In [1]:
import sys
import os
import shutil
import keras
import pandas as pd

sys.version

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
  (fname, cnt))
  (fname, cnt))


'3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) \n[GCC 7.2.0]'

## 数据预处理
- 下载好的数据集位于data目录下
- 筛选出训练集中不合格到图片，在pick_bad_pics.ipynb中说明并实现
- 将数据集解压后，归类图片到指定到目录（目录结构readme.md中有描述）
- 本note不对数据到预处理做重复描述

In [2]:
# 数据预处理
def pretreat_data():

    # 提取
    all_cats = [file for file in os.listdir("data/train") if 'cat' in file]
    all_dogs = [file for file in os.listdir("data/train") if 'dog' in file]
    print(len(all_cats), len(all_dogs))

    # 分割验证集20%
    val_num = int(len(all_dogs) * 0.2)
    val_cat = all_cats[-val_num:]
    val_dog = all_dogs[-val_num:]
    train_cat = all_cats[:-val_num]
    train_dog = all_dogs[:-val_num]
    print(len(val_cat), len(val_dog), len(train_cat), len(train_dog))

    # 建立必要目录，并填充
    if not os.path.isdir("data/gen_train"):
        os.mkdir("data/gen_train")
        os.mkdir("data/gen_train/dog")
        os.mkdir("data/gen_train/cat")
        for file in train_dog:
            shutil.copyfile('data/train/%s'%file, "data/gen_train/dog/%s"%file)
        for file in train_cat:
            shutil.copyfile('data/train/%s'%file, "data/gen_train/cat/%s"%file)
        print("done gen_train")

    if not os.path.isdir("data/gen_val"):
        os.mkdir("data/gen_val")
        os.mkdir("data/gen_val/dog")
        os.mkdir("data/gen_val/cat")
        for file in val_dog:
            shutil.copyfile('data/train/%s'%file, "data/gen_val/dog/%s"%file)
        for file in val_cat:
            shutil.copyfile('data/train/%s'%file, "data/gen_val/cat/%s"%file)
        print("done gen_val")
        
    if not os.path.isdir("data/gen_test"):
        os.mkdir("data/gen_test")
        os.mkdir("data/gen_test/mixed")
        for file in os.listdir("data/test")[:]:
            shutil.copyfile('data/test/%s'%file, "data/gen_test/mixed/%s"%file)
        print("done gen_test")


# 预处理
pretreat_data()

12500 12500
2500 2500 10000 10000


## 模型对比、筛选和验证
- 我们将编写函数，将需要采用到到预先训练模型，分部进行训练和预测。根据其中到表现，筛选出我们最终会使用到预训练模型
- 我们将对比和验证到模型有：InceptionV3、Xception和Inception ResnetV2

In [3]:
# 预训练模型筛选

# 构建模型
def buid_model(pre_model, image_size, pre_input, name):

    # 获取基础模型，不保留顶层的全连接网络
    input_tensor = keras.Input(shape=(image_size[0], image_size[1], 3))
    if pre_input:
        input_tensor = keras.layers.Lambda(pre_input)(input_tensor)
    base_model = pre_model(input_tensor=input_tensor, include_top=False)

    # 锁定模型，保护处理
    for layer in base_model.layers:
        layer.trainable = False

    # 空域信号施加全局平均池化，dropout处理防止过拟合，重建全连接层
    tmp = keras.layers.GlobalAveragePooling2D()(base_model.output)
    tmp = keras.layers.Dropout(0.5)(tmp)
    tmp = keras.layers.Dense(1, activation='sigmoid', kernel_initializer='he_normal')(tmp)

    # 配置模型
    model_obj = keras.models.Model(inputs=base_model.input, outputs=tmp)
    #model_obj.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy'])
    model_obj.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    # 返回
    print('%s has %d layers.' % (name, len(model_obj.layers)))

    # Fine Tune处理
    for layer in model_obj.layers[-5:]:
        print("layer name = %s" % layer.name)
        layer.trainable = True
    model_obj.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model_obj


# 预测输出，并保存结果
def do_predict(model_obj, image_size, name):

    # 定义图片生成器
    gen = keras.preprocessing.image.ImageDataGenerator()
    train_generator = gen.flow_from_directory("./data/gen_train", image_size, shuffle=False, class_mode="binary")
    val_generator = gen.flow_from_directory("./data/gen_val", image_size, shuffle=False, class_mode="binary")
    test_generator = gen.flow_from_directory("./data/gen_test", image_size, shuffle=False, class_mode=None)
    print(len(train_generator), len(val_generator), len(test_generator))

    # 训练
    check_pt = keras.callbacks.ModelCheckpoint('%s_{epoch:02d}_{val_loss:.4f}.hdf5' % name
        , monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, period=1)
    early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.002
        , patience=5, verbose=1,mode='auto')  
    model_obj.fit_generator(train_generator, len(train_generator), epochs=10, verbose=1
        , validation_data=val_generator, validation_steps=len(val_generator), callbacks=[check_pt, early_stop])
 
    # 预测输出
    pred = model_obj.predict_generator(test_generator, verbose=1)
    pred = pred.clip(min=0.005, max=0.995)

    df = pd.read_csv("sample_submission.csv")
    for i, fname in enumerate(test_generator.filenames):
        index = int(fname[fname.rfind('/')+1:fname.rfind('.')])
        df.set_value(index-1, 'label', pred[i])

    # 保存结果
    df.to_csv('submission_%s.csv' % name, index=None)
    df.head(20)
    
# 筛选测试
def select_best_model(pre_model, image_size, pre_input, name):
    
    # 构建模型
    model_obj = buid_model(pre_model, image_size, pre_input, name)
    
    # 预测输出
    do_predict(model_obj, image_size, name)

## 模型InceptionV3
预训练模型InceptionV3

In [4]:
select_best_model(keras.applications.inception_v3.InceptionV3, (299, 299)
                  , keras.applications.inception_v3.preprocess_input, "InceptionV3")

InceptionV3 has 315 layers.
Found 20000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.
625 157 391
Epoch 1/10

Epoch 00001: saving model to InceptionV3_01_0.6262.hdf5
Epoch 2/10

Epoch 00002: saving model to InceptionV3_02_0.6687.hdf5
Epoch 3/10

Epoch 00003: saving model to InceptionV3_03_0.6350.hdf5
Epoch 4/10

Epoch 00004: saving model to InceptionV3_04_0.6929.hdf5
Epoch 5/10

Epoch 00005: saving model to InceptionV3_05_0.7025.hdf5
Epoch 6/10

Epoch 00006: saving model to InceptionV3_06_0.5832.hdf5
Epoch 7/10

Epoch 00007: saving model to InceptionV3_07_0.9229.hdf5
Epoch 8/10

Epoch 00008: saving model to InceptionV3_08_1.0480.hdf5
Epoch 9/10

Epoch 00009: saving model to InceptionV3_09_0.8218.hdf5
Epoch 10/10

Epoch 00010: saving model to InceptionV3_10_0.6537.hdf5




## 模型Xception
预训练模型Xception

In [5]:
select_best_model(keras.applications.xception.Xception, (299, 299)
                  , keras.applications.xception.preprocess_input, "Xception")

Xception has 136 layers.
Found 20000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.
625 157 391
Epoch 1/10

KeyboardInterrupt: 

## 模型Inception ResnetV2
预训练模型Inception ResnetV2

In [None]:
select_best_model(keras.applications.inception_resnet_v2.InceptionResNetV2, (299, 299)
                  , keras.applications.inception_resnet_v2.preprocess_input, "Inception ResnetV2")