# Artificial Intelligence Nanodegree

## Convolutional Neural Networks 基于Kera的迁移学习实现 (从udacity复制来的源码)

---

In this notebook, we use transfer learning to train a CNN to classify pigs.

### 1. Load Pig Dataset

Before running the code cell below, download the dataset of pig images and place it in the respository.

In [1]:
## 导入运行库
import tensorflow as tf
from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob

Using TensorFlow backend.


In [2]:
## 我的电脑内存设置有问题所以必须运行这行，根据情况自行选择
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

In [3]:
# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    pig_files = np.array(data['filenames'])
    pig_targets = np_utils.to_categorical(np.array(data['target']), 30)
    return pig_files, pig_targets

# load train, test, and validation datasets
train_files, train_targets = load_dataset('pigImages/train')
valid_files, valid_targets = load_dataset('pigImages/valid')
test_files, test_targets = load_dataset('pigImages/test')

# load ordered list of dog names
pig_names = [item[25:-1] for item in glob('pigImages/train/*/')]

# print statistics about the dataset
print('There are %d total pig categories.' % len(pig_names))
print('There are %s total pig images.\n' % str(len(train_files) + len(valid_files) + len(test_files)))
print('There are %d training pig images.' % len(train_files))
print('There are %d validation pig images.' % len(valid_files))
print('There are %d test pig images.'% len(test_files))


There are 30 total pig categories.
There are 2056 total pig images.

There are 1694 training pig images.
There are 362 validation pig images.
There are 0 test pig images.


### 2. Visualize the First 12 Training Images 同时保存dataset

In [None]:
keras.applications.xception.Xception(include_top=True, weights='imagenet',
                                    input_tensor=None, input_shape=None,
                                    pooling=None, classes=1000)
    

### 3. Obtain the Pre-trained Model Bottleneck Features 下载在tf下训练好的模型
Before running the code cell below, download the pretrained npz file and place it in the `bottleneck_features/` folder. 参考github上 

In [None]:
### 
from keras.applications.xception import Xception #VGG16
### 设置model
model = Xception(include_top=False, weights='imagenet')
### 
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
datagen = ImageDataGenerator(rescale=1./255)

# 训练集图像生成器
generator1 = datagen.flow_from_directory(
        "E:/DeepLearning/PigRecog/TransferLearn/pigImages/train",
        target_size=(299,299),
        batch_size=16,
        class_mode=None,
        shuffle=False) 
#　验证集图像生成器
generator2 = datagen.flow_from_directory(
        "E:/DeepLearning/PigRecog/TransferLearn/pigImages/valid",
        target_size=(299, 299),
        batch_size=16,
        class_mode=None,
        shuffle=False)
#　测试集图像生成器
generator3 = datagen.flow_from_directory(
        "E:/DeepLearning/PigRecog/TransferLearn/pigImages/test",
        target_size=(299, 299),
        batch_size=16,
        class_mode=None,
        shuffle=False)

#（2）灌入pre-model的权重
model.load_weights("C:/Users/Administrator/.keras/models/xception_weights_tf_dim_ordering_tf_kernels_notop.h5")

#（3）得到bottleneck feature
bottleneck_features_train = model.predict_generator(generator1, 500)
# 核心，steps是生成器要返回数据的轮数，每个epoch含有500张图片，与model.fit(samples_per_epoch)相对
np.save(open('bottleneck_features_train.npy', 'wb'), bottleneck_features_train)

bottleneck_features_validation = model.predict_generator(generator2, 100)
# 与model.fit(nb_val_samples)相对，一个epoch有800张图片，验证集
np.save(open('bottleneck_features_validation.npy', 'wb'), bottleneck_features_validation)

bottleneck_features_validation = model.predict_generator(generator3, 100)
# 与model.fit(nb_val_samples)相对，一个epoch有800张图片，测试集
np.save(open('bottleneck_features_test.npy', 'wb'), bottleneck_features_validation)

In [None]:
bottleneck_features = np.load('E:/DeepLearning/PigRecog/TransferLearn/bottleneck_features_train.npy') # 请自行修改npz文件名， 需要重新训练bottleneck_feastures并保存好
train_vgg16 = np.load(open('E:/DeepLearning/PigRecog/TransferLearn/bottleneck_features_train.npy','rb'))
valid_vgg16 = np.load(open('E:/DeepLearning/PigRecog/TransferLearn/bottleneck_features_validation.npy','rb'))
test_vgg16 = np.load(open('E:/DeepLearning/PigRecog/TransferLearn/bottleneck_features_test.npy','rb'))

### 5. 定义网络结构，检查网络模型
先用 from keras.layers import 命令导入需要用到的网络模型，再用 model.add 命令叠加网络。 
具体操作可参考keras文档：https://keras.io/getting-started/sequential-model-guide/ 

In [None]:
from keras.layers import Dense, Flatten
from keras.models import Sequential
from keras.layers import GlobalAveragePooling2D
from keras.layers import Conv2D, MaxPooling2D

model = Sequential()
#model.add(Conv2D(512, (3, 3), activation='relu', input_shape=(10, 10, 2048)))
#model.add(MaxPooling2D(pool_size=(2, 2)))
#model.add(GlobalAveragePooling2D())
model.add(Flatten(input_shape=(10, 10, 2048)))
#model.add(Conv2D(64, (3, 3), activation='relu'))
#model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1024, activation='relu'))
model.add(Dense(30, activation='softmax'))
model.summary()

### 6. Compile the Model 组装模型

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
                  metrics=['accuracy'])

### 7. Train the Model 训练模型，自动存档，下次自动从上次训练保存处开始

In [None]:
from keras.callbacks import ModelCheckpoint 

# train the model
checkpointer = ModelCheckpoint(filepath='dogvgg16.weights.best.hdf5', verbose=1, 
                               save_best_only=True)
model.fit(train_vgg16, train_targets, epochs=100, validation_data=(valid_vgg16, valid_targets), 
          callbacks=[checkpointer], verbose=1, shuffle=True)

### 8. Load the Model with the Best Validation Accuracy 读取存储好的最佳模型

In [None]:
# load the weights that yielded the best validation accuracy
model.load_weights('dogvgg16.weights.best.hdf5')

### 9. Calculate Classification Accuracy on Test Set 测试集

In [None]:
# get index of predicted dog breed for each image in test set
vgg16_predictions = [np.argmax(model.predict(np.expand_dims(feature, axis=0))) 
                     for feature in test_vgg16]

# report test accuracy
test_accuracy = 100*np.sum(np.array(vgg16_predictions)==
                           np.argmax(test_targets, axis=1))/len(vgg16_predictions)
print('\nTest accuracy: %.4f%%' % test_accuracy)