## CIFAR-10数据集介绍

CIFAR-10数据集含有6万个32*32的彩色图像，共分为10种类型，由 Alex Krizhevsky, Vinod Nair和 Geoffrey Hinton收集而来。包含50000张训练图片，10000张测试图片

[CIFAR_10数据集](http://www.cs.toronto.edu/~kriz/cifar.html)

数据集的数据存在一个10000 * 3072 的 numpy数组中，单位是uint8s，3072是存储了一个32 * 32的彩色图像。（3072=1024* 3）。前1024位是r值，中间1024是g值，后面1024是b值。

![](http://img.blog.csdn.net/20150312153659274?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvbHlubmFuZHdlaQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center)

## 程序说明
时间: 2018-01-25  
说明: 这是一个使用卷积网络在CIFAR-10数据集上做分类的程序，其中使用了数据增强。  
数据集: CIFAR-10

## 加载keras模块

In [1]:
from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.utils import np_utils

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## 变量初始化

In [2]:
batch_size = 32
nb_classes = 10
epochs = 20
data_augmentation = True

# input image dimensions
img_rows, img_cols = 32, 32
# the CIFAR-10 IMAGE are RGB
img_channels = 3

## 准备数据

In [3]:
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


## 数据压缩0-1

In [13]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

In [4]:
X_train.shape

(50000, 32, 32, 3)

## 转换类标号

In [5]:
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

## 建立模型 

## 使用Sequential()

In [10]:
model = Sequential()
model.add(Convolution2D(32, kernel_size=(3, 3), strides=(1, 1), padding='same',input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

model.add(Convolution2D(64, kernel_size=(3, 3), strides=(1, 1), padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

  after removing the cwd from sys.path.
  # This is added back by InteractiveShellApp.init_path()


## 打印模型

In [11]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_7 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 30, 30, 32)        9248      
_________________________________________________________________
activation_8 (Activation)    (None, 30, 30, 32)        0         
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 15, 15, 32)        0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 15, 15, 64)        18496     
__________

## 训练和评估

## 编译模型

In [12]:
# let's train the model using SGD + momentum
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
             optimizer=sgd,
             metrics=['accuracy'])

## 数据增强

数据增强使用ImageDataGenetator 这个函数

In [14]:
if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(X_train, Y_train,
             batch_size=batch_size, 
             epochs=epochs,
             validation_data=(X_test, Y_test),
             shuffle=True)
else:
    print("Using real-time data augmentation")
    
    # 这将做预处理和实时数据增加
    datagen = ImageDataGenerator(
        featurewise_center=False, # 在数据集上将输入平均值设置为0
        samplewise_center=False, # 将每个样本均值设置为0
        featurewise_std_normalization=False, # 将输入除以数据集的std
        samplewise_std_normalization=False, # 将输入除以std
        zca_whitening=False, # 应用ZCA白化
        rotation_range=0,
        width_shift_range=0.1,
        height_shift_range=0.1,
        horizontal_flip=True,
        vertical_flip=False
    )
    
    datagen.fit(X_train)
    model.fit_generator(datagen.flow(X_train, Y_train,
                                    batch_size=batch_size),
                       samples_per_epoch=X_train.shape[0],
                       epochs=epochs,
                       validation_data=(X_test, Y_test))

Using real-time data augmentation




Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
