数据集下载地址：https://www.kesci.com/home/dataset/5bfe734a954d6e0010683839/project

# [牛刀小试]基于TibetanMNIST数据集的图像分类与图像匹配

## 主要内容
- 基于TibetanMNIST数据集进行图像分类实验 

  >  LogisticRegression、多层MLP、卷积神经网络
   
- 基于TibetanMNIST数据集进行图像相似判定实验

  > 孪生网络Siamese Network

## 效果展示
原先在慕课网学习了关于TensorFlow的教程，网址忘了，应该能直接搜到。最终的项目是一个应用，后端是用的node.js，不过自己没有调通，后来改成jquery+flask；然后自己最近在学Keras，所以写了基于Tibetan的Keras的版本，完整的项目在我的github [【TensorFlow-MNIST-WEBAPP】](https://github.com/yanqiangmiffy/tibetan-mnist/tree/master/webapp/tibetan-keras)

- 藏文数字与阿拉伯数字对照表

![](https://upload-images.jianshu.io/upload_images/1531909-1dd822293b968574.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

- 数据示例
![](https://cdn.kesci.com/upload/image/pixx4l3j4p.jpg?imageView2/0/w/320/h/320)

- 识别结果
CNN的效果比较好，能够更好的泛化
![](https://upload-images.jianshu.io/upload_images/1531909-b41b31a9e46aa5d6.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

## 加载数据

In [4]:
import numpy as np
from keras.utils import np_utils
from sklearn.model_selection import train_test_split

# 参数设置
batch_size=128
n_classes=10
n_epochs=20

# img_rows=28
# img_cols=28

# 加载数据
data = np.load('/home/kesci/input/TibetanMNIST5610/TibetanMNIST.npz')
X, y = data['image'], data['label']  # (17768, 28, 28)
X = X.reshape(17768, 784).astype('float32') / 255
y = np_utils.to_categorical(y, num_classes=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

## 函数定义

In [3]:
import matplotlib.pyplot as plt
def draw_train(history):
    '''
    绘制训练曲线
    :param history:
    :return:
    '''

    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('Model accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper left')
    plt.show()

    # Plot training & validation loss values
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper left')
    plt.show()
    
def draw_img(X):
    """
    画图像
    :return:
    """
    plt.imshow(X, cmap='gray')
    plt.show()


## 基于TibetanMNIST数据集进行图像分类实验

### 1 逻辑回归

In [6]:
from keras.models import  Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.optimizers import RMSprop
from sklearn.model_selection import train_test_split

model=Sequential()
model.add(Dense(input_shape=(784,),units=10,activation='softmax'))
model.compile(optimizer=RMSprop(lr=0.001,rho=0.9),loss='binary_crossentropy',metrics=['accuracy'])
model.summary()
history=model.fit(X_train,y_train,
                  batch_size=batch_size,
                  epochs=n_epochs,
                  verbose=1,
                  validation_data=(X_test,y_test))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________
Train on 13326 samples, validate on 4442 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [7]:
evaluation=model.evaluate(X_test,y_test)
print('Summary: Loss over the test dataset: %.4f, Accuracy: %.4f' % (evaluation[0], evaluation[1]))
draw_train(history)

Summary: Loss over the test dataset: 0.0430, Accuracy: 0.9853


### 2 多层MLP

In [8]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
model = Sequential()
model.add(Dense(input_shape=(784,), units=512, activation='relu',name="Dense1"))
model.add(Dropout(0.2,name='Dropout1'))
model.add(Dense(input_shape=(512,), units=256, activation='relu',name='Dense2'))
model.add(Dropout(0.2,name='Dropout2'))
model.add(Dense(input_shape=(256,), units=10, activation='softmax',name='Dense3'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

history=model.fit(x=X_train, y=y_train,
          batch_size=128,
          epochs=20,
          verbose=1,
          validation_data=(X_test, y_test))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Dense1 (Dense)               (None, 512)               401920    
_________________________________________________________________
Dropout1 (Dropout)           (None, 512)               0         
_________________________________________________________________
Dense2 (Dense)               (None, 256)               131328    
_________________________________________________________________
Dropout2 (Dropout)           (None, 256)               0         
_________________________________________________________________
Dense3 (Dense)               (None, 10)                2570      
Total params: 535,818
Trainable params: 535,818
Non-trainable params: 0
_________________________________________________________________
Train on 13326 samples, validate on 4442 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20

In [9]:
evaluation = model.evaluate(X_test, y_test, verbose=1)
print('Summary: Loss over the test dataset: %.4f, Accuracy: %.4f' % (evaluation[0], evaluation[1]))
draw_train(history)

Summary: Loss over the test dataset: 0.0226, Accuracy: 0.9944


In [10]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
SVG(model_to_dot(model,show_shapes=True).create(prog='dot', format='svg'))

### 3 CNN 卷积神经网络

在这里我们尝试下经典的卷积神经网络，用于图像分类，需要注意的是我们这里加载的数据方式（y_test_）与前面有所区别（为了后面的验证）

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.optimizers import Adam
from keras.layers.normalization import BatchNormalization
from keras.utils import np_utils
from keras.layers import Conv2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
# 参数设置
batch_size=128
n_classes=10
n_epochs=20


# 加载数据
data = np.load('/home/kesci/input/TibetanMNIST5610/TibetanMNIST.npz')
X, y = data['image'], data['label']  # (17768, 28, 28)
X = X.reshape(X.shape[0], 28,28,1).astype('float32') / 255
X_train, X_test, y_train, y_test_ = train_test_split(X, y, random_state=42)
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test_, num_classes=10)

Using TensorFlow backend.


In [6]:
# 构建模型
input_shape=(28,28,1)
# 构建模型
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
BatchNormalization(axis=-1)
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

BatchNormalization(axis=-1)
model.add(Conv2D(64,(3, 3)))
model.add(Activation('relu'))
BatchNormalization(axis=-1)
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())

BatchNormalization()
model.add(Dense(512))
model.add(Activation('relu'))
BatchNormalization()
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.summary()

# 训练模型
gen = ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
                         height_shift_range=0.08, zoom_range=0.08)
test_gen = ImageDataGenerator()
train_generator = gen.flow(X_train, y_train, batch_size=64)
test_generator = test_gen.flow(X_test, y_test, batch_size=64)
history=model.fit_generator(train_generator, steps_per_epoch=60000//64, epochs=20,
                    validation_data=test_generator, validation_steps=10000//64)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
activation_7 (Activation)    (None, 26, 26, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
activation_8 (Activation)    (None, 24, 24, 32)        0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 10, 10, 64)        18496     
_________________________________________________________________
activation_9 (Activation)    (None, 10, 10, 64)        0         
__________

In [7]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
SVG(model_to_dot(model,show_shapes=True).create(prog='dot', format='svg'))

In [8]:
evaluation = model.evaluate(X_test, y_test, verbose=1)
print('Summary: Loss over the test dataset: %.4f, Accuracy: %.4f' % (evaluation[0], evaluation[1]))
draw_train(history)

Summary: Loss over the test dataset: 0.0629, Accuracy: 0.9910


**展示下预测的正确标签与错误标签**

In [9]:
predicted_classes = model.predict_classes(X_test)
correct_indices = np.nonzero(predicted_classes == y_test_)[0]
incorrect_indices = np.nonzero(predicted_classes != y_test_)[0]

plt.figure(figsize=(9,10))
for i, correct in enumerate(correct_indices[:9]):
    plt.subplot(3, 3, i + 1)
    plt.imshow(X_test[correct].reshape(28, 28), cmap='gray', interpolation='none')
    plt.title("Predicted {}, True_Class {}".format(predicted_classes[correct], y_test_[correct]))
plt.show()
print(type(predicted_classes))
print(type(y_test_))
print(predicted_classes)

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
[6 9 1 ... 3 5 1]


In [10]:
plt.figure(figsize=(9,10))
for i, incorrect in enumerate(incorrect_indices[:9]):
    plt.subplot(3, 3, i + 1)
    plt.imshow(X_test[incorrect].reshape(28, 28), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test_[incorrect]))
plt.show()

**通过实验我们可以看到，这个三个模型在本数据集上都表现不错，准确率最高的是多层MLP，可以达到99%多，逻辑回归和CNN可以达到98%**

## 基于TibetanMNIST数据集进行图像相似判定实验
刚好前几天做了个关于语义相似性的项目，最早Siamese用于图像匹配中，更多关于Siamese的介绍请看[sentence-similarity](https://github.com/yanqiangmiffy/sentence-similarity)

![](https://upload-images.jianshu.io/upload_images/10313103-32ad7ac7475506b6.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/600/format/webp)

In [1]:
import numpy as np
import random
from sklearn.model_selection import train_test_split
from keras.models import Model
from keras.utils import np_utils
from keras.layers import Dense, Input, Flatten
from keras.layers import Dropout, Activation, Lambda
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import Adadelta
from keras.callbacks import ModelCheckpoint
import keras.backend as K

Using TensorFlow backend.


## 准备数据

这一部分主要目的：准备image pair；准备正负样本

In [2]:
# 准备数据
data = np.load('/home/kesci/input/TibetanMNIST5610/TibetanMNIST.npz')
X, y = data['image'], data['label']  # (17768, 28, 28)
X = X.astype('float32') / 255
# y = np_utils.to_categorical(y, num_classes=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [3]:
# 创建正负样本 比如 x为[img_0,img_0],那么label为1，x为[img_0,img_1],label为0
def create_pairs(x, digit_indices, num_classes):
    pairs = []
    labels = []
    n = min([len(digit_indices[j]) for j in range(num_classes)]) - 1
    for j in range(num_classes):
        for i in range(n):
            p1, p2 = digit_indices[j][i], digit_indices[j][i + 1]
            pairs += [[x[p1], x[p2]]]
            inc = random.randrange(1, num_classes)
            jn = (j + inc) % num_classes
            p1, p2 = digit_indices[j][i], digit_indices[jn][i]
            pairs += [[x[p1], x[p2]]]
            labels += [1, 0]
    return np.array(pairs), np.array(labels)

In [4]:
num_classes = len(np.unique(y_train))
# 训练集
digit_indices = [np.where(y_train == i)[0] for i in range(0, 10)]
train_pairs, train_y = create_pairs(X_train, digit_indices, num_classes)
# 测试集
digit_indices = [np.where(y_test == i)[0] for i in range(0, 10)]
test_pairs, test_y = create_pairs(X_test, digit_indices, num_classes)

## 函数定义

In [5]:
# 损失函数
def contrastive_loss(y_true, y_pred):
    margin = 1
    sq_pred = K.square(y_pred)
    margin_sq = K.square(K.maximum(margin - y_pred, 0))
    loss = K.mean(y_true * sq_pred + (1 - y_true) * margin_sq)
    return loss

In [6]:
# 欧式距离
def euclidean_distance(vects):
    x, y = vects
    sum_sq = K.sum(K.square(x - y), axis=1, keepdims=True)
    distance = K.sqrt(K.maximum(sum_sq, K.epsilon()))
    return distance

def eucl_shape(shape):
    shape1, shape2 = shape
    return (shape1[0], 1)

## 创建Siamese Network

In [7]:
# 创建Siamese模型
def shared_network(input_shape):
    """
    共享层
    :param input_shape:
    :return:
    """
    input = Input(shape=input_shape)
    layer = Flatten()(input)
    layer = Dense(128, activation='relu')(layer)
    layer = Dropout(0.1)(layer)
    layer = Dense(128, activation='relu')(layer)
    layer = Dropout(0.1)(layer)
    layer = Dense(128, activation='relu')(layer)
    return Model(input, layer)

In [8]:
input_shape = X_train.shape[1:]
siamese = shared_network(input_shape)
input_left = Input(shape=input_shape)
input_right = Input(shape=input_shape)
output_left = siamese(input_left)
output_right = siamese(input_right)


In [9]:
distance = Lambda(euclidean_distance,
                  output_shape=eucl_shape)([output_left, output_right])
model = Model([input_left, input_right], distance)
ada = Adadelta()

需要注意的是我们模型最后的输出是distance，即欧式距离，而不是概率，所以我们需要重新定义acc，即如果两张图片距离小于0.5，那么为我们认为两张图片相似或者一样

In [10]:
threshold = 0.5
def acc(y_true, y_pred):
    return K.mean(K.equal(y_true, K.cast(y_pred < threshold, y_true.dtype)))

In [11]:
model.compile(loss=contrastive_loss, optimizer=ada, metrics=[acc])
filepath = "model.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc',
                             verbose=1,
                             save_best_only=True)

In [12]:
history = model.fit([train_pairs[:, 0], train_pairs[:, 1]], train_y,
                    batch_size=128,
                    epochs=20,
                    callbacks=[checkpoint],
                    validation_data=([test_pairs[:, 0], test_pairs[:, 1]], test_y))

Train on 19720 samples, validate on 6280 samples
Epoch 1/20

Epoch 00001: val_acc improved from -inf to 0.84602, saving model to model.hdf5
Epoch 2/20

Epoch 00002: val_acc improved from 0.84602 to 0.91131, saving model to model.hdf5
Epoch 3/20

Epoch 00003: val_acc improved from 0.91131 to 0.92850, saving model to model.hdf5
Epoch 4/20

Epoch 00004: val_acc improved from 0.92850 to 0.93010, saving model to model.hdf5
Epoch 5/20

Epoch 00005: val_acc improved from 0.93010 to 0.93678, saving model to model.hdf5
Epoch 6/20

Epoch 00006: val_acc did not improve
Epoch 7/20

Epoch 00007: val_acc improved from 0.93678 to 0.93726, saving model to model.hdf5
Epoch 8/20

Epoch 00008: val_acc improved from 0.93726 to 0.94092, saving model to model.hdf5
Epoch 9/20

Epoch 00009: val_acc improved from 0.94092 to 0.94873, saving model to model.hdf5
Epoch 10/20

Epoch 00010: val_acc did not improve
Epoch 11/20

Epoch 00011: val_acc did not improve
Epoch 12/20

Epoch 00012: val_acc improved from 0.948

In [15]:
draw_train(history)

## 模型评估
我们看下在训练集和测试集上面的准确率

In [16]:
# 评估模型
# 加载模型 best for test
model.load_weights("model.hdf5")
def compute_accuracy(y_true, y_pred):
    pred = y_pred.ravel() < threshold
    return np.mean(pred == y_true)


y_pred = model.predict([train_pairs[:, 0], train_pairs[:, 1]])
train_acc = compute_accuracy(train_y, y_pred)
y_pred = model.predict([test_pairs[:, 0], test_pairs[:, 1]])
test_acc = compute_accuracy(test_y, y_pred)
print('Accuracy on training set: %.4f' % train_acc)
print('Accuracy on test set: %.4f' % test_acc)

Accuracy on training set: 0.9916
Accuracy on test set: 0.9591


**预测两张图片是否相似的正确情况**

In [31]:
pred = y_pred.ravel() < threshold
pred=pred+0
correct_indices = np.nonzero(pred==test_y)[0]
incorrect_indices = np.nonzero(pred!=test_y)[0]

for i, correct in enumerate(correct_indices[:4]):
    print("Group:{},Predicted {}, Label {}".format(i+1,pred[correct], test_y[correct]))
    plt.figure()
    plt.subplot(221)
    plt.imshow(train_pairs[:, 0][correct].reshape(28, 28), cmap='gray', interpolation='none')
    plt.subplot(222)
    plt.imshow(train_pairs[:, 1][correct].reshape(28, 28), cmap='gray', interpolation='none')
    plt.show()


Group:1,Predicted 1, Label 1


Group:2,Predicted 0, Label 0


Group:3,Predicted 1, Label 1


Group:4,Predicted 0, Label 0


**预测两张图片是否相似的错误情况**

In [36]:
for i, correct in enumerate(incorrect_indices[:4]):
    print("Group:{},Predicted {}, Label {}".format(i+1,pred[correct], test_y[correct]))
    plt.figure()
    plt.subplot(221)
    plt.imshow(train_pairs[:, 0][correct].reshape(28, 28), cmap='gray', interpolation='none')
    plt.subplot(222)
    plt.imshow(train_pairs[:, 1][correct].reshape(28, 28), cmap='gray', interpolation='none')
    plt.show()


Group:1,Predicted 0, Label 1


Group:2,Predicted 0, Label 1


Group:3,Predicted 0, Label 1


Group:4,Predicted 0, Label 1
