### 比赛简介

心电图是检查心脏疾病常见手段之一，本次比赛使用PTB心电图数据集，输入电压：±16 mV，输入电阻：100Ω（DC），分辨率：16位，带宽：0-1 kHz。数据集包含心跳正常、心跳异常两种类型，选手需判断分类，正常=0，不正常=1。

数据集包含16个输入，其中14个心跳数据，1个呼吸数据，1个电压数据

频率：每秒1000次扫描

分辨率：16位

带宽：0-1 kHz

### 评审标准

我们将会对比选手提交的csv文件，确认正确识别样本数据


True：模型分类正确数量
Total ：测试集样本总数量

### 数据格式查看

In [1]:
import os
import pandas as pd

train_df = pd.read_csv("./ptbdb_train.csv", header=None)
train_df.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,178,179,180,181,182,183,184,185,186,187
0,1.0,0.88806,0.448561,0.166045,0.125533,0.118603,0.083955,0.063966,0.055437,0.05677,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,0.991337,0.594059,0.29146,0.0,0.045173,0.123762,0.174505,0.195545,0.17698,0.225866,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,0.950226,0.975113,0.967195,0.993213,0.997738,1.0,0.919683,0.809955,0.813348,0.745475,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,1.0,0.55701,0.290867,0.178592,0.165212,0.121873,0.111111,0.095695,0.087551,0.084933,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.474798,0.191612,0.040015,0.09696,0.137745,0.143901,0.127741,0.123894,0.128511,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


一共188列，前187列为输入，最后一列为标签。0表示正常，1表示不正常。

In [2]:
train_df[187].value_counts()

1.0    3504
0.0    3496
Name: 187, dtype: int64

可以看出正常和不正常的标签分布平衡。

### 代码

**1.导入所需模块**

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from tensorflow import keras
import tensorflow.keras.backend as K
from tensorflow.keras import Input
from tensorflow.keras.layers import Layer, Conv1D, Conv2D, MaxPool1D, MaxPool2D, Dense, Concatenate, RepeatVector, Reshape
from tensorflow.keras.layers import GlobalAveragePooling1D, GlobalAveragePooling2D, Multiply, Flatten, Dropout
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

from sklearn.model_selection import StratifiedKFold, KFold
from sklearn.preprocessing import MinMaxScaler
from scipy.stats import pearsonr

import os

**2.加载训练集和待预测的数据**

In [4]:
# 加载数据
train_file = np.array(pd.read_csv("./ptbdb_train.csv", header=None))
test_file = np.array(pd.read_csv("./ptbdb_test.csv", header=None))

# 准备训练与测试集
train_x = train_file[:, :-1].reshape(-1, 1, 187, 1)
train_y = train_file[:, -1]

test_x = test_file.reshape(-1, 1, 187, 1)

del train_file, test_file

**3.定义一个inception_block块**

In [5]:
def inception_block(inputs, filters):
    """
    Inception Block模块
    :param inputs:输入 Tensor
    :param filters:滤波器个数
    :return 输出 Tensor
    """ 
    # 分支1
    conv_1 = Conv2D(filters=filters, kernel_size=[1, 1], strides=[1, 1], padding="same", activation="relu")(inputs)
    

    # 分支2
    conv_2 = Conv2D(filters=filters, kernel_size=[1, 3], strides=[1, 1], padding="same", activation="relu")(inputs)

    # 分支3
    conv_3 = Conv2D(filters=filters, kernel_size=[1, 3], strides=[1, 1], padding="same", activation="relu")(inputs)
    conv_3 = Conv2D(filters=filters, kernel_size=[1, 3], strides=[1, 1], padding="same", activation="relu")(conv_3)

    # 合并
    outputs = Concatenate(axis=3)([conv_1, conv_2, conv_3])
    outputs = Conv2D(filters=filters, kernel_size=[1, 1], strides=[1, 1], padding="same", activation="relu")(outputs)

    return outputs

**4.定义一个se_block模块**

In [6]:
def se_block(inputs, k):
    """
    SE Block模块
    :param inputs:输入 Tensor
    :param k:重标定重要性的参数
    :return 输出 Tensor
    """
    # 输入尺寸
    input_shape = K.int_shape(inputs)

    # 全局平均池化
    outputs = GlobalAveragePooling2D()(inputs)

    # 计算每个通道的重要性
    outputs = Dense(units=int(input_shape[-1] / k), activation="relu")(outputs)
    outputs = Dense(units=input_shape[-1], activation="sigmoid")(outputs)
    
    # 重新标定每个通道
    outputs = RepeatVector(input_shape[1] * input_shape[2])(outputs)
    outputs = Reshape([input_shape[1], input_shape[2], input_shape[3]])(outputs)
    outputs = Multiply()([inputs, outputs])
    
    return outputs

**5.定义一个特征提取层**

In [7]:
def ims_layer(inputs, filters, pool_size):
    """
    特征提取层
    :param inputs:输入 Tensor
    :param filters滤波器个数
    :return 输出 Tensor
    """
    inception = inception_block(inputs, filters)
    pool = MaxPool2D(pool_size=pool_size, strides=pool_size, padding="same")(inception)
    se = se_block(pool, 4)
    
    return se  

**6.定义全连接层**

In [8]:
def fc_layer(inputs, units):
    """
    全连接层
    :param inputs:输入 Tensor
    :return 输出 Tensor
    """
    outputs = Dense(units=units, activation="relu")(inputs)
    outputs = Dropout(0.5)(outputs)
    
    return outputs

**7.构建模型**

In [9]:
def base_model():
    """
    构建模型
    :return 模型句柄
    """
    # 原始输入数据
    x_input = Input(train_x.shape[1:])
    
    # 对原始信号提取特征
    # ims_1
    ims_1 = ims_layer(x_input, 64, [1, 2]) 
    
    # ims_2
    ims_2 = ims_layer(ims_1, 128, [1, 4]) 
    
    # Flatten
    flatten = Flatten()(ims_2)
    
    # fc
    # fc_1
    fc_1 = fc_layer(flatten, 512)
    # fc_2
    fc_2 = fc_layer(fc_1, 128)
                        
    # y_output
    y_output = Dense(units=1, activation="sigmoid")(fc_2)
    
    # 建立模型
    model = Model(inputs=x_input, outputs=y_output)
    # 编译模型
    model.compile(Adam(1e-3), loss='binary_crossentropy', metrics=['binary_accuracy'])
    
    return model

In [10]:
base_model().summary()

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 1, 187, 1)]  0                                            
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 1, 187, 64)   256         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 1, 187, 64)   128         input_1[0][0]                    
________________________________________________________________________________

**8.K折交叉验证模型模型**

In [10]:
k_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=7)

model_list = []

accuracy_list = []

n_fold = 1

for train_index, val_index in k_fold.split(train_x, train_y):
    print("n_flod:{}.".format(n_fold))
    # 训练数据
    X_train = train_x[train_index]
    Y_train = train_y[train_index]
    # 验证数据
    X_val = train_x[val_index]
    Y_val = train_y[val_index]
    
    # 建立模型
    m = base_model()
    
    if os.path.exists("./model/cnn1_{}.h5".format(n_fold)):
        m.load_weights("./model/cnn1_{}.h5".format(n_fold))
    else:
        # 早停函数
        early_stop = EarlyStopping(monitor="val_loss", min_delta=0, patience=50, mode="auto", restore_best_weights=True)
        # 训练
        m.fit(X_train, Y_train, batch_size=128, validation_data=(X_val, Y_val), epochs=5000, callbacks=[early_stop])
        # 保存
        m.save_weights("./model/cnn1_{}.h5".format(n_fold))
    
    n_fold += 1
    
    # 评估
    _, accuracy = m.evaluate(X_val, Y_val, verbose=0)
    print("当前模型正确率为:{}.".format(accuracy))
    
    # 存储模型
    model_list.append(m)
    
    # 存储单个模型的正确率
    accuracy_list.append(accuracy)

print("平均正确率为:{}.".format(np.mean(accuracy_list)))

n_flod:1.
Train on 5599 samples, validate on 1401 samples
Epoch 1/5000
Epoch 2/5000
Epoch 3/5000
Epoch 4/5000
Epoch 5/5000
Epoch 6/5000
Epoch 7/5000
Epoch 8/5000
Epoch 9/5000
Epoch 10/5000
Epoch 11/5000
Epoch 12/5000
Epoch 13/5000
Epoch 14/5000
Epoch 15/5000
Epoch 16/5000
Epoch 17/5000
Epoch 18/5000
Epoch 19/5000
Epoch 20/5000
Epoch 21/5000
Epoch 22/5000
Epoch 23/5000
Epoch 24/5000
Epoch 25/5000
Epoch 26/5000
Epoch 27/5000
Epoch 28/5000
Epoch 29/5000
Epoch 30/5000
Epoch 31/5000
Epoch 32/5000
Epoch 33/5000
Epoch 34/5000
Epoch 35/5000
Epoch 36/5000
Epoch 37/5000
Epoch 38/5000
Epoch 39/5000
Epoch 40/5000
Epoch 41/5000
Epoch 42/5000
Epoch 43/5000
Epoch 44/5000
Epoch 45/5000
Epoch 46/5000
Epoch 47/5000
Epoch 48/5000
Epoch 49/5000
Epoch 50/5000
Epoch 51/5000
Epoch 52/5000
Epoch 53/5000
Epoch 54/5000
Epoch 55/5000
Epoch 56/5000
Epoch 57/5000
Epoch 58/5000
Epoch 59/5000
Epoch 60/5000
Epoch 61/5000
Epoch 62/5000
Epoch 63/5000
Epoch 64/5000
Epoch 65/5000
Epoch 66/5000
Epoch 67/5000
Epoch 68/5000

Epoch 100/5000
Epoch 101/5000
Epoch 102/5000
Epoch 103/5000
Epoch 104/5000
Epoch 105/5000
Epoch 106/5000
Epoch 107/5000
Epoch 108/5000
Epoch 109/5000
Epoch 110/5000
Epoch 111/5000
Epoch 112/5000
Epoch 113/5000
Epoch 114/5000
Epoch 115/5000
Epoch 116/5000
Epoch 117/5000
Epoch 118/5000
Epoch 119/5000
Epoch 120/5000
Epoch 121/5000
Epoch 122/5000
Epoch 123/5000
Epoch 124/5000
Epoch 125/5000
Epoch 126/5000
Epoch 127/5000
Epoch 128/5000
Epoch 129/5000
Epoch 130/5000
Epoch 131/5000
Epoch 132/5000
Epoch 133/5000
Epoch 134/5000
Epoch 135/5000
Epoch 136/5000
Epoch 137/5000
Epoch 138/5000
Epoch 139/5000
Epoch 140/5000
Epoch 141/5000
当前模型正确率为:0.9942898154258728.
n_flod:2.
Train on 5600 samples, validate on 1400 samples
Epoch 1/5000
Epoch 2/5000
Epoch 3/5000
Epoch 4/5000
Epoch 5/5000
Epoch 6/5000


Epoch 7/5000
Epoch 8/5000
Epoch 9/5000
Epoch 10/5000
Epoch 11/5000
Epoch 12/5000
Epoch 13/5000
Epoch 14/5000
Epoch 15/5000
Epoch 16/5000
Epoch 17/5000
Epoch 18/5000
Epoch 19/5000
Epoch 20/5000
Epoch 21/5000
Epoch 22/5000
Epoch 23/5000
Epoch 24/5000
Epoch 25/5000
Epoch 26/5000
Epoch 27/5000
Epoch 28/5000
Epoch 29/5000
Epoch 30/5000
Epoch 31/5000
Epoch 32/5000
Epoch 33/5000
Epoch 34/5000
Epoch 35/5000
Epoch 36/5000
Epoch 37/5000
Epoch 38/5000
Epoch 39/5000
Epoch 40/5000
Epoch 41/5000
Epoch 42/5000
Epoch 43/5000
Epoch 44/5000
Epoch 45/5000
Epoch 46/5000
Epoch 47/5000
Epoch 48/5000
Epoch 49/5000
Epoch 50/5000
Epoch 51/5000
Epoch 52/5000
Epoch 53/5000
Epoch 54/5000
Epoch 55/5000
Epoch 56/5000


Epoch 57/5000
Epoch 58/5000
Epoch 59/5000
Epoch 60/5000
Epoch 61/5000
Epoch 62/5000
Epoch 63/5000
Epoch 64/5000
Epoch 65/5000
Epoch 66/5000
Epoch 67/5000
Epoch 68/5000
Epoch 69/5000
Epoch 70/5000
Epoch 71/5000
Epoch 72/5000
Epoch 73/5000
Epoch 74/5000
Epoch 75/5000
Epoch 76/5000
Epoch 77/5000
Epoch 78/5000
Epoch 79/5000
Epoch 80/5000
Epoch 81/5000
Epoch 82/5000
Epoch 83/5000
Epoch 84/5000
Epoch 85/5000
Epoch 86/5000
Epoch 87/5000
Epoch 88/5000
Epoch 89/5000
Epoch 90/5000
Epoch 91/5000
Epoch 92/5000
Epoch 93/5000
Epoch 94/5000
Epoch 95/5000
Epoch 96/5000
Epoch 97/5000
Epoch 98/5000
当前模型正确率为:0.9864285588264465.
n_flod:3.
Train on 5600 samples, validate on 1400 samples
Epoch 1/5000
Epoch 2/5000
Epoch 3/5000
Epoch 4/5000
Epoch 5/5000
Epoch 6/5000
Epoch 7/5000
Epoch 8/5000


Epoch 9/5000
Epoch 10/5000
Epoch 11/5000
Epoch 12/5000
Epoch 13/5000
Epoch 14/5000
Epoch 15/5000
Epoch 16/5000
Epoch 17/5000
Epoch 18/5000
Epoch 19/5000
Epoch 20/5000
Epoch 21/5000
Epoch 22/5000
Epoch 23/5000
Epoch 24/5000
Epoch 25/5000
Epoch 26/5000
Epoch 27/5000
Epoch 28/5000
Epoch 29/5000
Epoch 30/5000
Epoch 31/5000
Epoch 32/5000
Epoch 33/5000
Epoch 34/5000
Epoch 35/5000
Epoch 36/5000
Epoch 37/5000
Epoch 38/5000
Epoch 39/5000
Epoch 40/5000
Epoch 41/5000
Epoch 42/5000
Epoch 43/5000
Epoch 44/5000
Epoch 45/5000
Epoch 46/5000
Epoch 47/5000
Epoch 48/5000
Epoch 49/5000
Epoch 50/5000
Epoch 51/5000
Epoch 52/5000
Epoch 53/5000
Epoch 54/5000
Epoch 55/5000
Epoch 56/5000
Epoch 57/5000


Epoch 58/5000
Epoch 59/5000
Epoch 60/5000
Epoch 61/5000
Epoch 62/5000
Epoch 63/5000
Epoch 64/5000
Epoch 65/5000
Epoch 66/5000
Epoch 67/5000
Epoch 68/5000
Epoch 69/5000
Epoch 70/5000
当前模型正确率为:0.9885714054107666.
n_flod:4.
Train on 5600 samples, validate on 1400 samples
Epoch 1/5000
Epoch 2/5000
Epoch 3/5000
Epoch 4/5000
Epoch 5/5000
Epoch 6/5000
Epoch 7/5000
Epoch 8/5000
Epoch 9/5000
Epoch 10/5000
Epoch 11/5000
Epoch 12/5000
Epoch 13/5000
Epoch 14/5000
Epoch 15/5000
Epoch 16/5000
Epoch 17/5000
Epoch 18/5000
Epoch 19/5000
Epoch 20/5000
Epoch 21/5000
Epoch 22/5000
Epoch 23/5000
Epoch 24/5000
Epoch 25/5000
Epoch 26/5000
Epoch 27/5000
Epoch 28/5000
Epoch 29/5000
Epoch 30/5000
Epoch 31/5000
Epoch 32/5000
Epoch 33/5000
Epoch 34/5000
Epoch 35/5000
Epoch 36/5000
Epoch 37/5000
Epoch 38/5000
Epoch 39/5000
Epoch 40/5000
Epoch 41/5000
Epoch 42/5000
Epoch 43/5000
Epoch 44/5000
Epoch 45/5000
Epoch 46/5000
Epoch 47/5000
Epoch 48/5000
Epoch 49/5000
Epoch 50/5000
Epoch 51/5000
Epoch 52/5000
Epoch 53/500

Epoch 14/5000
Epoch 15/5000
Epoch 16/5000
Epoch 17/5000
Epoch 18/5000
Epoch 19/5000
Epoch 20/5000
Epoch 21/5000
Epoch 22/5000
Epoch 23/5000
Epoch 24/5000
Epoch 25/5000
Epoch 26/5000
Epoch 27/5000
Epoch 28/5000
Epoch 29/5000
Epoch 30/5000
Epoch 31/5000
Epoch 32/5000
Epoch 33/5000
Epoch 34/5000
Epoch 35/5000
Epoch 36/5000
Epoch 37/5000
Epoch 38/5000
Epoch 39/5000
Epoch 40/5000
Epoch 41/5000
Epoch 42/5000
Epoch 43/5000
Epoch 44/5000
Epoch 45/5000
Epoch 46/5000
Epoch 47/5000
Epoch 48/5000
Epoch 49/5000
Epoch 50/5000
Epoch 51/5000
Epoch 52/5000
Epoch 53/5000
Epoch 54/5000
Epoch 55/5000
Epoch 56/5000
Epoch 57/5000
Epoch 58/5000
Epoch 59/5000
Epoch 60/5000
Epoch 61/5000
Epoch 62/5000


Epoch 63/5000
Epoch 64/5000
Epoch 65/5000
Epoch 66/5000
Epoch 67/5000
Epoch 68/5000
Epoch 69/5000
Epoch 70/5000
Epoch 71/5000
Epoch 72/5000
Epoch 73/5000
Epoch 74/5000
Epoch 75/5000
Epoch 76/5000
Epoch 77/5000
Epoch 78/5000
Epoch 79/5000
Epoch 80/5000
Epoch 81/5000
Epoch 82/5000
Epoch 83/5000
Epoch 84/5000
Epoch 85/5000
Epoch 86/5000
Epoch 87/5000
Epoch 88/5000
Epoch 89/5000
Epoch 90/5000
Epoch 91/5000
Epoch 92/5000
Epoch 93/5000
Epoch 94/5000
Epoch 95/5000
Epoch 96/5000
Epoch 97/5000
Epoch 98/5000
Epoch 99/5000
Epoch 100/5000
Epoch 101/5000
Epoch 102/5000
Epoch 103/5000
Epoch 104/5000
Epoch 105/5000
Epoch 106/5000
Epoch 107/5000
Epoch 108/5000
Epoch 109/5000
Epoch 110/5000
Epoch 111/5000
Epoch 112/5000
Epoch 113/5000
Epoch 114/5000
Epoch 115/5000
Epoch 116/5000
Epoch 117/5000
Epoch 118/5000
Epoch 119/5000
Epoch 120/5000
Epoch 121/5000
Epoch 122/5000
Epoch 123/5000
Epoch 124/5000
Epoch 125/5000
Epoch 126/5000
Epoch 127/5000
Epoch 128/5000
Epoch 129/5000
Epoch 130/5000
Epoch 131/5000
Ep

Epoch 162/5000
Epoch 163/5000
Epoch 164/5000
Epoch 165/5000
Epoch 166/5000
Epoch 167/5000
Epoch 168/5000
Epoch 169/5000
Epoch 170/5000
Epoch 171/5000
Epoch 172/5000
Epoch 173/5000
Epoch 174/5000
Epoch 175/5000
Epoch 176/5000
Epoch 177/5000
Epoch 178/5000
Epoch 179/5000
Epoch 180/5000
Epoch 181/5000
Epoch 182/5000
Epoch 183/5000
Epoch 184/5000
Epoch 185/5000
Epoch 186/5000
Epoch 187/5000
Epoch 188/5000
Epoch 189/5000
Epoch 190/5000
Epoch 191/5000
Epoch 192/5000
Epoch 193/5000
Epoch 194/5000
Epoch 195/5000
Epoch 196/5000
Epoch 197/5000
Epoch 198/5000
Epoch 199/5000
Epoch 200/5000
Epoch 201/5000
Epoch 202/5000
Epoch 203/5000
Epoch 204/5000
Epoch 205/5000
Epoch 206/5000
Epoch 207/5000
当前模型正确率为:0.9942816495895386.
平均正确率为:0.9895714521408081.


**9.对带预测的数据进行预测**

In [11]:
# 当前模型的预测结果
res_list = []

for m in model_list:
    res_list.append(m.predict(test_x))
    
res = np.round(np.mean(res_list, axis=0), 0).astype("float")

**10.提交文件生成**

In [12]:
# 生成当前模型的提交文件
pd.DataFrame({"id": [i for i in range(0, res.shape[0])], "label": res.reshape(-1, )}).to_csv("./sub_cnn.csv", 
                                                                                             header=None, index=None)