深度学习识别手势
=========

文件概述
===
- 视频转图片

- 全局变量设置
- 图片尺寸伸缩
- one hot 编码
- DataLoader定义
- 定义网络结构
- 常规训练、测试、应用    

快速开始
====
- __步骤一：环境配置__

    Python2.7+  numpy  
    torch torchvision cuda相关gpu配置 : 因人而异,比较麻烦  
    有些包pip下载不了就用conda，反之亦然（玄学）  
    但是只要下面这三个安装成了就可以生成数据集 
    ```bash
    pip install opencv-python
    pip install opencv-contrib-python
    conda install pillow
    ```

- __步骤二：视频转图片__

    执行以上命令，video_src/ 中的视频转为 picture_src/中的图片
    
- __步骤三：图片压缩__

    picture_src/ 中的图片 经过压缩后存入 dataset/train/ 之后就可以把picture_src里面的图片删掉了（自己手动删）  
    （lmc2.py文件内可选择存入train 或 test 或 predict)  
    偷懒的存入train之后手动copy一部分dataset/train/中的图片放入dataset/test/就行  
    训练阶段dataset/predict/是没什么用的  

- __步骤四：训练模型__

    可以在控制台，看到训练过程

- __步骤五：测试模型__

    可以在控制台，看到输出的结果（如果准确率较低，可以考虑增加训练数据量，或者修改model）


声明
===
本框架收集自Github上多个其他博主，本人也记不清了，仅作为自用

# 必须运行：“全局变量”设置
直接运行一遍即可
* 所有可能的label列在ALL_CHAR_SET中
* 每张图片的label长度列在MAX_CAPTCHA中
* 图片大小由IMAGE_HEIGHT 和 IMAGE_WIDTH 确定
* train,test,predict三个数据集路径的定义

In [1]:
# -*- coding: UTF-8 -*-
import os


# 像下面这么写就支持二分类，修改ALL_CHAR_SET就可以修改分类数，支持数字、大写字母、小写字母
ALL_CHAR_SET = ['0', '1']
ALL_CHAR_SET_LEN = len(ALL_CHAR_SET)
MAX_CAPTCHA = 1  # 这里是从一个验证码识别的框架里搞出来的，定义为1的话就是普通分类问题

# 图像大小
IMAGE_HEIGHT = 64
IMAGE_WIDTH = 64

TRAIN_DATASET_PATH = 'dataset' + os.path.sep + 'train'  # 这里是最终训练集的位置
TEST_DATASET_PATH = 'dataset' + os.path.sep + 'test'  # 这里是最终测试集的位置
PREDICT_DATASET_PATH = 'dataset' + os.path.sep + 'predict'  # 这里随便放点图片，可以看看输出

# 必须运行：one hot编码设置

In [2]:
# -*- coding: UTF-8 -*-
import numpy as np

# 这里进行one hot 编码

# 定义工具函数，借助unicode的ord函数把字符串转为数字
# 把下划线定义为62，把数字字符定义为自己，把大然后是大写字母，然后是小写字母
'''
def char2pos(c):
    if c == '_':
        k = 62
        return k
    k = ord(c)-48
    if k > 9:
        k = ord(c) - 65 + 10
        if k > 35:
            k = ord(c) - 97 + 26 + 10
            if k > 61:
                raise ValueError('error')
    return k
'''
def char2pos(c):
    k = ord(c)
    
    if 48<= k <=57:  # 把数字映射到0~9
        k = k - 48
    elif 65<= k <=90:  # 把大写字母映射到10~35
        k = k - 65 + 10
    elif 97 <= k <= 122: # 把小写字母映射到36~61
        k = k - 97 + 10 + 26
    elif k == 95:  # 把下划线'_'映射到62
        k = 62
    else:
        raise ValueError('error')
    return k
    

def encode(text):  #字符串到one hot 向量
    # 这是个多分类框架，不过单分类也可以做
    vector = np.zeros(ALL_CHAR_SET_LEN * MAX_CAPTCHA, dtype=float)
    # 遍历字符串中的所有字符，单分类的话其实只循环一次
    for i, c in enumerate(text):
        idx = i * ALL_CHAR_SET_LEN + char2pos(c)
        vector[idx] = 1.0
    return vector


def decode(vec):  # 把one hot向量变成字符串
    char_pos = vec.nonzero()[0]
    text = []
    for i, c in enumerate(char_pos):
        char_at_pos = i  # c/63
        char_idx = c % ALL_CHAR_SET_LEN
        if char_idx < 10:
            char_code = char_idx + ord('0')
        elif char_idx <36:
            char_code = char_idx - 10 + ord('A')
        elif char_idx < 62:
            char_code = char_idx - 36 + ord('a')
        elif char_idx == 62:
            char_code = ord('_')
        else:
            raise ValueError('error')
        text.append(chr(char_code))
    return "".join(text)


if __name__ == '__main__':  # 做点测试
    e = encode("1")
    print(e)
    print(decode(e))


[0. 1.]
1


# 视频转图片
* 运行本节代码：视频转化为图片，并存放在"picture_src"文件夹中，一次不宜太多视频（画质太高占内存）
* 如果picture_src文件夹不存在，则会创建该文件夹

要求：
* 所有要转化成图片的视频都存放在./video_src文件夹中
* 所有视频的命名格式为 ‘数字_乱码’ 其中数字用作label  目前为 0:left  1:right
* 为了防止重名覆盖，不要把同样的手势分批转化（但是可以是很多个同类手势的视频都放在video_src/中一起运行该程序）


In [3]:
import cv2
import os


def save_img():
    c = 0
    video_path = './video_src'  # 输入的视频放在文件夹./video_src下
    folder_name = "picture_src"  # 输出图片 到 ./picture_src文件夹里
    os.makedirs(folder_name, exist_ok=True)
    pic_path = folder_name + '/'

    videos = os.listdir(video_path)
    for video_name in videos:  # 遍历所有视频文件名
        sign_name = video_name.split('_')[0]  # 获得文件名第一个下划线前的部分（即手势类型）

        vc = cv2.VideoCapture(video_path+'/'+video_name)  # 读取视频
        have_next = vc.isOpened()
        while have_next:
            c = c + 1  # 防止文件重名
            have_next, frame = vc.read()  # 读一帧
            if have_next:
                cv2.imwrite(pic_path + sign_name + '_' + str(c) + '.png', frame)
                cv2.waitKey(1)
            else:
                break
        vc.release()
        print(video_name + 'save_success in:')
        print(folder_name)


save_img()


0_left0.mp4save_success in:
picture_src
0_left1.mp4save_success in:
picture_src
1_right0.mp4save_success in:
picture_src
1_right1.mp4save_success in:
picture_src


# 图片尺寸伸缩后 放入数据集源文件夹
运行本段代码将./picture_src中的图片调整大小后

按一定概率分布放在dataset/train/或者dataset/test/ 文件夹中

In [4]:
from PIL import Image
import os
import random
# 这里把图片尺寸压缩并存入指定文件夹，
# 可以指定为TRAIN || TEST || PREDICT （在第19行）

def get_picture_size(inpath):
    im = Image.open(inpath)
    x, y = im.size
    return x, y

def resize_picture(inpath):
    im = Image.open(inpath)
    x, y = im.size
    im = im.resize((IMAGE_WIDTH, IMAGE_HEIGHT), Image.ANTIALIAS)  # lmc3里可以调整输出图片的大小（我随便设得64*64）
    filename = inpath.split('/')[-1].split('.')[0]
    sign_name = filename.split('_')[0]
    if random.random() < 0.8:
        folder_name = TRAIN_DATASET_PATH  # 这个是输出小图片的文件夹,默认是放到训练集dataset/train/中
    else:
        folder_name = TEST_DATASET_PATH
    os.makedirs(folder_name, exist_ok=True)
    outpath = folder_name + '/' + filename + '.png'
    im.save(outpath)


os.makedirs("dataset", exist_ok=True)
c = 0
folder = './picture_src'  # 这里写大图片所在的文件夹
pictures = os.listdir(folder)
for picture_name in pictures:
    picpath = folder + '/' + picture_name
    resize_picture(picpath)


# 这里定义DataLoader

In [5]:
# -*- coding: UTF-8 -*-
import os
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms
from PIL import Image





# 定义预处理操作：转化为灰度值，然后转化为tensor
transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.ToTensor(),
])

# 是torch.utils.data.Dataset的子类
class mydataset(Dataset):

    def __init__(self, folder, transform=None):
        self.train_image_file_paths = [os.path.join(folder, image_file) for image_file in os.listdir(folder)]
        self.transform = transform

    def __len__(self):
        return len(self.train_image_file_paths)

    def __getitem__(self, idx):
        image_root = self.train_image_file_paths[idx]
        image_name = image_root.split(os.path.sep)[-1]
        image = Image.open(image_root)
        if self.transform is not None:
            image = self.transform(image)
        label = encode(image_name.split('_')[0])  # 图片命名为"label值_时间戳.PNG", 对label值做 one-hot 处理
        return image, label  # 返回 tensor+one hot标签 二元组


def get_train_data_loader():
    dataset = mydataset(TRAIN_DATASET_PATH, transform=transform)
    return DataLoader(dataset, batch_size=64, shuffle=True)  # DataLoader第一个参数是Dataset类的


def get_test_data_loader():
    dataset = mydataset(TEST_DATASET_PATH, transform=transform)
    return DataLoader(dataset, batch_size=1, shuffle=True)


def get_predict_data_loader():
    dataset = mydataset(PREDICT_DATASET_PATH, transform=transform)
    return DataLoader(dataset, batch_size=1, shuffle=False)  # 这里就不打乱了，方便顺序对比

# 这里搭建神经网络

In [6]:
# -*- coding: UTF-8 -*-
import torch.nn as nn


# CNN Model (模仿VGG，但因算力不足没有做VGG那么深)
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1, stride=2),
            # 下面是MobileNet的核心特色，一个deepwise卷积层紧跟着一个pointwise卷积层
            nn.Conv2d(32,32, kernel_size=3, padding=1, stride=1, groups=32),
            nn.Conv2d(32, 128, kernel_size=1, padding=0, stride=2),
            nn.BatchNorm2d(128)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(128, 128, kernel_size=3, padding=1, stride=1, groups=128),
            nn.Conv2d(128, 256, kernel_size=1, padding=0, stride=1),
            nn.BatchNorm2d(256),
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 256, kernel_size=3, padding=1, stride=1, groups=256),
            nn.BatchNorm2d(256),
            nn.Dropout(0.5),  # drop 50% of the neuron,这个是我自己加的...反正已经是阉割版了，意思到了即可
            nn.ReLU()
        )
        
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        
        self.fc = nn.Sequential(
            nn.Linear(256, MAX_CAPTCHA * ALL_CHAR_SET_LEN),  # 最后输出长为MAX_CAPTCHA * ALL_CHAR_SET_LEN的向量
        )

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.avgpool(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out



# 训练循环

In [7]:
# -*- coding: UTF-8 -*-
import torch
import torch.nn as nn
from torch.autograd import Variable

# 可调超参
num_epochs = 30
batch_size = 64
learning_rate = 0.001


def main():
    cnn = CNN()  # 准备模型
    cnn = cnn.to("cuda")  # 送给cuda
    cnn.train()  # 设置为训练模式
    print('init net')
    criterion = nn.MultiLabelSoftMarginLoss()  # 准备一个损失函数
    optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate)  # 准备一个优化器

    # Train the Model
    train_dataloader = get_train_data_loader()  # 准备数据加载器
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_dataloader):  # 训练循环
            images = images.to("cuda")  # X_train送给cuda
            labels = labels.to("cuda")  # y_train送给cuda
            images = Variable(images)   # 设置为核心类“Variable”，此后对他进行处理
            labels = Variable(labels.float())  # 设置为核心类，并调用float()
            predict_labels = cnn(images)  # 过神经网络
            # print(predict_labels.type)
            # print(labels.type)
            loss = criterion(predict_labels, labels)  # 计算损失函数（利用此前准备的criterion）
            optimizer.zero_grad()  # 清空梯度缓存
            loss.backward()  # 计算新的梯度
            optimizer.step()  # 利用计算的梯度更新模型
            if (i+1) % 10 == 0:
                print("epoch:", epoch, "step:", i, "loss:", loss.item())  # 每过10个mini batch输出一次
            if (i+1) % 100 == 0:
                torch.save(cnn.state_dict(), "./model.pkl")   #每过100个mini batch保存一次model
                print("save model")
        print("epoch:", epoch, "step:", i, "loss:", loss.item())  # 每过一个epoch输出一次loss
    torch.save(cnn.state_dict(), "./model.pkl")   #current is model.pkl保存最终模型
    print("save last model")

if __name__ == '__main__':
    main()




init net
epoch: 0 step: 9 loss: 0.6209810972213745
epoch: 0 step: 19 loss: 0.5489240884780884
epoch: 0 step: 29 loss: 0.4909036159515381
epoch: 0 step: 39 loss: 0.40799814462661743
epoch: 0 step: 49 loss: 0.33559703826904297
epoch: 0 step: 59 loss: 0.24925178289413452
epoch: 0 step: 69 loss: 0.20021595060825348
epoch: 0 step: 79 loss: 0.14480940997600555
epoch: 0 step: 89 loss: 0.13387715816497803
epoch: 0 step: 89 loss: 0.13387715816497803
epoch: 1 step: 9 loss: 0.10757225006818771
epoch: 1 step: 19 loss: 0.10288936644792557
epoch: 1 step: 29 loss: 0.06859497725963593
epoch: 1 step: 39 loss: 0.08062350749969482
epoch: 1 step: 49 loss: 0.05666393041610718
epoch: 1 step: 59 loss: 0.08754513412714005
epoch: 1 step: 69 loss: 0.07993736863136292
epoch: 1 step: 79 loss: 0.043985359370708466
epoch: 1 step: 89 loss: 0.05890776589512825
epoch: 1 step: 89 loss: 0.05890776589512825
epoch: 2 step: 9 loss: 0.0261266827583313
epoch: 2 step: 19 loss: 0.05119902640581131
epoch: 2 step: 29 loss: 0.039

epoch: 18 step: 19 loss: 0.0025314721278846264
epoch: 18 step: 29 loss: 0.0077795591205358505
epoch: 18 step: 39 loss: 0.00398209085687995
epoch: 18 step: 49 loss: 0.00116478162817657
epoch: 18 step: 59 loss: 0.0006718744407407939
epoch: 18 step: 69 loss: 0.00034754647640511394
epoch: 18 step: 79 loss: 0.0034520013723522425
epoch: 18 step: 89 loss: 0.0007643852732144296
epoch: 18 step: 89 loss: 0.0007643852732144296
epoch: 19 step: 9 loss: 0.000986177008599043
epoch: 19 step: 19 loss: 0.0006843197625130415
epoch: 19 step: 29 loss: 0.004930164664983749
epoch: 19 step: 39 loss: 0.0011308719404041767
epoch: 19 step: 49 loss: 0.004874060396105051
epoch: 19 step: 59 loss: 0.0004722331650555134
epoch: 19 step: 69 loss: 0.0002715354785323143
epoch: 19 step: 79 loss: 0.001906530000269413
epoch: 19 step: 89 loss: 0.006409444846212864
epoch: 19 step: 89 loss: 0.006409444846212864
epoch: 20 step: 9 loss: 0.0032792892307043076
epoch: 20 step: 19 loss: 0.0015005844179540873
epoch: 20 step: 29 loss:

# test测试集评分

In [8]:
# -*- coding: UTF-8 -*-
import numpy as np
import torch
from torch.autograd import Variable


def main():
    cnn = CNN()
    cnn.eval()
    cnn.load_state_dict(torch.load('model.pkl'))
    print("load cnn net.")

    test_dataloader = get_test_data_loader()

    correct = 0
    total = 0
    for i, (images, labels) in enumerate(test_dataloader):  # 被enumerate的时候会调用覆写的__getitem__
        image = images
        vimage = Variable(image)
        predict_label = cnn(vimage)

        c0 = ALL_CHAR_SET[np.argmax(predict_label[0, 0:ALL_CHAR_SET_LEN].data.numpy())]

        predict_label = '%s' % (c0)
        true_label = decode(labels.numpy()[0])
        total += labels.size(0)
        if predict_label == true_label:
            correct += 1
        if total % 200 == 0:
            print('Test Accuracy of the model on the %d test images: %f %%' % (total, 100 * correct / total))
    print('Test Accuracy of the model on the %d test images: %f %%' % (total, 100 * correct / total))

if __name__ == '__main__':
    main()




load cnn net.
Test Accuracy of the model on the 200 test images: 100.000000 %
Test Accuracy of the model on the 400 test images: 100.000000 %
Test Accuracy of the model on the 600 test images: 100.000000 %
Test Accuracy of the model on the 800 test images: 100.000000 %
Test Accuracy of the model on the 1000 test images: 100.000000 %
Test Accuracy of the model on the 1200 test images: 100.000000 %
Test Accuracy of the model on the 1400 test images: 100.000000 %
Test Accuracy of the model on the 1529 test images: 100.000000 %


# 实际使用一下我们的模型
需要我们手动将一些64x64的图片放到dataset/predict/文件夹中

In [10]:
# -*- coding: UTF-8 -*-
import numpy as np
import torch
from torch.autograd import Variable

def main():
    cnn = CNN()
    cnn.eval()
    cnn.load_state_dict(torch.load('model.pkl'))
    print("load cnn net.")

    predict_dataloader = get_predict_data_loader()

    for i, (images, labels) in enumerate(predict_dataloader):
        image = images
        vimage = Variable(image)
        predict_label = cnn(vimage)

        c0 = ALL_CHAR_SET[np.argmax(predict_label[0, 0:ALL_CHAR_SET_LEN].data.numpy())]
        # c1 = ALL_CHAR_SET[np.argmax(predict_label[0, ALL_CHAR_SET_LEN:2 * ALL_CHAR_SET_LEN].data.numpy())]
        # c2 = ALL_CHAR_SET[np.argmax(predict_label[0, 2 * ALL_CHAR_SET_LEN:3 * ALL_CHAR_SET_LEN].data.numpy())]
        # c3 = ALL_CHAR_SET[np.argmax(predict_label[0, 3 * ALL_CHAR_SET_LEN:4 * ALL_CHAR_SET_LEN].data.numpy())]

        c = '%s' % (c0)  # 这个本来是个四重分类框架，现在就一重了，所以这里看起来很多余...
        print('No',i,'  sign=',c)


if __name__ == '__main__':
    main()




load cnn net.
No 0   sign= 0
No 1   sign= 0
No 2   sign= 0
No 3   sign= 0
No 4   sign= 0
No 5   sign= 1
No 6   sign= 1
No 7   sign= 1
No 8   sign= 1
No 9   sign= 1


# 问题很弱，效果很好~