多标签车牌识别

一些参考:
https://discuss.pytorch.org/t/multi-label-classification-in-pytorch/905/16
Multi Label Classification in pytorch

https://discuss.pytorch.org/t/calculating-accuracy-for-a-multi-label-classification-problem/2303
Calculating accuracy for a multi-label classification problem

https://discuss.pytorch.org/t/equivalent-of-tensorflows-sigmoid-cross-entropy-with-logits-in-pytorch/1985
Equivalent of TensorFlow’s Sigmoid Cross Entropy With Logits in Pytorch

https://www.kaggle.com/mratsim/starting-kit-for-pytorch-deep-learning
Starting Kit for PyTorch Deep Learning

http://stackoverflow.com/questions/34240703/difference-between-tensorflow-tf-nn-softmax-and-tf-nn-softmax-cross-entropy-with
difference between tensorflow tf.nn.softmax and tf.nn.softmax_cross_entropy_with_logits



In [11]:
import os
import os.path
import random
import cv2
import math
from scipy import ndimage
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.utils.data as torch_utils_data

In [12]:
DIGITS = "0123456789"
LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
#PROVINCE="黑吉辽京津内冀鲁豫徽苏沪浙赣闽粤鄂湘云贵川渝藏青宁新陕甘宁晋" #30
CHARS = LETTERS + DIGITS
NPLEN=7
NUM_CLASSES=len(CHARS)*NPLEN

In [None]:
conv=nn.Sequential(
            nn.Conv2d(1,64,kernel_size=3,padding=1), #layer1, inputs single channel,224*224
            nn.ReLU(inplace=True),
            nn.Conv2d(64,64,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(64,128,kernel_size=3,padding=1), #layer2 inputs 64 channel,112*112
            nn.ReLU(inplace=True),
            nn.Conv2d(128,128,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(128,256,kernel_size=3,padding=1), #layer3 inputs 128 channel,56*56
            nn.ReLU(inplace=True),
            nn.Conv2d(256,256,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256,256,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(256,512,kernel_size=3,padding=1),
 #layer4 inputs 256 channel,28*28
            nn.ReLU(inplace=True),
            nn.Conv2d(512,512,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512,512,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(512,512,kernel_size=3,padding=1), #layer5 inputs 512 channel,14*14
            nn.ReLU(inplace=True),
            nn.Conv2d(512,512,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512,512,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2)
    )

class vgg16train(nn.Module):
    def __init__(self): #36*7+1=253   36*6+1=217
        super(vgg16train,self).__init__()
        self.features=conv
        self.classifier=nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 2048),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(2048, num_classes)
        )
        #initialize_weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()
    def forward(self,x):
        x=self.features(x)
        x=x.view(x.size(0),-1)
        x=self.classifier(x)
        return x
    
class vgg16detect(nn.Module):    
    def __init__(self):
        super(vgg16detect,self).__init__()
        self.features=conv
        self.classifier=nn.Sequential(
            nn.Conv2d(512,4096,kernel_size=7,padding=1),  #padding=1?
            nn.ReLU(inplace=True),
            nn.Conv2d(4096,2048,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(4096,num_classes,kernel_size=1,padding=1),
            #nn.ReLU(inplace=True),
        )
    def forward(self,x):  #是否需要
        x=self.features(x)
        x=x.view(x.size(0),-1)
        x=self.classifier(x)
        return x    

In [None]:
class anprmodel(nn.Module):
    def __init__(self):
        super(anprmodel,self).__init__()
        self.num_classes=NUM_CLASSES
        self.conv1=nn.Conv2d(1,48,kernel_size=5,padding=2)
        self.pool1=nn.MaxPool2d(kernel_size=2,stride=2)
        self.conv2=nn.Conv2d(48,64,kernel_size=5,padding=2)
        self.pool2=nn.MaxPool2d(kernel_size=(2,1),stride=(2,1))      #(kH,kW)
        self.conv3=nn.Conv2d(64,128,kernel_size=5,padding=2)
        self.pool3=nn.MaxPool2d(kernel_size=(2,2),stride=(2,2))        
        self.fc1=nn.Linear(32*8*128,2048)
        self.fc2=nn.Linear(2048,NUM_CLASSES)

    def forward(self,x): 
        x=F.relu(self.pool1(self.conv1(x)))  #224*224  128*64
        x=F.relu(self.pool2(self.conv2(x)))  #112*112   64*32
        x=F.relu(self.pool3(self.conv3(x)))  #56*56       64*16
        x=x.view(-1,32*8*128)                        #32*8
        #x=x.view(-1,28*28*128)                   #28*28
        x=F.relu(self.fc1(x))
        x=self.fc2(x)                                       
        return x
        

In [None]:
class NPSET(torch_utils_data.Dataset):
    picroot='np'
   
    def code_to_vec(self,code):  #(self,p, code):
        def char_to_vec(c):
            y = np.zeros((len(CHARS),),dtype=np.float)
            y[CHARS.index(c)] = 1.0
            return y
        c = np.vstack([char_to_vec(c) for c in code])
        return c.flatten()
        #return np.concatenate([[1. if p else 0], c.flatten()])

    def __getitem__(self,index):
        label,img=self.labels[index], self.dataset[index]
        if self.data_transform is not None:
            img=self.data_transform(img)
        labelarray=self.code_to_vec(label)
        return img,torch.FloatTensor(labelarray)

    def __len__(self):
        return self.len

    def __init__(self,root,data_transform=None):
        self.picroot=root
        self.data_transform=data_transform

        if not os.path.exists(self.picroot):
            raise RuntimeError('{} doesnot exists'.format(self.picroot))
        for root,dnames,filenames in os.walk(self.picroot):
            imgs=np.ndarray(shape=(len(filenames),1,64,128),dtype=np.float)  #batch,channel,height,width
            labels=[]
            i=0
            for filename in filenames:
                picfilename=os.path.join(self.picroot,filename)  #file name:
                im=cv2.imread(picfilename,cv2.IMREAD_GRAYSCALE)
                imgs[i][0]=im#/255
                m=filename.split('_')  #filename style: xxxxxxxx_xxxxxxx_x.png
                labels.append(m[1])
                i=i+1
            self.dataset=imgs
            self.labels=labels
            self.len=len(filenames)

In [None]:
model=anprmodel()
#model.features=torch.nn.DataParallel(model.features)
#model.cuda()
#cudnn.benchmark=True
batch_size=4
data_transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                             ])
npset = NPSET(root='/home/wang/git/nppic/su_np-128x64/train', data_transform=data_transform)
nploader = torch.utils.data.DataLoader(npset, batch_size=batch_size, shuffle=True, num_workers=1)  #train
npvalset=NPSET(root='/home/wang/git/nppic/su_np-128x64/val', data_transform=data_transform)
npvalloader=torch.utils.data.DataLoader(npvalset, batch_size=batch_size, shuffle=False, num_workers=1) #validate
criterion=nn.MultiLabelSoftMarginLoss()  #MultiLabelMarginLoss()
#optimizer=torch.optim.SGD(model.parameters(),0.1,momentum=0.9)
optimizer=torch.optim.Adam(model.parameters())
cudnn.benchmark=True

In [None]:
res_sum=0
res_cnt=0
res_avg=0

In [None]:
for epoch in range(0,4):
    #Sets the learning rate to the initial LR decayed by 10 every 30 epochs
    #lr=0.1*(0.1**(epoch//30))
    #for param_group in optimizer.param_groups:
    #    param_group['lr']=lr
    #train
    model.train()
    for i,data in enumerate(nploader):
        inputs,targets = data
        #target=target.cuda()
        input_var=torch.autograd.Variable(inputs)
        #targets=torch.LongTensor(np.array(targets.numpy(),np.long))
        target_var=torch.autograd.Variable(targets)
        output=model(input_var)
        #porcess loss
        loss=criterion(output,target_var)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        #
        if i% 12 == 0:
             print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(
                  epoch, i * len(data), len(nploader.dataset),
                  100. * i / len(nploader), loss.data[0]))

In [None]:
    #validate
    model.eval()
    for i, data in enumerate(npvalloader):
        (inputs, target)=data
        #target = target.cuda()
        input_var = torch.autograd.Variable(inputs, volatile=True)
        target_var = torch.autograd.Variable(target, volatile=True)
        # compute output
        output = model(input_var)
        #porcess loss
        for k < o.size(0)
        #
        if i% 12 == 0:
             print('Test Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(
                  epoch, i * len(data), len(nploader.dataset),
                  100. * i / len(nploader), loss.data[0]))
        prec1=top1.avg

In [None]:
#将su_np-256x128,541张仅车牌的灰度图像，调整大小输出到su_np-128x64中
SOURCE='/home/wang/git/nppic/su_np-256x128'
DEST='/home/wang/git/nppic/su_np-128x64'
for parent,dirnames,filenames in os.walk(SOURCE):
    for fname in filenames:
        im=cv2.imread(os.path.join(parent,fname))
        imr=cv2.resize(im,(128,64))
        img=cv2.cvtColor(imr,cv2.COLOR_BGR2GRAY)
        #cv2.imwrite(os.path.join(DEST,fname),img)
print 'over'        

In [13]:
def code_to_vec(p, code):
    def char_to_vec(c):
        y = np.zeros((len(CHARS),))
        y[CHARS.index(c)] = 1.0
        return y
    c = np.vstack([char_to_vec(c) for c in code])
    return c,c.flatten()
a,b=code_to_vec('1','0AV253C')
print 'NUMCLASSES={}'.format(NUM_CLASSES)
print a.shape,a
print '-----------------'
print b.shape,b

NUMCLASSES=238
(7, 34) [[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.

In [18]:
#multi label classification example
import torch
import torch.nn as nn
import numpy as np
import torch.optim as optim
from  torch.autograd import Variable

train=[]
labels=[]
for i in range(10000):
    category=(np.random.choice([0,1]),np.random.choice([0,1]))
    if category==(1,0):
        train.append([np.random.uniform(0.1,1),0])
        labels.append([1,0,1])
    if category==(0,1):
        train.append([0,np.random.uniform(0.1,1)])
        labels.append([0,1,0])
    if category==(0,0):
        train.append([np.random.uniform(0.1,1),np.random.uniform(0.1,1)])
        labels.append([0,0,1])
        
class _classifier(nn.Module):
    def __init__(self,nlabel):
        super(_classifier,self).__init__()
        self.main=nn.Sequential(
            nn.Linear(2,64),
            nn.ReLU(),
            nn.Linear(64,nlabel),
        )
    def forward(self,input):
        return self.main(input)
    
nlabel=len(labels[0])    
classifier=_classifier(nlabel)
optimizer=optim.Adam(classifier.parameters())
criterion=nn.MultiLabelSoftMarginLoss()

epochs=1
for epoch in range(epochs):
    losses=[]
    for i,sample in enumerate(train):
        inputv=Variable(torch.FloatTensor(sample)).view(1,-1)
        labelsv=Variable(torch.FloatTensor(labels[i])).view(1,-1)
        output=classifier(inputv)
        loss=criterion(output,labelsv)
        print loss
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Variable containing:
 0.6687
[torch.FloatTensor of size 1]

Variable containing:
 0.6655
[torch.FloatTensor of size 1]

Variable containing:
 0.7180
[torch.FloatTensor of size 1]

Variable containing:
 0.6769
[torch.FloatTensor of size 1]

Variable containing:
 0.6873
[torch.FloatTensor of size 1]

Variable containing:
 0.6955
[torch.FloatTensor of size 1]

Variable containing:
 0.6590
[torch.FloatTensor of size 1]

Variable containing:
 0.6578
[torch.FloatTensor of size 1]

Variable containing:
 0.6542
[torch.FloatTensor of size 1]

Variable containing:
 0.6639
[torch.FloatTensor of size 1]

Variable containing:
 0.6500
[torch.FloatTensor of size 1]

Variable containing:
 0.6524
[torch.FloatTensor of size 1]

Variable containing:
 0.6486
[torch.FloatTensor of size 1]

Variable containing:
 0.6657
[torch.FloatTensor of size 1]

Variable containing:
 0.6919
[torch.FloatTensor of size 1]

Variable containing:
 0.6755
[torch.FloatTensor of size 1]

Variable containing:
 0.7026
[torch.Floa

https://www.kaggle.com/mratsim/starting-kit-for-pytorch-deep-learning


import pandas as pd
from torch import np # Torch wrapper for Numpy

import os
from PIL import Image

import torch
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from torchvision import transforms
from torch import nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

from sklearn.preprocessing import MultiLabelBinarizer

IMG_PATH = '../input/train-jpg/'
IMG_EXT = '.jpg'
TRAIN_DATA = '../input/train.csv'

class KaggleAmazonDataset(Dataset):
    """Dataset wrapping images and target labels for Kaggle - Planet Amazon from Space competition.

    Arguments:
        A CSV file path
        Path to image folder
        Extension of images
        PIL transforms
    """

    def __init__(self, csv_path, img_path, img_ext, transform=None):
    
        tmp_df = pd.read_csv(csv_path)
        assert tmp_df['image_name'].apply(lambda x: os.path.isfile(img_path + x + img_ext)).all(), \
"Some images referenced in the CSV file were not found"
        
        self.mlb = MultiLabelBinarizer()
        self.img_path = img_path
        self.img_ext = img_ext
        self.transform = transform

        self.X_train = tmp_df['image_name']
        self.y_train = self.mlb.fit_transform(tmp_df['tags'].str.split()).astype(np.float32)

    def __getitem__(self, index):
        img = Image.open(self.img_path + self.X_train[index] + self.img_ext)
        img = img.convert('RGB')
        if self.transform is not None:
            img = self.transform(img)
        
        label = torch.from_numpy(self.y_train[index])
        return img, label

    def __len__(self):
        return len(self.X_train.index)
        
transformations = transforms.Compose([transforms.Scale(32),transforms.ToTensor()])

dset_train = KaggleAmazonDataset(TRAIN_DATA,IMG_PATH,IMG_EXT,transformations)
        


train_loader = DataLoader(dset_train,
                          batch_size=256,
                          shuffle=True,
                          num_workers=4 # 1 for CUDA
                         # pin_memory=True # CUDA only
                         )



class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(2304, 256)
        self.fc2 = nn.Linear(256, 17)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(x.size(0), -1) # Flatten layer
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.sigmoid(x)

model = Net() # On CPU
#model = Net().cuda() # On GPU
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # data, target = data.cuda(async=True), target.cuda(async=True) # On GPU
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.binary_cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))

for epoch in range(1, 2):
    train(epoch)
    


Train Epoch: 1 [0/40479 (0%)]	Loss: 0.690799
Train Epoch: 1 [2560/40479 (6%)]	Loss: 0.686433
Train Epoch: 1 [5120/40479 (13%)]	Loss: 0.681382
Train Epoch: 1 [7680/40479 (19%)]	Loss: 0.673910
Train Epoch: 1 [10240/40479 (25%)]	Loss: 0.667018
Train Epoch: 1 [12800/40479 (31%)]	Loss: 0.656438
Train Epoch: 1 [15360/40479 (38%)]	Loss: 0.645444
Train Epoch: 1 [17920/40479 (44%)]	Loss: 0.628610
Train Epoch: 1 [20480/40479 (50%)]	Loss: 0.600967
Train Epoch: 1 [23040/40479 (57%)]	Loss: 0.570082
Train Epoch: 1 [25600/40479 (63%)]	Loss: 0.520596
Train Epoch: 1 [28160/40479 (69%)]	Loss: 0.465080
Train Epoch: 1 [30720/40479 (75%)]	Loss: 0.412709
Train Epoch: 1 [33280/40479 (82%)]	Loss: 0.365693
Train Epoch: 1 [35840/40479 (88%)]	Loss: 0.357215
Train Epoch: 1 [38400/40479 (94%)]	Loss: 0.340456

    

https://discuss.pytorch.org/t/feedback-on-pytorch-for-kaggle-competitions/2252

https://discuss.pytorch.org/t/multi-label-classification-in-pytorch/905/13
@AjayTalati

Either after your last fc you do a sigmoid and then you use BCELoss or F.binary_crossentropy as your criterion/lossfunction

Or you directly use MultiLabelSoftMarginLoss as your loss function (it comes with sigmoid inside)

Now once you have your prediction, you need to threshold. 0.5 is the default naive way but it's probably not optimal. In any case, once you get there, great !

Next part is technical optimization, you can do Multilabel classification without

Regarding the threshold, you might want to optimize either a common threshold for all your outputs (it can be 0.2, 0.5, 0.123456 who knows) or optimize a threshold per label class, especially if your classes as unbalanced.
You will need a solid validation set and a MultiLabel evaluation metrics (Hamming Loss, F1-score, Fbeta score).

An example code for the first strategy is here on Kaggle2.

For the second strategy, I'm deep into various papers myself so I can't help yet.
One thing to keep in mind is your "best threshold" will probably overfit the validation set, so use regularization, cross-validation or other anti-overfitting strategy.

https://discuss.pytorch.org/t/equivalent-of-tensorflows-sigmoid-cross-entropy-with-logits-in-pytorch/1985/11
@AjayTalati I managed to use BCELoss, binary_crossentropy and MultiLabelSoftMarginLoss on a MultiLabel problem

Here is the basic code

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # data, target = data.cuda(async=True), target.cuda(async=True) # On GPU
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.binary_cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))
And the source is here6.

For BCELoss you can use criterion = BCELoss() and then loss = criterion(output, target) but as @Misha_E said, the NN must return a sigmoid activation.