Let's download train dataset from here: http://benchmark.ini.rub.de/?section=gtsdb&subsection=dataset

I will use opencv Cascase Classifier so we will also need to preprocess our train dataset and also find some images without traffic signs.

I changed dataset folder structure to this:

Data/

|-- neg/ <-- images without traffic signs

|-- pos/ <-- original images with traffic signs

|-- traffic_signs/ <-- folders 00-42 with signs examples

|-- data/ <-- usage will be explain later

|-- signs.info

|-- bg.txt

|-- signs.vec

In [1]:
import numpy as np
import os

In [2]:
def num_to_filename(num):
    return '{:05d}.ppm'.format(num)

In [35]:
gt = np.loadtxt('Data/gt.txt', dtype=bytes).astype(str)
data = np.empty((len(gt), 6), dtype=int)
for i, row in enumerate(gt):
    data[i] = np.array(row.replace('.ppm', '').split(';')).astype(int)
data[:, [3, 4]] -= data[:, [1, 2]]

Now in first column of data stored images numbers with traffic signs. Let's put all other images in folder neg:

In [57]:
negative_images = np.setdiff1d(np.arange(900), data[:, 0])
for img in negative_images:
    os.rename('Data/pos/' + num_to_filename(img), 'Data/neg/' + num_to_filename(img))

Now let's write down signs info in following format:

pos/{image name} {number of signs image contains} ({x} {y} {w} {h})*

In [58]:
with open('Data/signs.info', 'w') as f:
    prev_img = 0
    counter = 0
    location = ""
    for img, x, y, w, h, _ in data:
        if img != prev_img:
            f.write('pos/{name} {num} {loc}\n'.format(name=num_to_filename(prev_img), num=counter, loc=location))
            prev_img = img
            counter = 1
            location = " {x} {y} {w} {h}".format(x=x, y=y, w=w, h=h)
        else:
            counter += 1
            location += " {x} {y} {w} {h}".format(x=x, y=y, w=w, h=h)
    f.write('pos/{name} {num} {loc}\n'.format(name=num_to_filename(prev_img), num=counter, loc=location))

So far we described positive part. Negative part describing is just list of filenames:

In [62]:
with open('Data/bg.txt', 'w') as f:
    for img in negative_images:
        f.write('neg/{name}\n'.format(name=num_to_filename(img)))

Now we're prepared to create samples via next command in terminal (run it inside the Data folder):

$ opencv_createsamples -info signs.info -num 741 -w 43 -h 43 -vec signs.vec

Where $741$ is number of lines in 'signs.info' (difference between total number of images and negative images)

Then you're done with this you can check created samples with the following command:

$ opencv_createsamples -vec signs.vec -w 43 -h 43

Next part is to create the cascade using signs.vec and bg.txt with the following command:

$ opencv_traincascade -data data -vec signs.vec -bg bg.txt -numPos 741 -numNeg 159 -numStages 10 -w 43 -h 43 -featureType LBP

Where $data$ is just an empty directory for the files.