Deep Learning Assisted Computer Vision System For Traffic Sign Classification and Detection
here )Synopsis (Get full documentation
Computers, as we have around today, are GIGO (Garbage-in-Garbage-Out) devices which are only capable of producing results based on what is inputted into them and how they have been originally programmed to respond to such inputs. As such, challenges exist with problem categories that cannot be formulated as algorithms, especially problems which depend on many subtle factors such as knowledge and understanding of previous scenes and corresponding reactions to them. As an example, for the recognition of the Queen of England’s image among a cluster of 100 other images, the human brain may be able to provide an informed guess, probably based on past knowledge and various other experiences combined, however, this cannot be accurately derived by a computer without an already pre-written algorithm. In the light of this, there has been growing interest in researches geared toward developing Artificial Intelligent (AI) models which are capable of learning and carrying out classification tasks without making references to any pre-written algorithm. One of such research area is in the field of Neural Networks (NN) which are a biologically inspired family of computation architectures built as extremely simplified models of the human brain.
It is safe to say that despite the ever-increasing popularity of transferring (human) tasks to computers for simplification purposes, there are still a lot of human tasks that are still poorly done by computers, such as in areas of visual perception and intelligence. This is because the largest part of the human brain works continuously on data analysis and interpretation while the largest part of the computer is only available for passive data storage. Thus, the brain therefore performs even closer to its theoretical maximum. Although the computer is fast, reliable, unbiased, never tired, consistent and sometimes can even carry out much more complex computational combinations than the human brain is known to muscle, it is still unable to synthesize new rules and it is safe to say, it has no common sense. They rather have a group of arithmetic processing units and storage carefully interconnected to perform complex numerical calculations in a very short time but are not adaptive.
On the other hand, the human brain possesses what we know as common sense, a bigger knowledge base, ability to synthesize new rules and spontaneously detect trends in data without being pre-taught, even though based on capacity, the computer should be more powerful than the human brain as it averagely comprises of over 109 transistors with a switching time of 10-9 seconds while the brain in comparison consists of over 1011 neurons but with only a switching time of about 10-3 seconds. With closer analysis, we note that although the human brain is easily tired, bored, biased, inconsistent and cannot be fully trusted, it still outperforms the computer in some application areas due to its perceptive nature of operation (interpretation of sensory information in order to understand the environment). This explains why there still is major reliance on the human brain for classification tasks.
Juxtaposing the computer’s strengths and weaknesses against the human brain’s makes us realize that in as much as the human brain is better when it comes to perceptive tasks, it has endurance, bias and inconsistency issues. Therefore, effort is being made by researchers to develop systems which are capable of fusing together the advantages of both the brain and the computer into one near perfect outfit. A system which can take on the perceptive learning, out of the box synthesis, self-organizing and self-learning characteristic of the human brain, while maintaining the massive computational capability, speed and enduring features of the computer. This motive has led to increased research on neural networks which are a biologically inspired family of computation architectures built as extremely simplified models of the human brain.
In summary, this project seeks to explore the science behind Neural Networks (NN), its various flavours (especially CNNs), application areas and then finally, narrow down by applying it in the design and development of a computer vision system which can be used for traffic sign recognition and detection in autonomous vehicles.
Project Implementation and Results
IMDB Creation (Dataset)
A dataset of traffic sign images from the German Traffic Sign Recognition Benchmark (GTSRB) website is compiled and used to create the image database (IMDB) used to train and test the deep neural network. The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. The dataset consists of over 39,000 images in total, grouped into 43 different road traffic symbols.
Through a written IMDB creation script, the dataset is split into 70% training, 20% validation and 10% test sets of images which are used in the holistic training, validation and testing of the created AlexNet model. Validation images are used to test the performance of the network during the training process while the test images are reserved in this project to perform personal test on the network, post training. The validation set actually can be regarded as a part of training set, but it is usually used for parameter selection and to avoid overfitting. If a model is trained on a training set only, it is very likely to get close to 100% accuracy and over fit, thus get very poor performance on test set which have never been seen by the network before. The test-sets are only used to test the performance of a trained model and are the best means of detecting over fitting in the network.
For this project, the AlexNet CNN model is used. The MatConvNet Toolbox is used to create, modify and train the AlexNet CNN model. MatConvNet includes a variety of layer models contained in its MATLAB library directory, such as convolution, deconvolution, max and average pooling, ReLU activation, sigmoid activation and many other pre-written functions. There are enough elements available to help implement many interesting state-of-the-art networks out of the box, or even import them from other toolboxes such as Caffe. After the creation of the AlexNet model, the network undergoes training and is then tested to ensure performance and accuracy. Considering the fact that two levels of operation is cascaded, one for the traffic sign (object) detection and the other for the actual classification of the traffic sign, it is important to note that this chapter only explains the development, training and testing of the image classifier. Detailed testing and verification is carried out to ensure optimal performance of the system. In order to develop a Convolutional Neural Network which is able to classify images fed into it, the network has to be trained over multiple epochs in a specialized manner. Batches of training images fed into the CNN first have to be pre-processed to the network’s standard input size and in most cases, normalized to have zero mean. This initiative affect the rate of convergence of the network during training to a great extent. It is important to remember that the convolutional layers of the network serve as the feature extractors while the fully connected layers and the softmax serve as the processing and classifier elements
The diagrams below show the results of the performance analysis and testing carried out and the descent in error rate of the classifier/CNN over the course of the 58 epochs (rounds) of training. This took about 47 hours using over 39000 training images on a 16 Gigabyte RAM quad core processor. The images below show the descent in the error rate as training proceeded, a classification example,and a bar graph showing performance improvement as the training went progressed.
Figure Showing Error Descent During 58 Epochs of Training (left) and Sample Classification Result After Training (right)
Figure Showing Improvememnt Bar Graph Per Epoch During Training
As at Epoch 58, the achieved accuracy level was 98.464. Continuing the training a few more epochs down the line will result in accuracy levels above 99%
Sign Detection & Integration
The images below show the integration plan for the system (detection/classification) as well as the different schemes used for the traffic sign detection (harris corner detection, sobel edge filtering, hough line/circular transforms, connected component analysis etc.) Figure Showing Detection and Classification Integration Methodology
For more details, read Project Documentation (Chapter 5.0)
The key optimization schemes used in the improvement of the performance of this system include:
- MATLAB Vectorization
- Use of C/C++/FORTRAN for some subroutines
- MATLAB Parallel Computing
- Heterogeneous Computing
Figure Showing Speed Improvement After Some Form of Optimization
By engaging different methods of optimization (software/hardware) we can improve the speed of action of Neural Network computational expensive operations, as in this project, I was able to push my computer vision system's performance from about 1.5 seconds per frame (i.e. about half a frame per second) to between 25 to 30 frames per seconds using my C/C++ solution running across multiple cores. The pure MATLAB solution optimized through vectorization and multi core dispatch gives a maximum of just about 8 frames per second which is also way ahead of the un-optimized MATLAB version of the vision system. In a nutshell, the results achieved by this project are as listed below
- Design and implementation of a deep leaning model (AlexNet) for traffic sign classification which attained a classification accuracy of over 98%.
- Design and implantation of multiple layers of traffic sign post detection mechanisms which attained a detection accuracy of over 99%.
- Optimization of the detection algorithms performance from about half a frame per second to around 25 to 30 frames per second (classification operation included). This sums up to over 50 time’s improvement in speed (C/C++ version).
- Practically proving the fact that further improvement in speed can be achieved through heterogeneous computing. i.e., by dedicating some parts of the computationally expensive detection functions to suitable devices such as GPUs and FPGA.
Figure Showing Vision System in Operation
This repository purely contains the project documentation and codes. Large supplementing files such as the:
- The IMDB file
- Trained ConvNet
- Test Images
- Test Videos
...can be downloaded from the links given below
Remember to set the project folder location in the file 'setGlobalVariables.m' (src>Others), as the current ProjectFolder = ('C:\Temp\final_project'), which may be different from where you have this project located on your computer
Also remember to unzip the 'TestImages' folder and 'TrainingImages' folder
Once all code files and supplementing files/folders have all been downloaded, import folder path into MATLAB directory
Paste the below and run in MATLAB to download/install MATCONVNET Library
cd matconvnet-1.0-beta23 run matlab/vl_compilenn; % Setup MatConvnet. run matlab/vl_setupnn; ```
Note once again: Different function files interact with various supporting files such as video clips, images, webcam etc. Make sure that all of this resources are well references/linked before you run any of the functions, so as to avoid minor errors.
The list below gives a short explanation on each of the function.
- AlexNetNN.m: Used to create alexnet model, link to the IMDB and train
- callAlex.m: Used to call AlexNetNN and also set some post training parameters
- classifyImg.m: Used to run classification on the detected ROI
- createIMDB.m: Used to create an image database for the NN training
- sceneClassifier.m: Used to validate input image to system, set detection algorithm parameters and call parallel/nonparallelDetection
- CCA.m: Used to perform connected component analysis on input image
- detectCircle.m: For circle detection
- detectCorner.m: For corner detection (Harris method)
- detectEdge.m: For edge detection (Sobel’s method)
- getROI.m: Used to call CCA and also set some pre-CCA call conditions
- myDetectCircle.m: Customized circle detection algorithm (Circular Hough Transform)
- networkTest.m: Used to test individual CNN epoch outputs
- testDetection.m: Used to test individual detection functions
- getGrayScale.m: To get image grayscale
- MapRegion.m: Region class used for CCA
- viewCircularHough: Used to have a look into how the circular hough transform operates
- shapeAnalyser.m: Analyses and searches for triangles, rectangles, octagon and diamond shapes
- cornerTestBench.m: Testbench for corner detection function
- setGlobalVariables.m: Used to set global variables
- getGlobalVariables.m: Used to get global variables
- drawDetectedCircle.m: Takes in image, circle radius and centre co-ordinates and produces the circle on top of the received imge
- passVideoToVision.m: To feed video stream to the vision system, so as to evaluate
- parallelDetection.m: Used to execute all detection algorithms in a parallel way
- multiEpochAnalyser.m: Used to test a wide range of CNN epoch outputs so as to detect overfitting
- detectCircleTestBench.m: Test bench for circle detection function performance
- analyseCriticalAreas.m: Used to analyse critical areas as explained in 5.2.2
- nonParallelDetection.m: Used to execute all detection algorithms in a non-parallel way
- evaluateDetectionSystem.m: Used to run evaluation on all the individual detection functions
- evaluateOpenCLPerformance.m: Used to evaluate the performance of the OpenCL based parts of the vision system application
- Oluwole Oyetoke - Project work - LinkedIn Profile, Website
- Dr. David Cowell - Initial work - University Profile
This project is free for use and to be contributed to
- Well wishers
- All the nice people who published helpful and easy to grasp journal articles in this area of study
- Most importantly, future contributors to this project