vehicle signal lights states classification

TUD master thesis project, vehicle signal lights states classification based on video clips, pytorch.

Intro:

This thesis aims to solve the state classification problem of daytime vehicle signal rearlights based on images, which is an important module in an autonomous driving system. Based on previous methods, we propose and implement a novel model called "ResNet-LSTM network" to complete this classification task. Further, we try to merge the light region's position information into the current model and propose the "YOLO-ResNet-LSTM network", which however decreases the performance and still has space for improvement. Besides the performance, we try to implement the model with TensorRT to achieve higher inference speed and less model size to ensure its real-time capability and efficiency. Finally, a model with outstanding results in both performance and efficiency is obtained, the accuracy of the classification can reach 94.9% on 8-classes rearlight state classification tasks, and the average inference speed on GPU can reach 36.1 ms per video clip of 10 frames.

Proposed model

ResNet-LSTM network

The ResNet-LSTM got a impressive result, the architecture is shown:

YOLO-ResNet-LSTM network

The YOLO-ResNet-LSTM however got a worse result, the architecture is shown:

Usage:

Config enviroment:

python3 -m venv /path/to/your/virtual/env
source /path/to/your/virtual/env
pip install requirements.txt

Train model:

ResNet-LSTM model:

The most important code file are the dataloader/VSLdataset.py, models/CNN_LSTM_model_resnet50.py and models/CNN_LSTM_model_infer_RT.py. And you have to make sure your dataset structure looks like this:

train/
    - BOO/
        -clip1/
            -frame00.png
            -frame01.png
            - ...
        -clip2
            - frame00.png
            - frame01.png
            - ...
    - BLO
        -clip3/
            -frame00.png
            -frame01.png
            - ...
        -clip4
            - frame00.png
            - frame01.png
    - ... 
  valid/
      - BOO/
      -...
  test/
      - BOO/
      -...

Then you can start training by run script:

python models/CNN_LSTM_model_resnet50.py

where mainly includes 2 step: 1. create model 2. training model

model = CLSTM(lstm_hidden_dim = 512, lstm_num_layers = 3, class_num=8)        
train(model_in = model, num_epochs = 100, load_model = False, freeze_extractor = False)

For comparing different encoder network, decoder network and video clips of different length you can check the code in detail and modify the corresponding code. The model with best performance is the ResNet50-LSTM3 model, the trained model path is here: link

To infer the model with data without labels, you can put all your images into a dir called "inference_data" as structrue:

inference_data/
  - clip1
      -frame00.png
      -frame01.png
      - ...
  - clip2
      -frame00.png
      -frame01.png
      - ...
  - ...

Then you can start inference by comment the traning code and uncomment the inference code:

model = CLSTM(lstm_hidden_dim = 512, lstm_num_layers = 3, class_num=8)      
infer(model)

or if your want to run inference with tensorrt (you have to install torch2trt before. Besides, the original torch2trt dosen't have LSTM implementation on TRT, so you have to write the corresponding converter as link)

python models/CNN_LSTM_model_infer_RT.py

YOLO-ResNet-LSTM model:

To run YOLO network to detect the light before feeding into ResNet-LSTM: You have to install YOLO as instruction: link Then you can use the code /YOLO_models/cut_rearlight_box.py to get the bbox of lights, then either cut out the ROI or feed as masks. The pre-trained YOLOv4 model: link. And you can create a new dataset folder named "YOLO_mask_dataset" which has the same structure of train/valid/test Then you can start training by run script:

python models/CNN_LSTM_model_mask.py

However, the result of this model is worse, which means the YOLO network misleads the classifier.

Result:

Statics:

We choose a model with best performance: ResNet50-LSTM3 and find that when a longer sequence is applied only for the test phase while in the training phase the short sequence is applied, the classification result could be even better.

Visualized result:

Confusion matrix:

left figure is the confusion matrix of model inferring with 10 frames while right figure is with 20 frames.

ROC curve:

left figure is the roc curve of model inferring with 10 frames while right figure is with 20 frames.

Dataset:

To train a neural network, a significant role is the dataset. For vehicle signal lights states classification, we have two types of datasets to use, which we call the "end-to-end method dataset" for directly classifying states from frames and the "detection-based method dataset" for detecting the lights' bounding box first.

The end-to-end method dataset

We used a public dataset: link, some examples of each class is shown:

The detection-based method dataset

We labeled 715 images with bbox of lights: link, some examples of each class is shown:

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
YOLO_model		YOLO_model
__pycache__		__pycache__
dataloader		dataloader
models		models
readme		readme
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vehicle signal lights states classification

Intro:

Proposed model

ResNet-LSTM network

YOLO-ResNet-LSTM network

Usage:

Config enviroment:

Train model:

ResNet-LSTM model:

YOLO-ResNet-LSTM model:

Result:

Statics:

Visualized result:

Confusion matrix:

ROC curve:

Dataset:

The end-to-end method dataset

The detection-based method dataset

About

Releases

Packages

Languages

chrisHuxi/vehicle-signal-lights-states-classification

Folders and files

Latest commit

History

Repository files navigation

vehicle signal lights states classification

Intro:

Proposed model

ResNet-LSTM network

YOLO-ResNet-LSTM network

Usage:

Config enviroment:

Train model:

ResNet-LSTM model:

YOLO-ResNet-LSTM model:

Result:

Statics:

Visualized result:

Confusion matrix:

ROC curve:

Dataset:

The end-to-end method dataset

The detection-based method dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages