Skip to content

Master in Computer Vision - M5 Visual recognition

Notifications You must be signed in to change notification settings

BourbonCreams/mcv-m5

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

=== WEEK 1 ===

Title of the project

M5 Project: Scene Understanding for Autonomous Vehicles

Name of the group

Team 9 (also known as למלם, pronounced "Lam Lam")

Name and contact email of all the team members

Lorenzo Betto: smemo23@gmail.com
Noa Mor: noamor87@gmail.com
Ana Caballero Cano: ana.caballero.cano@gmail.com
Ivan Caminal Colell: ivancaminal72@gmail.com

Short abstract about the project (2 lines max)

The goal of the project is to perform object detection, segmentation and recognition using Deep Learning in the context of scene understanding for autonomous vehicles. The network's goal will be to successfully compute the aforementioned tasks on classes like pedestrians, vehicles, road, etc.

Link to the overleaf

Link to the Overleaf article, i.e. the report of the project.

Summary of the two papers

VGG, SqueezeNet.

Tasks:

Task A Understand the project

We read and understand the slides and Overleaf document

Task B Form teams

Team members are those named above.

Task C Install and setup the development framework

We installed the necessary software.

=== WEEK 2 ===

Short abstract about what you implemented (5 lines max)

Task A: Bash file to output the number of samples of each folder.
Task B: We ran the code for KITTI dataset, for training and validation
Task Cii: We implemented a new CNN (LamLam) with two parallel sequential processes of convolutional layers.
Task E: We wrote the report

Short explanation of the code in the repository

Task A: We have created a bash script that returns 3 txt (train, test, val) that contain a list "subfolder_name; number_of_images".
Task B: We ran the code for the KITTY, for trainning and validation. Not for test
Task Cii: Our own CNN implementation, we named it LamLam (as our team). It has two parallel sequential processes of convolutional layers of different sizes that allow to capture two different types of information.

Results of the different experiments

All experiments results

Instructions for using the code

CUDA_VISIBLE_DEVICES=0 python train.py -c config/dataset.py -e expName

Indicate the level of completeness of the goals of this week

100%

Link to the Google Slide presentation

Slides Week 2

Link to a Google Drive with the weights of the model

Weights of the trained models

Tasks:

Task A Run the provided code

  • Analyze the dataset: The images are 64x64 pixels and differ in point of view, background, illumination and HUE. Furthermore, some images are slightly blurred.

  • Count the number of samples per class: 16527 for training,
    644 for validation
    8190 for testing.

To know the number of samples per class follow the link:
Google Sheets

  • Accuracy of train/test Accuracy Train: 97.7 %;
    Accuracy Test: 95.2 %
    The accuracy of train is better than in the test set, as expected.

  • For this case which one provides better results, crop or resize? On this dataset crop useless because images are already cropped, so resize is better.

  • Where does the mean subtraction takes place? The mean subtraction takes place in the ImageDataGenerator, setting norm_featurewise_center to ‘True’.

  • Fine-tune the classification for the Belgium traffic signs dataset. Custom accuracy:
    Accuracy with Belgium traffic signs dataset:
    Custom loss:
    Loss with Belgium traffic signs dataset:

Task B Train the network on a different dataset

We ran the KITTI dataset for the training and the validation datasets since the test set is private and we can't access it.

Task Cii Implement a new network

We used a CNN that was tested in the Machine Learning course of the same Master program. Such architecture is shown in

and it performed well with a classification problem that involved scenery images.

The idea that led to the development of a network with two parallel sequential processes of convolutional layers of different sizes was to allow to capture two different types of information, the first one being the small details and texture and the second one to capture the composition and details in the bigger picture.

The model's parameters were optimized using a random search when the model was first used, i.e. in the Machine Learning course.

Task D Boost the performance of your network

We boost the performance of the network by using a SPP layer (Spatial Pyramid Pooling) instead of a costum pooling layer in the end of each tower (to concatenate the two towers, their shape must agree).
In addition this layer makes the model independent from the image size.
The Training is done over TT100K dataset and testing is done over the Belgium database. On the way to try to create a generic model.

=== WEEK 3-4 ===

Short abstract about what you implemented (5 lines max)

Task A: We fixed some errors to be able to run the code.
Task B: We red two articled and did a summary.
Task C: SSD object detector, using Keras
Task D: Evaluation of udacity dataset
Task E: Boost the performance throught data augmentation

Short explanation of the code in the repository

Task A: YOLO object detector
We fixed some errors to be able to run the code. We got this results

The dataset was also analyzed:
The number of signs in the annotation files do not always includes all the traffic sign exist in the image.
They differ in the number of traffic signs, the orientation and illumination.

Task B: -
Task C: SSD object detector, Keras implementation
Task D: Run Udacity dataset for epochs to 40 and tune the colors (saturation) to solve the challenges of the dataset.
Task E: Data augmentation

Results of the different experiments

All experiments results

Instructions for using the code

CUDA_VISIBLE_DEVICES=0 python train.py -c config/dataset.py -e expName

Indicate the level of completeness of the goals of this week

100%

Link to a Google Drive with the weights of the trained models

Weights of the trained models

Tasks:

Task A Run the provided code

We fixed some errors to be able to run the initial code.
From the dataset analysis we can conclude that:

TT100k dataset: The number of signs in the annotation files do not always includes all the traffic sign that exist in the image, the selective choices are not clear. The images differ in the number of traffic signs, orientation and illumination.
Udacity dataset: consist of urban images while the camera always facing toward to road including the dashboard. There is a big difference between the train and test images.
Training images are in strong mid-day light, mostly saturated colors, shades and reflective light (e.g. reflective light from the windshield). While test images have more vivid colors in different time of day.

There is a large variance in the luminance in the photos, has a lot of un balance and disorders in the luminance (for example : reflective light from the windshield).
Propose (and implement) solutions.
A solution can be to pre process the images - tunning the colors (saturation).
Or training using data augmentation on the color channel - creating more variance in the color spectrum.

Task B Read two papers

Summaries:
YOLO: You only Look Once
SSD: Single Shot MultiBox Detector

Task Ci Implement a new network

Implementation found in this Github repo.

Task D Train the networks on a different dataset

Set-up new experiments files to detect among cars, pedestrians, and trucks on the Udacity dataset, Train and evaluate
Increment the number of epochs to 40. We ran the yolo with plain udacity (for comparison with TT100K)

Task E Boost the performance of your network

We boosted the performance of the network by implementing the previous solution (pro process images changing color saturation) and applying data augmentation.

=== WEEK 5-6 ===

Short abstract about what you implemented (5 lines max)

Task (a): Run the provided code (YOLO-Traffic Sign)
Task (B): Read two papers: FCN for Semantic Segmentation and U-Net.
Task (c): Implement a new network, modifying a keras implemented
Task (d): Train the network on a different dataset

Short explanation of the code in the repository

Task (a): We used the preconfigured experiment file (camvid_segmentation.py) to segment objects with the FCN8 model. Then we analysed the dataset and evaluate on train, val and test sets.
Task (B): -
Task (c): We implemented a new network,U-Net, modifying a keras implemented.
Task (d): Train the network on a different datasets, kitti, cambid, Cityscapes and Synthia_rand_cityscapes

Results of the different experiments

All experiments results

Instructions for using the code

CUDA_VISIBLE_DEVICES=0 python train.py -c config/dataset.py -e expName

Indicate the level of completeness of the goals of this week

100%

Link to a Google Drive with the weights of the model

Google Slides

Tasks:

Task A Run the provided code

We fixed some errors to be able to run the initial code From the dataset analysis we can conclude that:

Kitti and Camvid have the same classes.
Kitti dataset: the images are taken at the same time of day and there is an important contrast between sun and shadow, making it difficult to segment them.
Camvid dataset: the streets are more similar, but differ at the time of day, that we can observe in the lighting.

Cityscapes and Synthia_rand_cityscapes have the same classes.
Cityscapes dataset: it is similar in terms of illumination and color even though they are from different german cities. These images differ more in content (pedestrians, bicycles, vehicles, etc)
Synthia_rand_cityscapes differs on the scene. There are groups of the same scene which differ on the moment of the day. We can observe it on the illumination, shadows or even the rain.

Task B Read two papers

FCN and U-Net

Task C Implement a new network **

Unet network was implemented inspired by zhixuhao's github repository, adding image mirror padding layer for a full image segmentation network.

Task D Train the networks on a different dataset

We trained the network with four datasets: Kitti, Cambid, Cityscapes and Synthia Cityscapes.
As Kitti and Camvid contain the same number of classes we tried to use the fine tuned weights of fcn8 on Camvid to test the performance in Kitti but we did not obtain good results.
Synthia and Cityscapes also have the same number of classes. The former contains synthetic images, the latter contains real ones.

Task E Boost the performance of your network

We tried to do data augmentation on the color channels by adding for each pixel value in every channel a random value of intensity from a uniform distribution with a margin of ±20% but unfortunately did not improve the jaccard metric. Additionally we implement a weighted cross entropy loss function which take into account classes frequency in the unbalance datasets.

=== WEEK 7 ===

Final Slides

Report

About

Master in Computer Vision - M5 Visual recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 95.1%
  • Roff 4.6%
  • Shell 0.3%