An adventure through Inpainting hidden part of vehicles with Deep Learning!
Explore the report »
View Mid-semester results
·
View End-semester results
·
Table of Contents
In urban traffic analysis, the accurate detection of vehicles plays a crucial role in generating reliable statistics for various city management applications. However, occlusions occurring in densely populated city environments pose significant challenges to vehicle detection algorithms, leading to reduced detection rates and compromised data accuracy. To try to address this issue, we compared two of the novel models that leverage machine learning techniques for inpainting occluded vehicles, with the goal of improving the overall detection rate and enhancing the reliability of city statistics.
De-occlusion involves a two-step process:
- Occlusion detection (segmentation)
- Inpainting (image completion)
First, an occlusion detection algorithm is employed to identify regions within the traffic scene that contain occluded vehicles.
Second, is an Inpainting model that given the occluded part, completes this hidden image fraction.
We exclusively focused on the inpainting part of images for this project.
One machine learning model, namely Repaint, is trained using only a dataset of non-occluded vehicle images and the second, namely AOT-GAN, also needs masks of the occluded part. The models learn to inpaint the occluded regions based on the available visual information. By leveraging contextual cues and vehicle appearance patterns, the model should effectively generate plausible completions of the occluded regions, restoring the missing vehicle details.
We wanted to evaluate the proposed method on a comprehensive dataset of UAV point of view single vehicles in urban traffic scenes. Finetune it with another dataset of the LUTS lab. Then comparing the ground truth image with the inpainted image to compare the results. An interesting future evaluation could be comparing the detection performance before and after applying the inpainting technique.
Here are the major frameworks/libraries used in our project.
This project was based on the following GitHub repositories.
Main Machine Learning Models used:
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models
- Guided Diffusion (Repaint training pipeline)
- AOT-GAN for High-Resolution Image Inpainting
Evaluation Metrics:
2D Shape Generator:
To get a local copy up and running follow these simple steps.
We need to setup, for both model, the environment, download the datasets and the models, and then run the code.
Get the latest version of Python, PyTorch and pip installed on your machine.
We also highly suggest the use of conda to manage the virtual environments.
We recommend using a virtual environment to run the code of each model individually.
To fetch the code from GitHub, you need to have git installed on your machine.
You can download this repo from the following command:
git clone https://github.com/2Tricky4u/Bachelor-Thesis-De-occlusion-of-occluded-vehicle-images-from-drone-video
- Go to the Repaint folder
cd Models/Repaint
- When in the appropriate virtual environment, install the requirements:
You should be ready to use Repaint if your machine has a GPU with CUDA support.
pip install numpy torch blobfile tqdm pyYaml pillow
Go to the Repaint Usage section to see how to use it.
- Go to the Guided Diffusion folder
cd Models/Guided-Diffusion
- When in the appropriate virtual environment, install the requirements:
pip install -e .
- You also need the following Message Passing Interface (MPI) library:
You should be ready to use Guided Diffusion and train a model for Repaint (if your machine has a GPU with CUDA support.)
pip install mpi4py
Go to the Guided Diffusion Usage section to see how to use it and launch training.
-
Go to the AOT-GAN folder
cd Models/AOT-GAN
-
With conda, create a new virtual environment and install the requirements:
conda env create -f environment.yml
-
Activate the environment
conda activate inpainting
You should be ready to use AOT-GAN for inference and training (if your machine has a GPU with CUDA support.)
Go to the AOT-GAN Usage section to see how to use it.
- Go to the Evaluation Metrics folder
cd Metrics
- When in the appropriate virtual environment, install the requirements:
You should be ready to use the Evaluation Metrics.
pip install piq
Go to the Evaluation Metrics Usage section to see how to use it.
If any of the above steps fail, please refer to the official documentation of the respective model for more information.
There, are our generated datasets for the occlusion problem Here.
To create a dataset, you need to run the script main.py
in the Dataset_creation\scripts
folder:
- Go to the Dataset_creation folder
cd Dataset_creation\scripts
- Run the script
python main.py [options -> see below]
You need to specify the input, a folder containing the images of your future dataset, and the output, a folder where the dataset will be created.
The script will create two folder with the name train
for the training part and one with the name test
for the test part of the dataset in the output folder.
Each of thees folder will be composed of a folder named gt
for ground truth images and a folder named mask
for the masks.
For more information about the options, you should look at the dataset creation section of our report.
I would highly suggest to set the resize option to false, as it will take a lot of RAM to resize the images in the fly of the process and can cause unexpected results.
We provided a standalone script to address this problem.
To resize the images of a folder using the standalone script, you need to copy the script resize.py
in the Dataset_creation\scripts\standalone_script
folder and paste it in the folder containing the images you want to resize. It is coded in a way that the folder you paste resizer.py
should contains two other folder named gt
and mask
You can edit the beginning of the script to change the size of the images and the background color.
import cv2
import numpy as np
import pathlib
import os
bg_color = [0, 0, 0] <--- Background color
square_dim = 128 <--- Size of the images (square)
def get_files_from_folder(path):
files = os.listdir(path)
return np.asarray(files)
...
Then you need to run the script resize.py
with the following command:
python resizer.py
This will create a folder named new
containing the resized images in the same folder
This script is used to split images which contains too much green in them.
To use it, you need to copy the script green_splitter.py
in the Dataset_creation\scripts\standalone_script
folder and paste it in the folder containing the images you want to split. It is coded in a way that the folder you paste green_splitter.py
should contains two other folder named gt
and mask
You need to edit the beginning of the script to change the input and output path.
import math
import os
import cv2
import numpy as np
original_path = "./v_patches/" <--- Path to the folder containing the images
path = "./resized/" <--- Path for the outputs
Then you need to run the script green_splitter.py
with the following command:
python green_splitter.py
The script will create a folder named green
containing the images with too much green in them and a folder named as you name it in path containing the other images.
Here we will show how to use each model and how to train them.
To launch the inferrence of the model, you need to run the script tes.py
in the Models\RePaint
folder:
python test.py --conf_path confs/face_train.yml
You can change the configuration file to use another model or change the parameters
Here is the configuration file with some appropriate comment how to configure it:
# Copyright (c) 2022 Huawei Technologies Co., Ltd.
# Licensed under CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International) (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
#
# The code is released for academic research use only. For commercial use, please contact Huawei Technologies Co., Ltd.
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This repository was forked from https://github.com/openai/guided-diffusion, which is under the MIT license
attention_resolutions: 16,8
class_cond: false # Use conditioner
diffusion_steps: 4000 # To Not Adapt
learn_sigma: true
noise_schedule: linear #Choose the noise schedule
num_channels: 128
num_head_channels: -1
num_heads: 4
num_res_blocks: 2
resblock_updown: false
use_fp16: false
use_scale_shift_norm: true
classifier_scale: 4.0
lr_kernel_n_std: 2
num_samples: 100
show_progress: true
timestep_respacing: '250'
use_kl: false
predict_xstart: false
rescale_timesteps: false
rescale_learned_sigmas: false
classifier_use_fp16: false
classifier_width: 128
classifier_depth: 2
classifier_attention_resolutions: 16,8
classifier_use_scale_shift_norm: true
classifier_resblock_updown: false
classifier_pool: attention
num_heads_upsample: -1
channel_mult: ''
dropout: 0.0
use_checkpoint: false
use_new_attention_order: false
clip_denoised: true
use_ddim: false
latex_name: RePaint
method_name: Repaint
image_size: 128
model_path: ~/occlusion_removal/guided-diffusion/model000000.pt #~/occlusion_removal/guided-diffusion/openai-2023-06-07-22-46-09-696639/ema_0.9999_000000.pt #model000000.pt #./data/pretrained/256x256_diffusion.pt #./data/pretrained/256x256_diffusion_uncond.pt #./data/pretrained/celeba256_250000.pt
name: face_example
inpa_inj_sched_prev: true
n_jobs: 1
print_estimated_vars: true
inpa_inj_sched_prev_cumnoise: false
schedule_jump_params:
t_T: 200 # 250 # YT : Reduce the total number of steps (w/o resampling) to remove more noise per step
# when t_T = 50 -> 385 iterations / t_T = 200 -> 3710
n_sample: 1
jump_length: 5 # 10 # YT : Reduce it to resample fewer times
jump_n_sample: 10 # YT : Apply resampling not from the beginning but only after a specific time
data:
eval:
paper_face_mask:
mask_loader: true
gt_path: ./data/datasets/gts/face
mask_path: ./data/datasets/gt_keep_masks/face
image_size: 128
class_cond: false
deterministic: true
random_crop: false
random_flip: false
return_dict: true
drop_last: false
batch_size: 16
return_dataloader: true
offset: 0
max_len: 1 # 8 # YT : They iterate over paths and sample <max_len> images
# as for the face_example, only 1 image is available -> set max_len to 1 otherwise they sample 8 times the same gt and the mask
paths:
srs: ./log/face_example/inpainted
lrs: ./log/face_example/gt_masked
gts: ./log/face_example/gt
gt_keep_masks: ./log/face_example/gt_keep_mask
This conf file NEED to be kept the same for the training and the inference even if we infer on Repaint model and train on Guided Diffusion model
To train the model, you need to do it exclusively from the next model, ie Guided Diffusion.
To train a model fo repaint you need to launch the script images_train.py
in the Models\Guided-DIffusion-Fixed-for-Repaint
folder:
python images_train.py --conf_path confs/face_train.yml
To train AOT-GAN you need to launch the script train.py
in the Models\AOT-GAN
folder:
python train.py --dir_image "./data/VRAI_128_WonBL/new/gt" --dir_mask "./data/VRAI_128_WonBL/new/mask" --data_train "" --data_test "" --mask_type "" --image_size 128 --batch_size 16 --print_every 1000 --save_every 1000 --pre_train "./experiments/aotgan_128/" --resume
To infer with AOT-GAN you need to launch the script test.py
in the Models\AOT-GAN
folder:
python test.py --pre_train "../experiments/G0000000.pt" --dir_mask "data/mask2" --dir_image "data/gt"
To evaluate AOT-GAN you need to launch the script eval.py
in the Models\AOT-GAN
folder:
python eval.py --real_dir "data/GT/1" --fake_dir "data/AOT_big/1" --metric mae psnr ssim fid
To evaluates some images you need to launch the script main.py
in the Metrics
folder:
python main.py
You can configure the script by changing the config.yaml
file in the Metrics\src\config
folder.
# data parameters
dataset_name: car256 # used for saving csv. Easy to remember which dataset you're evalulating, especially, if you're using multiple datasets.
dataset_with_subfolders: False # True
dataset_format: image # file_list # file_list is not implemented. I will implement it later. Easier to implement.
multiple_evaluation: False
generated_image_path: ./data/AOT_big/patch/1
ground_truth_image_path: ./data/GT/patch_big/1
return_dataset_name: False # Currently, no use. In future, it will be used for multi-testing.
# experiment
exp_type: ablation
model_name: difnet
# processing parameters
batch_size: 1 # set according to your GPU/CPU
image_shape: [ 256, 256, 3 ] # set according to your need.
random_crop: False # currently, not implemented. In future, it will be used to evaluate patches.
threads: 4 # set according to your CPU.
# print option
print_interval_frequency: 1
show_config: True
# save options
save_results: True
save_results_path: ./logs
save_file_name: metrics
save_type: csv
- Provide a script to create occlusion dataset
- Generate multiples datasets for vehicles (UAV POV)
- Selected and study two of the newest ML model for inpainting
- Setup two different pipelines to run/train those models
- Setup a pipeline to evaluate the models
- Evaluate the mid-semester results of the models
- Train successfully, with expected results, the models on the created datasets
- Repaint
- AOT-GAN (doesn't converge)
Xavier Ogay - website - xavier.ogay@epfl.ch
Mahmoud Dokmak - mahmoud.dokmak@epfl.ch
Project Link: https://github.com/2Tricky4u/Bachelor-Thesis-De-occlusion-of-occluded-vehicle-images-from-drone-video
We would like to thank our supervisors, Yura Tak, Robert Fonod and Prof. Geroliminis, for their guidance and support throughout the project.
This browser does not support PDFs. Please download the PDF to view it: Download PDF.
https://drive.google.com/drive/folders/1OmV5iyZMOzC-QUi7ynDh4HsonT9wst51?usp=sharing