Skip to content

3D pedestrian detection in farmland by monocular RGB image and infrared sensing

Notifications You must be signed in to change notification settings

dfqytcom/3D-Pedestrian-Detection-in-Farmland

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

3D-Pedestrian-Detection-in-Farmland

Paper: 3D Pedestrian Detection in Farmland by Monocular RGB Image and Far-Infrared Sensing. Remote Sens. 2021, 13(15).
Wei Tian, Zhenwen Deng, Dong Yin, Zehan Zheng, Yuyao Huang and Xin Bi. [PDF]

This repository is the PyTorch implementation for the framework of 3D pedestrian detection in farmland, which includes D5F: Dual-Input Depth-guided Dynamic-Depthwise-Dilated Fusion network.

Abstract

The automated driving of agricultural machinery is of great significance for the agricultural production efficiency, yet is still challenging due to the significantly varied environmental conditions through day and night. To address operation safety for pedestrians in farmland, this paper proposes a 3D person sensing approach based on monocular RGB and Far-Infrared (FIR) images. Since public available datasets for agricultural 3D pedestrian detection are scarce, a new dataset is proposed, named as “FieldSafePedestrian”, which includes field images in both day and night. The implemented data augmentations of night images and semi-automatic labeling approach are also elaborated to facilitate the 3D annotation of pedestrians. To fuse heterogeneous images of sensors with non- parallel optical axis, the Dual-Input Depth-Guided Dynamic-Depthwise-Dilated Fusion network (D5F) is proposed, which assists the pixel alignment between FIR and RGB images with estimated depth information and deploys a dynamic filtering to guide the heterogeneous information fusion. Experiments on field images in both daytime and nighttime demonstrate that compared with the state-of-the-arts, the dynamic aligned image fusion achieves an accuracy gain of 3.9% and 4.5% in terms of center distance and BEV-IOU, respectively, without affecting the run-time efficiency.

Framework

Architecture

The framework of 3D pedestrian detection in farmland. Dataset generation and pedestrian detection are included in this work. The infrared image pixels are aligned with RGB image pixels by the estimated depth map. Concatenated image channels and transferred pseudo point clouds are respectively imported to convolution networks to extract deep semantic features, which are further fused to estimate the 3D cylinder label of each pedestrian in agricultural field.

FieldSafePedestrian Dataset

Google Drive | Baidu Pan (password: j12t)

  • We have uploaded some examples of the FieldSafePedestrian dataset, containing 500 paired monocular RGB images and Far-IR images.

  • The RGB daytime images have corresponding nighttime images generated by the CycleGAN network as data augmentation.

  • Annotation Format: score classification x12d y12d x22d y22d r h x3d y3d z3d

Citation

If you find this project useful in your research, please consider citing us.

@Article{rs13152896,
AUTHOR = {Tian, Wei and Deng, Zhenwen and Yin, Dong and Zheng, Zehan and Huang, Yuyao and Bi, Xin},
TITLE = {3D Pedestrian Detection in Farmland by Monocular RGB Image and Far-Infrared Sensing},
JOURNAL = {Remote Sensing},
VOLUME = {13},
YEAR = {2021},
NUMBER = {15},
ARTICLE-NUMBER = {2896},
URL = {https://www.mdpi.com/2072-4292/13/15/2896},
ISSN = {2072-4292},
DOI = {10.3390/rs13152896}
}

About

3D pedestrian detection in farmland by monocular RGB image and infrared sensing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published