Skip to content

Bissmella/Small-object-detection-transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal transformer using cross-channel attention for object detection in remote sensing images

This repo contains the official PyTorch implementation for the ICIP 2024 paper: 'Multimodal transformer using cross-channel attention for object detection in remote sensing images' (paper).

Brief Introduction

  • Cross-channel attention fuses multi-sensory data (RGB, IR) using cross-attention while taking into account two channels at a time. The fused output of cross-channel attention is then used for object detection.
  • SWIN backbone is used but enhanced with convolutional layer in non-shifting block which acts as an additional support to the SWIN's shifting mechanism.
  • The proposed model consists of cross-channel attention, enhanced SWIN-like backbone, and yolo-5 based detection head.

Data Preparation

  • We train and evaluate our model on VEDAI dataset which includes aerial images of two RGB and IR channels. The VEDAI dataset can be downloaded from (here.)
  • Please prepare the original VEDAI dataset using the 'data_transform.py' file.

Citation

If you find the idea useful or inspiring, please consider citing:

@article{bahaduri2023multimodal,
  title={Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images},
  author={Bahaduri, Bissmella and Ming, Zuheng and Feng, Fangchen and Mokraou, Anissa},
  journal={arXiv preprint arXiv:2310.13876},
  year={2023}
}

Acknowledgement

Our code is heavily based on previous works, including SuperYOLO and YOLOv5 thanks to their authors open-sourcing their implementation codes!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages