Skip to content

A personal investigative project to track the latest progress in the field of multi-modal object tracking.

License

Notifications You must be signed in to change notification settings

983632847/Awesome-Multimodal-Object-Tracking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Multi-modal Object Tracking (MMOT)


Awesome Multi-modal Object Tracking (MMOT)

A continuously updated project to track the latest progress in multi-modal object tracking.

If this repository can bring you some inspiration, we would feel greatly honored.

If you like our project, please give us a star ⭐ on this GitHub.

If you have any suggestions, please feel free to contact: andyzhangchunhui@gmail.com.

💥 Highlights

  • 2024.05.30: The Paper of WebUOT-1M was Online arXiv.
  • 2024.05.24: The Report of Awesome MMOT Project was Online arXiv 知乎.
  • 2024.05.20: Awesome MMOT Project Started.

Contents

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2024awesome,
  title={Awesome Multi-modal Object Tracking},
  author={Zhang, Chunhui and Liu, Li and Wen, Hao and Zhou, Xi and Wang, Yanfeng},
  journal={arXiv preprint arXiv:2405.14200},
  year={2024}
}

Survey

  • Pengyu Zhang, Dong Wang, Huchuan Lu.
    "Multi-modal Visual Tracking: Review and Experimental Comparison." ArXiv (2022). [paper]

  • Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu.
    "A Survey for Deep RGBT Tracking." ArXiv (2022). [paper]

  • Jinyu Yang, Zhe Li, Song Yan, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen, Ling Shao.
    "RGBD Object Tracking: An In-depth Review." ArXiv (2022). [paper]

  • Chenglong Li, Andong Lu, Lei Liu, Jin Tang.
    "Multi-modal visual tracking: a survey. 多模态视觉跟踪方法综述" Journal of Image and Graphics.中国图象图形学报 (2023). [paper]

  • Ou Zhou, Ying Ge, Zhang Dawei, and Zheng Zhonglong.
    "A Survey of RGB-Depth Object Tracking. RGB-D 目标跟踪综述" Journal of Computer-Aided Design & Computer Graphics. 计算机辅助设计与图形学学报 (2024). [paper]

  • Zhang, ZhiHao and Wang, Jun and Zang, Zhuli and Jin, Lei and Li, Shengjie and Wu, Hao and Zhao, Jian and Bo, Zhang.
    "Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective." ACM Transactions on Multimedia Computing, Communications and Applications (2024). [paper]

  • MV-RGBT & MoETrack: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler.
    "Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method." ArXiv (2024). [paper] [code]

  • Xingchen Zhang and Ping Ye and Henry Leung and Ke Gong and Gang Xiao.
    "Object fusion tracking based on visible and infrared images: A comprehensive review." Information Fusion (2024). [paper]

  • Mingzheng Feng and Jianbo Su.
    "RGBT tracking: A comprehensive review." Information Fusion (2024). [paper]

Vision-Language Tracking

Datasets

Dataset Pub. & Date WebSite Introduction
OTB99-L CVPR-2017 OTB99-L 99 videos
LaSOT CVPR-2019 LaSOT 1400 videos
LaSOT_EXT IJCV-2021 LaSOT_EXT 150 videos
TNL2K CVPR-2021 TNL2K 2000 videos
WebUAV-3M TPAMI-2023 WebUAV-3M 4500 videos, 3.3 million frames, UAV tracking, vision-language-audio
MGIT NeurIPS-2023 MGIT 150 long video sequences, 2.03 million frames, three semantic grains (i.e., action, activity, and story)
VastTrack arXiv-2024 VastTrack 50,610 video sequences, 4.2 million frames, 2,115 classes
WebUOT-1M arXiv-2024 WebUOT-1M The first million-scale underwater object tracking dataset contains 1,500 video sequences, 1.1 million frames

Papers

2024

  • Tapall.ai: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng.
    "1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation." ArXiv (2024). [paper] [code]

  • DTLLM-VLT: Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang.
    "DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM." CVPRW (2024). [paper]

  • UVLTrack: Yinchao Ma, Yuyang Tang, Wenfei Yang, Tianzhu Zhang, Jinpeng Zhang, Mengxue Kang.
    "Unifying Visual and Vision-Language Tracking via Contrastive Learning." AAAI (2024). [paper] [code]

  • QueryNLT: Yanyan Shao, Shuting He, Qi Ye, Yuchao Feng, Wenhan Luo, Jiming Chen.
    "Context-Aware Integration of Language and Visual References for Natural Language Tracking." CVPR (2024). [paper] [code]

  • OSDT: Guangtong Zhang, Bineng Zhong, Qihua Liang, Zhiyi Mo, Ning Li, Shuxiang Song.
    "One-Stream Stepwise Decreasing for Vision-Language Tracking." TCSVT (2024). [paper]

  • TTCTrack: Zhongjie Mao; Yucheng Wang; Xi Chen; Jia Yan.
    "Textual Tokens Classification for Multi-Modal Alignment in Vision-Language Tracking." ICASSP (2024). [paper]

  • MMTrack: Zheng, Yaozong and Zhong, Bineng and Liang, Qihua and Li, Guorong and Ji, Rongrong and Li, Xianxian.
    "Toward Unified Token Learning for Vision-Language Tracking." TCSVT (2024). [paper]

  • Ping Ye, Gang Xiao, Jun Liu .
    "Multimodal Features Alignment for Vision–Language Object Tracking." Remote Sensing (2024). [paper]

2023

  • All in One: Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang.
    "All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment." ACM MM (2023). [paper] [code]

  • CiteTracker: Xin Li, Yuqing Huang, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang.
    "CiteTracker: Correlating Image and Text for Visual Tracking." ICCV (2023). [paper] [code]

  • JointNLT: Li Zhou, Zikun Zhou, Kaige Mao, Zhenyu He.
    "Joint Visual Grounding and Tracking with Natural Language Specifcation." CVPR (2023). [paper] [code]

  • DecoupleTNL: Ma, Ding and Wu, Xiangqian.
    "Tracking by Natural Language Specification with Long Short-term Context Decoupling." ICCV (2023). [paper]

  • Haojie Zhao, Xiao Wang, Dong Wang, Huchuan Lu, Xiang Ruan.
    "Transformer vision-language tracking via proxy token guided cross-modal fusion." PRL (2023). [paper]

  • OVLM: Zhang, Huanlong and Wang, Jingchao and Zhang, Jianwei and Zhang, Tianzhu and Zhong, Bineng.
    "One-Stream Vision-Language Memory Network for Object Tracking." TMM (2023). [paper]

  • SATracker: Jiawei Ge, Xiangmei Chen, Jiuxin Cao, Xuelin Zhu, Bo Liu.
    "Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking." ArXiv (2023). [paper]

  • VLATrack: Zuo, Jixiang and Wu, Tao and Shi, Meiping and Liu, Xueyan and Zhao, Xijun.
    "Multi-Modal Object Tracking with Vision-Language Adaptive Fusion and Alignment." RICAI (2023). [paper]

  • VLT_TT: Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan.
    "Divert More Attention to Vision-Language Object Tracking." ArXiv (2023). [paper] [code]

2022

  • VLT_TT: Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing.
    "Divert More Attention to Vision-Language Tracking." NeurIPS (2022). [paper] [code]

  • AdaRS: Li, Yihao and Yu, Jun and Cai, Zhongpeng and Pan, Yuwen.
    "Cross-modal Target Retrieval for Tracking by Natural Language." CVPR Workshops (2022). [paper]

2021

  • SNLT: Qi Feng, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff.
    "Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers." CVPR (2021). [paper] [code]

RGBE Tracking

Datasets

Dataset Pub. & Date WebSite Introduction
FE108 ICCV-2021 FE108 108 event videos
COESOT arXiv-2022 COESOT 1354 RGB-event video pairs
VisEvent TC-2023 VisEvent 820 RGB-event video pairs
EventVOT CVPR-2023 EventVOT 1141 event videos
CRSOT arXiv-2024 CRSOT 1030 RGB-event video pairs
FELT arXiv-2024 FELT 742 RGB-event video pairs

Papers

2024

  • Mamba-FETrack: Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang.
    "Mamba-FETrack: Frame-Event Tracking via State Space Model." ArXiv (2024). [paper] [code]

  • AMTTrack: Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo.
    "Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline." ArXiv (2024). [paper] [code]

  • TENet: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler.
    "TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking." ArXiv (2024). [paper] [code]

  • HDETrack: Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang.
    "Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline." CVPR (2024). [paper] [code]

  • Yabin Zhu, Xiao Wang, Chenglong Li, Bo Jiang, Lin Zhu, Zhixiang Huang, Yonghong Tian, Jin Tang.
    "CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras." ArXiv (2024). [paper] [code]

  • CDFI: Jiqing Zhang, Xin Yang, Yingkai Fu, Xiaopeng Wei, Baocai Yin, Bo Dong.
    "Object Tracking by Jointly Exploiting Frame and Event Domain." ArXiv (2024). [paper]

  • MMHT: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo.
    "Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion." ArXiv (2024). [paper]

2023

  • Zhiyu Zhu, Junhui Hou, Dapeng Oliver Wu.
    "Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers." ICCV (2023). [paper] [code]

  • AFNet: Jiqing Zhang, Yuanchen Wang, Wenxi Liu, Meng Li, Jinpeng Bai, Baocai Yin, Xin Yang.
    "Frame-Event Alignment and Fusion Network for High Frame Rate Tracking." CVPR (2023). [paper] [code]

  • RT-MDNet: Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, Feng Wu.
    "VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows." TC (2023). [paper] [code]

2022

  • Event-tracking: Zhiyu Zhu, Junhui Hou, Xianqiang Lyu.
    "Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds." NeurIPS (2022). [paper] [code]

  • STNet: Jiqing Zhang, Bo Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, Xin Yang.
    "Spiking Transformers for Event-based Single Object Tracking." CVPR (2022). [paper] [code]

  • CEUTrack: Chuanming Tang, Xiao Wang, Ju Huang, Bo Jiang, Lin Zhu, Jianlin Zhang, Yaowei Wang, Yonghong Tian.
    "Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric." ArXiv (2022). [paper] [code]

2021

  • CFE: Jiqing Zhang, Kai Zhao, Bo Dong, Yingkai Fu, Yuxin Wang, Xin Yang, Baocai Yin.
    "Multi-domain Collaborative Feature Representation for Robust Visual Object Tracking." The Visual Computer (2021). [paper] [paper]

RGBD Tracking

Datasets

Dataset Pub. & Date WebSite Introduction
PTB ICCV-2013 PTB 100 sequences
STC TC-2018 STC 36 sequences
CDTB ICCV-2019 CDTB 80 sequences
DepthTrack ICCV-2021 DepthTrack 200 sequences
RGBD1K AAAI-2023 RGBD1K 1,050 sequences, 2.5M frames
DTTD CVPR Workshops-2023 DTTD 103 scenes, 55691 frames
ARKitTrack CVPR-2023 ARKitTrack 300 RGB-D sequences, 455 targets, 229.7K video frames

Papers

2024

  • SSLTrack: Xue-Feng Zhu, Tianyang Xu, Sara Atito, Muhammad Awais, Xiao-Jun Wu, Zhenhua Feng, Josef Kittler.
    "Self-supervised learning for RGB-D object tracking." PR (2024). [paper]

  • VADT: Zhang, Guangtong and Liang, Qihua and Mo, Zhiyi and Li, Ning and Zhong, Bineng.
    "Visual Adapt for RGBD Tracking." ICASSP (2024). [paper]

  • FECD: Xue-Feng Zhu, Tianyang Xu, Xiao-Jun Wu, Josef Kittler.
    "Feature enhancement and coarse-to-fine detection for RGB-D tracking." PRL (2024). [paper]

  • CDAAT: Xue-Feng Zhu, Tianyang Xu, Xiao-Jun Wu, Zhenhua Feng, Josef Kittler.
    "Adaptive Colour-Depth Aware Attention for RGB-D Object Tracking." SPL (2024). [paper] [code]

2023

  • SPT: Xue-Feng Zhu, Tianyang Xu, Zhangyong Tang, Zucheng Wu, Haodong Liu, Xiao Yang, Xiao-Jun Wu, Josef Kittler.
    "RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking." AAAI (2023). [paper] [code]

  • EMT: Yang, Jinyu and Gao, Shang and Li, Zhe and Zheng, Feng and Leonardis, Ale\v{s}.
    "Resource-Effcient RGBD Aerial Tracking." CVPR (2023). [paper] [code]

2022

  • Track-it-in-3D: Jinyu Yang, Zhongqun Zhang, Zhe Li, Hyung Jin Chang, Aleš Leonardis, Feng Zheng.
    "Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline." ECCV (2022). [paper] [code]

  • DMTracker: Shang Gao, Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song.
    "Learning Dual-Fused Modality-Aware Representations for RGBD Tracking." ECCVW (2022). [paper]

2021

  • DeT: Song Yan, Jinyu Yang, Jani Käpylä, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen.
    "DepthTrack: Unveiling the Power of RGBD Tracking." ICCV (2021). [paper] [code]

  • TSDM: Pengyao Zhao, Quanli Liu, Wei Wang and Qiang Guo.
    "TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator." ICPR (2021). [paper] [code]

  • 3s-RGBD: Feng Xiao, Qiuxia Wu, Han Huang.
    "Single-scale siamese network based RGB-D object tracking with adaptive bounding boxes." Neurocomputing (2021). [paper]

2020

  • DAL: Yanlin Qian, Alan Lukezic, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas.
    "DAL : A deep depth-aware long-term tracker." ICPR (2020). [paper] [code]

  • RF-CFF: Yong Wang, Xian Wei, Hao Shen, Lu Ding, Jiuqing Wan.
    "Robust fusion for RGB-D tracking using CNN features." Applied Soft Computing Journal (2020). [paper]

  • SiamOC: Wenli Zhang, Kun Yang, Yitao Xin, Rui Meng.
    "An Occlusion-Aware RGB-D Visual Object Tracking Method Based on Siamese Network.." ICSP (2020). [paper]

  • WCO: Weichun Liu, Xiaoan Tang, Chengling Zhao.
    "Robust RGBD Tracking via Weighted Convlution Operators." Sensors (2020). [paper]

2019

  • OTR: Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas.
    "Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters." CVPR (2019). [paper] [code]

  • H-FCN: Ming-xin Jiang, Chao Deng, Jing-song Shan, Yuan-yuan Wang, Yin-jie Jia, Xing Sun.
    "Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking." Information Fusion (2019). [paper]

  • Kuai, Yangliu and Wen, Gongjian and Li, Dongdong and Xiao, Jingjing.
    "Target-Aware Correlation Filter Tracking in RGBD Videos." IEEE Sensors Journal (2019). [paper]

  • RGBD-OD: Yujun Xie, Yao Lu, Shuang Gu.
    "RGB-D Object Tracking with Occlusion Detection." CIS (2019). [paper]

  • 3DMS: Alexander Gutev, Carl James Debono.
    "Exploiting Depth Information to Increase Object Tracking Robustness." ICST (2019). [paper]

  • CA3DMS: Ye Liu, Xiao-Yuan Jing, Jianhui Nie, Hao Gao, Jun Liu, Guo-Ping Jiang.
    "Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos." TMM (2019). [paper] [code]

  • Depth-CCF: Guanqun Li, Lei Huang, Peichang Zhang, Qiang Li, YongKai Huo.
    "Depth Information Aided Constrained correlation Filter for Visual Tracking." GSKI (2019). [paper]

2018

  • STC: Jingjing Xiao, Rustam Stolkin, Yuqing Gao, Aleš Leonardis.
    "Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints." TC (2018). [paper] [code]

  • Kart, Uğur and Kämäräinen, Joni-Kristian and Matas, Jiří.
    "How to Make an RGBD Tracker ?." ECCVW (2018). [paper] [code]

  • Jiaxu Leng, Ying Liu.
    "Real-Time RGB-D Visual Tracking With Scale Estimation and Occlusion Handling." IEEE Access (2018). [paper]

  • DM-DCF: Uğur Kart, Joni-Kristian Kämäräinen, Jiří Matas, Lixin Fan, Francesco Cricri.
    "Depth Masked Discriminative Correlation Filter." ICPR (2018). [paper]

  • OACPF: Yayu Zhai, Ping Song, Zonglei Mou, Xiaoxiao Chen, Xiongjun Liu.
    "Occlusion-Aware Correlation Particle FilterTarget Tracking Based on RGBD Data." Access (2018). [paper]

  • RT-KCF: Han Zhang, Meng Cai, Jianxun Li.
    "A Real-time RGB-D tracker based on KCF." CCDC (2018). [paper]

2017

  • ODIOT: Wei-Long Zheng, Shan-Chun Shen, Bao-Liang Lu.
    "Online Depth Image-Based Object Tracking with Sparse Representation and Object Detection." Neural Process Letters (2017). [paper]

  • ROTSL: Zi-ang Ma, Zhi-yu Xiang.
    "Robust Object Tracking with RGBD-based Sparse Learning." ITEE (2017). [paper]

2016

  • DLS: Ning An, Xiao-Guang Zhao, Zeng-Guang Hou.
    "Online RGB-D Tracking via Detection-Learning-Segmentation." ICPR (2016). [paper]

  • DS-KCF_shape: Sion Hannuna, Massimo Camplani, Jake Hall, Majid Mirmehdi, Dima Damen, Tilo Burghardt, Adeline Paiement, Lili Tao.
    "DS-KCF: A Real-time Tracker for RGB-D Data." RTIP (2016). [paper] [code]

  • 3D-T: Adel Bibi, Tianzhu Zhang, Bernard Ghanem.
    "3D Part-Based Sparse Tracker with Automatic Synchronization and Registration." CVPR (2016). [paper] [code]

  • OAPF: Kourosh Meshgia, Shin-ichi Maedaa, Shigeyuki Obaa, Henrik Skibbea, Yu-zhe Lia, Shin Ishii.
    "Occlusion Aware Particle Filter Tracker to Handle Complex and Persistent Occlusions." CVIU (2016). [paper]

2015

  • CDG: Huizhang Shi, Changxin Gao, Nong Sang.
    "Using Consistency of Depth Gradient to Improve Visual Tracking in RGB-D sequences." CAC (2015). [paper]

  • DS-KCF: Massimo Camplani, Sion Hannuna, Majid Mirmehdi, Dima Damen, Adeline Paiement, Lili Tao, Tilo Burghardt.
    "Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling." BMVC (2015). [paper] [code]

  • DOHR: Ping Ding, Yan Song.
    "Robust Object Tracking Using Color and Depth Images with a Depth Based Occlusion Handling and Recovery." FSKD (2015). [paper]

  • ISOD: Yan Chen, Yingju Shen, Xin Liu, Bineng Zhong.
    "3D Object Tracking via Image Sets and Depth-Based Occlusion Detection." SP (2015). [paper]

  • OL3DC: Bineng Zhong, Yingju Shen, Yan Chen, Weibo Xie, Zhen Cui, Hongbo Zhang, Duansheng Chen ,Tian Wang, Xin Liu, Shujuan Peng, Jin Gou, Jixiang Du, Jing Wang, Wenming Zheng.
    "Online Learning 3D Context for Robust Visual Tracking." Neurocomputing (2015). [paper]

2014

  • MCBT: Qi Wang, Jianwu Fang, Yuan Yuan. Multi-Cue Based Tracking.
    "Multi-Cue Based Tracking." Neurocomputing (2014). [paper]

2013

  • PT: Shuran Song, Jianxiong Xiao.
    "Tracking Revisited using RGBD Camera: Unified Benchmark and Baselines." ICCV (2013). [paper] [code]

2012

  • Matteo Munaro, Filippo Basso and Emanuele Menegatti .
    "Tracking people within groups with RGB-D data." IROS (2012). [paper]

  • AMCT: Germán Martín García, Dominik Alexander Klein, Jörg Stückler, Simone Frintrop, Armin B. Cremers.
    "Adaptive Multi-cue 3D Tracking of Arbitrary Objects." JDOS (2012). [paper]

RGBT Tracking

Datasets

Dataset Pub. & Date WebSite Introduction
GTOT TIP-2016 GTOT 50 video pairs, 1.5W frames
RGBT210 ACM MM-2017 RGBT210 210 video pairs
RGBT234 PR-2018 RGBT234 234 video pairs, the extension of RGBT210
LasHeR TIP-2021 LasHeR 1224 video pairs, 730K frames
VTUAV CVPR-2022 VTUAV Visible-thermal UAV tracking, 500 sequences, 1.7 million high-resolution frame pairs
MV-RGBT arXiv-2024 MV-RGBT 122 video pairs, 89.9K frames

Papers

2024

  • MIGTD: Yujue Cai, Xiubao Sui, Guohua Gu, Qian Chen.
    "Multi-modal interaction with token division strategy for RGB-T tracking." PR (2024). [paper]

  • GMMT: Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Xiao-Jun Wu, Josef Kittler.
    "Generative-based Fusion Mechanism for Multi-Modal Tracking." AAAI (2024). [paper] [code]

  • BAT: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu.
    "Bi-directional Adapter for Multi-modal Tracking." AAAI (2024). [paper] [code]

  • ProFormer: Yabin Zhu, Chenglong Li, Xiao Wang, Jin Tang, Zhixiang Huang.
    "RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning." TCSVT (2024). [paper]

  • QueryTrack: Fan, Huijie and Yu, Zhencheng and Wang, Qiang and Fan, Baojie and Tang, Yandong.
    "QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking." TIP (2024). [paper]

  • CAT++: Liu, Lei and Li, Chenglong and Xiao, Yun and Ruan, Rui and Fan, Minghao.
    "RGBT Tracking via Challenge-Based Appearance Disentanglement and Interaction." TIP (2024). [paper]

  • TATrack: Hongyu Wang, Xiaotao Liu, Yifan Li, Meng Sun, Dian Yuan, Jing Liu.
    "Temporal Adaptive RGBT Tracking with Modality Prompt." ArXiv (2024). [paper]

  • MArMOT: Chenglong Li, Tianhao Zhu, Lei Liu, Xiaonan Si, Zilin Fan, Sulan Zhai.
    "Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark." ArXiv (2024). [paper]

  • AMNet: Zhang, Tianlu and He, Xiaoyi and Jiao, Qiang and Zhang, Qiang and Han, Jungong.
    "AMNet: Learning to Align Multi-modality for RGB-T Tracking." TCSVT (2024). [paper]

  • MCTrack: Hu, Xiantao and Zhong, Bineng and Liang, Qihua and Zhang, Shengping and Li, Ning and Li, Xianxian.
    "Towards Modalities Correlation for RGB-T Tracking." TCSVT (2024). [paper]

  • AFter: Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo.
    "AFter: Attention-based Fusion Router for RGBT Tracking." ArXiv (2024). [paper] [code]

  • CSTNet: Yunfeng Li, Bo Wang, Ye Li, Zhiwen Yu, Liang Wang.
    "Transformer-based RGB-T Tracking with Channel and Spatial Feature Fusion." ArXiv (2024). [paper] [code]

2023

  • TBSI: Hui, Tianrui and Xun, Zizheng and Peng, Fengguang and Huang, Junshi and Wei, Xiaoming and Wei, Xiaolin and Dai, Jiao and Han, Jizhong and Liu, Si.
    "Bridging Search Region Interaction with Template for RGB-T Tracking." CVPR (2023). [paper] [code]

  • DFNet: Jingchao Peng , Haitao Zhao , and Zhengwei Hu.
    "Dynamic Fusion Network for RGBT Tracking." TITS (2023). [paper] [code]

  • CMD: Zhang, Tianlu and Guo, Hongyuan and Jiao, Qiang and Zhang, Qiang and Han, Jungong.
    "Efficient RGB-T Tracking via Cross-Modality Distillation." CVPR (2023). [paper]

  • DFAT: Zhangyong Tang, Tianyang Xu, Hui Li, Xiao-Jun Wu, XueFeng Zhu, Josef Kittler.
    "Exploring fusion strategies for accurate RGBT visual object tracking." Information Fusion (2023). [paper] [code]

  • QAT: Lei Liu, Chenglong Li, Yun Xiao, Jin Tang.
    "Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance." ACM MM (2023). [paper]

  • GuideFuse: Zhang, Zeyang and Li, Hui and Xu, Tianyang and Wu, Xiao-Jun and Fu, Yu.
    "GuideFuse: A Novel Guided Auto-Encoder Fusion Network for Infrared and Visible Images." TIM (2023). [paper]

  • MPLT: Yang Luo, Xiqing Guo, Hui Feng, Lei Ao.
    "RGB-T Tracking via Multi-Modal Mutual Prompt Learning." ArXiv (2023). [paper] [code]

2022

  • HMFT: Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan.
    "Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline." CVPR (2022). [paper] [code]

  • MFGNet: Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, Feng Wu.
    "MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking." TMM (2022). [paper] [code]

  • MBAFNet: Li, Yadong and Lai, Huicheng and Wang, Liejun and Jia, Zhenhong.
    "Multibranch Adaptive Fusion Network for RGBT Tracking." IEEE Sensors Journal (2022). [paper]

  • AGMINet: Mei, Jiatian and Liu, Yanyu and Wang, Changcheng and Zhou, Dongming and Nie, Rencan and Cao, Jinde.
    "Asymmetric Global–Local Mutual Integration Network for RGBT Tracking." TIM (2022). [paper]

  • APFNet: Yun Xiao, Mengmeng Yang, Chenglong Li, Lei Liu, Jin Tang.
    "Attribute-Based Progressive Fusion Network for RGBT Tracking." AAAI (2022). [paper] [code]

  • DMCNet: Lu, Andong and Qian, Cun and Li, Chenglong and Tang, Jin and Wang, Liang.
    "Duality-Gated Mutual Condition Network for RGBT Tracking." TNNLS (2022). [paper]

  • TFNet: Zhu, Yabin and Li, Chenglong and Tang, Jin and Luo, Bin and Wang, Liang.
    "RGBT Tracking by Trident Fusion Network." TCSVT (2022). [paper]

  • Mingzheng Feng, Jianbo Su .
    "Learning reliable modal weight with transformer for robust RGBT tracking." KBS (2022). [paper]

2021

  • JMMAC: Zhang, Pengyu and Zhao, Jie and Bo, Chunjuan and Wang, Dong and Lu, Huchuan and Yang, Xiaoyun.
    "Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking." TIP (2021). [paper] [code]

  • ADRNet: Pengyu Zhang, Dong Wang, Huchuan Lu, Xiaoyun Yang.
    "Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking." IJCV (2021). [paper] [code]

  • SiamCDA: Zhang, Tianlu and Liu, Xueru and Zhang, Qiang and Han, Jungong.
    "SiamCDA: Complementarity-and distractor-aware RGB-T tracking based on Siamese network." TCSVT (2021). [paper] [code]

  • Wang, Yong and Wei, Xian and Tang, Xuan and Shen, Hao and Zhang, Huanlong.
    "Adaptive Fusion CNN Features for RGBT Object Tracking." TITS (2021). [paper]

  • M5L: Zhengzheng Tu, Chun Lin, Chenglong Li, Jin Tang, Bin Luo.
    "M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking." TIP (2021). [paper]

  • CBPNet: Qin Xu, Yiming Mei, Jinpei Liu, and Chenglong Li.
    "Multimodal Cross-Layer Bilinear Pooling for RGBT Tracking." TMM (2021). [paper]

  • MANet++: Andong Lu, Chenglong Li, Yuqing Yan, Jin Tang, Bin Luo.
    "RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss." TIP (2021). [paper]

  • CMR: Li, Chenglong and Xiang, Zhiqiang and Tang, Jin and Luo, Bin and Wang, Futian.
    "RGBT Tracking via Noise-Robust Cross-Modal Ranking." TNNLS (2021). [paper]

  • GCMP: Rui Yang, Xiao Wang, Chenglong Li, Jinmin Hu, Jin Tang.
    "RGBT tracking via cross-modality message passing." Neurocomputing (2021). [paper]

  • HDINet: Mei, Jiatian and Zhou, Dongming and Cao, Jinde and Nie, Rencan and Guo, Yanbu.
    "HDINet: Hierarchical Dual-Sensor Interaction Network for RGBT Tracking." IEEE Sensors Journal (2021). [paper]

2020

  • CMPP: Chaoqun Wang, Chunyan Xu, Zhen Cui, Ling Zhou, Tong Zhang, Xiaoya Zhang, Jian Yang.
    "Cross-Modal Pattern-Propagation for RGB-T Tracking."CVPR (2020). [paper]

  • CAT: Chenglong Li, Lei Liu, Andong Lu, Qing Ji, Jin Tang.
    "Challenge-Aware RGBT Tracking." ECCV (2020). [paper]

  • FANet: Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang .
    "FANet: Quality-Aware Feature Aggregation Network for Robust RGB-T Tracking." TIV (2020). [paper]

2019

  • mfDiMP: Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, Fahad Shahbaz Khan.
    "Multi-Modal Fusion for End-to-End RGB-T Tracking." ICCVW (2019). [paper] [code]

  • DAPNet: Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang, Xiao Wang.
    "Dense Feature Aggregation and Pruning for RGBT Tracking." ACM MM (2019). [paper]

  • DAFNet: Yuan Gao, Chenglong Li, Yabin Zhu, Jin Tang, Tao He, Futian Wang.
    "Deep Adaptive Fusion Network for High Performance RGBT Tracking." ICCVW (2019). [paper] [code]

  • MANet: Chenglong Li, Andong Lu, Aihua Zheng, Zhengzheng Tu, Jin Tang.
    "Multi-Adapter RGBT Tracking." ICCV (2019). [paper] [code]

Miscellaneous

Datasets

Dataset Pub. & Date WebSite Introduction
WebUAV-3M TPAMI-2023 WebUAV-3M 4500 videos, 3.3 million frames, UAV tracking, Vision-language-audio
UniMod1K IJCV-2024 UniMod1K 1050 video pairs, 2.5 million frames, Vision-depth-language

Papers

2024

  • XTrack: Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte.
    "Towards a Generalist and Blind RGB-X Tracker." ArXiv (2024). [paper] [code]

  • OneTracker: Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang.
    "OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning." CVPR (2024). [paper]

  • SDSTrack: Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu.
    "SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking." CVPR (2024). [paper] [code]

  • Un-Track: Zongwei Wu, Jilai Zheng, Xiangxuan Ren, Florin-Alexandru Vasluianu, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte.
    "Single-Model and Any-Modality for Video Object Tracking." CVPR (2024). [paper] [code]

  • ELTrack: Alansari, Mohamad and Alnuaimi, Khaled and Alansari, Sara and Werghi, Naoufel and Javed, Sajid.
    "ELTrack: Correlating Events and Language for Visual Tracking." ArXiv (2024). [paper] [code]

  • KSTrack: He, Yuhang and Ma, Zhiheng and Wei, Xing and Gong, Yihong.
    "Knowledge Synergy Learning for Multi-Modal Tracking." TCSVT (2024). [paper]

  • SeqTrackv2: Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu.
    "Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking." ArXiv (2024). [paper] [code]

2023

  • ViPT: Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu.
    "Visual Prompt Multi-Modal Tracking." CVPR (2023). [paper] [code]

2022

  • ProTrack: Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song.
    "Prompting for Multi-Modal Tracking." ACM MM (2022). [paper]

Others

2024

  • SCANet: Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, Ye Li.
    "RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker." ArXiv (2024). [paper] [code]

Awesome Repositories for MMOT

License

This project is released under the MIT license. Please see the LICENSE file for more information.

About

A personal investigative project to track the latest progress in the field of multi-modal object tracking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published