R2Former: Unified Retrieval and Reranking Transformer for Place Recognition |
|
|
|
Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations |
|
|
|
StructVPR: Distill Structural Knowledge With Weighting Samples for Visual Place Recognition |
➖ |
|
|
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining |
|
|
|
One-to-Few Label Assignment for End-to-End Dense Detection |
|
|
|
Where Is My Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization |
|
|
|
Semi-DETR: Semi-Supervised Object Detection With Detection Transformers |
|
|
➖ |
Universal Instance Perception As Object Discovery and Retrieval |
|
|
➖ |
CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection |
➖ |
|
|
Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection |
|
|
➖ |
FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection |
|
|
|
Box-Level Active Detection |
|
|
|
Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking |
|
|
➖ |
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection |
|
|
|
Aligning Bag of Regions for Open-Vocabulary Object Detection |
|
|
|
Asymmetric Feature Fusion for Image Retrieval |
➖ |
|
➖ |
3D Video Object Detection With Learnable Object-Centric Global Optimization |
|
|
|
Enhanced Training of Query-Based Object Detection via Selective Query Recollection |
|
|
|
Dense Distinct Query for End-to-End Object Detection |
|
|
|
On-the-Fly Category Discovery |
|
|
|
ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification |
➖ |
|
➖ |
Q-DETR: An Efficient Low-Bit Quantized Detection Transformer |
|
|
➖ |
SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency |
|
|
|
An Erudite Fine-Grained Visual Classification Model |
|
|
|
Self-Supervised Implicit Glyph Attention for Text Recognition |
|
|
➖ |
Multi-View Adversarial Discriminator: Mine the Non-Causal Factors for Object Detection in Unseen Domains |
|
|
|
HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization |
|
|
|
DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets |
|
|
|
Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning |
|
|
➖ |
Fake It Till You Make It: Learning Transferable Representations From Synthetic ImageNet Clones |
|
|
|
FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures |
➖ |
|
➖ |
Revisiting Self-Similarity: Structural Embedding for Image Retrieval |
|
|
|
Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition |
|
|
➖ |
MixTeacher: Mining Promising Labels With Mixed Scale Teacher for Semi-Supervised Object Detection |
|
|
|
Learning Attention As Disentangler for Compositional Zero-Shot Learning |
|
|
|
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration |
|
|
➖ |
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection |
|
|
|
SOOD: Towards Semi-Supervised Oriented Object Detection |
|
|
|
Bias-Eliminating Augmentation Learning for Debiased Federated Learning |
➖ |
|
➖ |
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors |
|
|
|
AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection |
|
|
|
CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching |
|
|
|
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection |
|
|
|
Disentangled Representation Learning for Unsupervised Neural Quantization |
➖ |
|
➖ |
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors |
|
|
➖ |
Virtual Sparse Convolution for Multimodal 3D Object Detection |
|
|
|
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompted Reconstruction for Person Re-Identification |
|
|
|
Adaptive Sparse Pairwise Loss for Object Re-Identification |
|
|
➖ |
Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph |
|
|
|
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning |
|
|
➖ |
Vector Quantization With Self-Attention for Quality-Independent Representation Learning |
|
|
|
Siamese Image Modeling for Self-Supervised Vision Representation Learning |
|
|
|
FCC: Feature Clusters Compression for Long-Tailed Visual Recognition |
|
|
|
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information |
|
|
|
Soft Augmentation for Image Classification |
|
|
➖ |
Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution |
|
|
➖ |
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning With Multimodal Models |
|
|
➖ |
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning |
|
|
|
Glocal Energy-Based Learning for Few-Shot Open-Set Recognition |
|
|
|
Improving Image Recognition by Retrieving From Web-Scale Image-Text Data |
➖ |
|
➖ |
Deep Factorized Metric Learning |
|
|
|
Learning To Detect and Segment for Open Vocabulary Object Detection |
➖ |
|
➖ |
ConQueR: Query Contrast Voxel-DETR for 3D Object Detection |
|
|
|
Photo Pre-Training, but for Sketch |
|
|
|
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions |
|
|
|
Detecting Everything in the Open World: Towards Universal Object Detection |
|
|
|
Twin Contrastive Learning With Noisy Labels |
|
|
➖ |
Feature Aggregated Queries for Transformer-Based Video Object Detectors |
|
|
|
Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection |
|
|
|
Deep Hashing With Minimal-Distance-Separated Hash Centers |
➖ |
|
|
Knowledge Combination To Learn Rotated Detection Without Rotated Annotation |
|
|
|
Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification |
|
|
|
Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder |
➖ |
|
|
2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection |
|
|
➖ |
LINe: Out-of-Distribution Detection by Leveraging Important Neurons |
|
|
|
Progressive Transformation Learning for Leveraging Virtual Images in Training |
|
|
|
Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection |
|
|
|
Decoupling MaxLogit for Out-of-Distribution Detection |
|
|
|
Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection |
|
|
➖ |
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding |
➖ |
|
|
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision |
➖ |
|
|
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers |
➖ |
|
➖ |
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining |
➖ |
|
|
Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision |
|
|
|
Generalized UAV Object Detection via Frequency Domain Disentanglement |
➖ |
|
|
Deep Frequency Filtering for Domain Generalization |
➖ |
|
|
Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images |
|
|
|
Improved Test-Time Adaptation for Domain Generalization |
|
|
➖ |
Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation |
|
|
|
Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models |
|
|
|
VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision |
|
|
|
DETRs With Hybrid Matching |
|
|
|
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection |
|
|
|
Clothing-Change Feature Augmentation for Person Re-Identification |
|
|
|
Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis |
|
|
|
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks |
|
|
|
Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection |
➖ |
|
➖ |
DynamicDet: A Unified Dynamic Architecture for Object Detection |
|
|
|
Switchable Representation Learning Framework With Self-Compatibility |
➖ |
|
|
DATE: Domain Adaptive Product Seeker for E-Commerce |
|
|
➖ |
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery |
|
|
|
Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies |
➖ |
|
➖ |
OvarNet: Towards Open-Vocabulary Object Attribute Recognition |
|
|
|
HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models |
|
|
|
Learning From Noisy Labels With Decoupled Meta Label Purifier |
|
|
➖ |
A Light Touch Approach to Teaching Transformers Multi-View Geometry |
➖ |
|
|
OpenMix: Exploring Outlier Samples for Misclassification Detection |
|
|
|
Revisiting Reverse Distillation for Anomaly Detection |
|
|
|
PROB: Probabilistic Objectness for Open World Object Detection |
|
|
|
Equiangular Basis Vectors |
|
|
➖ |
Weakly Supervised Posture Mining for Fine-Grained Classification |
|
|
➖ |
An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Group Activity |
➖ |
|
➖ |
Weak-Shot Object Detection Through Mutual Knowledge Transfer |
➖ |
|
➖ |
Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style |
|
|
|
Exploring Structured Semantic Prior for Multi Label Recognition With Incomplete Labels |
|
|
|
Learning Partial Correlation Based Deep Visual Representation for Image Classification |
|
|
|
Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval |
|
|
|
PHA: Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification |
|
|
➖ |
Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects |
|
|
|
BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation |
|
|
|
Annealing-Based Label-Transfer Learning for Open World Object Detection |
|
|
|
Diversity-Measurable Anomaly Detection |
|
|
|
Recurrent Vision Transformers for Object Detection With Event Cameras |
|
|
|
AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers |
|
|
|
Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate |
➖ |
|
|
Contrastive Mean Teacher for Domain Adaptive Object Detectors |
|
|
|
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification |
|
|
|
PartMix: Regularization Strategy To Learn Part Discovery for Visible-Infrared Person Re-Identification |
➖ |
|
➖ |
BiasAdv: Bias-Adversarial Augmentation for Model Debiasing |
➖ |
|
|
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection |
|
|
➖ |
Robust 3D Shape Classification via Non-Local Graph Attention Network |
➖ |
|
|
Two-Way Multi-Label Loss |
|
|
|
Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection |
|
|
|
Object Detection With Self-Supervised Scene Adaptation |
|
|
|
Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision |
|
|
|
Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection |
➖ |
|
➖ |
Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants With No False Negatives and No False Positives |
➖ |
|
➖ |
Deep Semi-Supervised Metric Learning With Mixed Label Propagation |
➖ |
|
➖ |
Fine-Grained Classification With Noisy Labels |
➖ |
|
|