You Only Segment Once: Towards Real-Time Panoptic Segmentation |
|
|
|
IS-GGT: Iterative Scene Graph Generation with Generative Transformers |
|
|
|
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness |
|
|
➖ |
Panoptic Video Scene Graph Generation |
|
|
|
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud |
➖ |
|
|
JacobiNeRF: NeRF Shaping with Mutual Information Gradients |
|
|
|
Learning Geometric-Aware Properties in 2D Representation using Lightweight CAD Models, or Zero Real 3D Pairs |
|
|
|
Learning and Aggregating Lane Graphs for Urban Automated Driving |
|
|
|
MIME: Human-Aware 3D Scene Generation |
|
|
➖ |
Connecting the Dots: Floorplan Reconstruction using Two-Level Queries |
|
|
|
NeRF-RPN: A General Framework for Object Detection in NeRFs |
|
|
|
Relational Context Learning for Human-Object Interaction Detection |
|
|
|
Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion |
|
|
|
Token Contrast for Weakly-Supervised Semantic Segmentation |
|
|
➖ |
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency |
|
|
|
Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation |
|
|
|
CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP |
|
|
|
Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline |
|
|
➖ |
Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting |
|
|
|
Indiscernible Object Counting in Underwater Scenes |
|
|
|
Long Range Pooling for 3D Large-Scale Scene Understanding |
|
|
➖ |
Delivering Arbitrary-Modal Semantic Segmentation |
|
|
|
Images Speak in Images: A Generalist Painter for In-Context Visual Learning |
|
|
➖ |
SCPNet: Semantic Scene Completion on Point Cloud |
|
|
|
Content-Aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers |
|
|
|
OpenScene: 3D Scene Understanding with Open Vocabularies |
|
|
|
Devil's on the Edges: Selective Quad Attention for Scene Graph Generation |
|
|
|
Delving into Shape-Aware Zero-Shot Semantic Segmentation |
|
|
|
Category Query Learning for Human-Object Interaction Classification |
|
|
|
Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision |
|
|
|
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction |
➖ |
|
➖ |
SCOOP: Self-Supervised Correspondence and Optimization-based Scene Flow |
|
|
|
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences |
|
|
|
PanelNet: Understanding 360 Indoor Environment via Panel Representation |
➖ |
|
|
Perspective Fields for Single Image Camera Calibration |
|
|
|
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework |
➖ |
|
➖ |
Fast Contextual Scene Graph Generation with Unbiased Context Augmentation |
|
|
➖ |
Diffusion-based Generation, Optimization, and Planning in 3D Scenes |
|
|
|
TopNet: Transformer-based Object Placement Network for Image Compositing |
➖ |
|
|
Computational Flash Photography through Intrinsics |
|
|
|
Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task using Artificial Neural Networks |
➖ |
|
➖ |
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting |
|
|
➖ |
LEGO-Net: Learning Regular Rearrangements of Objects in Rooms |
|
|
|
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation |
|
|
|
Weakly-Supervised Domain Adaptive Semantic Segmentation with Prototypical Contrastive Learning |
|
|
|
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images |
|
|
|
Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields |
|
|
|
TempSAL - Uncovering Temporal Information for Deep Saliency Prediction |
|
|
|
Probabilistic Debiasing of Scene Graphs |
|
|
|
Towards Unified Scene Text Spotting based on Sequence Generation |
|
|
➖ |
Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph using Pre-trained Visual-Semantic Space |
|
|
|
Modular Memorability: Tiered Representations for Video Memorability Prediction |
|
|
|
Where we are and what we're Looking at: Query based Worldwide Image Geo-Localization using Hierarchies and Scenes |
|
|
|
HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions |
|
|
|