This repository containes my paper reading notes on machine learning and deep learning.
This repository contains my paper reading notes on AI and computer vision. It is inspired by Denny Britz and Daniel Takeshi. A minimalistic webpage generated with Github io can be found here.
If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into this list of papers. I did so (see my notes) and it served me well.
Here is a list of trustworthy sources of papers in case I ran out of papers to read.
I regularly update my blog in Toward Data Science.
- BEV Perception in Mass Production Autonomous Driving
- Challenges of Mass Production Autonomous Driving in China
- Vision-centric Semantic Occupancy Prediction for Autonomous Driving (related paper notes)
- Drivable Space in Autonomous Driving — The Industry
- Drivable Space in Autonomous Driving — The Academia
- Drivable Space in Autonomous Driving - The Concept
- Monocular BEV Perception with Transformers in Autonomous Driving (related paper notes)
- Illustrated Differences between MLP and Transformers for Tensor Reshaping in Deep Learning
- Monocular 3D Lane Line Detection in Autonomous Driving (related paper notes)
- Deep-Learning based Object detection in Crowded Scenes (related paper notes)
- Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving (related paper notes)
- Deep Learning in Mapping for Autonomous Driving
- Monocular Dynamic Object SLAM in Autonomous Driving
- Monocular 3D Object Detection in Autonomous Driving — A Review
- Self-supervised Keypoint Learning — A Review
- Single Stage Instance Segmentation — A Review
- Self-paced Multitask Learning — A Review
- Convolutional Neural Networks with Heterogeneous Metadata
- Lifting 2D object detection to 3D in autonomous driving
- Multimodal Regression
- Paper Reading in 2019
- ChatGPT for Robotics: Design Principles and Model Abilities [Notes] [Microsoft, LLM for robotics]
- RoboVQA: Multimodal Long-Horizon Reasoning for Robotics [Notes] [Google DeepMind, LLM for robotics]
- ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application [Microsoft Robotics]
- GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration [Notes] [LLM for robotics, Microsoft Robotics]
- LLM-Brain: LLM as A Robotic Brain: Unifying Egocentric Memory and Control [Notes]
- Language to Rewards for Robotic Skill Synthesis
- Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
- LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent [UM]
- LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [Sergey Levine]
- A Survey of Embodied AI: From Simulators to Research Tasks IEEE TETCI 2021
- Habitat Challenge 2021
- Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
- DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment [Jianyu Chen]
- The Power of Scale for Parameter-Efficient Prompt Tuning EMNLP 2021
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents ICML 2022
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models ICRA 2023
- CLIPort: What and Where Pathways for Robotic Manipulation CoRL 2021
- Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation CoRL 2022
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale NeurIPS 2022 [LLM Quant]
- AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration [Song Han, LLM Quant]
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- CoDi: Any-to-Any Generation via Composable Diffusion NeurIPS 2023
- What if a Vacuum Robot has an Arm? UR 2023
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- GPT in 60 Lines of NumPy
- Speeding up the GPT - KV cache
- LLM Parameter Counting
- Transformer Inference Arithmetic
- ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation NeurIPS 2021 [Junnan Li]
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation ICML 2022 [Junnan Li]
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [Junnan Li]
- MOO: Open-World Object Manipulation using Pre-trained Vision-Language Models [Google Robotics, end-to-end visuomotor]
- VC-1: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
- CLIPort: What and Where Pathways for Robotic Manipulation CoRL 2021 [Nvidia, end-to-end visuomotor]
- GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers ICLR 2023
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models ICML 2023 [Song Han, LLM Quant]
- SAPIEN: A SimulAted Part-based Interactive ENvironment CVPR 2020
- FiLM: Visual Reasoning with a General Conditioning Layer AAAI 2018
- TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? NeurIPS 2021
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge NeurIPS 2022 [Outstanding paper award]
- QLoRA: Efficient Finetuning of Quantized LLMs
- OVO: Open-Vocabulary Occupancy
- Code Llama: Open Foundation Models for Code
- Chinchilla: Training Compute-Optimal Large Language Models [DeepMind]
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
- Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
- VIMA: General Robot Manipulation with Multimodal Prompts
- An Attention Free Transformer [Apple]
- PDDL Planning with Pretrained Large Language Models [MIT, Leslie Kaelbling]
- Task and Motion Planning with Large Language Models for Object Rearrangement IROS 2023
- RetNet: Retentive Network: A Successor to Transformer for Large Language Models [Notes] [MSRA]
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [Notes] ICML 2020 [Linear attention]
- AFT: An Attention Free Transformer [Notes] [Apple]
- RT-1: Robotics Transformer for Real-World Control at Scale [Notes] [DeepMind]
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [Notes] [DeepMind, end-to-end visuomotor]
- RWKV: Reinventing RNNs for the Transformer Era [Notes]
- MILE: Model-Based Imitation Learning for Urban Driving [Notes] NeurIPS 2022 [Alex Kendall]
- PaLM-E: An embodied multimodal language model [Notes] [Google Robotics]
- VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models [Notes] [Feifei Li]
- CaP: Code as Policies: Language Model Programs for Embodied Control [Notes] [Project]
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models ICRA 2023
- TidyBot: Personalized Robot Assistance with Large Language Models [Notes] [Project]
- SayCan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Notes] [Project]
- End-to-end review by Shanghai AI Labs
- Pix2seq v2: A Unified Sequence Interface for Vision Tasks [Notes] NeurIPS 2022 [Geoffrey Hinton]
- 🦩 Flamingo: a Visual Language Model for Few-Shot Learning [Notes] NeurIPS 2022 [DeepMind]
- 😼 Gato: A Generalist Agent [Notes] TMLR 2022 [DeepMind]
- BC-SAC: Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios [Notes] NeurIPS 2022 [Waymo]
- MGAIL-AD: Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving [Notes] IROS 2022 [Waymo]
- SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [Notes] [Occupancy Network, Wei Yi, Jiwen Lu]
- Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving [Notes] [Occupancy Network, Zhao Hang]
- Occupancy Networks: Learning 3D Reconstruction in Function Space CVPR 2019 [Notes] [Andreas Geiger]
- OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction [Occupancy Network, PhiGent]
- Pix2seq: A Language Modeling Framework for Object Detection [Notes] ICLR 2022 [Geoffrey Hinton]
- VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks [Notes] [Jifeng Dai]
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [Notes]
- UniAD: Planning-oriented Autonomous Driving [Notes] [BEV, e2e, Hongyang Li]
- GPT-4 Technical Report [Notes] [OpenAI, GPT]
- OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception [Notes] [Occupancy Network, Jiwen Lu]
- VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion [Note] CVPR 2023 highlight [Occupancy Network, Nvidia]
- MonoScene: Monocular 3D Semantic Scene Completion CVPR 2022 [Notes] [Occupancy Network, single cam]
- CoReNet: Coherent 3D scene reconstruction from a single RGB image [Notes] ECCV 2020 oral
- Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning [Notes] [Epoch.ai industry report]
- Codex: Evaluating Large Language Models Trained on Code [Notes] [GPT, OpenAI]
- InstructGPT: Training language models to follow instructions with human feedback [Notes] [GPT, OpenAI]
- TPVFormer: Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [Notes] CVPR 2023 [Occupancy Network, Jiwen Lu]
- PPGeo: Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [Notes] ICLR 2023
- nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles [Notes]
- Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [Notes] [PJLab]
- ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries [Notes] [BEV, perception + prediction, Hang Zhao]
- MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction [Notes] [Horizon, BEVNet]
- StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving ICRA 2022
- MOTR: End-to-End Multiple-Object Tracking with Transformer ECCV 2022 [Megvii, MOT]
- Anchor DETR: Query Design for Transformer-Based Object Detection [Notes] AAAI 2022 [Megvii]
- HOME: Heatmap Output for future Motion Estimation [Notes] ITSC 2021 [behavior prediction, Huawei Paris]
- PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark [Notes] [BEVNet, lane line]
- VectorMapNet: End-to-end Vectorized HD Map Learning [Notes] [BEVNet, LLD, Hang Zhao]
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection [Notes] ECCV 2022 [BEVNet]
- PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images [Notes] [BEVNet, MegVii]
- M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation [Notes] [BEVNet, nvidia]
- BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection [Notes] [BEVNet, NuScenes SOTA, Megvii]
- CVT: Cross-view Transformers for real-time Map-view Semantic Segmentation [Notes] CVPR 2022 oral [UTAustin, Philipp]
- Wayformer: Motion Forecasting via Simple & Efficient Attention Networks [Notes] [Behavior prediction, Waymo]
- BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection [Notes] [BEVNet]
- BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving [Notes] [Jiwen Lu, BEVNet, perception + prediction]
- BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [Notes] [BEVNet, Han Song]
- BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers [Notes] ECCV 2022 [BEVNet, Hongyang Li, Jifeng Dai]
- TNT: Target-driveN Trajectory Prediction [Notes] CoRL 2020 [prediction, Waymo, Hang Zhao]
- DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets [Notes] ICCV 2021 [prediction, Waymo, 1st place winner WOMD]
- Manydepth: The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth [Notes] CVPR 2021 [monodepth, Niantic]
- DEKR: Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [Notes] CVPR 2021
- BN-FFN-BN: Leveraging Batch Normalization for Vision Transformers [Notes] ICCVW 2021 [BN, transformers]
- PowerNorm: Rethinking Batch Normalization in Transformers [Notes] ICML 2020 [BN, transformers]
- MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction [Notes] ICRA 2022 [Waymo, behavior prediction]
- BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View [Notes]
- Translating Images into Maps [Notes] ICRA 2022 [BEVNet, transformers]
- DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [Notes] CoRL 2021 [BEVNet, transformers]
- Robust-CVD: Robust Consistent Video Depth Estimation CVPR 2021 oral [website]
- MAE: Masked Autoencoders Are Scalable Vision Learners [Notes] [Kaiming He, unsupervised learning]
- SimMIM: A Simple Framework for Masked Image Modeling [Notes] [MSRA, unsupervised learning, MAE]
- iBOT: Image BERT Pre-Training with Online Tokenizer
- STSU: Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images [Notes] ICCV 2021 [BEV feat stitching, Luc Van Gool]
- PanopticBEV: Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images [Notes] RAL 2022 [BEVNet, vertical/horizontal features]
- NEAT: Neural Attention Fields for End-to-End Autonomous Driving [Notes] ICCV 2021 [supplementary] [BEVNet]
- DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? [Notes] ICCV 2021 [mono3D, Toyota]
- EfficientDet: Scalable and Efficient Object Detection [Notes] CVPR 2020 [BiFPN, Tesla AI day]
- PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [Notes] CVPR 2020 [Uber ATG]
- MP3: A Unified Model to Map, Perceive, Predict and Plan [Notes] CVPR 2021 [Uber, planning]
- BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning [Notes] ICCV 2021 [BEVNet, surveillance]
- LiDAR R-CNN: An Efficient and Universal 3D Object Detector [Notes] CVPR 2021 [TuSimple, Naiyan Wang]
- Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches [Notes] [corner cases]
- Systematization of Corner Cases for Visual Perception in Automated Driving [Notes] IV 2020 [corner cases]
- An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving [Notes] IV 2021 [corner cases]
- PYVA: Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation [Notes] CVPR 2021 [Supplementary] [BEVNet]
- YOLOF: You Only Look One-level Feature [Notes] CVPR 2021 [megvii]
- Perceiving Humans: from Monocular 3D Localization to Social Distancing [Notes] TITS 2021 [monoloco++]
- PifPaf: Composite Fields for Human Pose Estimation CVPR 2019
- Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images [BEVNet]
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation CVPR 2021
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving CVPR 2021
- Conditional DETR for Fast Training Convergence
- Probabilistic and Geometric Depth: Detecting Objects in Perspective CoRL 2021
- EgoNet: Exploring Intermediate Representation for Monocular Vehicle Pose Estimation [Notes] CVPR 2021 [mono3D]
- MonoEF: Monocular 3D Object Detection: An Extrinsic Parameter Free Approach [Notes] CVPR 2021 [mono3D]
- GAC: Ground-aware Monocular 3D Object Detection for Autonomous Driving [Notes] RAL 2021 [mono3D]
- FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [Notes] NeurIPS 2020 [mono3D, senseTime]
- GUPNet: Geometry Uncertainty Projection Network for Monocular 3D Object Detection [Notes] ICCV 2021 [mono3D, Wanli Ouyang]
- DARTS: Differentiable Architecture Search [Notes] ICLR 2019 [VGG author]
- FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search [Notes] CVPR 20219 [DARTS]
- FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions CVPR 2020
- FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining CVPR 2021
- Perceiver: General Perception with Iterative Attention [Notes] ICML 2021 [transformers, multimodal]
- Perceiver IO: A General Architecture for Structured Inputs & Outputs [Notes]
- PillarMotion: Self-Supervised Pillar Motion Learning for Autonomous Driving [Notes] CVPR 2021 [Qcraft, Alan Yuille]
- SimTrack: Exploring Simple 3D Multi-Object Tracking for Autonomous Driving [Notes] ICCV 2019 [QCraft, Alan Yuille]
- HDMapNet: An Online HD Map Construction and Evaluation Framework [Notes] CVPR 2021 workshop [youtube video only, Li Auto]
- FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras [Notes] ICCV 2021 [BEVNet, perception + prediction]
- Baidu's CNN seg [Notes]
- Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation [Notes] CVPR 2021 [megvii]
- CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark CVPR 2019
- The Overlooked Elephant of Object Detection: Open Set WACV 2021
- Class-Agnostic Object Detection WACV 2021
- OWOD: Towards Open World Object Detection [Notes] CVPR 2021 oral
- FsDet: Frustratingly Simple Few-Shot Object Detection ICML 2020
- MonoFlex: Objects are Different: Flexible Monocular 3D Object Detection [Notes] CVPR 2021 [mono3D, Jiwen Lu, cropped]
- monoDLE: Delving into Localization Errors for Monocular 3D Object Detection [Notes] CVPR 2021 [mono3D]
- Exploring 2D Data Augmentation for 3D Monocular Object Detection
- OCM3D: Object-Centric Monocular 3D Object Detection [mono3D]
- FSM: Full Surround Monodepth from Multiple Cameras [Notes] ICRA 2021 [monodepth, Xnet]
- CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection [Notes] CVPR 2021 oral [mono3D, BEVNet]
- DSNT: Numerical Coordinate Regression with Convolutional Neural Networks [Notes] [differentiable spatial to numerical transform]
- Soft-Argmax: Human pose regression by combining indirect part detection and contextual information
- INSTA-YOLO: Real-Time Instance Segmentation [Notes] ICML workshop 2020 [single stage instance segmentation]
- CenterNet2: Probabilistic two-stage detection [Notes] [CenterNet, two-stage]
- Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection [Notes] [NMS]
- BoxInst: High-Performance Instance Segmentation with Box Annotations [Notes] CVPR 2021 [Chunhua Shen, Tian Zhi]
- 3DSSD: Point-based 3D Single Stage Object Detector [Notes] CVPR 2020
- RepVGG: Making VGG-style ConvNets Great Again [Notes] [Megvii, Xiangyu Zhang, ACNet]
- ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [Notes] ICCV 2019
- BEV-Feat-Stitching: Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera [Notes] [BEVNet, mono3D, Luc Van Gool]
- PSS: Object Detection Made Simpler by Eliminating Heuristic NMS [Notes] [Transformer, DETR]
- DeFCN: End-to-End Object Detection with Fully Convolutional Network [Notes] [Transformer, DETR]
- OneNet: End-to-End One-Stage Object Detection by Classification Cost [Notes] [Transformer, DETR]
- Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles [Notes] ICRA 2011 [traffic light, Sebastian Thrun]
- Towards lifelong feature-based mapping in semi-static environments [Notes] ICRA 2016
- How to Keep HD Maps for Automated Driving Up To Date [Notes] ICRA 2020 [BMW]
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection [Notes] CVPR 2021 [focal loss]
- Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning [Notes] CVPR 2018 workshop
- Centroid Voting: Object-Aware Centroid Voting for Monocular 3D Object Detection [Notes] IROS 2020 [mono3D, geometry + appearance = distance]
- Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras [Notes] [GM Israel, mono3D]
- DeepPS: Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset TIP 2018 [Parking slot detection, PS2.0 dataset]
- PSDet: Efficient and Universal Parking Slot Detection [Notes] IV 2020 [Zongmu, Parking slot detection]
- PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [Notes] ASPLOS 2020 [pruning]
- Scaled-YOLOv4: Scaling Cross Stage Partial Network [Notes] [yolo]
- Yolov5 by Ultralytics [Notes] [yolo, spatial2channel]
- PP-YOLO: An Effective and Efficient Implementation of Object Detector [Notes] [yolo, paddle-paddle, baidu]
- PointPainting: Sequential Fusion for 3D Object Detection [Notes] [nuscenece]
- MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps [Notes] CVPR 2020 [Unseen moving objects, BEV]
- Locating Objects Without Bounding Boxes [Notes] CVPR 2019 [weighted Haussdorf distance, NMS-free]
- TSP: Rethinking Transformer-based Set Prediction for Object Detection [Notes] ICCV 2021 [DETR, transformers, Kris Kitani]
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals [Notes] CVPR 2020 [DETR, Transformer]
- Unsupervised Monocular Depth Learning in Dynamic Scenes [Notes] CoRL 2020 [LearnK improved ver, Google]
- MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time [Notes] ICML 2020 [Mono3D, pairwise relationship]
- Argoverse: 3D Tracking and Forecasting with Rich Maps [Notes] CVPR 2019 [HD maps, dataset, CV lidar]
- The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes [Notes] ICRA 2019
- Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection CVPRW 2020 [dataset, Daimler, mono3D]
- NYC3DCars: A Dataset of 3D Vehicles in Geographic Context ICCV 2013
- Towards Fully Autonomous Driving: Systems and Algorithms IV 2011
- Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding [Notes] [mono3D, LID+DepJoint]
- ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection AAAI 2020 oral [mono3D]
- CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection [Notes] WACV 2021 [early fusion, camera, radar]
- 3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation [Notes] NeurIPS 2020 workshop [GM Israel, 3D LLD]
- LSTR: End-to-end Lane Shape Prediction with Transformers [Notes] WACV 2021 [LLD, transformers]
- PIXOR: Real-time 3D Object Detection from Point Clouds [Notes] CVPR 2018 (birds eye view)
- HDNET/PIXOR++: Exploiting HD Maps for 3D Object Detection [Notes] CoRL 2018
- CPNDet: Corner Proposal Network for Anchor-free, Two-stage Object Detection ECCV 2020 [anchor free, two stage]
- MVF: End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds [Notes] CoRL 2019 [Waymo, VoxelNet 1st author]
- Pillar-based Object Detection for Autonomous Driving [Notes] ECCV 2020
- Training-Time-Friendly Network for Real-Time Object Detection AAAI 2020 [anchor-free, fast training]
- Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies [Review of autonomous stack, Yu Huang]
- Dense Monocular Depth Estimation in Complex Dynamic Scenes CVPR 2016
- Probabilistic Future Prediction for Video Scene Understanding
- AB3D: A Baseline for 3D Multi-Object Tracking IROS 2020 [3D MOT]
- Spatial-Temporal Relation Networks for Multi-Object Tracking ICCV 2019 [MOT, feature location over time]
- Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking ICRA 2018 [MOT, IIT, 3D shape]
- ST-3D: Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking CVPR 2020 [Peilinag LI, author of VINS and S3DOT]
- Augment Your Batch: Improving Generalization Through Instance Repetition CVPR 2020
- RetinaTrack: Online Single Stage Joint Detection and Tracking CVPR 2020 [MOT]
- Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots
- Gradient Centralization: A New Optimization Technique for Deep Neural Networks ECCV 2020 oral
- Depth Completion via Deep Basis Fitting WACV 2020
- BTS: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation [monodepth, supervised]
- The Edge of Depth: Explicit Constraints between Segmentation and Depth CVPR 2020 [monodepth, Xiaoming Liu]
- On the Continuity of Rotation Representations in Neural Networks CVPR 2019 [rotational representation]
- VDO-SLAM: A Visual Dynamic Object-aware SLAM System IJRR 2020
- Dynamic SLAM: The Need For Speed
- Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction ECCV 2020
- Traffic Light Mapping and Detection [Notes] ICRA 2011 [traffic light, Google, Chris Urmson]
- Traffic light recognition exploiting map and localization at every stage [Notes] Expert Systems 2017 [traffic light, 鲜于明镐,徐在圭,郑浩奇]
- Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars [Notes] IJCNN 2019 [traffic light, Espirito Santo Brazil]
- TSM: Temporal Shift Module for Efficient Video Understanding [Notes] ICCV 2019 [Song Han, video, object detection]
- WOD: Waymo Dataset: Scalability in Perception for Autonomous Driving: Waymo Open Dataset [Notes] CVPR 2020
- Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection [Notes] NeurIPS 2020 [classification as regression]
- A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection NeurIPS 2020 spotlight
- Rethinking the Value of Labels for Improving Class-Imbalanced Learning NeurIPS 2020
- RepLoss: Repulsion Loss: Detecting Pedestrians in a Crowd [Notes] CVPR 2018 [crowd detection, Megvii]
- Adaptive NMS: Refining Pedestrian Detection in a Crowd [Notes] CVPR 2019 oral [crowd detection, NMS]
- AggLoss: Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd [Notes] ECCV 2018 [crowd detection]
- CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions [Notes] CVPR 2020 oral [crowd detection, Megvii, Earth mover's distance]
- R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing [Notes] CVPR 2020
- Double Anchor R-CNN for Human Detection in a Crowd [Notes] [head-body bundle]
- Review: AP vs MR
- SKU110K: Precise Detection in Densely Packed Scenes [Notes] CVPR 2019 [crowd detection, no occlusion]
- GossipNet: Learning non-maximum suppression CVPR 2017
- TLL: Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation ECCV 2018
- Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels GCPR 2020 [mono3D, Daniel Cremers, TUM]
- CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection [Notes] [mono3D, depth AE pretraining]
- Deformable DETR: Deformable Transformers for End-to-End Object Detection [Notes] ICLR 2021 [Jifeng Dai, DETR]
- ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Notes] ICLR 2021
- BYOL: Bootstrap your own latent: A new approach to self-supervised Learning [self-supervised]
- SDFLabel: Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors [Notes] CVPR 2020 oral [TRI, differentiable rendering]
- DensePose: Dense Human Pose Estimation In The Wild [Notes] CVPR 2018 oral [FAIR]
- NOCS: Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation CVPR 2019
- monoDR: Monocular Differentiable Rendering for Self-Supervised 3D Object Detection [Notes] ECCV 2020 [TRI, mono3D]
- Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D [Notes] ECCV 2020 [BEV-Net, Utoronto, Sanja Fidler]
- Implicit Latent Variable Model for Scene-Consistent Motion Forecasting ECCV 2020 [Uber ATG, Rachel Urtasun]
- FISHING Net: Future Inference of Semantic Heatmaps In Grids [Notes] CVPRW 2020 [BEV-Net, Mapping, Zoox]
- VPN: Cross-view Semantic Segmentation for Sensing Surroundings [Notes] RAL 2020 [Bolei Zhou, BEV-Net]
- VED: Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks [Notes] ICRA 2019 [BEV-Net]
- Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View [Notes] ITSC 2020 [BEV-Net]
- Learning to Look around Objects for Top-View Representations of Outdoor Scenes [Notes] ECCV 2018 [BEV-Net, UCSD, Manmohan Chandraker]
- A Parametric Top-View Representation of Complex Road Scenes CVPR 2019 [BEV-Net, UCSD, Manmohan Chandraker]
- FTM: Understanding Road Layout from Videos as a Whole CVPR 2020 [BEV-Net, UCSD, Manmohan Chandraker]
- KM3D-Net: Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training [Notes] RAL 2021 [RTM3D, Peixuan Li]
- InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving [Notes] IROS 2020 [motion segmentation]
- MPV-Nets: Monocular Plan View Networks for Autonomous Driving [Notes] IROS 2019 [BEV-Net]
- Class-Balanced Loss Based on Effective Number of Samples [Notes] CVPR 2019 [Focal loss authors]
- Geometric Pretraining for Monocular Depth Estimation [Notes] ICRA 2020
- Robust Traffic Light and Arrow Detection Using Digital Map with Spatial Prior Information for Automated Driving [Notes] Sensors 2020 [traffic light, 金沢]
- Feature-metric Loss for Self-supervised Learning of Depth and Egomotion [Notes] ECCV 2020 [feature-metric, local minima, monodepth]
- Depth-VO-Feat: Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction CVPR 2018 [feature-metric, monodepth]
- MonoResMatch: Learning monocular depth estimation infusing traditional stereo knowledge [Notes] CVPR 2019 [monodepth, local minima, cheap stereo GT]
- SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance [Notes] ECCV 2020 [Moving objects]
- Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding ECCV 2018 [dynamic objects, rigid and dynamic motion]
- Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding TPAMI 2018
- CC: Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation [Notes] CVPR 2019
- ObjMotionNet: Self-supervised Object Motion and Depth Estimation from Video [Notes] CVPRW 2020 [object motion prediction, velocity prediction]
- Instance-wise Depth and Motion Learning from Monocular Videos
- Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation
- Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues
- DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency ECCV 2018
- LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments [mapping]
- Road-SLAM: Road Marking based SLAM with Lane-level Accuracy [Notes] [HD mapping]
- AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot [Notes] IROS 2020 [Huawei, HD mapping, Tong Qin, VINS author, autonomous valet parking]
- AVP-SLAM-Late-Fusion: Mapping and Localization using Semantic Road Marking with Centimeter-level Accuracy in Indoor Parking Lots [Notes] ITSC 2019
- Lane markings-based relocalization on highway ITSC 2019
- DeepRoadMapper: Extracting Road Topology from Aerial Images [Notes] ICCV 2017 [Uber ATG, NOT HD maps]
- RoadTracer: Automatic Extraction of Road Networks from Aerial Images CVPR 2018 [NOT HD maps]
- PolyMapper: Topological Map Extraction From Overhead Images [Notes] ICCV 2019 [mapping, polygon, NOT HD maps]
- HRAN: Hierarchical Recurrent Attention Networks for Structured Online Maps [Notes] CVPR 2018 [HD mapping, highway, polyline loss, Chamfer distance]
- Deep Structured Crosswalk: End-to-End Deep Structured Models for Drawing Crosswalks [Notes] ECCV 2018
- DeepBoundaryExtractor: Convolutional Recurrent Network for Road Boundary Extraction [Notes] CVPR 2019 [HD mapping, boundary, polyline loss]
- DAGMapper: Learning to Map by Discovering Lane Topology [Notes] ICCV 2019 [HD mapping, highway, forks and merges, polyline loss]
- Sparse-HD-Maps: Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization [Notes] IROS 2019 oral [Uber ATG, metadata, mapping, localization]
- Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks IEEE TGRS 2018
- Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs Sensors 2020 [Tsinghua, 3D HD maps]
- PatchNet: Rethinking Pseudo-LiDAR Representation [Notes] ECCV 2020 [SenseTime, Wanli Ouyang]
- D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection [Notes] CVPR 2020 [mono3D]
- MfS: Learning Stereo from Single Images [Notes] ECCV 2020 [mono for stereo, learn stereo matching with mono]
- BorderDet: Border Feature for Dense Object Detection ECCV 2020 oral [Megvii]
- Scale-Aware Trident Networks for Object Detection ICCV 2019 [different heads for different scales]
- Learning Depth from Monocular Videos using Direct Methods
- Vid2Depth: Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints CVPR 2018 [Google]
- NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
- Supervising the new with the old: learning SFM from SFM [Notes] ECCV 2018
- Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera CVPR 2019 [multi-frame monodepth]
- Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [multi-frame monodepth, RNN]
- Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth [multi-frame monodepth, RNN]
- Exploiting temporal consistency for real-time video depth estimation ICCV 2019 [multi-frame monodepth, RNN, indoor]
- SfM-Net: Learning of Structure and Motion from Video [dynamic object, SfM]
- MB-Net: MergeBoxes for Real-Time 3D Vehicles Detection [Notes] IV 2018 [mono3D: Daimler]
- BS3D: Beyond Bounding Boxes: Using Bounding Shapes for Real-Time 3D Vehicle Detection from Monocular RGB Images [Notes] IV 2019 [mono3D, Daimler]
- 3D-GCK: Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometrically Constrained Keypoints in Real-Time [Notes] IV 2020 [[mono3D, Daimler]
- UR3D: Distance-Normalized Unified Representation for Monocular 3D Object Detection [Notes] ECCV 2020 [mono3D]
- DA-3Det: Monocular 3D Object Detection via Feature Domain Adaptation [Notes] ECCV 2020 [mono3D]
- RAR-Net: Reinforced Axial Refinement Network for Monocular 3D Object Detection [Notes] ECCV 2020 [mono3D]
- CenterTrack: Tracking Objects as Points [Notes] ECCV 2020 spotlight [camera based 3D MOD, MOT SOTA, CenterNet, video based object detection, Philipp Krähenbühl]
- CenterPoint: Center-based 3D Object Detection and Tracking [Notes] CVPR 2021 [lidar based 3D MOD, CenterNet]
- Tracktor: Tracking without bells and whistles [Notes] ICCV 2019 [Tracktor/Tracktor++, Laura Leal-Taixe@TUM]
- FairMOT: A Simple Baseline for Multi-Object Tracking [Notes]
- DeepMOT: A Differentiable Framework for Training Multiple Object Trackers [Notes] CVPR 2020 [trainable Hungarian, Laura Leal-Taixe@TUM]
- MPNTracker: Learning a Neural Solver for Multiple Object Tracking CVPR 2020 oral [trainable Hungarian, Laura Leal-Taixe@TUM]
- nuScenes: A multimodal dataset for autonomous driving [Notes] CVPR 2020 [dataset, point cloud, radar]
- CBGS: Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection [Notes] CVPRW 2019 [Megvii, lidar, WAD challenge winner]
- AFDet: Anchor Free One Stage 3D Object Detection and Competition solution [Notes] CVPRW 2020 [Horizon robotics, lidar, winning for Waymo challenge]
- Review of MOT and SOT [Notes]
- CrowdHuman: A Benchmark for Detecting Human in a Crowd [Notes] [megvii, pedestrian, dataset]
- WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild [Notes] TMM 2019 [dataset, pedestrian]
- Tsinghua-Daimler Cyclists: A New Benchmark for Vison-Based Cyclist Detection [Notes] IV 2016 [dataset, cyclist Detection]
- Specialized Cyclist Detection Dataset: Challenging Real-World Computer Vision Dataset for Cyclist Detection Using a Monocular RGB Camera [Notes] IV 2019 [Extention to KITTI]
- PointTrack: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation [Notes] ECCV 2020 oral [MOTS]
- PointTrack++ for Effective Online Multi-Object Tracking and Segmentation [Notes] CVPR 2020 workshop [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]
- SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth [Notes] ICCV 2019 [one-stage, instance segmentation]
- BA-Net: Dense Bundle Adjustment Networks [Notes] ICLR 2019 [Bundle adjustment, multi-frame monodepth, feature-metric]
- DeepSFM: Structure From Motion Via Deep Bundle Adjustment ECCV 2020 oral [multi-frame monodepth, indoor scene]
- CVD: Consistent Video Depth Estimation [Notes] SIGGRAPH 2020 [multi-frame monodepth, online finetune]
- DeepV2D: Video to Depth with Differentiable Structure from Motion [Notes] ICLR 2020 [multi-frame monodepth, Jia Deng]
- GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [Notes] CVPR 2018 [residual optical flow, monodepth, rigid and dynamic motion]
- GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera [Notes] ICCV 2019 [online finetune, rigid and dynamic motion]
- Depth Hints: Self-Supervised Monocular Depth Hints [Notes] ICCV 2019 [monodepth, local minima, cheap stereo GT]
- MonoUncertainty: On the uncertainty of self-supervised monocular depth estimation [Notes] CVPR 2020 [depth uncertainty]
- Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment [Notes] [Bundle adjustment, xmotors.ai, multi-frame monodepth]
- Kinematic 3D Object Detection in Monocular Video [Notes] ECCV 2020 [multi-frame mono3D, Xiaoming Liu]
- VelocityNet: Camera-based vehicle velocity estimation from monocular video [Notes] CVPR 2017 workshop [monocular velocity estimation, CVPR 2017 challenge winner]
- Vehicle Centric VelocityNet: End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera [Notes] [monocular velocity estimation, monocular distance, SOTA]
- LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain [Notes] IROS 2018 [lidar, mapping]
- PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction [Notes] ICCV 2019
- JAAD: Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior ICCV 2017
- Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs BMVC 2019
- Is the Pedestrian going to Cross? Answering by 2D Pose Estimation IV 2018
- Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation ITSC 2019 [skeleton, pedestrian, cyclist intention]
- Attentive Single-Tasking of Multiple Tasks CVPR 2019
- DETR: End-to-End Object Detection with Transformers [Notes] ECCV 2020 oral [FAIR]
- Transformer: Attention Is All You Need [Notes] NIPS 2017
- SpeedNet: Learning the Speediness in Videos [Notes] CVPR 2020 oral
- MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships [Notes] CVPR 2020 [Mono3D, pairwise relationship]
- SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation [Notes] CVPRW 2020 [Mono3D, Zongmu]
- Vehicle Re-ID for Surround-view Camera System [Notes] CVPRW 2020 [tireline, vehicle ReID, Zongmu]
- End-to-End Lane Marker Detection via Row-wise Classification [Notes] [Qualcomm Korea, LLD as cls]
- Reliable multilane detection and classification by utilizing CNN as a regression network ECCV 2018 [LLD as reg]
- SUPER: A Novel Lane Detection System [Notes]
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation ICCV 2019
- StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation BMVC 2015
- StixelNetV2: Real-time category-based and general obstacle detection for autonomous driving [Notes] ICCV 2017 [DS]
- Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [Notes] CVPR 2016 [channel-to-pixel]
- Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints [mono3D]
- Self-Mono-SF: Self-Supervised Monocular Scene Flow Estimation [Notes] CVPR 2020 oral [scene-flow, Stereo input]
- MEBOW: Monocular Estimation of Body Orientation In the Wild [Notes] CVPR 2020
- VG-NMS: Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes [Notes] NeurIPS 2019 workshop [Crowded scene, NMS, Daimler]
- WYSIWYG: What You See is What You Get: Exploiting Visibility for 3D Object Detection [Notes] CVPR 2020 oral [occupancy grid]
- Real-Time Panoptic Segmentation From Dense Detections [Notes] CVPR 2020 oral [bbox + semantic segmentation = panoptic segmentation, Toyota]
- Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving [Notes] CVPRW 2020 [efficient annotation]
- SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving [Notes] CVPR 2020 oral [Waymo, auto data generation, surfel]
- LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World [Notes] CVPR 2020 oral [Uber ATG, auto data generation, surfel]
- SuMa++: Efficient LiDAR-based Semantic SLAM IROS 2019 [semantic segmentation, lidar, SLAM]
- PON/PyrOccNet: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks [Notes] CVPR 2020 oral [BEV-Net, OFT]
- MonoLayout: Amodal scene layout from a single image [Notes] WACV 2020 [BEV-Net]
- BEV-Seg: Bird’s Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud [Notes] CVPR 2020 workshop [BEV-Net, Mapping]
- A Geometric Approach to Obtain a Bird's Eye View from an Image ICCVW 2019 [mapping, geometry, Andrew Zisserman]
- FrozenDepth: Learning the Depths of Moving People by Watching Frozen People [Notes] CVPR 2019 oral
- ORB-SLAM: a Versatile and Accurate Monocular SLAM System TRO 2015
- ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras TRO 2016
- CubeSLAM: Monocular 3D Object SLAM [Notes] TRO 2019 [dynamic SLAM, orb slam + mono3D]
- ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings [Notes] CVPR 2020 [general dynamic SLAM]
- S3DOT: Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving [Notes] ECCV 2018 [Peiliang Li]
- Multi-object Monocular SLAM for Dynamic Environments [Notes] IV 2020 [monolayout authors]
- PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume [Notes] CVPR 2018 oral [Optical flow]
- LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation CVPR 2018 [Optical flow]
- FlowNet: Learning Optical Flow With Convolutional Networks ICCV 2015 [Optical flow]
- FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks CVPR 2017 [Optical flow]
- ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network CVPR 2019 [semantic segmentation, lightweight]
- Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes ICCV 2019 [depth uncertainty]
- Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems [Notes] [Honda] ICRA 2019
- PackNet: 3D Packing for Self-Supervised Monocular Depth Estimation [Notes] CVPR 2020 oral [Scale aware depth]
- PackNet-SG: Semantically-Guided Representation Learning for Self-Supervised Monocular Depth [Notes] ICLR 2020 [TRI, infinite-depth problem]
- TrianFlow: Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Notes] CVPR 2020 [Scale aware]
- Understanding the Limitations of CNN-based Absolute Camera Pose Regression [Notes] CVPR 2019 [Drawbacks of PoseNet, MapNet, Laura Leal-Taixe@TUM]
- To Learn or Not to Learn: Visual Localization from Essential Matrices [Notes] ICRA 2020 [SIFT + 5 pt solver >> others for VO, Laura Leal-Taixe@TUM]
- DF-VO: Visual Odometry Revisited: What Should Be Learnt? [Notes] ICRA 2020 [Depth and Flow for accurate VO]
- D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry [Notes] CVPR 2020 oral [Daniel Cremers, TUM, depth uncertainty]
- Network Slimming: Learning Efficient Convolutional Networks through Network Slimming [Notes] ICCV 2017
- BatchNorm Pruning: Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers [Notes] ICLR 2018
- Direct Sparse Odometry PAMI 2018
- Train in Germany, Test in The USA: Making 3D Object Detectors Generalize [Notes] CVPR 2020
- PseudoLidarV3: End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [Notes] CVPR 2020
- ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection [Notes] CVPR 2020 oral
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression AAAI 2020
- Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [Journal version]
- YOLOv4: Optimal Speed and Accuracy of Object Detection [Notes]
- CBN: Cross-Iteration Batch Normalization [Notes]
- Stitcher: Feedback-driven Data Provider for Object Detection [Notes]
- SKNet: Selective Kernel Networks [Notes] CVPR 2019
- CBAM: Convolutional Block Attention Module [Notes] ECCV 2018
- ResNeSt: Split-Attention Networks [Notes]
- ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst [Notes] RSS 2019 [Waymo]
- IntentNet: Learning to Predict Intention from Raw Sensor Data [Notes] CoRL 2018 [Uber ATG, perception and prediction, Lidar+Map]
- RoR: Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions [Notes] CVPR 2019 [Zoox]
- MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction [Notes] CoRL 2019 [Waymo, authors from RoR and ChauffeurNet]
- NMP: End-to-end Interpretable Neural Motion Planner [Notes] CVPR 2019 oral [Uber ATG]
- Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks [Notes] ICRA 2019 [Henggang Cui, Multimodal, Uber ATG Pittsburgh]
- Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving WACV 2020 [Uber ATG Pittsburgh]
- Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles IROS 2019 Oral [Uber ATG, behavioral planning, motion planning]
- TensorMask: A Foundation for Dense Object Segmentation [Notes] ICCV 2019 [single-stage instance seg]
- BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [Notes] CVPR 2020 oral
- Mask Encoding for Single Shot Instance Segmentation [Notes] CVPR 2020 oral [single-stage instance seg, Chunhua Shen]
- PolarMask: Single Shot Instance Segmentation with Polar Representation [Notes] CVPR 2020 oral [single-stage instance seg]
- SOLO: Segmenting Objects by Locations [Notes] ECCV 2020 [single-stage instance seg, Chunhua Shen]
- SOLOv2: Dynamic, Faster and Stronger [Notes] [single-stage instance seg, Chunhua Shen]
- CondInst: Conditional Convolutions for Instance Segmentation [Notes] ECCV 2020 oral [single-stage instance seg, Chunhua Shen]
- CenterMask: Single Shot Instance Segmentation With Point Representation [Notes]CVPR 2020
- VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition [Notes] ICCV 2017
- Which Tasks Should Be Learned Together in Multi-task Learning? [Notes] [Stanford, MTL] ICML 2020
- MGDA: Multi-Task Learning as Multi-Objective Optimization NeurIPS 2018
- Taskonomy: Disentangling Task Transfer Learning [Notes] CVPR 2018
- Rethinking ImageNet Pre-training [Notes] ICCV 2019 [Kaiming He]
- UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor [Notes] [superpoint]
- KP2D: Neural Outlier Rejection for Self-Supervised Keypoint Learning [Notes] ICLR 2020 (pointNet)
- KP3D: Self-Supervised 3D Keypoint Learning for Ego-motion Estimation [Notes] CoRL 2020 [Toyota, superpoint]
- NG-RANSAC: Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses [Notes] ICCV 2019 [pointNet]
- Learning to Find Good Correspondences [Notes] CVPR 2018 Oral (pointNet)
- RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving [Notes] [Huawei, Mono3D]
- DSP: Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation [Notes] AAAI 2020 (SenseTime, Mono3D)
- Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks (LLD, LSTM)
- LaneNet: Towards End-to-End Lane Detection: an Instance Segmentation Approach [Notes] IV 2018 (LaneNet)
- 3D-LaneNet: End-to-End 3D Multiple Lane Detection [Notes] ICCV 2019
- Semi-Local 3D Lane Detection and Uncertainty Estimation [Notes] [GM Israel, 3D LLD]
- Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection [Notes] ECCV 2020 [Apollo, 3D LLD]
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty CVPR 2018 [Egocentric prediction]
- It’s Not All About Size: On the Role of Data Properties in Pedestrian Detection ECCV 2018 [pedestrian]
- Associative Embedding: End-to-End Learning for Joint Detection and Grouping [Notes] NIPS 2017
- Pixels to Graphs by Associative Embedding [Notes] NIPS 2017
- Social LSTM: Human Trajectory Prediction in Crowded Spaces [Notes] CVPR 2017
- Online Video Object Detection using Association LSTM [Notes] [single stage, recurrent]
- SuperPoint: Self-Supervised Interest Point Detection and Description [Notes] CVPR 2018 (channel-to-pixel, deep SLAM, Magic Leap)
- PointRend: Image Segmentation as Rendering [Notes] CVPR 2020 Oral [Kaiming He, FAIR]
- Multigrid: A Multigrid Method for Efficiently Training Video Models [Notes] CVPR 2020 Oral [Kaiming He, FAIR]
- GhostNet: More Features from Cheap Operations [Notes] CVPR 2020
- FixRes: Fixing the train-test resolution discrepancy [Notes] NIPS 2019 [FAIR]
- MoVi-3D: Towards Generalization Across Depth for Monocular 3D Object Detection [Notes] ECCV 2020 [Virtual Cam, viewport, Mapillary/Facebook, Mono3D]
- Amodal Completion and Size Constancy in Natural Scenes [Notes] ICCV 2015 (Amodal completion)
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning [Notes] CVPR 2020 Oral [FAIR, Kaiming He]
- Double Descent: Reconciling modern machine learning practice and the bias-variance trade-of [Notes] PNAS 2019
- Deep Double Descent: Where Bigger Models and More Data Hurt [Notes]
- Visualizing the Loss Landscape of Neural Nets NIPS 2018
- The ApolloScape Open Dataset for Autonomous Driving and its Application CVPR 2018 (dataset)
- ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving [Notes] CVPR 2019
- Part-level Car Parsing and Reconstruction from a Single Street View [Notes] [Baidu]
- 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images [Notes] CVPR 2019
- RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving [Notes] ECCV 2020 spotlight
- DORN: Deep Ordinal Regression Network for Monocular Depth Estimation [Notes] CVPR 2018 [monodepth, supervised]
- D&T: Detect to Track and Track to Detect [Notes] ICCV 2017 (from Feichtenhofer)
- CRF-Net: A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection [Notes] SDF 2019 (radar detection)
- RVNet: Deep Sensor Fusion of Monocular Camera and Radar for Image-based Obstacle Detection in Challenging Environments [Notes] PSIVT 2019
- RRPN: Radar Region Proposal Network for Object Detection in Autonomous Vehicles [Notes] ICIP 2019
- ROLO: Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking [Notes] ISCAS 2016
- Recurrent SSD: Recurrent Multi-frame Single Shot Detector for Video Object Detection [Notes] BMVC 2018 (Mitsubishi)
- Recurrent RetinaNet: A Video Object Detection Model Based on Focal Loss [Notes] ICONIP 2018 (single stage, recurrent)
- Actions as Moving Points [Notes] [not suitable for online]
- The PREVENTION dataset: a novel benchmark for PREdiction of VEhicles iNTentIONs [Notes] ITSC 2019 [dataset, cut-in]
- Semi-Automatic High-Accuracy Labelling Tool for Multi-Modal Long-Range Sensor Dataset [Notes] IV 2018
- Astyx dataset: Automotive Radar Dataset for Deep Learning Based 3D Object Detection [Notes] EuRAD 2019 (Astyx)
- Astyx camera radar: Deep Learning Based 3D Object Detection for Automotive Radar and Camera [Notes] EuRAD 2019 (Astyx)
- How Do Neural Networks See Depth in Single Images? [Notes] ICCV 2019
- Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera ICRA 2019 (depth completion)
- DC: Depth Coefficients for Depth Completion [Notes] CVPR 2019 [Xiaoming Liu, Multimodal]
- Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation [Notes] ICRA 2017
- VO-Monodepth: Enhancing self-supervised monocular depth estimation with traditional visual odometry [Notes] 3DV 2019 (sparse to dense)
- Probabilistic Object Detection: Definition and Evaluation [Notes]
- The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation [Notes] ICCV 2019
- On Calibration of Modern Neural Networks [Notes] ICML 2017 (Weinberger)
- Extreme clicking for efficient object annotation [Notes] ICCV 2017
- Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems [Notes] NeurIPS 2019 (radar)
- Deep Active Learning for Efficient Training of a LiDAR 3D Object Detector [Notes] IV 2019
- C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion [Notes] ICCV 2019
- YOLACT: Real-time Instance Segmentation [Notes] ICCV 2019 [single-stage instance seg]
- YOLACT++: Better Real-time Instance Segmentation [single-stage instance seg]
- Review of Image and Feature Descriptors
- Vehicle Detection With Automotive Radar Using Deep Learning on Range-Azimuth-Doppler Tensors [Notes] ICCV 2019
- GPP: Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road [Notes] IV 2020 [UCSD, Trevidi, mono 3DOD]
- MVRA: Multi-View Reprojection Architecture for Orientation Estimation [Notes] ICCV 2019
- YOLOv3: An Incremental Improvement
- Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving [Notes] ICCV 2019 (Detection with Uncertainty)
- Bayesian YOLOv3: Uncertainty Estimation in One-Stage Object Detection [Notes] [DriveU]
- Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [Notes] ITSC 2018 (DriveU)
- Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [Notes] IV 2019 (DriveU)
- Can We Trust You? On Calibration of a Probabilistic Object Detector for Autonomous Driving [Notes] IROS 2019 (DriveU)
- LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [Notes] CVPR 2019 (uncertainty)
- LaserNet KL: Learning an Uncertainty-Aware Object Detector for Autonomous Driving [Notes] [LaserNet with KL divergence]
- IoUNet: Acquisition of Localization Confidence for Accurate Object Detection [Notes] ECCV 2018
- gIoU: Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression [Notes] CVPR 2019
- The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks CVPR 2018 [IoU as loss]
- KL Loss: Bounding Box Regression with Uncertainty for Accurate Object Detection [Notes] CVPR 2019
- CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth [Notes] CVPR 2019
- BayesOD: A Bayesian Approach for Uncertainty Estimation in Deep Object Detectors [Notes]
- TW-SMNet: Deep Multitask Learning of Tele-Wide Stereo Matching [Notes] ICIP 2019
- Accurate Uncertainties for Deep Learning Using Calibrated Regression [Notes] ICML 2018
- Calibrating Uncertainties in Object Localization Task [Notes] NIPS 2018
- SMWA: On the Over-Smoothing Problem of CNN Based Disparity Estimation [Notes] ICCV 2019 [Multimodal, depth estimation]
- Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image [Notes] ICRA 2018 (depth completion)
- Review of monocular object detection
- Review of 2D 3D contraints in Mono 3DOD
- MonoGRNet 2: Monocular 3D Object Detection via Geometric Reasoning on Keypoints [Notes] [estimates depth from keypoints]
- Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [Notes] CVPR 2017
- SS3D: Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss [Notes] [rergess distance from images, centernet like]
- GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving [Notes] CVPR 2019
- M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [Notes] ICCV 2019 oral [3D anchors, cyclists, Xiaoming Liu]
- TLNet: Triangulation Learning Network: from Monocular to Stereo 3D Object Detection [Notes] CVPR 2019
- A Survey on 3D Object Detection Methods for Autonomous Driving Applications [Notes] TITS 2019 [Review]
- BEV-IPM: Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image [Notes] IV 2019
- ForeSeE: Task-Aware Monocular Depth Estimation for 3D Object Detection [Notes] AAAI 2020 oral [successor to pseudo-lidar, mono 3DOD SOTA]
- Obj-dist: Learning Object-specific Distance from a Monocular Image [Notes] ICCV 2019 (xmotors.ai + NYU) [monocular distance]
- DisNet: A novel method for distance estimation from monocular camera [Notes] IROS 2018 [monocular distance]
- BirdGAN: Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles [Notes] IROS 2019
- Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints [Notes] ICIP 2019
- 3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare [Notes] CVPR 2018
- Deep Optics for Monocular Depth Estimation and 3D Object Detection [Notes] ICCV 2019
- MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation [Notes] ICCV 2019
- Joint Monocular 3D Vehicle Detection and Tracking [Notes] ICCV 2019 (Berkeley DeepDrive)
- CasGeo: 3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results [Notes]
- Slimmable Neural Networks [Notes] ICLR 2019
- Universally Slimmable Networks and Improved Training Techniques [Notes] ICCV 2019
- AutoSlim: Towards One-Shot Architecture Search for Channel Numbers
- Once for All: Train One Network and Specialize it for Efficient Deployment
- DOTA: A Large-scale Dataset for Object Detection in Aerial Images [Notes] CVPR 2018 (rotated bbox)
- RoiTransformer: Learning RoI Transformer for Oriented Object Detection in Aerial Images [Notes] CVPR 2019 (rotated bbox)
- RRPN: Arbitrary-Oriented Scene Text Detection via Rotation Proposals TMM 2018
- R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection (rotated bbox)
- TI white paper: Webinar: mmWave Radar for Automotive and Industrial applications [Notes] [TI, radar]
- Federated Learning: Strategies for Improving Communication Efficiency [Notes] NIPS 2016
- sort: Simple Online and Realtime Tracking [Notes] ICIP 2016
- deep-sort: Simple Online and Realtime Tracking with a Deep Association Metric [Notes]
- MT-CNN: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks [Notes] SPL 2016 (real time, facial landmark)
- RetinaFace: Single-stage Dense Face Localisation in the Wild [Notes] CVPR 2020 [joint object and landmark detection]
- SC-SfM-Learner: Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video [Notes] NIPS 2019
- SiamMask: Fast Online Object Tracking and Segmentation: A Unifying Approach CVPR 2019 (tracking, segmentation, label propagation)
- Review of Kálmán Filter (from Tim Babb, Pixar Animation) [Notes]
- R-FCN: Object Detection via Region-based Fully Convolutional Networks [Notes] NIPS 2016
- Guided backprop: Striving for Simplicity: The All Convolutional Net [Notes] ICLR 2015
- Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks [Notes] CVPR 2019
- Boxy Vehicle Detection in Large Images [Notes] ICCV 2019
- FQNet: Deep Fitting Degree Scoring Network for Monocular 3D Object Detection [Notes] CVPR 2019 [Mono 3DOD, Jiwen Lu]
- Mono3D: Monocular 3D Object Detection for Autonomous Driving [Notes] CVPR2016
- MonoDIS: Disentangling Monocular 3D Object Detection [Notes] ICCV 2019
- Pseudo lidar-e2e: Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud [Notes] ICCV 2019 (pseudo-lidar with 2d and 3d consistency loss, better than PL and worse than PL++, SOTA for pure mono3D)
- MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization [Notes] AAAI 2019 (SOTA of Mono3DOD, MLF < MonoGRNet < Pseudo-lidar)
- MLF: Multi-Level Fusion based 3D Object Detection from Monocular Images [Notes] CVPR 2018 (precursor to pseudo-lidar)
- ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape [Notes] CVPR 2019
- AM3D: Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving [Notes] ICCV 2019 [similar to pseudo-lidar, color-enhanced]
- Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors [Notes] (from Stefano Soatto) AAAI 2019
- Deep Metadata Fusion for Traffic Light to Lane Assignment [Notes] IEEE RA-L 2019 (traffic lights association)
- Automatic Traffic Light to Ego Vehicle Lane Association at Complex Intersections ITSC 2019 (traffic lights association)
- Distant Vehicle Detection Using Radar and Vision[Notes] ICRA 2019 [radar, vision, radar tracklets fusion]
- Distance Estimation of Monocular Based on Vehicle Pose Information [Notes]
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics [Notes] CVPR 2018 (Alex Kendall)
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks [Notes] ICML 2018 (multitask)
- DTP: Dynamic Task Prioritization for Multitask Learning [Notes] ECCV 2018 [multitask, Stanford]
- Will this car change the lane? - Turn signal recognition in the frequency domain [Notes] IV 2014
- Complex-YOLO: Real-time 3D Object Detection on Point Clouds [Notes] (BEV detection only)
- Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds CVPR 2019 (sensor fusion and tracking)
- An intriguing failing of convolutional neural networks and the CoordConv solution [Notes] NIPS 2018
- Deep Parametric Continuous Convolutional Neural Networks [Notes] CVPR 2018 (@Uber, sensor fusion)
- ContFuse: Deep Continuous Fusion for Multi-Sensor 3D Object Detection [Notes] ECCV 2018 [Uber ATG, sensor fusion, BEV]
- Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [Notes] CVPR 2018 oral [lidar only, perception and prediction]
- LearnK: Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras [Notes] ICCV 2019 [monocular depth estimation, intrinsic estimation, SOTA]
- monodepth: Unsupervised Monocular Depth Estimation with Left-Right Consistency [Notes] CVPR 2017 oral (monocular depth estimation, stereo for training)
- Struct2depth: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos [Notes] AAAI 2019 [monocular depth estimation, estimating movement of dynamic object, infinite depth problem, online finetune]
- Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency [Notes] AAAI 2018 (monocular depth estimation, static assumption, surface normal)
- LEGO Learning Edge with Geometry all at Once by Watching Videos [Notes] CVPR 2018 spotlight (monocular depth estimation, static assumption, surface normal)
- Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network [Notes] (radar, RD map, OD, Arxiv 201902)
- A study on Radar Target Detection Based on Deep Neural Networks [Notes] (radar, RD map, OD)
- 2D Car Detection in Radar Data with PointNets [Notes] (from Ulm Univ, radar, point cloud, OD, Arxiv 201904)
- Learning Confidence for Out-of-Distribution Detection in Neural Networks [Notes] (budget to cheat)
- A Deep Learning Approach to Traffic Lights: Detection, Tracking, and Classification [Notes] ICRA 2017 (Bosch, traffic lights)
- How hard can it be? Estimating the difficulty of visual search in an image [Notes] CVPR 2016
- Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [Notes] (review from Bosch)
- Review of monocular 3d object detection (blog from 知乎)
- Deep3dBox: 3D Bounding Box Estimation Using Deep Learning and Geometry [Notes] CVPR 2017 [Zoox]
- MonoPSR: Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [Notes] CVPR 2019
- OFT: Orthographic Feature Transform for Monocular 3D Object Detection [Notes] BMVC 2019 [Convert camera to BEV, Alex Kendall]
- MixMatch: A Holistic Approach to Semi-Supervised Learning [Notes]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Notes] ICML 2019
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? [Notes] NIPS 2017
- Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding [Notes]BMVC 2017
- TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents [Notes] AAAI 2019 oral
- Deep Depth Completion of a Single RGB-D Image [Notes] CVPR 2018 (indoor)
- DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image [Notes] CVPR 2019 (outdoor)
- SfMLearner: Unsupervised Learning of Depth and Ego-Motion from Video [Notes] CVPR 2017
- Monodepth2: Digging Into Self-Supervised Monocular Depth Estimation [Notes] ICCV 2019 [Niantic]
- DeepSignals: Predicting Intent of Drivers Through Visual Signals [Notes] ICRA 2019 (@Uber, turn signal detection)
- FCOS: Fully Convolutional One-Stage Object Detection [Notes] ICCV 2019 [Chunhua Shen]
- Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving [Notes] ICLR 2020
- MMF: Multi-Task Multi-Sensor Fusion for 3D Object Detection [Notes] CVPR 2019 (@Uber, sensor fusion)
- CenterNet: Objects as points (from ExtremeNet authors) [Notes]
- CenterNet: Object Detection with Keypoint Triplets [Notes]
- Object Detection based on Region Decomposition and Assembly [Notes] AAAI 2019
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [Notes] ICLR 2019
- M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network [Notes] AAAI 2019
- Deep Radar Detector [Notes] RadarCon 2019
- Semantic Segmentation on Radar Point Clouds [[Notes]] (from Daimler AG) FUSION 2018
- Pruning Filters for Efficient ConvNets [Notes] ICLR 2017
- Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks [Notes] NIPS 2018 talk
- LeGR: Filter Pruning via Learned Global Ranking [Notes] CVPR 2020 oral
- NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [Notes] CVPR 2019
- AutoAugment: Learning Augmentation Policies from Data [Notes] CVPR 2019
- Path Aggregation Network for Instance Segmentation [Notes] CVPR 2018
- Channel Pruning for Accelerating Very Deep Neural Networks ICCV 2017 (Face++, Yihui He) [Notes]
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices ECCV 2018 (Song Han, Yihui He)
- MobileNetV3: Searching for MobileNetV3 [Notes] ICCV 2019
- MnasNet: Platform-Aware Neural Architecture Search for Mobile [Notes] CVPR 2019
- Rethinking the Value of Network Pruning ICLR 2019
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (MobileNets v2) [Notes] CVPR 2018
- A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms [Notes] ITSC 2013
- MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving [Notes]
- Optimizing the Trade-off between Single-Stage and Two-Stage Object Detectors using Image Difficulty Prediction (Very nice illustration of 1 and 2 stage object detection)
- Light-Head R-CNN: In Defense of Two-Stage Object Detector [Notes] (from Megvii)
- CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection [Notes] CVPR 2019 [center and scale prediction, anchor-free, near SOTA pedestrian]
- Review of Anchor-free methods (知乎Blog) 目标检测:Anchor-Free时代 Anchor free深度学习的目标检测方法 My Slides on CSP
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- CornerNet: Detecting Objects as Paired Keypoints [Notes] ECCV 2018
- ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points [Notes] CVPR 2019
- FSAF: Feature Selective Anchor-Free Module for Single-Shot Object Detection [Notes] CVPR 2019
- FoveaBox: Beyond Anchor-based Object Detector (anchor-free) [Notes]
- Bag of Freebies for Training Object Detection Neural Networks [Notes]
- mixup: Beyond Empirical Risk Minimization [Notes] ICLR 2018
- Multi-view Convolutional Neural Networks for 3D Shape Recognition (MVCNN) [Notes] ICCV 2015
- 3D ShapeNets: A Deep Representation for Volumetric Shapes [Notes] CVPR 2015
- Volumetric and Multi-View CNNs for Object Classification on 3D Data [Notes] CVPR 2016
- Group Normalization [Notes] ECCV 2018
- Spatial Transformer Networks [Notes] NIPS 2015
- Frustum PointNets for 3D Object Detection from RGB-D Data (F-PointNet) [Notes] CVPR 2018
- Dynamic Graph CNN for Learning on Point Clouds [Notes]
- PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud (SOTA for 3D object detection) [Notes] CVPR 2019
- MV3D: Multi-View 3D Object Detection Network for Autonomous Driving [Notes] CVPR 2017 (Baidu, sensor fusion, BV proposal)
- AVOD: Joint 3D Proposal Generation and Object Detection from View Aggregation [Notes] IROS 2018 (sensor fusion, multiview proposal)
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [Notes]
- Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gafp in 3D Object Detection for Autonomous Driving [Notes] CVPR 2019
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection CVPR 2018 (Apple, first end-to-end point cloud encoding to grid)
- SECOND: Sparsely Embedded Convolutional Detection Sensors 2018 (builds on VoxelNet)
- PointPillars: Fast Encoders for Object Detection from Point Clouds [Notes] CVPR 2019 (builds on SECOND)
- Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite [Notes] CVPR 2012
- Vision meets Robotics: The KITTI Dataset [Notes] IJRR 2013
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) [Notes]Video CVPR 2017
- Initialization Strategies of Spatio-Temporal Convolutional Neural Networks [Notes] Video
- Detect-and-Track: Efficient Pose Estimation in Videos [Notes] ICCV 2017 Video
- Deep Learning Based Rib Centerline Extraction and Labeling [Notes] MI MICCAI 2018
- SlowFast Networks for Video Recognition [Notes] ICCV 2019 Oral
- Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) [Notes] CVPR 2017
- Beyond the pixel plane: sensing and learning in 3D (blog, 中文版本)
- VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition (VoxNet) [Notes]
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation CVPR 2017 [Notes]
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space NIPS 2017 [Notes]
- Review of Geometric deep learning 几何深度学习前沿 (from 知乎) (Up to CVPR 2018)
- DQN: Human-level control through deep reinforcement learning (Nature DQN paper) [Notes] DRL
- Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection [Notes] MI
- Panoptic Segmentation [Notes] PanSeg
- Panoptic Feature Pyramid Networks [Notes] PanSeg
- Attention-guided Unified Network for Panoptic Segmentation [Notes] PanSeg
- Bag of Tricks for Image Classification with Convolutional Neural Networks [Notes] CLS
- Deep Reinforcement Learning for Vessel Centerline Tracing in Multi-modality 3D Volumes [Notes] DRL MI
- Deep Reinforcement Learning for Flappy Bird [Notes] DRL
- Long-Term Feature Banks for Detailed Video Understanding [Notes] Video
- Non-local Neural Networks [Notes] Video CVPR 2018
- Mask R-CNN
- Cascade R-CNN: Delving into High Quality Object Detection
- Focal Loss for Dense Object Detection (RetinaNet) [Notes]
- Squeeze-and-Excitation Networks (SENet)
- Progressive Growing of GANs for Improved Quality, Stability, and Variation
- Deformable Convolutional Networks ICCV 2017 [build on R-FCN]
- Learning Region Features for Object Detection
- Learning notes on Deep Learning
- List of Papers on Machine Learning
- Notes of Literature Review on CNN in CV This is the notes for all the papers in the recommended list here
- Notes of Literature Review (Others)
- Notes on how to set up DL/ML environment
- Useful setup notes
Here is the list of papers waiting to be read.
- SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness ICML 2019
- Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet (BagNet) blog ICML 2019
- A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
- Understanding deep learning requires rethinking generalization
- Gradient Reversal: Unsupervised Domain Adaptation by Backpropagation ICML 2015
- Rethinking Pre-training and Self-training NeurIPS 2020 [Quoc Le]
- Mask Scoring R-CNN CVPR 2019
- Training Region-based Object Detectors with Online Hard Example Mining
- Gliding vertex on the horizontal bounding box for multi-oriented object detection
- ONCE: Incremental Few-Shot Object Detection CVPR 2020
- Domain Adaptive Faster R-CNN for Object Detection in the Wild CVPR 2018
- Foggy Cityscapes: Semantic Foggy Scene Understanding with Synthetic Data IJCV 2018
- Foggy Cityscapes ECCV: Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding ECCV 2018
- Dropout Sampling for Robust Object Detection in Open-Set Conditions ICRA 2018 (Niko Sünderhauf)
- Hybrid Task Cascade for Instance Segmentation CVPR 2019 (cascaded mask RCNN)
- Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection ICRA 2019 (Niko Sünderhauf)
- A Unified Panoptic Segmentation Network CVPR 2019 PanSeg
- Model Vulnerability to Distributional Shifts over Image Transformation Sets (CVPR workshop) tl:dr
- Automatic adaptation of object detectors to new domains using self-training CVPR 2019 (find corner case and boost)
- Missing Labels in Object Detection CVPR 2019
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- Circular Object Detection in Polar Coordinates for 2D LIDAR Data CCPR 2016
- LFFD: A Light and Fast Face Detector for Edge Devices [Lightweight, face detection, car detection]
- UnitBox: An Advanced Object Detection Network ACM MM 2016 [Ln IoU loss, Thomas Huang]
- Learning Spatiotemporal Features with 3D Convolutional Networks (C3D) Video ICCV 2015
- AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
- Spatiotemporal Residual Networks for Video Action Recognition (decouple spatiotemporal) NIPS 2016
- Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks (P3D, decouple spatiotemporal) ICCV 2017
- A Closer Look at Spatiotemporal Convolutions for Action Recognition (decouple spatiotemporal) CVPR 2018
- Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification (decouple spatiotemporal) ECCV 2018
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? CVPR 2018
- AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation ICCV 2019
- One-Shot Video Object Segmentation CVPR 2017
- Looking Fast and Slow: Memory-Guided Mobile Video Object Detection CVPR 2018
- Towards High Performance Video Object Detection [Notes] CVPR 2018
- Towards High Performance Video Object Detection for Mobiles [Notes]
- Temporally Distributed Networks for Fast Video Semantic Segmentation CVPR 2020 [efficient video segmentation]
- Memory Enhanced Global-Local Aggregation for Video Object Detection CVPR 2020 [efficient video object detection]
- Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation IJCAI 2018 oral [video skeleton]
- RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving NeurIPS 2019 workshop
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description CVPR 2015 oral
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition ECCV 2016
- TRN: Temporal Relational Reasoning in Videos ECCV 2018
- X3D: Expanding Architectures for Efficient Video Recognition CVPR 2020 oral [FAIR]
- Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians CVPR 2020 oral [pedestrian, video]
- Flow-guided feature aggregation for video object detection ICCV 2017 [video, object detection]
- 3D human pose estimation in video with temporal convolutions and semi-supervised training CVPR 2019 [mono3D pose estimation from video]
- OmegaNet: Distilled Semantics for Comprehensive Scene Understanding from Videos CVPR 2020
- Object Detection in Videos with Tubelet Proposal Networks CVPR 2017 [video object detection]
- T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos [video object detection]
- Flow-Guided Feature Aggregation for Video Object Detection ICCV 2017 [Jifeng Dai]
- Efficient Deep Learning Inference based on Model Compression (Model Compression)
- Neural Network Distiller [Intel]
- Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks
- CBAM: Convolutional Block Attention Module
- Playing Atari with Deep Reinforcement Learning NIPS 2013
- Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scan
- An Artificial Agent for Robust Image Registration
- 3D-CNN:3D Convolutional Neural Networks for Landing Zone Detection from LiDAR
- Generative and Discriminative Voxel Modeling with Convolutional Neural Networks
- Orientation-boosted Voxel Nets for 3D Object Recognition (ORION) <BMVC 2017>
- GIFT: A Real-time and Scalable 3D Shape Search Engine CVPR 2016
- 3D Shape Segmentation with Projective Convolutional Networks (ShapePFCN)CVPR 2017
- Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
- Open3D: A Modern Library for 3D Data Processing
- Multimodal Deep Learning for Robust RGB-D Object Recognition IROS 2015
- FlowNet3D: Learning Scene Flow in 3D Point Clouds CVPR 2019
- Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling CVPR 2018 (Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds)
- PU-Net: Point Cloud Upsampling Network CVPR 2018
- Recurrent Slice Networks for 3D Segmentation of Point Clouds CVPR 2018
- SPLATNet: Sparse Lattice Networks for Point Cloud Processing CVPR 2018
- Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering NIPS 2016
- Semi-Supervised Classification with Graph Convolutional Networks ICLR 2017
- Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks NIPS 2017
- Graph Attention Networks ICLR 2018
- 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection (3D SSD)
- Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models ICCV 2017
- Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis CVPR 2017
- IPOD: Intensive Point-based Object Detector for Point Cloud
- Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images CVPR 2017
- 2D-Driven 3D Object Detection in RGB-D Images
- 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection
- Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection [classify occluded object]
- PSMNet: Pyramid Stereo Matching Network CVPR 2018
- Stereo R-CNN based 3D Object Detection for Autonomous Driving CVPR 2019
- Deep Rigid Instance Scene Flow CVPR 2019
- Upgrading Optical Flow to 3D Scene Flow through Optical Expansion CVPR 2020
- Learning Multi-Object Tracking and Segmentation from Automatic Annotations CVPR 2020 [automatic MOTS annotation]
- Traffic-Sign Detection and Classification in the Wild CVPR 2016 [Tsinghua, Tencent, traffic signs]
- A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection IEEE CRV 2018 [U torronto]
- Detecting Traffic Lights by Single Shot Detection ITSC 2018
- DeepTLR: A single Deep Convolutional Network for Detection and Classification of Traffic Lights IV 2016
- Evaluating State-of-the-art Object Detector on Challenging Traffic Light Data CVPR 2017 workshop
- Traffic light recognition in varying illumination using deep learning and saliency map ITSC 2014 [traffic light]
- Traffic light recognition using high-definition map features RAS 2019
- Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives TITS 2015
- The DriveU Traffic Light Dataset: Introduction and Comparison with Existing Datasets ICRA 2018
- The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset
- Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives (traffic light survey, UCSD LISA)
- Review of Graph Spectrum Theory (WIP)
- 3D Deep Learning Tutorial at CVPR 2017 [Notes] - (WIP)
- A Survey on Neural Architecture Search
- Network pruning tutorial (blog)
- GNN tutorial at CVPR 2019
- Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset [Waymo, prediction dataset]
- PANDA: A Gigapixel-level Human-centric Video Dataset CVPR 2020
- WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving ICCV 2019 [Valeo]
- Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation 3DV 2018
- Depth Map Prediction from a Single Image using a Multi-Scale Deep Network NIPS 2014 (Eigen et al)
- Learning Depth from Monocular Videos using Direct Methods CVPR 2018 (monocular depth estimation)
- Virtual-Normal: Enforcing geometric constraints of virtual normal for depth prediction [Notes] ICCV 2019 (better generation of PL)
- Spatial Correspondence with Generative Adversarial Network: Learning Depth from Monocular Videos ICCV 2019
- Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM ICCV 2019
- Visualization of Convolutional Neural Networks for Monocular Depth Estimation ICCV 2019
- Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation ICCV 2019 workshop [indoor]
- Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation ECCV 2020 [indoor depth]
- Disambiguating Monocular Depth Estimation with a Single Transient ECCV 2020 [additional laser sensor, indoor depth]
- Guiding Monocular Depth Estimation Using Depth-Attention Volume ECCV 2020 [indoor depth]
- Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets ECCV 2020 [indoor depth]
- CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss ECCV 2020 [indoor depth]
- PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation (pointnet alternative, backbone)
- Vehicle Detection from 3D Lidar Using Fully Convolutional Network (VeloFCN) RSS 2016
- KPConv: Flexible and Deformable Convolution for Point Clouds (from the authors of PointNet)
- PointCNN: Convolution On X-Transformed Points NIPS 2018
- L3-Net: Towards Learning based LiDAR Localization for Autonomous Driving CVPR 2019
- RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement (sensor fusion, 3D mono proposal, refined in point cloud)
- DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map CVPR 2018
- Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection IROS 2019
- PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing
- Gated2Depth: Real-time Dense Lidar from Gated Images ICCV 2019 oral
- A Multi-Sensor Fusion System for Moving Object Detection and Tracking in Urban Driving Environments ICRA 2014
- PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation CVPR 2018 [sensor fusion, Zoox]
- Deep Hough Voting for 3D Object Detection in Point Clouds ICCV 2019 [Charles Qi]
- StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation CVPR 2020
- Depth Sensing Beyond LiDAR Range CVPR 2020 [wide baseline stereo with trifocal]
- Probabilistic Semantic Mapping for Urban Autonomous Driving Applications IROS 2020 [lidar mapping]
- RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds CVPR 2020 oral [lidar segmentation]
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation CVPR 2020 [lidar segmentation]
- OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression CVPR 2020 oral [lidar compression]
- MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models NeurIPS 2020 oral [lidar compression]
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty CVPR 2018 [on-board bbox prediction]
- Unsupervised Traffic Accident Detection in First-Person Videos IROS 2019 (Honda)
- NEMO: Future Object Localization Using Noisy Ego Priors (Honda)
- Robust Aleatoric Modeling for Future Vehicle Localization (perspective)
- Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments WACV 2020 (perspective bbox, pedestrian)
- Using panoramic videos for multi-person localization and tracking in a 3D panoramic coordinate
- End-to-end Lane Detection through Differentiable Least-Squares Fitting ICCV 2019
- Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit TITS 2019 [object-like proposals]
- Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers [3D LLD]
- Ultra Fast Structure-aware Deep Lane Detection ECCV 2020 [lane detection]
- A Novel Approach for Detecting Road Based on Two-Stream Fusion Fully Convolutional Network (convert camera to BEV)
- FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network
- RetinaTrack: Online Single Stage Joint Detection and Tracking CVPR 2020
- Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art (latest update in Dec 2019)
- Simultaneous Identification and Tracking of Multiple People Using Video and IMUs CVPR 2019
- Detect-and-Track: Efficient Pose Estimation in Videos
- TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis
- Video Action Transformer Network CVPR 2019 oral
- Online Real-time Multiple Spatiotemporal Action Localisation and Prediction ICCV 2017
- 多目标跟踪 近年论文及开源代码汇总
- GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning CVPR 2020 oral [3DMOT, CMU, Kris Kitani]
- Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking ECCV 2020 spotlight [MOT, Tencent]
- Towards Real-Time Multi-Object Tracking ECCV 2020 [MOT]
- Probabilistic 3D Multi-Object Tracking for Autonomous Driving [TRI]
- Probabilistic Face Embeddings ICCV 2019
- Data Uncertainty Learning in Face Recognition CVPR 2020
- Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos CVPR 2020 oral [VGG, self-supervised, interpretable, discriminator]
- Revisiting Small Batch Training for Deep Neural Networks
- ICML2019 workshop: Adaptive and Multitask Learning: Algorithms & Systems ICML 2019
- Adaptive Scheduling for Multi-Task Learning NIPS 2018 (NMT)
- Polar Transformer Networks ICLR 2018
- Measuring Calibration in Deep Learning CVPR 2019
- Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation ICCV 2019 (epistemic uncertainty)
- Making Convolutional Networks Shift-Invariant Again ICML
- Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty NeurIPS 2019
- Understanding deep learning requires rethinking generalization ICLR 2017 [ICLR best paper]
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks ICLR 2017 (NLL score as anomaly score)
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination CVPR 2018 spotlight (Stella Yu)
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks TIP 2018
- The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning ICML 2018
- Designing Network Design Spaces CVPR 2020
- Moco2: Improved Baselines with Momentum Contrastive Learning
- SGD on Neural Networks Learns Functions of Increasing Complexity NIPS 2019 (SGD learns a linear classifier first)
- Pay attention to the activations: a modular attention mechanism for fine-grained image recognition
- A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images BMVC 2018 (multi-bin, what's new?)
- In-Place Activated BatchNorm for Memory-Optimized Training of DNNs CVPR 2018 (optimized BatchNorm + ReLU)
- FCNN: Fourier Convolutional Neural Networks (FFT as CNN)
- Visualizing the Loss Landscape of Neural Nets NIPS 2018
- Xception: Deep Learning with Depthwise Separable Convolutions (Xception)
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (uncertainty)
- Learning to Drive from Simulation without Real World Labels ICRA 2019 (domain adaptation, sim2real)
- Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks CVPR 2020 oral
- Switchable Whitening for Deep Representation Learning ICCV 2019 [domain adaptation]
- Visual Chirality CVPR 2020 oral [best paper nominee]
- Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data CVPR 2020
- Self-training with Noisy Student improves ImageNet classification CVPR 2020 [distillation]
- Keep it Simple: Image Statistics Matching for Domain Adaptation CVPRW 2020 [Domain adaptation for 2D mod bbox]
- Epipolar Transformers CVPR 2020 [Yihui He]
- Scalable Uncertainty for Computer Vision With Functional Variational Inference CVPR 2020 [epistemic uncertainty with one fwd pass]
- 3DOP: 3D Object Proposals for Accurate Object Class Detection NIPS 2015
- DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation
- Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery ECCV 2018 (Monocular 3D object detection and depth estimation)
- Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-aware Representation CVPR 2019 [unified conditional decoder]
- DDP: Dense Depth Posterior from Single Image and Sparse Range CVPR 2019
- Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes IJCV 2018 (data augmentation with AR, Toyota)
- Exploring the Capabilities and Limits of 3D Monocular Object Detection -- A Study on Simulation and Real World Data IITS
- Towards Scene Understanding with Detailed 3D Object Representations IJCV 2014 (keypoint, 3D bbox annotation)
- Deep Cuboid Detection: Beyond 2D Bounding Boxes (Magic Leap)
- Viewpoints and Keypoints (Malik)
- Lifting Object Detection Datasets into 3D (PASCAL)
- 3D Object Class Detection in the Wild (keypoint based)
- Fast Single Shot Detection and Pose Estimation 3DV 2016 (SSD + pose, Wei Liu)
- Virtual KITTI 2
- Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing CVPR 2017
- Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views ICCV 2015 Oral
- Real-Time Seamless Single Shot 6D Object Pose Prediction CVPR 2018
- Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching NIPS 2018 [disparity estimation]
- Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera ICRA 2019
- Learning Depth with Convolutional Spatial Propagation Network (Baidu, depth from SPN) ECCV 2018
- Just Go with the Flow: Self-Supervised Scene Flow Estimation CVPR 2020 oral [Scene flow, Lidar]
- Online Depth Learning against Forgetting in Monocular Videos CVPR 2020 [monodepth]
- Self-Supervised Deep Visual Odometry with Online Adaptation CVPR 2020 oral [DF-VO, TrianFlow, meta-learning]
- Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume CVPR 2020
- Online Depth Learning against Forgetting in Monocular Videos CVPR 2020 [monodepth, online learning]
- SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation CVPR 2020 [monodepth, semantic]
- Inferring Distributions Over Depth from a Single Image TRO [Depth confidence, stitching them together]
- Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths CVPR 2020
- The Edge of Depth: Explicit Constraints between Segmentation and Depth CVPR 2020 [Xiaoming Liu, multimodal, depth bleeding]
- MV-RSS: Multi-View Radar Semantic Segmentation ICCV 2021
- Classification of Objects in Polarimetric Radar Images Using CNNs at 77 GHz (Radar, polar)
- CNNs for Interference Mitigation and Denoising in Automotive Radar Using Real-World Data NeurIPS 2019 (radar)
- Road Scene Understanding by Occupancy Grid Learning from Sparse Radar Clusters using Semantic Segmentation ICCV 2019 (radar)
- RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects ECCV 2020 [Uber ATG]
- Depth Estimation from Monocular Images and Sparse Radar Data IROS 2020 [Camera + Radar for monodepth, nuscenes]
- RPR: Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles IROS 2020 [radar proposal refinement]
- Warping of Radar Data into Camera Image for Cross-Modal Supervision in Automotive Applications
- PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [Notes] ICCV 2015
- PoseNet2: Modelling Uncertainty in Deep Learning for Camera Relocalization ICRA 2016
- PoseNet3: Geometric Loss Functions for Camera Pose Regression with Deep Learning CVPR 2017
- EssNet: Convolutional neural network architecture for geometric matching CVPR 2017
- NC-EssNet: Neighbourhood Consensus Networks NeurIPS 2018
- Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task CVPR 2020 oral [Eric Brachmann, ngransac]
- Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints CVPR 2018
- DynSLAM: Robust Dense Mapping for Large-Scale Dynamic Environments [dynamic SLAM, Andreas Geiger] ICRA 2018
- GCNv2: Efficient Correspondence Prediction for Real-Time SLAM LRA 2019 [Superpoint + orb slam]
- [Real-time Scalable Dense Surfel Mapping](Real-time Scalable Dense Surfel Mapping) ICRA 2019 [dense reconstruction, monodepth]
- Dynamic SLAM: The Need For Speed
- GSLAM: A General SLAM Framework and Benchmark ICCV 2019
- Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar CVPR 2020 [Daimler]
- Radar+RGB Attentive Fusion for Robust Object Detection in Autonomous Vehicles ICIP 2020
- Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor sensors 2020 [radar, camera, early fusion]
- A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence
- Monocular Depth Estimation Based On Deep Learning: An Overview
- Uncertainty Guided Multi-Scale Residual Learning-using a Cycle Spinning CNN for Single Image De-Raining CVPR 2019
- Learn to Combine Modalities in Multimodal Deep Learning (sensor fusion, general DL)
- Safe Trajectory Generation For Complex Urban Environments Using Spatio-temporal Semantic Corridor LRA 2019 [Motion planning]
- DAgger: Driving Policy Transfer via Modularity and Abstraction CoRL 2018 [DAgger, Immitation Learning]
- Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching ICRA 2020 [Motion planning]
- Baidu Apollo EM Motion Planner
- Calibration of Heterogeneous Sensor Systems
- Intro:Sensor Fusion for Adas 无人驾驶中的数据融合 (from 知乎) (Up to CVPR 2018)
- YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving CVPR 2019 (Real Time, Low Power)
- Deep Fusion of Heterogeneous Sensor Modalities for the Advancements of ADAS to Autonomous Vehicles
- Temporal Coherence for Active Learning in Videos ICCVW 2019 [active learning, temporal coherence]
- R-TOD: Real-Time Object Detector with Minimized End-to-End Delay for Autonomous Driving RTSS 2020 [perception system design]
- Learning Lane Graph Representations for Motion Forecasting ECCV 2020 [Uber ATG]
- DSDNet: Deep Structured self-Driving Network ECCV 2020 [Uber ATG]
- Temporal Coherence for Active Learning in Videos ICCV 2019 workshop
- Leveraging Pre-Trained 3D Object Detection Models For Fast Ground Truth Generation ITSC 2018 [UToronto, autolabeling]
- Learning Multi-Object Tracking and Segmentation From Automatic Annotations CVPR 2020 [Autolabeling]
- Canonical Surface Mapping via Geometric Cycle Consistency ICCV 2019
- TIDE: A General Toolbox for Identifying Object Detection Errors ECCV 2018 [tools]
- Self-Supervised Camera Self-Calibration from Video [TRI, intrinsic calibration, fisheye/pinhole]
- A Convolutional Neural Network for Modelling Sentences ACL 2014
- FastText: Bag of Tricks for Efficient Text Classification ACL 2017
- Siamese recurrent architectures for learning sentence similarity AAAI 2016
- Efficient Estimation of Word Representations in Vector Space ICLR 2013
- Neural Machine Translation by Jointly Learning to Align and Translate ICLR 2015
- Transformers: Attention Is All You Need NIPS 2017
- Ad推荐系统方向文章汇总
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction [Notes] (dimension reduction, better than t-SNE)
- Review Notes of Classical Key Points and Descriptors
- CRF
- Visual SLAM and Visual Odometry
- ORB SLAM
- Bundle Adjustment
- 3D vision
- SLAM/VIO学习总结
- Design Patterns
- Capturing Omni-Range Context for Omnidirectional Segmentation CVPR 2021
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers CVPR 2021 [transformers]
- DCL: Dense Label Encoding for Boundary Discontinuity Free Rotation Detection CVPR 2021
- 4D Panoptic LiDAR Segmentation CVPR 2021 [TUM]
- CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild CVPR 2021
- Fast and Accurate Model Scaling CVPR 2021 [FAIR]
- Cylinder3D: Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation CVPR 2021 [lidar semantic segmentation]
- LiDAR R-CNN: An Efficient and Universal 3D Object Detector CVPR 2021 [TuSimple, Lidar]
- PREDATOR: Registration of 3D Point Clouds with Low Overlap CVPR 2021 oral
- DBB: Diverse Branch Block: Building a Convolution as an Inception-like Unit CVPR 2021 [RepVGG, ACNet, Xiaohan Ding, Megvii]
- GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection CVPR 2021 [mono3D]
- DDMP: Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection CVPR 2021 [mono3D]
- M3DSSD: Monocular 3D Single Stage Object Detector CVPR 2021 [mono3D]
- MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation CVPR 2021 [mono3D]
- HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection CVPR 2021 [Lidar]
- PLUME: Efficient 3D Object Detection from Stereo Images [Yan Wang, Uber ATG]
- V2F-Net: Explicit Decomposition of Occluded Pedestrian Detection [crowded, pedestrian, megvii]
- IP-basic: In Defense of Classical Image Processing: Fast Depth Completion on the CPU CRV 2018
- Revisiting Feature Alignment for One-stage Object Detection [cls+reg]
- Per-frame mAP Prediction for Continuous Performance Monitoring of Object Detection During Deployment WACV 2021 [SafetyNet]
- TSD: Revisiting the Sibling Head in Object Detector CVPR 2020 [sensetime, cls+reg]
- 1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation [sensetime, cls+reg, 1st place OpenImage2019]
- Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation ICRA 2021
- End-to-end Lane Detection through Differentiable Least-Squares Fitting ICCV workshop 2019
- Revisiting ResNets: Improved Training and Scaling Strategies
- Multi-Modality Cut and Paste for 3D Object Detection
- LD: Localization Distillation for Object Detection
- PolyTransform: Deep Polygon Transformer for Instance Segmentation CVPR 2020 [single stage instance segmentation]
- ROAD: The ROad event Awareness Dataset for Autonomous Driving
- LidarMTL: A Simple and Efficient Multi-task Network for 3D Object Detection and Road Understanding [lidar MTL]
- High-Performance Large-Scale Image Recognition Without Normalization ICLR 2021
- Ground-aware Monocular 3D Object Detection for Autonomous Driving RA-L [mono3D]
- Demystifying Pseudo-LiDAR for Monocular 3D Object Detection [mono3d]
- Pseudo-labeling for Scalable 3D Object Detection [Waymo]
- LLA: Loss-aware Label Assignment for Dense Pedestrian Detection [Megvii]
- VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation CVPR 2020 [Waymo]
- CoverNet: Multimodal Behavior Prediction using Trajectory Sets CVPR 2020 [prediction, nuScenes]
- SplitNet: Divide and Co-training
- VoVNet: An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection CVPR 2019 workshop
- Isometric Neural Networks: Non-discriminative data or weak model? On the relative importance of data and model resolution ICCV 2019 workshop [spatial2channel]
- TResNet WACV 2021 [spatial2channel]
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression AAAI 2020 [DIOU, NMS]
- RegNet: Designing Network Design Spaces CVPR 2020 [FAIR]
- On Network Design Spaces for Visual Recognition [FAIR]
- Lane Endpoint Detection and Position Accuracy Evaluation for Sensor Fusion-Based Vehicle Localization on Highways Sensors 2018 [lane endpoints]
- Map-Matching-Based Cascade Landmark Detection and Vehicle Localization IEEE Access 2019 [lane endpoints]
- GCNet: End-to-End Learning of Geometry and Context for Deep Stereo Regression ICCV 2017 [disparity estimation, Alex Kendall, cost volume]
- Traffic Control Gesture Recognition for Autonomous Vehicles IROS 2020 [Daimler]
- Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild ECCV 2020
- OrcVIO: Object residual constrained Visual-Inertial Odometry [dynamic SLAM, very mathematical]
- InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling ECCV 2020
- DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving ECCV 2020
- Towards Lightweight Lane Detection by Optimizing Spatial Embedding ECCV 2020 workshop [LLD]
- Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection ECCV 2020 workshop [lidar]
- DeepIM: Deep iterative matching for 6d pose estimation ECCV 2018 [pose estimation]
- Monocular Depth Prediction through Continuous 3D Loss IROS 2020
- Multi-Task Learning for Dense Prediction Tasks: A Survey [MTL, Luc Van Gool]
- Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems ITSC 2020 oral [MTL]
- NeurAll: Towards a Unified Model for Visual Perception in Automated Driving ITSC 2019 oral [MTL]
- Deep Evidential Regression NeurIPS 2020 [one-pass aleatoric/epistemic uncertainty]
- Estimating Drivable Collision-Free Space from Monocular Video WACV 2015 [Drivable space]
- Visualization of Convolutional Neural Networks for Monocular Depth Estimation ICCV 2019 [monodepth]
- Differentiable Rendering: A Survey [differentiable rendering, TRI]
- SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware Feature Extraction [monodepth, semantics, Naver labs]
- Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework WACV 2020
- Towards Good Practice for CNN-Based Monocular Depth Estimation WACV 2020
- Self-Supervised Scene De-occlusion CVPR 2020 oral
- TP-LSD: Tri-Points Based Line Segment Detector
- Data Distillation: Towards Omni-Supervised Learning CVPR 2018 [Kaiming He, FAIR]
- MiDas: Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer [monodepth, dynamic object, synthetic dataset]
- Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation [monodepth]
- Towards Lightweight Lane Detection by Optimizing Spatial Embedding ECCV 2020 workshop
- Synthetic-to-Real Domain Adaptation for Lane Detection [GM Israel, LLD]
- PolyLaneNet: Lane Estimation via Deep Polynomial Regression ICPR 2020 [polynomial, LLD]
- Learning Universal Shape Dictionary for Realtime Instance Segmentation
- End-to-End Video Instance Segmentation with Transformers [DETR, transformers]
- Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks CVPR 2020 workshop
- When and Why Test-Time Augmentation Works
- Footprints and Free Space from a Single Color Image CVPR 2020 oral [Parking use, footprint]
- Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning [BEV, only predict footprint]
- Rethinking Classification and Localization for Object Detection CVPR 2020
- Monocular 3D Object Detection with Sequential Feature Association and Depth Hint Augmentation [mono3D]
- Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
- ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- MVSNet: Depth Inference for Unstructured Multi-view Stereo ECCV 2018
- Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference CVPR 2019 [Deep learning + MVS, Vidar, same author MVSNet]
- Artificial Dummies for Urban Dataset Augmentation AAAI 2021
- DETR for Pedestrian Detection [transformer, pedestrian detection]
- Multi-Modality Cut and Paste for 3D Object Detection [SenseTime]
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [transformer, semantic segmenatation]
- TransPose: Towards Explainable Human Pose Estimation by Transformer [transformer, pose estimation]
- Seesaw Loss for Long-Tailed Instance Segmentation
- SWA Object Detection [Stochastic Weights Averaging (SWA)]
- 3D Object Detection with Pointformer
- Toward Transformer-Based Object Detection [DETR-like]
- Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion [dense SfM]
- Multi-Modality Cut and Paste for 3D Object Detection
- Vision Global Localization with Semantic Segmentation and Interest Feature Points
- Transformer Interpretability Beyond Attention Visualization [transformers]
- Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
- DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution
- Empirical Upper Bound in Object Detection and More
- Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline [Fisheye, Senthil Yogamani]
- Monocular 3D Object Detection with Sequential Feature Association and Depth Hint Augmentation [mono3D]
- SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images [Jiwen Lu, monodepth]
- Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [TRI]
- Linformer: Self-Attention with Linear Complexity
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks ICML 2019
- PCT: Point cloud transformer Computational Visual Media 2021
- DDT: Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming IJCAI 2017
- Hierarchical Road Topology Learning for Urban Map-less Driving [Mercedes]
- Probabilistic Future Prediction for Video Scene Understanding ECCV 2020 [Alex Kendall]
- Detecting 32 Pedestrian Attributes for Autonomous Vehicles [VRU, MTL]
- Cascaded deep monocular 3D human pose estimation with evolutionary training data CVPR 2020 oral
- MonoGeo: Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [mono3D]
- Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth [mono3D]
- Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting [mono3D]
- Lite-FPN for Keypoint-based Monocular 3D Object Detection [mono3D]
- Lidar Point Cloud Guided Monocular 3D Object Detection
- Vision Transformers for Dense Prediction [Vladlen Koltun, Intel]
- Efficient Transformers: A Survey
- Do Vision Transformers See Like Convolutional Neural Networks?
- Progressive Coordinate Transforms for Monocular 3D Object Detection [mono3D]
- AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection ICCV 2021 [mono3D]
- BlazePose: On-device Real-time Body Pose tracking
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [Andy Zeng]
- Large Language Models as General Pattern Machines [Embodied AI]
- RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer
- PlaNet: Learning Latent Dynamics for Planning from Pixels ICML 2019
- Dreamer: Dream to Control: Learning Behaviors by Latent Imagination ICLR 2020 oral
- DreamerV2: Mastering Atari with Discrete World Models ICLR 2021 [World models]
- DreamerV3: Mastering Diverse Domains through World Models
- DayDreamer: World Models for Physical Robot Learning CoRL 2022
- JEPA: A Path Towards Autonomous Machine Intelligence
- I-JEPA: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture CVPR 2023
- MAGVIT: Masked Generative Video Transformer CVPR 2023 highlight [Video prediction]
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models CVPR 2023 [Video prediction]
- Runway Gen-1: Structure and Content-Guided Video Synthesis with Diffusion Models
- Learning to drive from a world on rails ICCV 2021 oral [Philipp Krähenbühl]
- Learning from All Vehicles CVPR 2022 [Philipp Krähenbühl]
- End-to-End Urban Driving by Imitating a Reinforcement Learning Coach ICCV 2021
- End-to-end Autonomous Driving: Challenges and Frontiers
- IL Difficulty Model: Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula CoRL 2022 [Waymo]
- Decision Transformer: Reinforcement Learning via Sequence Modeling NeurIPS 2021 [LLM for planning]
- LID: Pre-Trained Language Models for Interactive Decision-Making NeurIPS 2022 [LLM for planning]
- Planning with Large Language Models via Corrective Re-prompting NeurIPS 2022 Workshop
- Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability ICCV 2023 [TuSimple]
- Speculative Sampling: Accelerating Large Language Model Decoding with Speculative Sampling [Accelerated LLM, DeepMind]
- Inference with Reference: Lossless Acceleration of Large Language Models [Accelerated LLM, Microsoft]
- EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments T-RO 2021
- Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching ICRA 2020
- StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
- SSCNet: Semantic Scene Completion from a Single Depth Image CVPR 2017
- SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences ICCV 2019
- PixPro: Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning [self-supervised]
- Pixel-Wise Contrastive Distillation [self-supervised]
- VICRegL: Self-Supervised Learning of Local Visual Features NeurIPS 2022
- ImageBind: One Embedding Space To Bind Them All CVPR 2023
- KEMP: Keyframe-Based Hierarchical End-to-End Deep Model for Long-Term Trajectory Prediction ICRA 2022 [Planning]
- Deep Interactive Motion Prediction and Planning: Playing Games with Motion Prediction Models L4DC [Planning]
- GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving [Planning]
- LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [Planning, Raquel]
- DIPP: Differentiable Integrated Motion Prediction and Planning with Learnable Cost Function for Autonomous Driving [Planning]
- Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios [Planning, Waymo]
- Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving IROS 2022 [Planning, Waymo]
- Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation ICRA 2022 [Planning, Waymo]
- JFP: Joint Future Prediction with Interactive Multi-Agent Modeling for Autonomous Driving [Planning, Waymo]
- MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation NeurIPS 2021
- 3D Semantic Scene Completion: a Survey IJCV 2022
- DETIC: Detecting Twenty-thousand Classes using Image-level Supervision ECCV 2022
- Atlas: End-to-End 3D Scene Reconstruction from Posed Images ECCV 2020
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers NeurIPS 2021
- SimpleOccupancy: A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving [Occupancy Network]
- OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion [Occupancy Network, stereo]
- Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception NeurIPS 2022
- Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline
- ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals CVPR 2023 [Qcraft, prediction]
- Motion Transformer with Global Intention Localization and Local Movement Refinement NeurIPS 2022 Oral
- P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving
- MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction
- ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries
- SAM: Segment Anything [FAIR]
- GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
- Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge [Encode Road requirement to prediction]
- [Hivt: Hierarchical vector transformer for multi-agent motion prediction]
- Transformer Feed-Forward Layers Are Key-Value Memories EMNLP 2021
- BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline CVPR 2023 [BEVNet]
- Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception [BEVNet, megvii]
- VAD: Vectorized Scene Representation for Efficient Autonomous Driving [Horizon]
- A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving
- BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment [BEVDet, PhiGent]
- NVRadarNet: Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving
- GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping CVPR 2020 [Cewu Lu]
- AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains [Cewu Lu]
- Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
- HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding
- MTR: Motion Transformer with Global Intention Localization and Local Movement Refinement NeurIPS 2022
- UVTR: Unifying Voxel-based Representation with Transformer for 3D Object Detection [BEVFusion, Megvii, BEVNet, camera + lidar]
- Don't Use Large Mini-Batches, Use Local SGD ICLR 2020
- Grokking: Generalization beyond Overfitting on small algorithmic datasets
- Progress measures for grokking via mechanistic interpretability
- Understanding deep learning requires rethinking generalization ICLR 2017
- Unifying Grokking and Double Descent
- Deep Interactive Motion Prediction and Planning: Playing Games with Motion Prediction Models L4DC 2022
- Interactive Prediction and Planning for Autonomous Driving: from Algorithms to Fundamental Aspects [PhD thesis of Wei Zhan, 2019]
- Lyft1001: One Thousand and One Hours: Self-driving Motion Prediction Dataset [Lyft Level 5, prediction dataset]
- PCAccumulation: Dynamic 3D Scene Analysis by Point Cloud Accumulation ECCV 2022
- Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos NeurIPS 2022
- UniSim: A Neural Closed-Loop Sensor Simulator CVPR 2023 [simulation, Raquel]
- GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving CVPR 2023
- Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills [Driving Skill]
- Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors RSS 2023 [Driving Skill]
- IL Difficulty Model: Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula CoRL 2022 [Waymo]
- Neural Map Prior for Autonomous Driving CVPR 2023
- Track Anything: Segment Anything Meets Videos
- Self-Supervised Camera Self-Calibration from Video ICRA 2022 [TRI, calibration]
- Real-time Online Video Detection with Temporal Smoothing Transformers ECCV 2022 [ConvLSTM-style cross-attention]
- NeRF-Supervised Deep Stereo CVPR 2023
- GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images NeurIOS 2022
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation CVPR 2023
- Ego-Body Pose Estimation via Ego-Head Pose Estimation CVPR 2023
- FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
- PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- Visual Instruction Tuning
- VideoChat: Chat-Centric Video Understanding
- ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models [Notes]
- CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers CoRL 2022
- BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision [BEVNet, Jifeng Dai]
- Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception NeurIPS 2022
- Traj++: Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TITS 2021
- Data Driven Prediction Architecture for Autonomous Driving and its Application on Apollo Platform IV 2020 [Baidu]
- THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling ICLR 2022
- Learning Lane Graph Representations for Motion Forecasting ECCV 2020 oral
- Identifying Driver Interactions via Conditional Behavior Prediction ICRA 2021 [Waymo]
- Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data ECCV 2020
- TPNet: Trajectory Proposal Network for Motion Prediction CVPR 2020
- GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation
- PECNet: It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction ECCV 2020 oral
- From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting ICCV 2019
- PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings ICCV 2019
- PiP: Planning-informed Trajectory Prediction for Autonomous Driving ECCV 2020
- MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction CoRL 2019
- LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents CVPR 2021
- PRIME: Learning to Predict Vehicle Trajectories with Model-based Planning CoRL 2021
- A Flexible and Explainable Vehicle Motion Prediction and Inference Framework Combining Semi-Supervised AOG and ST-LSTM TITS 2020
- Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver based LSTMs IV 2018 [Trivedi]
- HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling ICRA 2022
- Trajectory Prediction with Linguistic Representations ICRA 2022
- What-If Motion Prediction for Autonomous Driving
- End-to-end Contextual Perception and Prediction with Interaction Transformer IROS 2020 [Auxiliary collision loss, scene compliant pred]
- SafeCritic: Collision-Aware Trajectory Prediction BMVC 2019 [IRL, scene compliant pred]
- Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset ICCV 2021 [Waymo]
- Interaction-Based Trajectory Prediction Over a Hybrid Traffic Graph IROS 2020
- Joint Interaction and Trajectory Prediction for Autonomous Driving using Graph Neural Networks NeurIPS 2019 workshop
- Fast Risk Assessment for Autonomous Vehicles Using Learned Models of Agent Futures Robotics: science and systems 2020
- Monocular 3D Object Detection: An Extrinsic Parameter Free Approach CVPR 2021 [PJLab]
- UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View [BEVFormer, BEVNet, Temporal]
- GitNet: geometric prior-baesd transformation for birds yee view segmentation
- WBF: weighted box fusion: ensembling boxes from differnt object detection modules
- NNI: auto parameter finding algorithm
- BEVFormer++: Improving BEVFormer for 3D Camera-only Object Detection [Waymo open dataset challenge 1st place in mono3d]
- LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D Detection [Waymo open dataset challenge official metric]
- High-Level Interpretation of Urban Road Maps Fusing Deep Learning-Based Pixelwise Scene Segmentation and Digital Navigation Maps Journal of Advanced Transportation 2018
- A Hybrid Vision-Map Method for Urban Road Detection Journal of Advanced Transportation 2017
- Terminology and Analysis of Map Deviations in Urban Domains: Towards Dependability for HD Maps in Automated Vehicles IV 2020
- TIME WILL TELL: NEW OUTLOOKS AND A BASELINE FOR TEMPORAL MULTI-VIEW 3D OBJECT DETECTION
- Conditional DETR for Fast Training Convergence ICCV 2021
- DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR ICLR 2022
- DN-DETR: Accelerate DETR Training by Introducing Query DeNoising CVPR 2022
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
- Trajectory Forecasting from Detection with Uncertainty-Aware Motion Encoding [Ouyang Wanli]
- Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation [BEVNet, polar]
- MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries [BEVNet, tracking] CVPR 2022 workshop [Hang Zhao]
- ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning ECCV 2022 [Hongyang Li]
- GKT: Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer [BEVNet, Horizon]
- SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network CVPR 2018
- TPLR: Topology Preserving Local Road Network Estimation from Single Onboard Camera Image CVPR 2022 [STSU, Luc Van Gool]
- LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation [Valeo, BEVNet, polar]
- PolarDETR: Polar Parametrization for Vision-based Surround-View 3D Detection [BEVNet]
- Exploring Geometric Consistency for Monocular 3D Object Detection CVPR 2022
- ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection WACV 2022 [mono3D]
- Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints AAAI 2022
- Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers ICICN 2021 [BEVNet, lane line]
- Unsupervised Labeled Lane Markers Using Maps ICCV 2019 workshop [Bosch, 2D lane line]
- M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [Lidar detection, Waymo open dataset] WACV 2022
- K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways [lane line dataset]
- Robust Monocular 3D Lane Detection With Dual Attention ICIP 2021
- OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction CVPR 2022
- MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer ICLR 2022 [lightweight Transformers]
- XFormer: Lightweight Vision Transformer with Cross Feature Attention [Samsung]
- CenterFormer: Center-based Transformer for 3D Object Detection ECCV 2022 oral [TuSimple]
- LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception [2022 Waymo Open Dataset, TuSimple]
- MTRA: 1st Place Solution for 2022 Waymo Open Dataset Challenge - Motion Prediction [Waymo open dataset challenge 1st place in motion prediction]
- BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs [BEVNet]
- Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers CVPR 2022 [nVidia]
- Efficiently Identifying Task Groupings for Multi-Task Learning NeurIPS 2021 spotlight [MTL]
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [Google, Golden Backbone]
- "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping CVPR 2022
- GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [BEVNet, Baidu]
- FUTR3D: A Unified Sensor Fusion Framework for 3D Detection [Hang Zhao]
- GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [BEVNet]
- MonoFormer: Towards Generalization of self-supervised monocular depth estimation with Transformers [monodepth]
- Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving
- cosFormer: Rethinking Softmax in Attention ICLR 2022
- StretchBEV: Stretching Future Instance Prediction Spatially and Temporally [BEVNet, prediction]
- Scene Representation in Bird’s-Eye View from Surrounding Cameras with Transformers [BEVNet, LLD] CVPR 2022 workshop
- Multi-Frame Self-Supervised Depth with Transformers CVPR 2022
- It's About Time: Analog Clock Reading in the Wild CVPR 2022 [Andrew Zisserman]
- SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation CoRL 2022 [Jiwen Lu]
- ONCE-3DLanes: Building Monocular 3D Lane Detection CVPR 2022
- K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways CVPR 2022 workshop [3D LLD]
- Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving CVPR 2022 workshop
- A Simple Baseline for BEV Perception Without LiDAR [TRI, BEVNet, vision+radar]
- Reconstruct from Top View: A 3D Lane Detection Approach based on Geometry Structure Prior CVPR 2022 workshop
- RIDDLE: Lidar Data Compression with Range Image Deep Delta Encoding CVPR 2022 [Waymo, Charles Qi]
- Occupancy Flow Fields for Motion Forecasting in Autonomous Driving RAL 2022 [Waymo occupancy flow challenge]
- Safe Local Motion Planning with Self-Supervised Freespace Forecasting CVPR 2021
- 数据闭环的核心 - Auto-labeling 方案分享
- K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways
- LETR: Line Segment Detection Using Transformers without Edges CVPR 2021 oral
- HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps CVPR 2021 [HD mapping]
- SketchRNN: A Neural Representation of Sketch Drawings [David Ha]
- PolyGen: An Autoregressive Generative Model of 3D Meshes ICML 2020
- SOLQ: Segmenting Objects by Learning Queries NeurlPS 2021 [Megvii, end-to-end, instance segmentation]
- MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer 3DV 2022
- MVSTER: Epipolar Transformer for Efficient Multi-View Stereo ECCV 2022
- MOVEDepth: Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning [MVS + monodepth]
- SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
- Scene Transformer: A unified architecture for predicting multiple agent trajectories [prediction, Waymo] ICLR 2022
- SSIA: Monocular Depth Estimation with Self-supervised Instance Adaptation [VGG team, TTR, test time refinement, CVD]
- CoMoDA: Continuous Monocular Depth Adaptation Using Past Experiences WACV 2021
- MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera CVPR 2021 [Daniel Cremmers]
- Plenoxels: Radiance Fields without Neural Networks
- Lidar with Velocity: Motion Distortion Correction of Point Clouds from Oscillating Scanning Lidars [Livox, ISEE]
- NWD: A Normalized Gaussian Wasserstein Distance for Tiny Object Detection
- Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation NeurIPS 2021 [Sanja Fidler]
- Insta-DM: Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency AAAI 2021
- Instance-wise Depth and Motion Learning from Monocular Videos NeurIPS 2020 workshop [website]
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis ECCV 2020 oral
- BARF: Bundle-Adjusting Neural Radiance Fields ICCV 2021 oral
- NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo ICCV 2021 oral
- Transfuser: Multi-Modal Fusion Transformer for End-to-End Autonomous Driving CVPR 2021
- YOLinO: Generic Single Shot Polyline Detection in Real Time ICCV 2021 workshop [lld]
- MonoRCNN: Geometry-based Distance Decomposition for Monocular 3D Object Detection ICCV 2021
- MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation ICCV 2021 workshop
- PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection CVPR 2020 [Waymo challenge 2nd place]
- Geometry-based Distance Decomposition for Monocular 3D Object Detection ICCV 2021 [mono3D]
- Offboard 3D Object Detection from Point Cloud Sequences CVPR 2021 [Charles Qi]
- FreeAnchor: Learning to Match Anchors for Visual Object Detection NeurIPS 2019
- AutoAssign: Differentiable Label Assignment for Dense Object Detection
- Probabilistic Anchor Assignment with IoU Prediction for Object Detection ECCV 2020
- FOVEA: Foveated Image Magnification for Autonomous Navigation ICCV 2021 [Argo]
- PifPaf: Composite Fields for Human Pose Estimation CVPR 2019
- Monocular 3D Localization of Vehicles in Road Scenes ICCV 2021 workshop [mono3D, tracking]
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- Conditional DETR for Fast Training Convergence
- Anchor DETR: Query Design for Transformer-Based Detector [megvii]
- PGD: Probabilistic and Geometric Depth: Detecting Objects in Perspective CoRL 2021
- Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
- What Makes for End-to-End Object Detection? PMLR 2021
- Instances as Queries ICCV 2021 [instance segmentation]
- One Million Scenes for Autonomous Driving: ONCE Dataset [Huawei]
- NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis 3DV 2021
- Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?
- Topology Preserving Local Road Network Estimation from Single Onboard Camera Image [BEVNet, Luc Van Gool]