Skip to content

DWCTOD/ICCV2021-Papers-with-Code-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 

Repository files navigation

ICCV2021-Papers-with-Code-Demo

☪️论文下载:

ICCV2021 论文下载汇总:

链接: https://pan.baidu.com/s/1vmOQzLG1QaBCgQD1ijtYuw

提取码: bp9j (解压密码,联系微信 nvshenj125 获取)

CVPR 2021整理:https://github.com/DWCTOD/CVPR2021-Papers-with-Code-Demo

论文下载:https://pan.baidu.com/share/init?surl=gjfUQlPf73MCk4vM8VbzoA

密码:aicv

🌟 ICCV 2021持续更新最新论文/paper和相应的开源代码/code!

🚗 ICCV 2021 收录列表

🚂ICCV 2021 报告和demo视频汇总 https://space.bilibili.com/288489574

🚗 官网链接:http://iccv2021.thecvf.com/home

⏲️ 时间 ⌚ 论文/paper接收公布时间:2021年7月23日

✋ ​注:欢迎各位大佬提交issue,分享ICCV 2021论文/paper和开源项目!共同完善这个项目

✈️ 为了方便下载,已将论文/paper存储在文件夹中 ✔️ 表示论文/paper已下载 / Paper Download

🎆 欢迎进群 | Welcome

ICCV 2021 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:ICCV+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。

image

🔨 目录 |Table of Contents(点击直接跳转)

Backbone

✔️Conformer: Local Features Coupling Global Representations for Visual Recognition

Contextual Convolutional Neural Networks

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Reg-IBP: Efficient and Scalable Neural Network Robustness Training via Interval Bound Propagation

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

返回目录/back

Dataset

Beyond Road Extraction: A Dataset for Map Update using Aerial Images

✔️FineAction: A Fined Video Dataset for Temporal Action Localization

KoDF: A Large-scale Korean DeepFake Detection Dataset

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes

Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark

✔️MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

Semantically Coherent Out-of-Distribution Detection

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

STRIVE: Scene Text Replacement In Videos

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

Who's Waldo? Linking People Across Text and Images (Oral)

返回目录/back

Loss

Asymmetric Loss For Multi-Label Classification

Bias Loss for Mobile Neural Networks

Focal Frequency Loss for Image Reconstruction and Synthesis

Orthogonal Projection Loss

Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)

返回目录/back

NAS

BN-NAS: Neural Architecture Search with Batch Normalization

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

CONet: Channel Optimization for Convolutional Neural Networks

FOX-NAS: Fast, On-device and Explainable Neural Architecture Search

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving

Single-DARTS: Towards Stable Architecture Search

返回目录/back

Image Classification

Influence-Balanced Loss for Imbalanced Visual Classification

Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

返回目录/back

Vision Transformer

An End-to-End Transformer Model for 3D Object Detection

AutoFormer: Searching Transformers for Visual Recognition

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

Conditional DETR for Fast Training Convergence

Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

Fast Convergence of DETR with Spatially Modulated Co-Attention

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (Oral)

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer

HiFT: Hierarchical Feature Transformer for Aerial Tracking

High-Fidelity Pluralistic Image Completion with Transformers

Improving 3D Object Detection with Channel-wise Transformer

Is it Time to Replace CNNs with Transformers for Medical Images?

Learning Spatio-Temporal Transformer for Visual Tracking

MUSIQ: Multi-scale Image Quality Transformer

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction (Oral)

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers (Oral)

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Rethinking and Improving Relative Position Encoding for Vision Transformer

Rethinking Spatial Dimensions of Vision Transformers

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

SOTR: Segmenting Objects with Transformers

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers

The Animation Transformer: Visual Correspondence via Segment Matching

The Right to Talk: An Audio-Visual Transformer Approach

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

TransPose: Keypoint Localization via Transformer

✔️Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

✔️Visual Transformer with Statistical Test for COVID-19 Classification

Vision Transformer with Progressive Sampling

Visual Saliency Transformer

Vision-Language Transformer and Query Generation for Referring Segmentation

Voxel Transformer for 3D Object Detection

返回目录/back

目标检测/Object Detection

Active Learning for Deep Object Detection via Probabilistic Modeling

Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters

Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery

Conditional Variational Capsule Network for Open Set Recognition

DetCo: Unsupervised Contrastive Learning for Object Detection

DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection

Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization

Detecting Invisible People

FMODetect: Robust Detection and Trajectory Estimation of Fast Moving Objects

GraphFPN: Graph Feature Pyramid Network for Object Detection

Human Detection and Segmentation via Multi-view Consensus

MDETR : Modulated Detection for End-to-End Multi-Modal Understanding

Mutual Supervision for Dense Object Detection

Morphable Detector for Object Detection on Demand

Moving Object Detection for Event-based vision using Graph Spectral Clustering

Oriented R-CNN for Object Detection

Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)

Reconcile Prediction Consistency for Balanced Object Detection

Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection

Towards Rotation Invariance in Object Detection

TOOD: Task-aligned One-stage Object Detection (Oral)

Vector-Decomposed Disentanglement for Domain-Invariant Object Detection

返回目录/back

Salient Object Detections

Disentangled High Quality Salient Object Detection

Light Field Saliency Detection with Dual Local Graph Learning andReciprocative Guidance

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Specificity-preserving RGB-D Saliency Detection

Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection

返回目录/back

3D目标检测 / 3D Object Detection

An End-to-End Transformer Model for 3D Object Detection

Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation

Improving 3D Object Detection with Channel-wise Transformer

Is Pseudo-Lidar needed for Monocular 3D Object detection?

ODAM: Object Detection, Association, and Mapping using Posed RGB Video (Oral)

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection

Voxel Transformer for 3D Object Detection

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

返回目录/back

目标跟踪 / Object Tracking

DepthTrack : Unveiling the Power of RGBD Tracking

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

Is First Person Vision Challenging for Object Tracking?

Learning to Track Objects from Unlabeled Videos

Learn to Match: Automatic Matching Network Design for Visual Tracking

Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths

Saliency-Associated Object Tracking

Video Annotation for Visual Tracking via Selection and Refinement

返回目录/back

Image Semantic Segmentation

Complementary Patch for Weakly Supervised Semantic Segmentation

Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Deep Metric Learning for Open World Semantic Segmentation

Dual Path Learning for Domain Adaptation of Semantic Segmentation

EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow

Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation (Oral)

Enhanced Boundary Learning for Glass-like Object Segmentation

From Contexts to Locality: Ultra-high Resolution Ie Segmentation via Locality-aware Contextual Correlation

ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation

Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation

Labels4Free: Unsupervised Segmentation using StyleGAN

LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation

Learning Meta-class Memory for Few-Shot Semantic Segmentation

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

Mining Contextual Information Beyond Image for Semantic Segmentation

Mining Latent Classes for Few-shot Segmentation(Oral)

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (Oral)

Personalized Image Semantic Segmentation

Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation

Pseudo-mask Matters inWeakly-supervised Semantic Segmentation

RECALL: Replay-based Continual Learning in Semantic Segmentation

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)

Semantic Segmentation on VSPW Dataset through Aggregation of Transformer Models

Self-Regulation for Semantic Segmentation

Semantic Concentration for Domain Adaptation

ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

SOTR: Segmenting Objects with Transformers

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

The Marine Debris Dataset for Forward-Looking Sonar Semantic Segmentation

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping

返回目录/back

Semantic Scene Segmentation

BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation

返回目录/back

3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

返回目录/back

3D Instance Segmentation

Hierarchical Aggregation for 3D Instance Segmentation

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

返回目录/back

实例分割/Instance Segmentation

CDNet: Centripetal Direction Network for Nuclear Instance Segmentation

✔️Crossover Learning for Fast Online Video Instance Segmentation

✔️Instances as Queries

Instance Segmentation Challenge Track Technical Report, VIPriors Workshop at ICCV 2021: Task-Specific Copy-Paste Data Augmentation Method for Instance Segmentation

Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)

Scaling up instance annotation via label propagation

返回目录/back

视频分割 / video semantic segmentation

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Full-Duplex Strategy for Video Object Segmentation

Hierarchical Memory Matching Network for Video Object Segmentation

Joint Inductive and Transductive Learning for Video Object Segmentation

返回目录/back

Medical Image Segmentation

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

Uncertainty-aware GAN with Adaptive Loss for Robust MRI Image Enhancement

返回目录/back

Medical Image Analysis

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images

Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts

Studying the Effects of Self-Attention for Medical Image Analysis

返回目录/back

GAN

3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations (Oral)

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer

Click to Move: Controlling Video Generation with Sparse Motion

Collaging Class-specific GANs for Semantic Image Synthesis

Disentangled Lifespan Face Synthesis

Dual Projection Generative Adversarial Networks for Conditional Image Generation

EigenGAN: Layer-Wise Eigen-Learning for GANs

GAN Inversion for Out-of-Range Images with Geometric Transformations

Generative Models for Multi-Illumination Color Constancy

Gradient Normalization for Generative Adversarial Networks

Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs

Image Synthesis via Semantic Composition

InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images

Learning to Diversify for Single Domain Generalization

Manifold Matching via Deep Metric Learning for Generative Modeling

Meta Gradient Adversarial Attack

Online Multi-Granularity Distillation for GAN Compression

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation

PixelSynth: Generating a 3D-Consistent Experience from a Single Image

Robustness and Generalization via Generative Adversarial Training

SemIE: Semantically-Aware Image Extrapolation

SketchLattice: Latticed Representation for Sketch Manipulation

Sketch Your Own GAN

Target Adaptive Context Aggregation for Video Scene Graph Generation

Toward a Visual Concept Vocabulary for GAN Latent Space

Toward Spatially Unbiased Generative Models

Towards Vivid and Diverse Image Colorization with Generative Color Prior

Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation

Unaligned Image-to-Image Translation by Learning to Reweight

Unconditional Scene Graph Generation

Unsupervised Geodesic-preserved Generative Adversarial Networks for Unconstrained 3D Pose Transfer

返回目录/back

Style Transfer

Domain-Aware Universal Style Transfer

返回目录/back

细粒度分类/Fine-Grained Visual Categorization

Benchmark Platform for Ultra-Fine-Grained Visual Categorization BeyondHuman Performance

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

返回目录/back

Multi-Label Recognition

Residual Attention: A Simple but Effective Method for Multi-Label Recognition

返回目录/back

Long-Tailed Recognition

ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot Oral

返回目录/back

Geometric deep learning

Manifold Matching via Deep Metric Learning for Generative Modeling

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation

返回目录/back

Zero/Few Shot

Binocular Mutual Learning for Improving Few-shot Classification

Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder

Discriminative Region-based Multi-Label Zero-Shot Learning

Domain Generalization via Gradient Surgery

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation

Few-Shot Batch Incremental Road Object Detection via Detector Fusion

Field-Guide-Inspired Zero-Shot Learning

Few-shot Visual Relationship Co-localization

Generalized Source-free Domain Adaptation

Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting

Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning

Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning

On the Importance of Distractors for Few-Shot Classification

Relational Embedding for Few-Shot Classification

SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation

Transductive Few-Shot Classification on the Oblique Manifold

Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware

返回目录/back

Unsupervised

Adversarial Robustness for Unsupervised Domain Adaptation

Collaborative Unsupervised Visual Representation Learning from Decentralized Data

Instance Similarity Learning for Unsupervised Feature Representation

Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning

Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence

Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density

返回目录/back

Self-supervised

Digging into Uncertainty in Self-supervised Multi-view Stereo

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring

Improving Self-supervised Learning with Hardness-aware Dynamic Curriculum Learning: An Application to Digital Pathology

Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark

Reducing Label Effort: Self-Supervised meets Active Learning

Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

Self-Supervised Video Representation Learning with Meta-Contrastive Network

SSH: A Self-Supervised Framework for Image Harmonization

返回目录/back

Semi Supervised

Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

返回目录/back

Weakly Supervised

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Online Refinement of Low-level Feature Based Activation Map for Weakly Supervised Object Localization

返回目录/back

Active Learning

Influence Selection for Active Learning

返回目录/back

Action Detection

Class Semantics-based Attention for Action Detection

返回目录/back

Action Recognition

"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021

A Baseline Framework for Part-level Action Parsing and Action Recognition

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

Elaborative Rehearsal for Zero-shot Action Recognition

✔️FineAction: A Fined Video Dataset for Temporal Action Localization

✔️MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation (Oral)

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

返回目录/back

时序行为检测 / Temporal Action Localization

Enriching Local and Global Contexts for Temporal Action Localization

Boundary-sensitive Pre-training for Temporal Localization in Videos

返回目录/back

手语识别/Sign Language Recognition

SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

Visual Alignment Constraint for Continuous Sign Language Recognition

返回目录/back

Hand Pose Estimation

HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

返回目录/back

Pose Estimation

2D Pose Estimation

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

Human Pose Regression with Residual Log-likelihood Estimation Oral

Online Knowledge Distillation for Efficient Pose Estimation

The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation

TransPose: Keypoint Localization via Transformer

3D Pose Estimation

EventHPE: Event-based 3D Human Pose and Shape Estimation

DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders(Oral)

FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration

Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild

Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Shape-aware Multi-Person Pose Estimation from Multi-View Images

Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition

返回目录/back

6D Object Pose Estimation

RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

返回目录/back

Human Reconstruction

ARCH++: Animation-Ready Clothed Human Reconstruction Revisited

imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose

Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Learning Motion Priors for 4D Human Body Capture in 3D Scenes (Oral)

Physics-based Human Motion Estimation and Synthesis from Videos

Probabilistic Modeling for Human Mesh Recovery

返回目录/back

3D Scene Understanding

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (Oral)

Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation [oral]

返回目录/back

Face Recognition

Masked Face Recognition Challenge: The InsightFace Track Report

Masked Face Recognition Challenge: The WebFace260M Track Report

PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition

Rethinking Common Assumptions to Mitigate Racial Bias in Face Recognition Datasets

SynFace: Face Recognition with Synthetic Data

Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models

返回目录/back

Face Alignment

ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment

返回目录/back

Facial Editing

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

返回目录/back

Face Reconstruction

Self-Supervised 3D Face Reconstruction via Conditional Estimation

Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing

返回目录/back

Facial Expression Recognition

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

Understanding and Mitigating Annotation Bias in Facial Expression Recognition

返回目录/back

行人重识别/Re-Identification

A Technical Report for ICCV 2021 VIPriors Re-identification Challenge

ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer

Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID Oral

Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences

Learning Instance-level Spatial-Temporal Patterns for Person Re-identification

Learning Compatible Embeddings

Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency

Towards Discriminative Representation Learning for Unsupervised Person Re-identification

TransReID: Transformer-based Object Re-Identification

Video-based Person Re-identification with Spatial and Temporal Memory Networks

Weakly Supervised Person Search with Region Siamese Networks

返回目录/back

Vehicle Re-identification

Heterogeneous Relational Complement for Vehicle Re-identification

返回目录/back

Pedestrian Detection

MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

Spatial and Semantic Consistency Regularizations for Pedestrian Attribute Recognition

返回目录/back

人群计数 /Crowd Counting

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework (Oral)

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

返回目录/back

Motion Forecasting

Generating Smooth Pose Sequences for Diverse Human Motion Prediction

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

Skeleton-Graph: Long-Term 3D Motion Prediction From 2D Observations Using Deep Spatio-Temporal Graph CNNs

返回目录/back

Pedestrian Trajectory Prediction

DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction

返回目录/back

Face-Anti-spoofing

CL-Face-Anti-spoofing

3D High-Fidelity Mask Face Presentation Attack Detection Challenge

Exploring Temporal Coherence for More General Video Face Forgery Detection

返回目录/back

deepfake

OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild

Fake It Till You Make It: Face analysis in the wild using synthetic data alone

返回目录/back

对抗攻击/ Adversarial Attacks

A Hierarchical Assessment of Adversarial Severity

AdvDrop: Adversarial Attack to DNNs by Dropping Information

AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning

Optical Adversarial Attack

Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings

TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning

返回目录/back

跨模态检索/Cross-Modal Retrieval

Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

  • 论文/paper:None
  • 代码/code:None

返回目录/back

深度估计 / Depth Estimation

AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network

Augmenting Depth Estimation with Geospatial Context

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation (oral)

Motion Basis Learning for Unsupervised Deep Homography Estimationwith Subspace Projection

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting (Oral)

StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

返回目录/back

视频插帧/Video Frame Interpolation

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation

✔️XVFI: eXtreme Video Frame Interpolation(Oral)

返回目录/back

Video Reasoning

The Multi-Modal Video Reasoning and Analyzing Competition

返回目录/back

NeRF

CodeNeRF: Disentangled Neural Radiance Fields for Object Categories

GNeRF: GAN-based Neural Radiance Field without Posed Camera

In-Place Scene Labelling and Understanding with Implicit Scene Representation (Oral)

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo (Oral)

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Self-Calibrating Neural Radiance Fields

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (Oral)

返回目录/back

Shadow Removal

CANet: A Context-Aware Network for Shadow Removal

返回目录/back

Image Retrieval

DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval

返回目录/back

超分辨/Super-Resolution

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution

Dual-Camera Super-Resolution with Aligned Attention Modules

Generalized Real-World Super-Resolution through Adversarial Robustness

Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

返回目录/back

Image Reconstruction

Equivariant Imaging: Learning Beyond the Range Space (Oral)

Spatially-Adaptive Image Restoration using Distortion-Guided Networks

返回目录/back

Image Deblurring

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image (Oral)

SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring

Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions

返回目录/back

Image Denoising

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising (Oral)

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

**ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models **Oral

Rethinking Deep Image Prior for Denoising

Rethinking Noise Synthesis and Modeling in Raw Denoising

返回目录/back

Image Desnowing

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss

返回目录/back

Image Enhancement

Gap-closing Matters: Perceptual Quality Assessment and Optimization of Low-Light Image Enhancement

Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables

返回目录/back

Image Matching

Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods

Viewpoint Invariant Dense Matching for Visual Geolocalization

返回目录/back

Image Quality

MUSIQ: Multi-scale Image Quality Transformer

返回目录/back

Image Compression

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform

返回目录/back

Image Restoration

Dynamic Attentive Graph Learning for Image Restoration

Towards Flexible Blind JPEG Artifacts Removal

返回目录/back

Image Inpainting

Image Inpainting via Conditional Texture and Structure Dual Generation

返回目录/back

Video Inpainting

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Internal Video Inpainting by Implicit Long-range Propagation

Occlusion-Aware Video Object Inpainting

返回目录/back

Video Recognition

Searching for Two-Stream Models in Multivariate Space for Video Recognition

返回目录/back

Visual Question Answering

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

返回目录/back

Matching

Multi-scale Matching Networks for Semantic Correspondence

返回目录/back

人机交互/Hand-object Interaction

✔️CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Exploiting Scene Graphs for Human-Object Interaction Detection

Spatially Conditioned Graphs for Detecting Human–Object Interactions

Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction

返回目录/back

视线估计/Gaze Estimation

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

返回目录/back

Contrastive-Learning

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Improving Contrastive Learning by Visualizing Feature Transformation

Social NCE: Contrastive Learning of Socially-aware Motion Representations

Parametric Contrastive Learning

返回目录/back

Graph Convolution Networks

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

返回目录/back

模型压缩/Compress

GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

返回目录/back

Quantization

Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Distance-aware Quantization

Dynamic Network Quantization for Efficient Video Inference

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

返回目录/back

Knowledge Distillation

Deep Structured Instance Graph for Distilling Object Detectors

Distilling Holistic Knowledge with Graph Neural Networks

Lipschitz Continuity Guided Knowledge Distillation

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation

Self Supervision to Distillation for Long-Tailed Visual Recognition

返回目录/back

点云/Point Cloud

A Robust Loss for Point Cloud Registration

A Technical Survey and Evaluation of Traditional Point Cloud Clustering Methods for LiDAR Panoptic Segmentation

(Just) A Spoonful of Refinements Helps the Registration Error Go Down Oral

ABD-Net: Attention Based Decomposition Network for 3D Point Cloud Decomposition

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds

CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds

Deep Models with Fusion Strategies for MVP Point Cloud Registration

DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation

Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation

Learning Inner-Group Relations on Point Clouds

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

ME-PCN: Point Completion Conditioned on Mask Emptiness

MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration

Out-of-Core Surface Reconstruction via Global TGV Minimization

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

PICCOLO: Point Cloud-Centric Omnidirectional Localization

Point Cloud Augmentation with Weighted Local Transformations

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers (Oral)

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification

Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility

Voxel-based Network for Shape Completion by Leveraging Edge Generation

Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis

返回目录/back

3D reconstruction

3D Shapes Local Geometry Codes Learning with SDF

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces

DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement(Oral)

VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction

返回目录/back

字体生成/Font Generation

✔️Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts

返回目录/back

文本检测 / Text Detection

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

返回目录/back

文本识别 / Text Recognition

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

返回目录/back

Scene Text Recognizer

Data Augmentation for Scene Text Recognition

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

返回目录/back

Autonomous-Driving

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

FOVEA: Foveated Image Magnification for Autonomous Navigation

Learning to drive from a world on rails

MAAD: A Model and Dataset for "Attended Awareness" in Driving

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving

Road-Challenge-Event-Detection-for-Situation-Awareness-in-Autonomous-Driving

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

返回目录/back

Visdrone_detection

ICCV2021_Visdrone_detection

返回目录/back

Anomaly Detection

DRÆM -- A discriminatively trained reconstruction embedding for surface anomaly detection

Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

其他/Others

Cross-Camera Convolutional Color Constancy

Learnable Boundary Guided Adversarial Training

Prior-Enhanced network with Meta-Prototypes (PEMP)

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Generalized-Shuffled-Linear-Regression (Oral)

VLGrammar: Grounded Grammar Induction of Vision and Language

A New Journey from SDRTV to HDRTV

IICNet: A Generic Framework for Reversible Image Conversion

Structure-Preserving Deraining with Residue Channel Prior Guidance

Learning with Noisy Labels via Sparse Regularization

Neural Strokes: Stylized Line Drawing of 3D Shapes

COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description

Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction

CanvasVAE: Learning to Generate Vector Graphic Documents

Refining activation downsampling with SoftPool

Aligning Latent and Image Spaces to Connect the Unconnectable

Unifying Nonlocal Blocks for Neural Networks

SLAMP: Stochastic Latent Appearance and Motion Prediction

TransForensics: Image Forgery Localization with Dense Self-Attention

Learning Facial Representations from the Cycle-consistency of Face

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

Impact of Aliasing on Generalization in Deep Convolutional Networks

Learning Canonical 3D Object Representation for Fine-Grained Recognition

UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks

SUNet: Symmetric Undistortion Network for Rolling Shutter Correction

Learning to Cut by Watching Movies

Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

Towards Interpretable Deep Metric Learning with Structural Matching

m-RevNet: Deep Reversible Neural Networks with Momentum

DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities

perf4sight: A toolflow to model CNN training performance on Edge GPUs

MT-ORL: Multi-Task Occlusion Relationship Learning

ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study

SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments

CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue

Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark

Pixel Difference Networks for Efficient Edge Detection

Online Continual Learning For Visual Food Classification

DICOM Imaging Router: An Open Deep Learning Framework for Classification of Body Parts from DICOM X-ray Scans

PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation

Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Finding Representative Interpretations on Convolutional Neural Networks

Investigating transformers in the decomposition of polygonal shapes as point collections

Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images

Group-aware Contrastive Regression for Action Quality Assessment

End-to-End Dense Video Captioning with Parallel Decoding

PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion

Scene Designer: a Unified Model for Scene Search and Synthesis from Sketch

Structured Outdoor Architecture Reconstruction by Exploration and Classification

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

Deep Hybrid Self-Prior for Full 3D Mesh Generation

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

Thermal Image Processing via Physics-Inspired Deep Networks

A New Journey from SDRTV to HDRTV

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs

Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates

LOKI: Long Term and Key Intentions for Trajectory Prediction

Stochastic Scene-Aware Motion Prediction

Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes

Social Fabric: Tubelet Compositions for Video Relation Detection

Causal Attention for Unbiased Visual Recognition

Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains

Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Learning to Match Features with Seeded Graph Matching Network

A Unified Objective for Novel Class Discovery

How to cheat with metrics in single-image HDR reconstruction

Towards Understanding the Generative Capability of Adversarially Robust Classifiers (Oral)

Airbert: In-domain Pretraining for Vision-and-Language Navigation

Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

Continual Learning for Image-Based Camera Localization

Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data

Detecting and Segmenting Adversarial Graphics Patterns from Images

TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies

Learning Signed Distance Field for Multi-view Surface Reconstruction (Oral)

Deep Relational Metric Learning

Ranking Models in Unlabeled New Environments

Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image

LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies

BiaSwap: Removing dataset bias with bias-tailored swapping augmentation

LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning

Learning of Visual Relations: The Devil is in the Tails

Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision

Support-Set Based Cross-Supervision for Video Grounding

Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition

Improving Generalization of Batch Whitening by Convolutional Unit Optimization

CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

NGC: A Unified Framework for Learning with Open-World Noisy Data

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Learning Cross-modal Contrastive Features for Video Domain Adaptation

Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process

A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry

LUAI Challenge 2021 on Learning to Understand Aerial Images

Embedding Novel Views in a Single JPEG Image

Learning to Discover Reflection Symmetry via Polar Matching Convolution

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes

Cross-category Video Highlight Detection via Set-based Learning

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

Sparse to Dense Motion Transfer for Face Image Animation

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

4D-Net for Learned Multi-Modal Alignment

The Power of Points for Modeling Humans in Clothing

The Functional Correspondence Problem

On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Towards Learning Spatially Discriminative Feature Representations

Learning Fast Sample Re-weighting Without Reward Data

CTRL-C: Camera calibration TRansformer with Line-Classification

PR-Net: Preference Reasoning for Personalized Video Highlight Detection

Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation

Learning to Generate Scene Graph from Natural Language Supervision

Parsing Table Structures in the Wild

Hierarchical Object-to-Zone Graph for Object Navigation

Square Root Marginalization for Sliding-Window Bundle Adjustment

YouRefIt: Embodied Reference Understanding with Language and Gesture

Deep Hough Voting for Robust Global Registration

IICNet: A Generic Framework for Reversible Image Conversion

Estimating Leaf Water Content using Remotely Sensed Hyperspectral Data

What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID

Shape-Biased Domain Generalization via Shock Graph Embeddings

Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting(Oral)

Multiresolution Deep Implicit Functions for 3D Shape Representation

Image Shape Manipulation from a Single Augmented Training Sample (Oral)

ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors

Contact-Aware Retargeting of Skinned Motion

DisUnknown: Distilling Unknown Factors for Disentanglement Learning

FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition

A Pathology Deep Learning System Capable of Triage of Melanoma Specimens Utilizing Dermatopathologist Consensus as Ground Truth

PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering

The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation

FaceEraser: Removing Facial Parts for Augmented Reality

S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation

JEM++: Improved Techniques for Training JEM

Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Visual Scene Graphs for Audio Source Separation

Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks

Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning

Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness

Sensor-Guided Optical Flow

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice (Oral)

HighlightMe: Detecting Highlights from Human-Centric Videos

How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors

Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images

Waypoint Models for Instruction-guided Navigation in Continuous Environments

Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning (Oral)

De-rendering Stylized Texts

Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction

Keypoint Communities

Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction (Oral)

2nd Place Solution to Google Landmark Retrieval 2021

Neural Strokes: Stylized Line Drawing of 3D Shapes

Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency

Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘ Videos

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

BuildingNet: Learning to Label 3D Buildings (oral)

SOMA: Solving Optical Marker-Based MoCap Automatically

Topic Scene Graph Generation by Attention Distillation from Caption

Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

Understanding of Emotion Perception from Art

Nuisance-Label Supervision: Robustness Improvement by Free Labels

Simple Baseline for Single Human Motion Forecasting

PixelPyramids: Exact Inference Models from Lossless Image Pyramids

返回目录/back