Skip to content

DWCTOD/ECCV2022-Papers-with-Code-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 

Repository files navigation

ECCV2022-Papers-with-Code-Demo

收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!

欢迎关注公众号:AI算法与图像处理

☪️福利 注册即可领取 200 块计算资源 : https://www.bkunyun.com/wap/console?source=aistudy 使用说明

🌟 ECCV 2022 持续更新最新论文/paper和相应的开源代码/code!

🚗 ECCV 2022 收录列表ID:https://ailb-web.ing.unimore.it/releases/eccv2022/accepted_papers.txt

🚗 官网链接:https://eccv2022.ecva.net

B站demo:https://space.bilibili.com/288489574

✋ ​注:欢迎各位大佬提交issue,分享ECCV 2022论文/paper和开源项目!共同完善这个项目

往年顶会论文汇总:

CVPR2022

CVPR2021

ICCV2021

🎆 欢迎进群 | Welcome

ECCV 2022 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:ECCV+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。

🔨 目录 |Table of Contents(点击直接跳转)

目录(右侧点击可折叠)

数据集/Dataset

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

返回目录/back

Image Classification

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

Invariant Feature Learning for Generalized Long-Tailed Classification

RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos

PLMCL: Partial-Label Momentum Curriculum Learning for Multi-Label Image Classification

返回目录/back

GAN

Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

RepMix: Representation Mixing for Robust Attribution of Synthesized Images

VecGAN: Image-to-Image Translation with Interpretable Latent Directions

Context-Consistent Semantic Image Editing with Style-Preserved Modulation

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Supervised Attribute Information Removal and Reconstruction for Image Manipulation

Name: Adaptive Feature Interpolation for Low-Shot Image Generation

WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

Outpainting by Queries

Single Stage Virtual Try-on via Deformable Attention Flows

Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation

Monocular 3D Object Reconstruction with GAN Inversion

Generative Multiplane Images: Making a 2D GAN 3D-Aware

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

2D GANs Meet Unsupervised Single-view 3D Reconstruction

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images

Auto-regressive Image Synthesis with Integrated Quantization

Compositional Human-Scene Interaction Synthesis with Semantic Control

Generator Knows What Discriminator Should Learn in Unconditional GANs

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing

Cross Attention Based Style Distribution for Controllable Person Image Synthesis

SKDCGN: Source-free Knowledge Distillation of Counterfactual Generative Networks using cGANs

Hierarchical Semantic Regularization of Latent Spaces in StyleGANs

Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment

Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing

Mind the Gap in Distilling StyleGANs

ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization

FurryGAN: High Quality Foreground-aware Image Synthesis

Improving GANs for Long-Tailed Data through Group Spectral Regularization

Unrestricted Black-box Adversarial Attack Using GAN with Limited Queries

3D-FM GAN: Towards 3D-Controllable Face Manipulation

High-Fidelity Image Inpainting with GAN Inversion

Bokeh-Loss GAN: Multi-Stage Adversarial Training for Realistic Edge-Aware Bokeh

Exploring Gradient-based Multi-directional Controls in GANs

Studying Bias in GANs through the Lens of Race

Improved Masked Image Generation with Token-Critic

Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation

返回目录/back

NeRF

Streamable Neural Fields

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields

PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo

Neural-Sim: Learning to Generate Training Data with NeRF

Neural Density-Distance Fields

HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

返回目录/back

Visual Transformer

k-means Mask Transformer

Weakly Supervised Grounding for VQA in Vision-Language Transformers

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Hunting Group Clues with Transformers for Social Group Activity Recognition

Entry-Flipped Transformer for Inference and Prediction of Participant Behavior

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

Action Quality Assessment with Temporal Parsing Transformer

GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

AiATrack: Attention in Attention for Transformer Visual Tracking

Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Reference-based Image Super-Resolution with Deformable Attention Transformer

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

Online Continual Learning with Contrastive Vision Transformer

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

TransMatting: Enhancing Transparent Objects Matting with Transformers

Ghost-free High Dynamic Range Imaging with Context-aware Transformer

返回目录/back

多模态 / Multimodal

Audio-Visual Segmentation

Cross-modal Prototype Driven Network for Radiology Report Generation

Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

Video Graph Transformer for Video Question Answering

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

LocVTP: Video-Text Pre-training for Temporal Localization

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments

Cross-Modal 3D Shape Generation and Manipulation

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Frozen CLIP Models are Efficient Video Learners

Consistency-based Self-supervised Learning for Temporal Anomaly Localization

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

Learning an Efficient Multimodal Depth Completion Model

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

MUST-VQA: MUltilingual Scene-text VQA

返回目录/back

Vision-Language

Vision-Language Adaptive Mutual Decoder for OOV-STR

返回目录/back

Domain Adaptation

Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation

返回目录/back

对比学习/Contrastive Learning

Network Binarization via Contrastive Learning

Contrastive Deep Supervision

ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Action-based Contrastive Learning for Trajectory Prediction

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

Adversarial Contrastive Learning via Asymmetric InfoNCE

Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches

Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness

Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation

Patient-level Microsatellite Stability Assessment from Whole Slide Images By Combining Momentum Contrast Learning and Group Patch Embeddings

FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

返回目录/back

目标检测/Object Detection

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

Should All Proposals be Treated Equally in Object Detection?

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

Adversarially-Aware Robust Object Detector

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

Point-to-Box Network for Accurate Object Detection via Single Point Supervision

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection

Rethinking IoU-based Optimization for Single-stage 3D Object Detection

Densely Constrained Depth Estimator for Monocular 3D Object Detection

Robust Object Detection With Inaccurate Bounding Boxes

Unsupervised Domain Adaptation for One-stage Object Detector using Offsets to Bounding Box

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

Active Learning Strategies for Weakly-supervised Object Detection

W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection

Salient Object Detection for Point Clouds

UC-OWOD: Unknown-Classified Open World Object Detection

Monocular 3D Object Detection with Depth from Motion

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection

RFLA: Gaussian Receptive Field based Label Assignment for Tiny Object Detection

Object Detection in Aerial Images with Uncertainty-Aware Graph Network

Adversarial Vulnerability of Temporal Feature Networks for Object Detection

Identifying Out-of-Distribution Samples in Real-Time for Safety-Critical 2D Object Detection with Margin Entropy Loss

CenterFormer: Center-based Transformer for 3D Object Detection

返回目录/back

目标跟踪/Object Tracking

Tracking Objects as Pixel-wise Distributions

Towards Grand Unification of Object Tracking

The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

MOTCOM: The Multi-Object Tracking Dataset Complexity Metric

Robust Landmark-based Stent Tracking in X-ray Fluoroscopy

AiATrack: Attention in Attention for Transformer Visual Tracking

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Tracking Every Thing in the Wild

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

Robust Multi-Object Tracking by Marginal Inference

Towards Sequence-Level Training for Visual Tracking

返回目录/back

语义分割/Segmentation

Domain Adaptive Video Segmentation via Temporal Pseudo Supervision

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

PseudoClick: Interactive Image Segmentation with Click Imitation

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Tackling Background Distraction in Video Object Segmentation

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Box-supervised Instance Segmentation with Level Set Evolution

ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation

GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation

Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions

In Defense of Online Models for Video Instance Segmentation

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

Long-tailed Instance Segmentation using Gumbel Optimized Loss

Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

Self-Support Few-Shot Semantic Segmentation

Active Pointly-Supervised Instance Segmentation

Video Mask Transfiner for High-Quality Video Instance Segmentation

Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation

Per-Clip Video Object Segmentation

Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels

Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

Occlusion-Aware Instance Segmentation via BiLayer Network Architectures

返回目录/back

Video Segmentation

Video Mask Transfiner for High-Quality Video Instance Segmentation

返回目录/back

医学图像分割/Medical Image Segmentation

Personalizing Federated Medical Image Segmentation via Local Calibration

Learning Topological Interactions for Multi-Class Medical Image Segmentation

qDWI-Morph: Motion-compensated quantitative Diffusion-Weighted MRI analysis for fetal lung maturity assessment

Self-Supervised Pretraining for 2D Medical Image Segmentation

返回目录/back

Knowledge Distillation

Knowledge Condensation Distillation

FedX: Unsupervised Federated Learning with Cross Knowledge Distillation

返回目录/back

Action Detection

ReAct: Temporal Action Detection with Relational Queries

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Temporal Action Detection with Global Segmentation Mask Learning

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

HaloAE: An HaloNet based Local Transformer Auto-Encoder for Anomaly Detection and Localization

返回目录/back

Action Recognition

Compound Prototype Matching for Few-shot Action Recognition

Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition

Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

PSUMNet: Unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition

Lane Change Classification and Prediction with Action Recognition Networks

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

返回目录/back

Anomaly Detection

Registration based Few-Shot Anomaly Detection

Look at Adjacent Frames: Video Anomaly Detection without Offline Training

Towards Open Set Video Anomaly Detection

返回目录/back

人脸识别/Face Recognition

Controllable and Guided Face Synthesis for Unconstrained Face Recognition

Towards Robust Face Recognition with Comprehensive Search

返回目录/back

人体姿态估计/Human Pose Estimation

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance

Pose for Everything: Towards Category-Agnostic Pose Estimation

C3P: Cross-domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

Neural Correspondence Field for Object Pose Estimation

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian

Learning Visibility for Robust Dense Human Body Estimation

返回目录/back

人脸活体检测/Face Anti-Spoofing

Generative Domain Adaptation for Face Anti-Spoofing

Multi-domain Learning for Updating Face Anti-spoofing Models

返回目录/back

人脸属性识别/Facial Attribute Recognition

FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification

返回目录/back

人脸相关 / Face

On Mitigating Hard Clusters for Face Clustering

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Perspective Reconstruction of Human Faces by Joint Mesh and Landmark Regression

返回目录/back

3D reconstruction

Latent Partition Implicit with Surface Codes for 3D Representation

LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction

SimpleRecon: 3D Reconstruction Without 3D Convolutions

返回目录/back

Human Reconstruction

3D Clothed Human Reconstruction in the Wild

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

The One Where They Reconstructed 3D Humans and Environments in TV Shows

BCom-Net: Coarse-to-Fine 3D Textured Body Shape Completion Network

Neural Capture of Animatable 3D Human from Monocular Video

返回目录/back

Relighting

Geometry-aware Single-image Full-body Human Relighting

Relighting4D: Neural Relightable Human from Videos

返回目录/back

DeepFake

Detecting and Recovering Sequential DeepFake Manipulation

An Efficient Method for Face Quality Assessment on the Edge

返回目录/back

OCR

Character decomposition to resolve class imbalance problem in Hangul OCR

Shift Variance in Scene Text Detection

1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words

Levenshtein OCR

返回目录/back

Text Recognition

Scene Text Recognition with Permuted Autoregressive Sequence Models

Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting

Contextual Text Block Detection towards Scene Text Understanding

GLASS: Global to Local Attention for Scene-Text Spotting

Multi-Granularity Prediction for Scene Text Recognition

返回目录/back

点云/Point Cloud

Open-world Semantic Segmentation for LIDAR Point Clouds

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

CPO: Change Robust Panorama to Point Cloud Localization

diffConv: Analyzing Irregular Point Clouds with an Irregular View

CATRE: Iterative Point Clouds Alignment for Category-level Object Pose Refinement

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Dynamic 3D Scene Analysis by Point Cloud Accumulation

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Salient Object Detection for Point Clouds

MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

Learning to Generate Realistic LiDAR Point Clouds

返回目录/back

光流估计/Flow Estimation

Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation

What Matters for 3D Scene Flow Network

Deep 360$^\circ$ Optical Flow Estimation Based on Multi-Projection Fusion

返回目录/back

深度估计/Depth Estimation

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation

Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation

返回目录/back

车道线检测/Lane Detection

RCLane: Relay Chain Prediction for Lane Detection

返回目录/back

轨迹预测/Trajectory Prediction

Action-based Contrastive Learning for Trajectory Prediction

Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction

Aware of the History: Trajectory Forecasting with the Local Behavior Data

Human Trajectory Prediction via Neural Social Physics

D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights

返回目录/back

超分/Super-Resolution

Image Super-Resolution with Deep Dictionary

Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution

Towards Interpretable Video Super-Resolution via Alternating Optimization

Reference-based Image Super-Resolution with Deformable Attention Transformer

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

HST: Hierarchical Swin Transformer for Compressed Image Super-resolution

DSR: Towards Drone Image Super-Resolution

返回目录/back

图像去噪/Image Denoising

Optimizing Image Compression via Joint Learning with Denoising

返回目录/back

图像去模糊/Image Deblurring

Spatio-Temporal Deformable Attention Network for Video Deblurring

Efficient Video Deblurring Guided by Motion Magnitude

Learning Degradation Representations for Image Deblurring

Towards Real-World Video Deblurring by Exploring Blur Formation Process

返回目录/back

图像复原/Image Restoration

D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration

返回目录/back

图像修复/Image Inpainting

Flow-Guided Transformer for Video Inpainting

Unbiased Multi-Modality Guidance for Image Inpainting

返回目录/back

图像增强/Image Enhancement

Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression

返回目录/back

Video Interpolation

Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow

返回目录/back

Temporal Action Segmentation

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

返回目录/back

检索/Image Retrieval

Feature Representation Learning for Unsupervised Cross-domain Image Retrieval

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

返回目录/back

Lossy Image Compression with Conditional Diffusion Models

返回目录/back

其他/Other

Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets

GraphVid: It Only Takes a Few Nodes to Understand a Video

Target-absent Human Attention

Lottery Ticket Hypothesis for Spiking Neural Networks

Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture

DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images

Learning Local Implicit Fourier Representation for Image Warping

SESS: Saliency Enhancing with Scaling and Sliding

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition

FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

Towards Realistic Semi-Supervised Learning

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Factorizing Knowledge in Neural Networks

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

Video Dialog as Conversation about Objects Living in Space-Time

Demystifying Unsupervised Semantic Correspondence Estimation

A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision

DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization

Batch-efficient EigenDecomposition for Small and Medium Matrices

Few 'Zero Level Set'-Shot Learning of Shape Signed Distance Functions in Feature Space

Camera Pose Auto-Encoders for Improving Pose Regression

Synergistic Self-supervised and Quantization Learning

Frequency Domain Model Augmentation for Adversarial Attack

Organic Priors in Non-Rigid Structure from Motion

Unsupervised Visual Representation Learning by Synchronous Momentum Grouping

Learning Implicit Templates for Point-Based Clothed Human Modeling

BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks

Lipschitz Continuity Retained Binary Neural Network

3D Instances as 1D Kernels

ScaleNet: Searching for the Model to Scale

Rethinking Data Augmentation for Robust Visual Question Answering

Semantic Novelty Detection via Relational Reasoning

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes

Class-incremental Novel Class Discovery

MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects

SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement

Learning with Recoverable Forgetting

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Neural Color Operators for Sequential Image Retouching

Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

You Should Look at All Objects

NeFSAC: Neurally Filtered Minimal Samples

CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS

Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations

Self-calibrating Photometric Stereo by Neural Inverse Rendering

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

Towards Understanding The Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos

Deep Semantic Statistics Matching (D2SM) Denoising Network

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

NDF: Neural Deformable Fields for Dynamic Human Modelling

Self-Supervision Can Be a Good Few-Shot Learner

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data

Prior-Guided Adversarial Initialization for Fast Adversarial Training

Prior Knowledge Guided Unsupervised Domain Adaptation

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks

Difficulty-Aware Simulator for Open Set Recognition

Tailoring Self-Supervision for Supervised Learning

Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain

Temporal and cross-modal attention for audio-visual zero-shot learning

Telepresence Video Quality Assessment

Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing

Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification

Discrete-Constrained Regression for Local Counting Models

Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction

Efficient Meta-Tuning for Content-aware Neural Video Delivery

Object-Compositional Neural Implicit Surfaces

Explaining Deepfake Detection by Analysing Image Matching

ERA: Expert Retrieval and Assembly for Early Action Prediction

Perspective Phase Angle Model for Polarimetric 3D Reconstruction

Explicit Image Caption Editing

Unsupervised Deep Multi-Shape Matching

Contributions of Shape, Texture, and Color in Visual Recognition

Novel Class Discovery without Forgetting

Approximate Differentiable Rendering with Algebraic Surfaces

FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling

Error Compensation Framework for Flow-Guided Video Inpainting

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Temporal Saliency Query Network for Efficient Video Recognition

UFO: Unified Feature Optimization

OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Towards Accurate Open-Set Recognition via Background-Class Regularization

Grounding Visual Representations with Texts for Domain Generalization

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

On Label Granularity and Object Localization

Spotting Temporally Precise, Fine-Grained Events in Video

Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles

GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning

Visual Knowledge Tracing

Tackling Long-Tailed Category Distribution Under Domain Shifts

Latent Discriminant deterministic Uncertainty

Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance

Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach

Structural Causal 3D Reconstruction

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

Continual Variational Autoencoder Learning via Online Cooperative Memorization

Panoptic Scene Graph Generation

Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay

POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion

Few-shot Object Counting and Detection

Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection

My View is the Best View: Procedure Learning from Egocentric Videos

Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation

MeshLoc: Mesh-Based Visual Localization

MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation

Deforming Radiance Fields with Cages

Equivariance and Invariance Inductive Bias for Learning from Insufficient Data

Black-box Few-shot Knowledge Distillation

Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing

Domain Adaptive Person Search

VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

Label-Guided Auxiliary Training Improves 3D Object Detector

Combining Internal and External Constraints for Unrolling Shutter in Videos

TIPS: Text-Induced Pose Synthesis

Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes

Learning Graph Neural Networks for Image Style Transfer

Contrastive Monotonic Pixel-Level Modulation

CompNVS: Novel View Synthesis with Scene Completion

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

3D Shape Sequence of Human Comparison and Classification using Current and Varifolds

NewsStories: Illustrating articles with visual summaries

Efficient One Pass Self-distillation with Zipf's Label Smoothing

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

Static and Dynamic Concepts for Self-supervised Video Representation Learning

Learning Hierarchy Aware Features for Reducing Mistake Severity

Translating a Visual LEGO Manual to a Machine-Executable Plan

Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

Trainability Preserving Neural Structured Pruning

Shift-tolerant Perceptual Similarity Metric

Abstracting Sketches through Simple Primitives

AutoTransition: Learning to Recommend Video Transition Effects

Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips

Identifying Hard Noise in Long-Tailed Sample Distribution

One-Trimap Video Matting

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation

LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity

Initialization and Alignment for Adversarial Texture Optimization

Depth Field Networks for Generalizable Multi-view Scene Representation

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images

Break and Make: Interactive Structural Understanding Using LEGO Bricks

A Repulsive Force Unit for Garment Collision Handling in Neural Networks

Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

AlphaVC: High-Performance and Efficient Learned Video Compression

WISE: Whitebox Image Stylization by Example-based Learning

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

Video Question Answering with Iterative Video-Text Co-Tokenization

S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning

Skeleton-free Pose Transfer for Stylized 3D Characters

Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-Boosting Attention Mechanism

SdAE: Self-distillated Masked Autoencoder

Out-of-Distribution Detection with Semantic Mismatch under Masking

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning

Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network

Few-Shot Class-Incremental Learning from an Open-Set Perspective

DAS: Densely-Anchored Sampling for Deep Metric Learning

Fast Two-step Blind Optical Aberration Correction

Negative Frames Matter in Egocentric Visual Query 2D Localization

Neighborhood Collective Estimation for Noisy Label Identification and Correction

PlaneFormers: From Sparse View Planes to 3D Reconstruction

SLiDE: Self-supervised LiDAR De-snowing through Reconstruction Difficulty

Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer

Learning Omnidirectional Flow in 360-degree Video via Siamese Representation

Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation

Contrastive Positive Mining for Unsupervised 3D Action Representation Learning

Speaker-adaptive Lip Reading with User-dependent Padding

Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

Rethinking Robust Representation Learning Under Fine-grained Noisy Faces

RDA: Reciprocal Distribution Alignment for Robust SSL

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees

MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition

PRIF: Primary Ray-based Implicit Function

Learning Semantic Correspondence with Sparse Annotations

CCRL: Contrastive Cell Representation Learning

Pose Forecasting in Industrial Human-Robot Collaboration

Combating Label Distribution Shift for Active Domain Adaptation

Matching Multiple Perspectives for Efficient Representation Learning

Uncertainty-guided Source-free Domain Adaptation

Context-Aware Streaming Perception in Dynamic Environments

Towards an Error-free Deep Occupancy Detector for Smart Camera Parking System

AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets

DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning

L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training

ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization

Unifying Visual Perception by Dispersible Points Learning

Visual Cross-View Metric Localization with Dense Uncertainty Estimates

GCISG: Guided Causal Invariant Learning for Improved Syn-to-real Generalization

SIM2E: Benchmarking the Group Equivariant Capability of Correspondence Matching Algorithms

Artifact-Based Domain Generalization of Skin Lesion Models

Fuse and Attend: Generalized Embedding Learning for Art and Sketches

Effectiveness of Function Matching in Driving Scene Recognition

Consistency Regularization for Domain Adaptation

IMPaSh: A Novel Domain-shift Resistant Representation for Colorectal Cancer Tissue Classification

Deep Structural Causal Shape Models

Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling

Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound

The Value of Out-of-Distribution Data

Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization

RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN

Cross-Camera View-Overlap Recognition

On the Design of Privacy-Aware Cameras: a Study on Deep Neural Networks

Discovering Transferable Forensic Features for CNN-generated Images Detection

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Learning Continuous Implicit Representation for Near-Periodic Patterns

NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems

Take One Gram of Neural Features, Get Enhanced Group Robustness

CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Probing Contextual Diversity for Dense Out-of-Distribution Detection

CAIR: Fast and Lightweight Multi-Scale Color Attention Network for Instagram Filter Removal

FUSION: Fully Unsupervised Test-Time Stain Adaptation via Fused Normalization Statistics

Style-Agnostic Reinforcement Learning

LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices

Unpaired Image Translation via Vector Symbolic Architectures

CNSNet: A Cleanness-Navigated-Shadow Network for Shadow Removal

Semi-Supervised Domain Adaptation by Similarity based Pseudo-label Injection

Recurrent Bilinear Optimization for Binary Neural Networks

Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions

Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies

Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps

Exploring Anchor-based Detection for Ego4D Natural Language Query

Detecting Driver Drowsiness as an Anomaly Using LSTM Autoencoders

Switchable Online Knowledge Distillation

Self-supervised Human Mesh Recovery with Cross-Representation Alignment

Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection

PointScatter: Point Set Representation for Tubular Structure Extraction

Adversarial Coreset Selection for Efficient Robust Training

Out-of-Vocabulary Challenge Report

DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

MIPI 2022 Challenge on RGB+ToF Depth Completion: Dataset and Report

MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report

MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results

Hydra Attention: Efficient Attention with Many Heads

返回目录/back