CVPR-2023-Papers

Application
New collections

Humans: Face, Body, Pose, Gesture, Movement

Title	Repo	Video
Micron-BERT: BERT-Based Facial Micro-Expression Recognition
NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation
A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation	➖	➖
Executing Your Commands via Motion Diffusion in Latent Space
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID		➖
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation		➖
Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation		➖
Dynamic Aggregated Network for Gait Recognition
Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone?
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
ECON: Explicit Clothed Humans Optimized via Normal Integration
Neuron Structure Modeling for Generalizable Remote Physiological Measurement
Continuous Sign Language Recognition With Correlation Network
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model		➖
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
3D Human Mesh Estimation From Virtual Markers
3D Human Pose Estimation via Intuitive Physics
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
Generating Holistic 3D Human Motion From Speech		➖
HARP: Personalized Hand Reconstruction From a Monocular RGB Video		➖
Learning Locally Editable Virtual Humans
Reconstructing Signing Avatars From Video Using Linguistic Priors
DrapeNet: Garment Generation and Self-Supervised Draping
X-Avatar: Expressive Human Avatars
Hi4D: 4D Instance Segmentation of Close Human Interaction
Vid2Avatar: 3D Avatar Reconstruction From Videos in the Wild via Self-Supervised Scene Decomposition
CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition
Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images	➖	➖
Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition	➖
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands	➖
Relightable Neural Human Assets From Multi-View Gradient Illuminations
Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training
DeFeeNet: Consecutive 3D Human Motion Prediction With Deviation Feedback	➖	➖
BioNet: A Biologically-Inspired Network for Face Recognition
Boosting Detection in Crowd Analysis via Underutilized Output Features
Learning Analytical Posterior Probability for Human Mesh Recovery
Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals
Detecting and Grounding Multi-Modal Media Manipulation
RelightableHands: Efficient Neural Relighting of Articulated Hand Models		➖
MEGANE: Morphable Eyeglass and Avatar Network
SunStage: Portrait Reconstruction and Relighting Using the Sun as a Light Stage
TryOnDiffusion: A Tale of Two UNets
Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination	➖
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
Scene-Aware Egocentric 3D Human Pose Estimation
PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers	➖
Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image
TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments
Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry
Generating Human Motion From Textual Descriptions With Discrete Representations
Learning Human Mesh Recovery in 3D Scenes		➖
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
3D-Aware Face Swapping
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
GFPose: Learning 3D Human Pose Prior With Gradient Fields
Rethinking Feature-Based Knowledge Distillation for Face Recognition	➖	➖
One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transformer
Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization	➖
Ego-Body Pose Estimation via Ego-Head Pose Estimation
TOPLight: Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition	➖	➖
StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping
Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues	➖
FLEX: Full-Body Grasping Without Full-Body Grasps
EDGE: Editable Dance Generation From Music
Complete 3D Human Reconstruction From a Single Incomplete Image	➖
Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters
Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild	➖	➖
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
Invertible Neural Skinning
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
Harmonious Feature Learning for Interactive Hand-Object Pose Estimation
Leapfrog Diffusion Model for Stochastic Trajectory Prediction
NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images
DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion		➖
GFIE: A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments
Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition From Egocentric RGB Videos
Decompose More and Aggregate Better: Two Closer Looks at Frequency Representation Learning for Human Motion Prediction	➖
Human Pose As Compositional Tokens
Normal-Guided Garment UV Prediction for Human Re-Texturing	➖
Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfake Detection	➖
VGFlow: Visibility Guided Flow Network for Human Reposing	➖	➖
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video	➖
PREIM3D: 3D Consistent Precise Image Attribute Editing From a Single Image
HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation
Implicit Identity Driven Deepfake Face Swapping Detection
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic Data	➖
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation	➖
AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
UDE: A Unified Driving Engine for Human Motion Generation
CodeTalker: Speech-Driven 3D Facial Animation With Discrete Motion Prior
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module
Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining
CIMI4D: A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions
Human Pose Estimation in Extremely Low-Light Conditions		➖
DistilPose: Tokenized Pose Regression With Heatmap Distillation
Human Body Shape Completion With Implicit Shape and Flow Learning	➖
Source-Free Adaptive Gaze Estimation by Uncertainty Reduction
Music-Driven Group Choreography
Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg
Transformer-Based Unified Recognition of Two Hands Manipulating Objects
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
ScarceNet: Animal Pose Estimation With Scarce Annotations
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
MoDi: Unconditional Motion Synthesis From Diverse Data
Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition	➖
MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction	➖
Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction
TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers
Handy: Towards a High Fidelity 3D Hand Shape and Appearance Model
CIRCLE: Capture in Rich Contextual Environments
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Implicit Neural Head Synthesis via Controllable Local Deformation Fields
Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation		➖
JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking		➖
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection		➖
GM-NeRF: Learning Generalizable Model-Based Neural Radiance Fields From Multi-View Images
Decoupled Multimodal Distilling for Emotion Recognition
HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions
ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection	➖
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion	➖
Probabilistic Knowledge Distillation of Face Ensembles	➖
Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing
Parameter Efficient Local Implicit Image Function Network for Face Segmentation	➖	➖
HumanGen: Generating Human Radiance Fields With Explicit Priors	➖
Biomechanics-Guided Facial Action Unit Detection Through Force Modeling	➖	➖
Decoupling Human and Camera Motion From Videos in the Wild
Overcoming the Trade-Off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction	➖
Instant-NVR: Instant Neural Volumetric Rendering for Human-Object Interactions From Monocular RGBD Stream
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Analyzing and Diagnosing Pose Estimation With Attributions	➖
Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning		➖
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification		➖
Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition	➖	➖
Avatars Grow Legs: Generating Smooth Human Motion From Sparse Tracking Inputs With Diffusion Model		➖
Local Connectivity-Based Density Estimation for Face Clustering
SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition	➖
Detecting Human-Object Contact in Images
Controllable Light Diffusion for Portraits
InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds
NeMo: Learning 3D Neural Motion Fields From Multiple Video Instances of the Same Action
Privacy-Preserving Adversarial Facial Features
Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation	➖
DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment
Clothed Human Performance Capture With a Double-Layer Neural Radiance Fields
Continuous Landmark Detection With 3D Queries	➖
Learning a 3D Morphable Face Reflectance Model From Low-Cost Data
AUNet: Learning Relations Between Action Units for Face Forgery Detection
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention
Implicit 3D Human Mesh Recovery Using Consistency With Pose and Shape From Unseen-View	➖
3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels	➖
Multi-Label Compound Expression Recognition: C-EXPR Database & Network	➖
FlexNeRF: Photorealistic Free-Viewpoint Rendering of Moving Humans From Sparse Views
Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos	➖	➖
Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards
FeatER: An Efficient Network for Human Reconstruction via Feature Map-based TransformER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

humans-face-body-pose-gesture-movement.md

humans-face-body-pose-gesture-movement.md

CVPR-2023-Papers

Humans: Face, Body, Pose, Gesture, Movement

Files

humans-face-body-pose-gesture-movement.md

Latest commit

History

humans-face-body-pose-gesture-movement.md

File metadata and controls

CVPR-2023-Papers

Humans: Face, Body, Pose, Gesture, Movement