Micron-BERT: BERT-Based Facial Micro-Expression Recognition |
|
|
|
NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation |
|
|
|
A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation |
➖ |
|
➖ |
Executing Your Commands via Motion Diffusion in Latent Space |
|
|
|
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID |
|
|
➖ |
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation |
|
|
➖ |
Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation |
|
|
➖ |
Dynamic Aggregated Network for Gait Recognition |
|
|
|
Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone? |
|
|
|
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction |
|
|
|
ECON: Explicit Clothed Humans Optimized via Normal Integration |
|
|
|
Neuron Structure Modeling for Generalizable Remote Physiological Measurement |
|
|
|
Continuous Sign Language Recognition With Correlation Network |
|
|
|
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment |
|
|
|
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model |
|
|
➖ |
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation |
|
|
|
3D Human Mesh Estimation From Virtual Markers |
|
|
|
3D Human Pose Estimation via Intuitive Physics |
|
|
|
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation |
|
|
|
Generating Holistic 3D Human Motion From Speech |
|
|
➖ |
HARP: Personalized Hand Reconstruction From a Monocular RGB Video |
|
|
➖ |
Learning Locally Editable Virtual Humans |
|
|
|
Reconstructing Signing Avatars From Video Using Linguistic Priors |
|
|
|
DrapeNet: Garment Generation and Self-Supervised Draping |
|
|
|
X-Avatar: Expressive Human Avatars |
|
|
|
Hi4D: 4D Instance Segmentation of Close Human Interaction |
|
|
|
Vid2Avatar: 3D Avatar Reconstruction From Videos in the Wild via Self-Supervised Scene Decomposition |
|
|
|
CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition |
|
|
|
Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images |
➖ |
|
➖ |
Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition |
➖ |
|
|
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands |
➖ |
|
|
Relightable Neural Human Assets From Multi-View Gradient Illuminations |
|
|
|
Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training |
|
|
|
DeFeeNet: Consecutive 3D Human Motion Prediction With Deviation Feedback |
➖ |
|
➖ |
BioNet: A Biologically-Inspired Network for Face Recognition |
|
|
|
Boosting Detection in Crowd Analysis via Underutilized Output Features |
|
|
|
Learning Analytical Posterior Probability for Human Mesh Recovery |
|
|
|
Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals |
|
|
|
Detecting and Grounding Multi-Modal Media Manipulation |
|
|
|
RelightableHands: Efficient Neural Relighting of Articulated Hand Models |
|
|
➖ |
MEGANE: Morphable Eyeglass and Avatar Network |
|
|
|
SunStage: Portrait Reconstruction and Relighting Using the Sun as a Light Stage |
|
|
|
TryOnDiffusion: A Tale of Two UNets |
|
|
|
Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination |
➖ |
|
|
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery |
|
|
|
Scene-Aware Egocentric 3D Human Pose Estimation |
|
|
|
PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers |
➖ |
|
|
Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting |
|
|
|
A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image |
|
|
|
TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments |
|
|
|
Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry |
|
|
|
Generating Human Motion From Textual Descriptions With Discrete Representations |
|
|
|
Learning Human Mesh Recovery in 3D Scenes |
|
|
➖ |
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction |
|
|
|
3D-Aware Face Swapping |
|
|
|
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos |
|
|
|
GFPose: Learning 3D Human Pose Prior With Gradient Fields |
|
|
|
Rethinking Feature-Based Knowledge Distillation for Face Recognition |
➖ |
|
➖ |
One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transformer |
|
|
|
Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization |
➖ |
|
|
Ego-Body Pose Estimation via Ego-Head Pose Estimation |
|
|
|
TOPLight: Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition |
➖ |
|
➖ |
StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping |
|
|
|
Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues |
➖ |
|
|
FLEX: Full-Body Grasping Without Full-Body Grasps |
|
|
|
EDGE: Editable Dance Generation From Music |
|
|
|
Complete 3D Human Reconstruction From a Single Incomplete Image |
➖ |
|
|
Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters |
|
|
|
Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video |
|
|
|
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes |
|
|
|
Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild |
➖ |
|
➖ |
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose |
|
|
|
Invertible Neural Skinning |
|
|
|
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing |
|
|
|
Harmonious Feature Learning for Interactive Hand-Object Pose Estimation |
|
|
|
Leapfrog Diffusion Model for Stochastic Trajectory Prediction |
|
|
|
NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images |
|
|
|
DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion |
|
|
➖ |
GFIE: A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments |
|
|
|
Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition From Egocentric RGB Videos |
|
|
|
Decompose More and Aggregate Better: Two Closer Looks at Frequency Representation Learning for Human Motion Prediction |
➖ |
|
|
Human Pose As Compositional Tokens |
|
|
|
Normal-Guided Garment UV Prediction for Human Re-Texturing |
➖ |
|
|
Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfake Detection |
➖ |
|
|
VGFlow: Visibility Guided Flow Network for Human Reposing |
➖ |
|
➖ |
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video |
➖ |
|
|
PREIM3D: 3D Consistent Precise Image Attribute Editing From a Single Image |
|
|
|
HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation |
|
|
|
Implicit Identity Driven Deepfake Face Swapping Detection |
|
|
|
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion |
|
|
|
3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic Data |
➖ |
|
|
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments |
|
|
|
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation |
➖ |
|
|
AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation |
|
|
|
UDE: A Unified Driving Engine for Human Motion Generation |
|
|
|
CodeTalker: Speech-Driven 3D Facial Animation With Discrete Motion Prior |
|
|
|
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module |
|
|
|
Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos |
|
|
|
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics |
|
|
|
ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction |
|
|
|
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining |
|
|
|
CIMI4D: A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions |
|
|
|
Human Pose Estimation in Extremely Low-Light Conditions |
|
|
➖ |
DistilPose: Tokenized Pose Regression With Heatmap Distillation |
|
|
|
Human Body Shape Completion With Implicit Shape and Flow Learning |
➖ |
|
|
Source-Free Adaptive Gaze Estimation by Uncertainty Reduction |
|
|
|
Music-Driven Group Choreography |
|
|
|
Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation |
|
|
|
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg |
|
|
|
Transformer-Based Unified Recognition of Two Hands Manipulating Objects |
|
|
|
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization |
|
|
|
ScarceNet: Animal Pose Estimation With Scarce Annotations |
|
|
|
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction |
|
|
|
MoDi: Unconditional Motion Synthesis From Diverse Data |
|
|
|
Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition |
➖ |
|
|
MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction |
➖ |
|
|
Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction |
|
|
|
TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers |
|
|
|
Handy: Towards a High Fidelity 3D Hand Shape and Appearance Model |
|
|
|
CIRCLE: Capture in Rich Contextual Environments |
|
|
|
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention |
|
|
|
Implicit Neural Head Synthesis via Controllable Local Deformation Fields |
|
|
|
Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation |
|
|
➖ |
JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking |
|
|
➖ |
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection |
|
|
➖ |
GM-NeRF: Learning Generalizable Model-Based Neural Radiance Fields From Multi-View Images |
|
|
|
Decoupled Multimodal Distilling for Emotion Recognition |
|
|
|
HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions |
|
|
|
ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection |
➖ |
|
|
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation |
|
|
|
Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion |
➖ |
|
|
Probabilistic Knowledge Distillation of Face Ensembles |
➖ |
|
|
Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing |
|
|
|
Parameter Efficient Local Implicit Image Function Network for Face Segmentation |
➖ |
|
➖ |
HumanGen: Generating Human Radiance Fields With Explicit Priors |
➖ |
|
|
Biomechanics-Guided Facial Action Unit Detection Through Force Modeling |
➖ |
|
➖ |
Decoupling Human and Camera Motion From Videos in the Wild |
|
|
|
Overcoming the Trade-Off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction |
➖ |
|
|
Instant-NVR: Instant Neural Volumetric Rendering for Human-Object Interactions From Monocular RGBD Stream |
|
|
|
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation |
|
|
|
Analyzing and Diagnosing Pose Estimation With Attributions |
➖ |
|
|
Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning |
|
|
➖ |
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification |
|
|
➖ |
Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition |
➖ |
|
➖ |
Avatars Grow Legs: Generating Smooth Human Motion From Sparse Tracking Inputs With Diffusion Model |
|
|
➖ |
Local Connectivity-Based Density Estimation for Face Clustering |
|
|
|
SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition |
➖ |
|
|
Detecting Human-Object Contact in Images |
|
|
|
Controllable Light Diffusion for Portraits |
|
|
|
InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds |
|
|
|
NeMo: Learning 3D Neural Motion Fields From Multiple Video Instances of the Same Action |
|
|
|
Privacy-Preserving Adversarial Facial Features |
|
|
|
Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation |
➖ |
|
|
DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment |
|
|
|
Clothed Human Performance Capture With a Double-Layer Neural Radiance Fields |
|
|
|
Continuous Landmark Detection With 3D Queries |
➖ |
|
|
Learning a 3D Morphable Face Reflectance Model From Low-Cost Data |
|
|
|
AUNet: Learning Relations Between Action Units for Face Forgery Detection |
|
|
|
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention |
|
|
|
Implicit 3D Human Mesh Recovery Using Consistency With Pose and Shape From Unseen-View |
➖ |
|
|
3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels |
➖ |
|
|
Multi-Label Compound Expression Recognition: C-EXPR Database & Network |
➖ |
|
|
FlexNeRF: Photorealistic Free-Viewpoint Rendering of Moving Humans From Sparse Views |
|
|
|
Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos |
➖ |
|
➖ |
Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards |
|
|
|
FeatER: An Efficient Network for Human Reconstruction via Feature Map-based TransformER |
|
|
|