Towards Universal Fake Image Detectors That Generalize Across Generative Models |
|
|
➖ |
Implicit Diffusion Models for Continuous Super-Resolution |
|
|
➖ |
High-Fidelity Guided Image Synthesis With Latent Diffusion Models |
|
|
|
DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields |
|
|
|
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit |
|
|
|
Balanced Spherical Grid for Egocentric View Synthesis |
|
|
|
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation |
|
|
|
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation |
|
|
|
Self-Guided Diffusion Models |
|
|
|
Multi-Concept Customization of Text-to-Image Diffusion |
|
|
|
3D-Aware Conditional Image Synthesis |
|
|
|
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity |
|
|
|
SceneComposer: Any-Level Semantic Image Synthesis |
|
|
|
DiffCollage: Parallel Generation of Large Content With Diffusion Models |
|
|
➖ |
Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes |
|
|
➖ |
Hybrid Neural Rendering for Large-Scale Scenes With Motion Blur |
|
|
|
Binary Latent Diffusion |
|
|
|
StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN |
|
|
|
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation |
|
|
➖ |
SeaThru-NeRF: Neural Radiance Fields in Scattering Media |
|
|
|
PointAvatar: Deformable Point-Based Head Avatars From Videos |
|
|
|
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars |
|
|
➖ |
Neural Preset for Color Style Transfer |
|
|
|
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning |
|
|
|
DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata |
|
|
|
Exploring Incompatible Knowledge Transfer in Few-Shot Image Generation |
|
|
|
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model With Discrete and Continuous Denoising |
|
|
|
Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization |
|
|
|
RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor |
|
|
|
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation |
|
|
|
LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook |
|
|
➖ |
Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation |
|
|
|
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis |
|
|
|
High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning |
➖ |
|
|
Consistent View Synthesis With Pose-Guided Diffusion Models |
|
|
|
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator |
|
|
|
Imagic: Text-Based Real Image Editing With Diffusion Models |
|
|
|
Large-Capacity and Flexible Video Steganography via Invertible Neural Network |
|
|
➖ |
Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis |
➖ |
|
➖ |
Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image |
|
|
|
CF-Font: Content Fusion for Few-Shot Font Generation |
|
|
|
One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Field |
|
|
|
Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation |
➖ |
|
|
Diffusion Probabilistic Model Made Slim |
➖ |
|
|
Collaborative Diffusion for Multi-Modal Face Generation and Editing |
|
|
|
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors |
|
|
|
Network-Free, Unsupervised Semantic Segmentation With Synthetic Images |
➖ |
|
|
Visual Prompt Tuning for Generative Transfer Learning |
|
|
|
Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style |
|
|
|
Catch Missing Details: Image Reconstruction With Frequency Augmented Variational Autoencoder |
|
|
|
Towards Bridging the Performance Gaps of Joint Energy-Based Models |
|
|
➖ |
GLeaD: Improving GANs With a Generator-Leading Task |
|
|
|
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction |
|
|
|
SPARF: Neural Radiance Fields From Sparse and Noisy Poses |
|
|
|
DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation |
|
|
|
Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis |
|
|
|
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation |
➖ |
|
|
MaskSketch: Unpaired Structure-Guided Masked Image Generation |
➖ |
|
|
Affordance Diffusion: Synthesizing Hand-Object Interactions |
|
|
|
Interactive Cartoonization With Controllable Perceptual Factors |
➖ |
|
|
MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation |
|
|
|
Paint by Example: Exemplar-Based Image Editing With Diffusion Models |
|
|
➖ |
GLIGEN: Open-Set Grounded Text-to-Image Generation |
|
|
|
L-CoIns: Language-Based Colorization With Instance Awareness |
|
|
➖ |
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation |
|
|
|
Evading DeepFake Detectors via Adversarial Statistical Consistency |
➖ |
|
➖ |
GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling |
➖ |
|
|
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning |
|
|
|
Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization |
|
|
➖ |
Regularized Vector Quantization for Tokenized Image Synthesis |
➖ |
|
➖ |
EDICT: Exact Diffusion Inversion via Coupled Transformations |
|
|
|
Scaling Up GANs for Text-to-Image Synthesis |
|
|
|
Shape-Aware Text-Driven Layered Video Editing |
|
|
➖ |
A Unified Pyramid Recurrent Network for Video Frame Interpolation |
|
|
|
TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision |
|
|
|
Fine-Grained Face Swapping via Regional GAN Inversion |
|
|
|
OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering |
|
|
|
Deep Stereo Video Inpainting |
➖ |
|
➖ |
StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer |
|
|
➖ |
Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences Between Pretrained Generative Models |
|
|
➖ |
Unsupervised Volumetric Animation |
|
|
➖ |
SINE: SINgle Image Editing With Text-to-Image Diffusion Models |
|
|
➖ |
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis |
|
|
|
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer |
|
|
|
DeepVecFont-v2: Exploiting Transformers To Synthesize Vector Fonts With Higher Quality |
|
|
➖ |
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization |
|
|
|
SINE: Semantic-Driven Image-Based NeRF Editing With Prior-Guided Editing Field |
|
|
|
Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervised Image Synthesis |
➖ |
|
➖ |
Image Cropping With Spatial-Aware Feature and Rank Consistency |
➖ |
|
|
Picture That Sketch: Photorealistic Image Generation From Abstract Sketches |
|
|
|
MonoHuman: Animatable Human Neural Field From Monocular Video |
|
|
|
PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing |
➖ |
|
|
Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views |
|
|
|
SpaText: Spatio-Textual Representation for Controllable Image Generation |
|
|
|
Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation |
➖ |
|
|
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation |
|
|
|
Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement |
|
|
|
Video Probabilistic Diffusion Models in Projected Latent Space |
|
|
➖ |
Variational Distribution Learning for Unsupervised Text-to-Image Generation |
➖ |
|
➖ |
Linking Garment With Person via Semantically Associated Landmarks for Virtual Try-On |
|
|
➖ |
UV Volumes for Real-Time Rendering of Editable Free-View Human Performance |
|
|
|
NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models |
|
|
|
Polynomial Implicit Neural Representations for Large Diverse Datasets |
|
|
|
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation |
|
|
|
Conditional Image-to-Video Generation With Latent Flow Diffusion Models |
|
|
|
Local 3D Editing via 3D Distillation of CLIP Knowledge |
➖ |
|
➖ |
Private Image Generation With Dual-Purpose Auxiliary Classifier |
➖ |
|
|
MAGVIT: Masked Generative Video Transformer |
|
|
➖ |
Dimensionality-Varying Diffusion Process |
➖ |
|
|
VIVE3D: Viewpoint-Independent Video Editing Using 3D-Aware GANs |
|
|
|
LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data |
|
|
➖ |
DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model |
|
|
|
Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint |
|
|
|
High-Fidelity and Freely Controllable Talking Head Video Generation |
➖ |
|
|
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation |
|
|
|
StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields |
|
|
|
MOSO: Decomposing MOtion, Scene and Object for Video Prediction |
|
|
|
Multi Domain Learning for Motion Magnification |
|
|
|
GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields |
|
|
|
Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding |
|
|
➖ |
Blemish-Aware and Progressive Face Retouching With Limited Paired Data |
➖ |
|
|
Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation |
➖ |
|
➖ |
NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models |
|
|
➖ |
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation |
|
|
|
Class-Balancing Diffusion Models |
|
|
|
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing |
|
|
|
Inversion-Based Style Transfer With Diffusion Models |
|
|
|
Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model |
➖ |
|
|
FlowGrad: Controlling the Output of Generative ODEs With Gradients |
|
|
➖ |
Graph Transformer GANs for Graph-Constrained House Generation |
➖ |
|
|
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer |
➖ |
|
|
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars |
|
|
|
Ham2Pose: Animating Sign Language Notation Into Pose Sequences |
|
|
|
Neural Transformation Fields for Arbitrary-Styled Font Generation |
|
|
|
LayoutDM: Transformer-Based Diffusion Model for Layout Generation |
|
|
|
Removing Objects From Neural Radiance Fields |
|
|
|
Person Image Synthesis via Denoising Diffusion Model |
|
|
|
AdaptiveMix: Improving GAN Training via Feature Space Shrinkage |
|
|
|
Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator |
|
|
|
3D Neural Field Generation Using Triplane Diffusion |
|
|
|
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis |
➖ |
|
|
RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security Image Synthesis |
➖ |
|
|
ObjectStitch: Object Compositing With Diffusion Model |
➖ |
|
|
Persistent Nature: A Generative Model of Unbounded 3D Worlds |
|
|
|
Masked and Adaptive Transformer for Exemplar Based Image Translation |
|
|
|
Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training |
|
|
|
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild |
|
|
|
Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models |
|
|
➖ |
All Are Worth Words: A ViT Backbone for Diffusion Models |
|
|
|
Few-Shot Semantic Image Synthesis With Class Affinity Transfer |
|
|
➖ |
Blowing in the Wind: CycleNet for Human Cinemagraphs From Still Images |
|
|
|
StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis |
|
|
|
MixNeRF: Modeling a Ray With Mixture Density for Novel View Synthesis From Sparse Inputs |
|
|
|
MoStGAN-V: Video Generation With Temporal Motion Styles |
|
|
|
Frame Interpolation Transformer and Uncertainty Guidance |
|
|
|
Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional Transformers |
|
|
|
HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images |
|
|
|
Neural Texture Synthesis With Guided Correspondence |
|
|
|
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° |
|
|
|
InstructPix2Pix: Learning To Follow Image Editing Instructions |
|
|
|
Unpaired Image-to-Image Translation With Shortest Path Regularization |
|
|
|
Freestyle Layout-to-Image Synthesis |
|
|
|
On Distillation of Guided Diffusion Models |
➖ |
|
➖ |
Single Image Backdoor Inversion via Robust Smoothed Classifiers |
|
|
|
Make-a-Story: Visual Memory Conditioned Consistent Story Generation |
|
|
|
Towards Practical Plug-and-Play Diffusion Models |
|
|
|
Efficient Scale-Invariant Generator With Column-Row Entangled Pixel Synthesis |
|
|
|
Wavelet Diffusion Models Are Fast and Scalable Image Generators |
|
|
|
3D GAN Inversion With Facial Symmetry Prior |
|
|
|
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert |
|
|
|
PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations |
|
|
➖ |
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts |
|
|
➖ |
Video Compression With Entropy-Constrained Neural Representations |
➖ |
|
|
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models |
|
|
|
CoralStyleCLIP: Co-Optimized Region and Layer Selection for Image Editing |
|
|
|
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding |
|
|
|
Sequential Training of GANs Against GAN-Classifiers Reveals Correlated Knowledge Gaps Present Among Independently Trained GAN Instances |
➖ |
|
|
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization |
|
|
|
Shifted Diffusion for Text-to-Image Generation |
|
|
➖ |
HandsOff: Labeled Dataset Generation With No Additional Human Annotations |
|
|
|
Lookahead Diffusion Probabilistic Models for Refining Mean Estimation |
|
|
➖ |
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting |
|
|
➖ |
Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration |
|
|
➖ |
BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models |
|
|
|
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models |
|
|
➖ |