CVPR-2023-Papers

Application
New collections

Image and Video Synthesis and Generation

Title	Repo	Video
Towards Universal Fake Image Detectors That Generalize Across Generative Models		➖
Implicit Diffusion Models for Continuous Super-Resolution		➖
High-Fidelity Guided Image Synthesis With Latent Diffusion Models
DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit
Balanced Spherical Grid for Egocentric View Synthesis
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Self-Guided Diffusion Models
Multi-Concept Customization of Text-to-Image Diffusion
3D-Aware Conditional Image Synthesis
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
SceneComposer: Any-Level Semantic Image Synthesis
DiffCollage: Parallel Generation of Large Content With Diffusion Models		➖
Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes		➖
Hybrid Neural Rendering for Large-Scale Scenes With Motion Blur
Binary Latent Diffusion
StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation		➖
SeaThru-NeRF: Neural Radiance Fields in Scattering Media
PointAvatar: Deformable Point-Based Head Avatars From Videos
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars		➖
Neural Preset for Color Style Transfer
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning
DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata
Exploring Incompatible Knowledge Transfer in Few-Shot Image Generation
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model With Discrete and Continuous Denoising
Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization
RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook		➖
Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning	➖
Consistent View Synthesis With Pose-Guided Diffusion Models
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator
Imagic: Text-Based Real Image Editing With Diffusion Models
Large-Capacity and Flexible Video Steganography via Invertible Neural Network		➖
Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis	➖	➖
Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image
CF-Font: Content Fusion for Few-Shot Font Generation
One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Field
Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation	➖
Diffusion Probabilistic Model Made Slim	➖
Collaborative Diffusion for Multi-Modal Face Generation and Editing
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors
Network-Free, Unsupervised Semantic Segmentation With Synthetic Images	➖
Visual Prompt Tuning for Generative Transfer Learning
Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style
Catch Missing Details: Image Reconstruction With Frequency Augmented Variational Autoencoder
Towards Bridging the Performance Gaps of Joint Energy-Based Models		➖
GLeaD: Improving GANs With a Generator-Leading Task
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
SPARF: Neural Radiance Fields From Sparse and Noisy Poses
DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation
Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation	➖
MaskSketch: Unpaired Structure-Guided Masked Image Generation	➖
Affordance Diffusion: Synthesizing Hand-Object Interactions
Interactive Cartoonization With Controllable Perceptual Factors	➖
MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation
Paint by Example: Exemplar-Based Image Editing With Diffusion Models		➖
GLIGEN: Open-Set Grounded Text-to-Image Generation
L-CoIns: Language-Based Colorization With Instance Awareness		➖
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
Evading DeepFake Detectors via Adversarial Statistical Consistency	➖	➖
GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling	➖
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning
Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization		➖
Regularized Vector Quantization for Tokenized Image Synthesis	➖	➖
EDICT: Exact Diffusion Inversion via Coupled Transformations
Scaling Up GANs for Text-to-Image Synthesis
Shape-Aware Text-Driven Layered Video Editing		➖
A Unified Pyramid Recurrent Network for Video Frame Interpolation
TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision
Fine-Grained Face Swapping via Regional GAN Inversion
OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering
Deep Stereo Video Inpainting	➖	➖
StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer		➖
Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences Between Pretrained Generative Models		➖
Unsupervised Volumetric Animation		➖
SINE: SINgle Image Editing With Text-to-Image Diffusion Models		➖
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
DeepVecFont-v2: Exploiting Transformers To Synthesize Vector Fonts With Higher Quality		➖
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
SINE: Semantic-Driven Image-Based NeRF Editing With Prior-Guided Editing Field
Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervised Image Synthesis	➖	➖
Image Cropping With Spatial-Aware Feature and Rank Consistency	➖
Picture That Sketch: Photorealistic Image Generation From Abstract Sketches
MonoHuman: Animatable Human Neural Field From Monocular Video
PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing	➖
Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views
SpaText: Spatio-Textual Representation for Controllable Image Generation
Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation	➖
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement
Video Probabilistic Diffusion Models in Projected Latent Space		➖
Variational Distribution Learning for Unsupervised Text-to-Image Generation	➖	➖
Linking Garment With Person via Semantically Associated Landmarks for Virtual Try-On		➖
UV Volumes for Real-Time Rendering of Editable Free-View Human Performance
NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models
Polynomial Implicit Neural Representations for Large Diverse Datasets
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Conditional Image-to-Video Generation With Latent Flow Diffusion Models
Local 3D Editing via 3D Distillation of CLIP Knowledge	➖	➖
Private Image Generation With Dual-Purpose Auxiliary Classifier	➖
MAGVIT: Masked Generative Video Transformer		➖
Dimensionality-Varying Diffusion Process	➖
VIVE3D: Viewpoint-Independent Video Editing Using 3D-Aware GANs
LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data		➖
DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
High-Fidelity and Freely Controllable Talking Head Video Generation	➖
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields
MOSO: Decomposing MOtion, Scene and Object for Video Prediction
Multi Domain Learning for Motion Magnification
GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields
Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding		➖
Blemish-Aware and Progressive Face Retouching With Limited Paired Data	➖
Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation	➖	➖
NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models		➖
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
Class-Balancing Diffusion Models
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
Inversion-Based Style Transfer With Diffusion Models
Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model	➖
FlowGrad: Controlling the Output of Generative ODEs With Gradients		➖
Graph Transformer GANs for Graph-Constrained House Generation	➖
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer	➖
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
Ham2Pose: Animating Sign Language Notation Into Pose Sequences
Neural Transformation Fields for Arbitrary-Styled Font Generation
LayoutDM: Transformer-Based Diffusion Model for Layout Generation
Removing Objects From Neural Radiance Fields
Person Image Synthesis via Denoising Diffusion Model
AdaptiveMix: Improving GAN Training via Feature Space Shrinkage
Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator
3D Neural Field Generation Using Triplane Diffusion
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis	➖
RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security Image Synthesis	➖
ObjectStitch: Object Compositing With Diffusion Model	➖
Persistent Nature: A Generative Model of Unbounded 3D Worlds
Masked and Adaptive Transformer for Exemplar Based Image Translation
Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models		➖
All Are Worth Words: A ViT Backbone for Diffusion Models
Few-Shot Semantic Image Synthesis With Class Affinity Transfer		➖
Blowing in the Wind: CycleNet for Human Cinemagraphs From Still Images
StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis
MixNeRF: Modeling a Ray With Mixture Density for Novel View Synthesis From Sparse Inputs
MoStGAN-V: Video Generation With Temporal Motion Styles
Frame Interpolation Transformer and Uncertainty Guidance
Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional Transformers
HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images
Neural Texture Synthesis With Guided Correspondence
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°
InstructPix2Pix: Learning To Follow Image Editing Instructions
Unpaired Image-to-Image Translation With Shortest Path Regularization
Freestyle Layout-to-Image Synthesis
On Distillation of Guided Diffusion Models	➖	➖
Single Image Backdoor Inversion via Robust Smoothed Classifiers
Make-a-Story: Visual Memory Conditioned Consistent Story Generation
Towards Practical Plug-and-Play Diffusion Models
Efficient Scale-Invariant Generator With Column-Row Entangled Pixel Synthesis
Wavelet Diffusion Models Are Fast and Scalable Image Generators
3D GAN Inversion With Facial Symmetry Prior
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations		➖
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts		➖
Video Compression With Entropy-Constrained Neural Representations	➖
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
CoralStyleCLIP: Co-Optimized Region and Layer Selection for Image Editing
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
Sequential Training of GANs Against GAN-Classifiers Reveals Correlated `Knowledge Gaps` Present Among Independently Trained GAN Instances	➖
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
Shifted Diffusion for Text-to-Image Generation		➖
HandsOff: Labeled Dataset Generation With No Additional Human Annotations
Lookahead Diffusion Probabilistic Models for Refining Mean Estimation		➖
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting		➖
Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration		➖
BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models		➖

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-and-video-synthesis-and-generation.md

image-and-video-synthesis-and-generation.md

CVPR-2023-Papers

Image and Video Synthesis and Generation

Files

image-and-video-synthesis-and-generation.md

Latest commit

History

image-and-video-synthesis-and-generation.md

File metadata and controls

CVPR-2023-Papers

Image and Video Synthesis and Generation