You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Personalized text-to-image generation using diffusion models has recentlybeen proposed and attracted lots of attention. Given a handful of imagescontaining a novel concept (e.g., a unique toy), we aim to tune the generativemodel to capture fine visual details of the novel concept and generatephotorealistic images following a text condition. We present a plug-in method,named ViCo, for fast and lightweight personalized generation. Specifically, wepropose an image attention module to condition the diffusion process on thepatch-wise visual semantics. We introduce an attention-based object mask thatcomes almost at no cost from the attention module. In addition, we design asimple regularization based on the intrinsic properties of text-image attentionmaps to alleviate the common overfitting degradation. Unlike many existingmodels, our method does not finetune any parameters of the original diffusionmodel. This allows more flexible and transferable model deployment. With onlylight parameter training (~6% of the diffusion U-Net), our method achievescomparable or even better performance than all state-of-the-art models bothqualitatively and quantitatively.
AkihikoWatanabe
changed the title
あ
ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image
Generation, Shaozhe Hao+, N/A, arXiv'23
Jun 16, 2023
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: