ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N/A, arXiv'23 #741

AkihikoWatanabe · 2023-06-16T12:36:40Z

URL

https://arxiv.org/abs//2306.00971

Affiliations

Shaozhe Hao, N/A
Kai Han, N/A
Shihao Zhao, N/A
Kwan-Yee K. Wong, N/A

Abstract

Personalized text-to-image generation using diffusion models has recentlybeen proposed and attracted lots of attention. Given a handful of imagescontaining a novel concept (e.g., a unique toy), we aim to tune the generativemodel to capture fine visual details of the novel concept and generatephotorealistic images following a text condition. We present a plug-in method,named ViCo, for fast and lightweight personalized generation. Specifically, wepropose an image attention module to condition the diffusion process on thepatch-wise visual semantics. We introduce an attention-based object mask thatcomes almost at no cost from the attention module. In addition, we design asimple regularization based on the intrinsic properties of text-image attentionmaps to alleviate the common overfitting degradation. Unlike many existingmodels, our method does not finetune any parameters of the original diffusionmodel. This allows more flexible and transferable model deployment. With onlylight parameter training (~6% of the diffusion U-Net), our method achievescomparable or even better performance than all state-of-the-art models bothqualitatively and quantitatively.

Translation (by gpt-3.5-turbo)

最近、拡散モデルを用いたパーソナライズされたテキストから画像生成が提案され、注目を集めています。新しい概念（例えば、ユニークなおもちゃ）を含む数枚の画像が与えられた場合、私たちは生成モデルを調整して、新しい概念の微細な視覚的詳細を捉え、テキスト条件に従って写真のような画像を生成することを目指します。私たちは、高速で軽量なパーソナライズされた生成のためのプラグインメソッドであるViCoを提案します。具体的には、画像の注目モジュールを提案して、パッチごとの視覚的意味に基づいて拡散プロセスを条件付けます。注目モジュールからほとんどコストがかからない注目ベースのオブジェクトマスクを導入します。さらに、テキスト-画像注目マップの固有の特性に基づくシンプルな正則化を設計して、一般的な過学習の劣化を軽減します。多くの既存のモデルとは異なり、私たちの方法は元の拡散モデルのパラメータを微調整しません。これにより、より柔軟で転移可能なモデルの展開が可能になります。軽量なパラメータトレーニング（拡散U-Netの約6％）だけで、私たちの方法は、定性的および定量的に、すべての最新のモデルと同等またはそれ以上の性能を発揮します。

Summary (by gpt-3.5-turbo)

拡散モデルを用いたパーソナライズされた画像生成において、高速で軽量なプラグインメソッドであるViCoを提案。注目モジュールを導入し、注目ベースのオブジェクトマスクを使用することで、一般的な過学習の劣化を軽減。元の拡散モデルのパラメータを微調整せず、軽量なパラメータトレーニングだけで、最新のモデルと同等またはそれ以上の性能を発揮することができる。

AkihikoWatanabe added the Pocket label Jun 16, 2023

AkihikoWatanabe changed the title あ ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N/A, arXiv'23 Jun 16, 2023

AkihikoWatanabe added the action_wanted label Jun 18, 2023

AkihikoWatanabe added ImageCaptioning ComputerVision NLP TextToImage Personalization DiffusionModel and removed action_wanted ImageCaptioning labels Oct 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N/A, arXiv'23 #741

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N/A, arXiv'23 #741

AkihikoWatanabe commented Jun 16, 2023 •

edited

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N/A, arXiv'23 #741

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation, Shaozhe Hao+, N/A, arXiv'23 #741

Comments

AkihikoWatanabe commented Jun 16, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Jun 16, 2023 •

edited