You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Text-to-image models (T2I) offer a new level of flexibility by allowing usersto guide the creative process through natural language. However, personalizingthese models to align with user-provided visual concepts remains a challengingproblem. The task of T2I personalization poses multiple hard challenges, suchas maintaining high visual fidelity while allowing creative control, combiningmultiple personalized concepts in a single image, and keeping a small modelsize. We present Perfusion, a T2I personalization method that addresses thesechallenges using dynamic rank-1 updates to the underlying T2I model. Perfusionavoids overfitting by introducing a new mechanism that "locks" new concepts'cross-attention Keys to their superordinate category. Additionally, we developa gated rank-1 approach that enables us to control the influence of a learnedconcept during inference time and to combine multiple concepts. This allowsruntime-efficient balancing of visual-fidelity and textual-alignment with asingle 100KB trained model, which is five orders of magnitude smaller than thecurrent state of the art. Moreover, it can span different operating pointsacross the Pareto front without additional training. Finally, we show thatPerfusion outperforms strong baselines in both qualitative and quantitativeterms. Importantly, key-locking leads to novel results compared to traditionalapproaches, allowing to portray personalized object interactions inunprecedented ways, even in one-shot settings.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: