You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We investigate the potential of learning visual representations usingsynthetic images generated by text-to-image models. This is a natural questionin the light of the excellent performance of such models in generatinghigh-quality images. We consider specifically the Stable Diffusion, one of theleading open source text-to-image models. We show that (1) when the generativemodel is configured with proper classifier-free guidance scale, trainingself-supervised methods on synthetic images can match or beat the real imagecounterpart; (2) by treating the multiple images generated from the same textprompt as positives for each other, we develop a multi-positive contrastivelearning method, which we call StableRep. With solely synthetic images, therepresentations learned by StableRep surpass the performance of representationslearned by SimCLR and CLIP using the same set of text prompts and correspondingreal images, on large scale datasets. When we further add language supervision,StableRep trained with 20M synthetic images achieves better accuracy than CLIPtrained with 50M real images.
AkihikoWatanabe
changed the title
あ
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual
Representation Learners, Yonglong Tian+, N/A, arXiv'23
Jun 16, 2023
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: