# Conditional image generation

Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.

The [DiffusionPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline) is the easiest way to use a pre-trained diffusion system for inference.

Start by creating an instance of [DiffusionPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline) and specify which pipeline [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) you would like to download.

In this guide, you'll use [DiffusionPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline) for text-to-image generation with [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5):

In [8]:
from datetime import datetime

# current_datetime = datetime.now()
# print(current_datetime)
# datetime.now().strftime('%Y%m%d_%H%M%S')

def now2str():
  return datetime.now().strftime('%Y%m%d_%H%M%S')

now2str()

'20250308_042054'

In [26]:
!pip install ipdb

from ipdb import set_trace as breakpoint

Collecting ipdb
  Downloading ipdb-0.13.13-py3-none-any.whl.metadata (14 kB)
Collecting jedi>=0.16 (from ipython>=7.31.1->ipdb)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading ipdb-0.13.13-py3-none-any.whl (12 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jedi, ipdb
Successfully installed ipdb-0.13.13 jedi-0.19.2


In [1]:
from diffusers import DiffusionPipeline
from diffusers import AutoPipelineForText2Image
import torch

model_name = "stable-diffusion-v1-5"
model_path = "runwayml/stable-diffusion-v1-5"



pipeline = DiffusionPipeline.from_pretrained(model_path)
# pipeline = AutoPipelineForText2Image.from_pretrained(model_path)
# -- https://huggingface.co/docs/diffusers/en/using-diffusers/conditional_image_generation

pipeline.to("cuda")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.32.2",
  "_name_or_path": "runwayml/stable-diffusion-v1-5",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "image_encoder": [
    null,
    null
  ],
  "requires_safety_checker": true,
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

The [DiffusionPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline) downloads and caches all modeling, tokenization, and scheduling components.
Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU.
You can move the generator object to a GPU, just like you would in PyTorch:

## 1. generate celeba-like images from this model


In [3]:

prompt = "a face of a celebrity"
image_domain = 'celeba'
n_samples = 16
for i in range(n_samples):
  image = pipeline(prompt).images[0]
  image.save(f"{model_name}_{image_domain}_{i}.png")
  print(f'done: {i}')

  0%|          | 0/50 [00:00<?, ?it/s]

done: 0


  0%|          | 0/50 [00:00<?, ?it/s]

done: 1


  0%|          | 0/50 [00:00<?, ?it/s]

done: 2


  0%|          | 0/50 [00:00<?, ?it/s]

done: 3


  0%|          | 0/50 [00:00<?, ?it/s]

done: 4


  0%|          | 0/50 [00:00<?, ?it/s]

done: 5


  0%|          | 0/50 [00:00<?, ?it/s]

done: 6


  0%|          | 0/50 [00:00<?, ?it/s]

done: 7


  0%|          | 0/50 [00:00<?, ?it/s]

done: 8


  0%|          | 0/50 [00:00<?, ?it/s]

done: 9


  0%|          | 0/50 [00:00<?, ?it/s]

done: 10


  0%|          | 0/50 [00:00<?, ?it/s]

done: 11


  0%|          | 0/50 [00:00<?, ?it/s]

done: 12


  0%|          | 0/50 [00:00<?, ?it/s]

done: 13


  0%|          | 0/50 [00:00<?, ?it/s]

done: 14


  0%|          | 0/50 [00:00<?, ?it/s]

done: 15


**bold text**

In [None]:
# def generate_images(model_name:str, prompt:str, n_samples:int, image_domain:str,
#                     verbose:bool=False):
#   for i in range(n_samples):
#     image = pipeline(prompt).images[0]
#     image.save(f"{model_name}_{image_domain}_{i}.png")
#     if verbose:
#       print(f'done: {model_name} -- {image_domain}_{i}')

# def generate_celeb_faces(model_name:str, n_samples:int=16):
#   prompt = "a face of a celebrity"
#   generate_images(model_name, prompt, n_samples, image_domain='celeba')

# def generate_cifar10_class(model_name:str, cifar_label:str, n_samples:int=16):
#   prompt = f"an image of a {cifar_label}"
#   generate_images(model_name, prompt, n_samples, image_domain=f'cifar10-{cifar_label}')

## 2. generate cifar10-like images from this model:
- do generation for each label, by setting proper prompts

In [4]:

image_domain = 'cifar10'
cifar10_labels = ['car'] #todo
# cifar10_labels = [
#   airplane,
#   automobile,
#   bird,
#   cat,
#   deer,
#   dog,
#   frog,
#   horse,
#   ship,
#   truck
# ]
n_samples = 16

for label in cifar10_labels:
  prompt = f'an image of {label}'
  for i in range(n_samples):
    image = pipeline(prompt).images[0]
    image.save(f"{model_name}_{image_domain}-{label}_{i}.png")
    print(f'done: {i}')



  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


  0%|          | 0/50 [00:00<?, ?it/s]

done: {i}


# Run it for all models to generate both celeb faces and cifar10-{label} images
- for each class: get 16 samples for dataset visualization


In [15]:
def generate_images(model_name:str, prompt:str, n_samples:int, image_domain:str,
                    out_dir:str="",
                    add_timestamp:bool=True,
                    verbose:bool=False):
  for i in range(n_samples):
    image = pipeline(prompt).images[0]

    # save
    if out_dir:
      image.save(f"{out_dir}/{model_name}_{image_domain}_{i}_{now2str()}.png")
    else:
      image.save(f"{model_name}_{image_domain}_{i}_{now2str()}.png")
    if verbose:
      print(f'done: {model_name} -- {image_domain}_{i}')

def generate_celeb_faces(model_name:str, n_samples:int=16, verbose:bool=False):
  prompt = "a face of a celebrity"
  generate_images(model_name, prompt, n_samples, image_domain='celeba',
                  out_dir=model_name,
                  verbose=verbose)

def generate_cifar10_class(model_name:str, cifar_label:str, n_samples:int=16, verbose:bool=False):
  prompt = f"an image of a {cifar_label}"
  generate_images(model_name, prompt, n_samples, image_domain=f'cifar10-{cifar_label}',
                  out_dir=model_name,
                  verbose=verbose)

In [51]:
###################
# model_name = "flux1-dev"
# model_path = "black-forest-labs/FLUX.1-dev"

##################
# create a model dict
dict_models = {
    # -- stable diffusions
    # "stable-diffusion-v1-5": "runwayml/stable-diffusion-v1-5",  #https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
    ##### "stable-diffusion-3.5-large": "stabilityai/stable-diffusion-3.5-large", #https://huggingface.co/stabilityai/stable-diffusion-3.5-large
    #
    # -- flex models
    # "flux1-dev": "black-forest-labs/FLUX.1-dev", #https://huggingface.co/black-forest-labs/FLUX.1-dev
    "flux1-schneller": "black-forest-labs/FLUX.1-schnell", # https://huggingface.co/black-forest-labs/FLUX.1-schnell
    ####"flux.1-faster": "black-forest-labs/FLUX.1-faster",
    #
    # -- midjourney-like
    #### "midjourney-v4-diffusion": "prompthero/midjourney-v4-diffusion",
    "openjourney": "prompthero/openjourney",  #https://huggingface.co/prompthero/openjourney
    #
    # -- dalle-like
    # -->need extra step so do it separately
    # "dalle-3-xl-lora-v2": "fluently/Fluently-XL-v2", #https://huggingface.co/ehristoforu/dalle-3-xl-v2
}



In [48]:
# create folders for each model
from pathlib import Path
for model_name in dict_models.keys():
  Path(model_name).mkdir(parents=True, exist_ok=True)

In [41]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: fineGrained).
The token `test` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authent

In [None]:


# test: generate_images
n_samples = 1
cifar10_labels = [
  "airplane",
  "automobile",
  "bird",
  "cat",
  "deer",
  "dog",
  "frog",
  "horse",
  "ship",
  "truck"
]


##### start sampling:
for model_name, model_path in dict_models.items():
  print('='*10)
  print(f'model_name: {model_name}')

  #load pretrined model
  # pipeline = AutoPipelineForText2Image.from_pretrained(model_path)
  pipeline = DiffusionPipeline.from_pretrained(model_path)
  pipeline.to("cuda")
  print('Loaded: ',  model_name)


  # geneate celeb faces
  generate_celeb_faces(model_name, n_samples=n_samples, verbose=True)
  print('Done: celeb faces')


  # generate cifar10 - each label:
  for label in cifar10_labels:
    generate_cifar10_class(model_name, label, n_samples=n_samples, verbose=True)
    print(f'Done: cifar10-{label}')

  # breakpoint()

  # delete model from gpu memory
  pipeline.to('cpu')
  del pipeline

  gc.collect()
  torch.cuda.empty_cache()


  # break


model_name: flux1-schneller


Fetching 23 files:   0%|          | 0/23 [00:00<?, ?it/s]



---



# For garbage collection:
https://stackoverflow.com/a/78704570



In [29]:
import gc
import torch

pipeline.to('cpu')
del pipeline

gc.collect()
torch.cuda.empty_cache()

In [50]:
pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

model_index.json:   0%|          | 0.00/536 [00:00<?, ?B/s]

Fetching 23 files:   0%|          | 0/23 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/782 [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.53G [00:00<?, ?B/s]

config.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/19.9k [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/274 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/588 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/705 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/20.8k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/321 [00:00<?, ?B/s]

(…)pytorch_model-00001-of-00003.safetensors:   0%|          | 0.00/9.96G [00:00<?, ?B/s]

(…)pytorch_model-00002-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

(…)pytorch_model-00003-of-00003.safetensors:   0%|          | 0.00/3.87G [00:00<?, ?B/s]

(…)ion_pytorch_model.safetensors.index.json:   0%|          | 0.00/121k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/774 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

KeyboardInterrupt: 