## Image Generation using Diffusion Models: **Stable Diffusion, FLUX ...**

> The current notebook is part of [AI Image Editing and Manipulation](https://github.com/afondiel/computer-vision-challenge/blob/main/L2_06_AI_Assisted_Image_Editing_and_Manipulation/) pipeline from [**Computer Vision Challenge**](https://github.com/afondiel/computer-vision-challenge).

<img width="400" height="400" src="https://github.com/afondiel/computer-vision-challenge/blob/main/L2_06_AI_Assisted_Image_Editing_and_Manipulation/docs/pipeline-last.png?raw=true">


|--|Notebook|Demo (Gradio)|
|--|--|--|
||[![Open notebook in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/afondiel/computer-vision-challenge/blob/main/L2_06_AI_Assisted_Image_Editing_and_Manipulation/notebooks/Image_Generator_Diffusion_Models.ipynb)|[HF Space (Ongoing)](https://huggingface.co/spaces)|

### Install Dependencies

In [1]:
# Hugging Face ecosystem
!pip install diffusers transformers accelerate scipy safetensors -U

Collecting diffusers
  Downloading diffusers-0.30.1-py3-none-any.whl.metadata (18 kB)
Collecting transformers
  Downloading transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate
  Downloading accelerate-0.33.0-py3-none-any.whl.metadata (18 kB)
Collecting scipy
  Downloading scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Downloading diffusers-0.30.1-py3-none-any.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading transformers-4.44.2-py3-none-any.whl (9.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m101.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ac

In [2]:
# torch ecosystem
!pip install torch torchvision torchaudio fastai timm -U



In [3]:
# gradio modules
!pip install gradio gradio_imageslider -U

Collecting gradio
  Using cached gradio-4.42.0-py3-none-any.whl.metadata (15 kB)
Collecting gradio_imageslider
  Using cached gradio_imageslider-0.0.20-py3-none-any.whl.metadata (10 kB)
Using cached gradio-4.42.0-py3-none-any.whl (16.8 MB)
Using cached gradio_imageslider-0.0.20-py3-none-any.whl (101 kB)
Installing collected packages: gradio, gradio_imageslider
Successfully installed gradio-4.42.0 gradio_imageslider-0.0.20


In [5]:
import os
import numpy as np
import torch
from diffusers import DiffusionPipeline, EulerDiscreteScheduler, LMSDiscreteScheduler
from PIL import Image
import logging
import gradio as gr
import matplotlib.pyplot as plt

### Preprocessing & Transformation

In [6]:
# display image function
def display_image(image, model_id):
    plt.figure(figsize=(10, 10))
    plt.imshow(image)
    plt.title(f"Generated Image: {model_id} Model")
    plt.axis("off")
    plt.show()

### Image Generator

In [7]:
# ----------------------------------------------
# Stable Diffusion: v2, v2-base ...
# ----------------------------------------------

def image_generator_sd(
    model_id: str,
    prompt: str,
    negative_prompt: str = None,
    # num_inference_steps: int = 25, # Lighter on cpu
    num_inference_steps: int = 50,
    guidance_scale: float = 7.5,
    height: int = 512,
    width: int = 512,
    seed: int = None,
    strength: float = 0.75,  # Only used for inpainting or image-to-image tasks
    eta: float = 0.0,
    output_path: str = "generated_image_sd.png",
    output_format: str = "PNG",
    callback=None,
    callback_steps: int = 1
) -> Image.Image:
    """
    Generate an image from a text prompt using a Stable Diffusion model.

    Args:
        model_id (str): The identifier of the model to use (e.g., "stabilityai/stable-diffusion-2").
        prompt (str): The text prompt to guide the image generation.
        negative_prompt (str, optional): A negative prompt to suppress certain features (default is None).
        num_inference_steps (int, optional): Number of inference steps (default is 50).
        guidance_scale (float, optional): The guidance scale (default is 7.5).
        height (int, optional): The height of the generated image (default is 512).
        width (int, optional): The width of the generated image (default is 512).
        seed (int, optional): Seed for reproducibility (default is None).
        strength (float, optional): Strength of the prompt (default is 0.75).
        eta (float, optional): Diffusion hyperparameter (default is 0.0).
        output_path (str, optional): Path to save the generated image (default is "generated_image.png").
        output_format (str, optional): Format to save the generated image (default is "PNG").
        callback (function, optional): A callback function to monitor the progress (default is None).
        callback_steps (int, optional): Number of steps between callbacks (default is 1).

    Returns:
        Image.Image: The generated image as a PIL Image.
    """
    try:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # Set up the scheduler and pipeline
        scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
        sd_pipe = DiffusionPipeline.from_pretrained(
            model_id,
            scheduler=scheduler,
            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.bfloat16,
            low_cpu_mem_usage=True,
        ).to(device)

        # Set the random seed for reproducibility
        generator = torch.Generator(device).torch.manual_seed(seed) if seed is not None else None

        # Generate the image
        output = sd_pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
            height=height,
            width=width,
            strength=strength,
            eta=eta,
            generator=generator,
            callback=callback,
            callback_steps=callback_steps,
        )

        generated_image = output.images[0]
        generated_image.save(output_path, format=output_format)  # Save the generated image

        return generated_image

    except Exception as e:
        logging.error(f"Failed to generate image: {str(e)}")
        return None


> **WARNING**

- I went out of memory trying to run FLUX on CPU with high RAM ~51 GB
- Make sure you have at least T4 GPU available (free on Colab) and 4 to 32GB of VRAM. I manage to generate image with 15GB VRAM


In [8]:
# ----------------------------------------------
# FLUX 1: Schnell, Dev ...
# ----------------------------------------------
import torch
from diffusers import FluxPipeline

def image_generator_flux(
    model_id: str,
    prompt: str,
    num_inference_steps: int = 4, # lower compared to stable diffusion model
    guidance_scale: float = 0.0,
    height: int = 512,
    width: int = 512,
    seed: int = 0,
    output_type="pil",
    output_path: str = "generated_image_flux.png",
    output_format: str = "PNG",
) -> Image.Image:
    """
    Generate an image from a text prompt using a Stable Diffusion model.

    Args:
        model_id (str): The identifier of the model to use.
        prompt (str): The text prompt to guide the image generation.
        num_inference_steps (int, optional): Number of inference steps (default is 4).
        guidance_scale (float, optional): The guidance scale (default is 0.0).
        height (int, optional): The height of the generated image (default is 512).
        width (int, optional): The width of the generated image (default is 512).
        seed (int, optional): Seed for reproducibility (default is 0).
        output_path (str, optional): Path to save the generated image (default is "generated_image_flux.png").
        output_format (str, optional): Format to save the generated image (default is "PNG").

    Returns:
        Image.Image: The generated image as a PIL Image.
    """
    try:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # Set up the scheduler and pipeline
        flux_pipe = FluxPipeline.from_pretrained(
            model_id,
            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        )
        # save some VRAM by offloading the model to CPU.
        if device.type == "cpu":
          flux_pipe.enable_model_cpu_offload() # Remove this if you have enough GPU power
        else:
          # to run on low vram GPUs (i.e. between 4 and 32 GB VRAM)
          flux_pipe.enable_sequential_cpu_offload()
          flux_pipe.vae.enable_slicing()
          flux_pipe.vae.enable_tiling()

          flux_pipe.to(torch.float16) # casting here instead of in the pipeline constructor because doing so in the constructor loads all models into CPU memory at once

        # Generate the image
        output = flux_pipe(
            prompt=prompt,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
            height=height,
            width=width,
            # generator=torch.Generator("cpu").manual_seed(0), # uncomment this if you're on cpu
            max_sequence_length=256,
        )

        generated_image = output.images[0]
        generated_image.save(output_path, format=output_format)  # Save the generated image

        return generated_image

    except Exception as e:
        logging.error(f"Failed to generate image: {str(e)}")
        return None


### App: Gradio Demo (Ongoing ...)

In [9]:
# ------------------------------------
# Gradio App
# ------------------------------------
def launch_gradio_demo():
    """
    Launches a Gradio demo for generating images from text prompts and sketches.
    """
    def image_generator_sd_gr(prompt, negative_prompt, sketch):
        model_id = "stabilityai/stable-diffusion-2-base"
        generated_image = image_generator_sd(
            model_id=model_id,
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=50,
            guidance_scale=7.5,
            height=512,
            width=512,
            seed=None,
            strength=0.75,
            eta=0.0,
            output_path="generated_image.png",
            output_format="PNG"
        )
        return generated_image

    # Gradio Interface
    inputs = [
        gr.Textbox(label="Text Prompt", placeholder="Enter your prompt here"),
        gr.Textbox(label="Negative Prompt", placeholder="Enter negative prompt (optional)"),
        # gr.Sketchpad(label="Sketch", shape=(512, 512))
        gr.Sketchpad(label="Sketch")
    ]
    outputs = gr.Image(label="Generated Image")

    demo = gr.Interface(
        fn=image_generator_sd_gr,
        inputs=inputs,
        outputs=outputs,
        title="AI Image Generator",
        description="Generate images based on text prompts and sketches using Stable Diffusion."
    )

    demo.launch()

In [None]:
# ----------------------------------------------
# Main: running the app
# ----------------------------------------------
if __name__ == "__main__":
  model_types = {
      "SD": "stabilityai/stable-diffusion-2-base",
      "FLUX": "black-forest-labs/FLUX.1-schnell"
  }

  # selected_model = "SD"
  selected_model = "FLUX"
  prompt="a photograph of an astronaut riding a horse"


  if selected_model == "SD":
    output_image = image_generator_sd(model_id=model_types[selected_model], prompt=prompt )
  elif selected_model == "FLUX":
    output_image = image_generator_flux(model_id=model_types[selected_model], prompt=prompt)
  else:
    print("Invalid model selected")

  # Visualization
  if output_image is not None:
    display_image(output_image, selected_model)
  else:
    print("Failed to generate image. Please check the logs for more details.")
    print(f'output_image:{output_image}, Obj-type:{type(output_image)}')


  # Launch the Gradio demo
  # launch_gradio_demo()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/536 [00:00<?, ?B/s]

Fetching 23 files:   0%|          | 0/23 [00:00<?, ?it/s]

text_encoder/config.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/274 [00:00<?, ?B/s]

text_encoder_2/config.json:   0%|          | 0.00/782 [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

(…)t_encoder_2/model.safetensors.index.json:   0%|          | 0.00/19.9k [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.53G [00:00<?, ?B/s]

tokenizer_2/special_tokens_map.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/705 [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/588 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer_2/tokenizer_config.json:   0%|          | 0.00/20.8k [00:00<?, ?B/s]

transformer/config.json:   0%|          | 0.00/321 [00:00<?, ?B/s]

tokenizer_2/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

(…)pytorch_model-00001-of-00003.safetensors:   0%|          | 0.00/9.96G [00:00<?, ?B/s]

(…)pytorch_model-00002-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

(…)pytorch_model-00003-of-00003.safetensors:   0%|          | 0.00/3.87G [00:00<?, ?B/s]

(…)ion_pytorch_model.safetensors.index.json:   0%|          | 0.00/121k [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/774 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## References

**Documentation:**

GRADIO:
- https://www.gradio.app/changelog
- https://www.gradio.app/docs/gradio/imageeditor

STABLE DIFFUSION:
- https://huggingface.co/blog/stable_diffusion

FLUX:
- https://huggingface.co/docs/diffusers/main/api/pipelines/flux





