# Generating Images from Text with Stable Diffusion

In this tutorial, we will be using a model called Stable Diffusion to generate images from text. We will explore how to use GPUs with Daft to accelerate computations.

To run this tutorial:

1. You will need to create a Huggingface account and an access token so that you can access the Stable Diffusion model: https://huggingface.co/docs/hub/security-tokens

2. You will need access to a GPU. If you are on Google Colab, you may switch to a GPU runtime by going to the menu `Runtime -> Change runtime type -> Hardware accelerator -> GPU -> Save`.

Let's get started!

In [None]:
!pip install getdaft
!pip install Pillow torch diffusers transformers

In [None]:
import os

# Replace with your auth token as a string
# See: https://huggingface.co/docs/hub/security-tokens
HUGGINGFACE_AUTH_TOKEN = os.getenv("HUGGINGFACE_AUTH_TOKEN", "")

## Setting Up

First, let's download a Parquet file containing some of the data that was used to train the Stable Diffusion model. This data is available on Huggingface as well, and we simply download the file to disk.

In [None]:
import os
import urllib.request

PARQUET_URL = "https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus/resolve/main/data/train-00000-of-00001-6f24a7497df494ae.parquet"
PARQUET_PATH = "laion_improved_aesthetics_6_5.parquet"

if not os.path.exists(PARQUET_PATH):
    with open(PARQUET_PATH, "wb") as f:
        response = urllib.request.urlopen(PARQUET_URL)
        f.write(response.read())

Now we can load this Parquet file into Daft and peek at the data like so:

In [None]:
from daft import DataFrame, col, udf

parquet_df = DataFrame.from_parquet(PARQUET_PATH)

In [None]:
parquet_df.show(10)

In [None]:
parquet_df = parquet_df.select(col("URL"), col("TEXT"), col("AESTHETIC_SCORE"))

## Downloading Images

Like many datasets, instead of storing the actual images in the dataset's files it looks like the Dataset authors have instead opted to store a URL to the image.

Let's use Daft's builtin functionality to download the images and open them as PIL Images - all in just a few lines of code!

In [None]:
import io
import PIL.Image


parquet_df_with_long_strings = parquet_df.where(col("TEXT").str.length() > 50)
images_df = parquet_df_with_long_strings.with_column(
    "image",
    # Download the images, then load them as PIL.Images if the download was successful
    col("URL").url.download().apply(lambda data: PIL.Image.open(io.BytesIO(data)) if data is not None else None),
)

In [None]:
%%time

images_df.show(5)

## Running a model (without a GPU)

We can run the Huggingface model without a GPU. Note that the next cell will take a while to run - almost 5 minutes! As such, we have commented out the line of code that runs the image generation, but you may run the code simply by uncommenting the last line of the cell.

In [None]:
import torch
from diffusers import DiffusionPipeline

@udf(return_type=PIL.Image.Image)
class GenerateImageFromText:
    
    def __init__(self):
        self.pipeline = DiffusionPipeline.from_pretrained(
            "CompVis/stable-diffusion-v1-4",
            use_auth_token=HUGGINGFACE_AUTH_TOKEN,
        )

    def __call__(self, text_col, num_steps=5):
        return [self.pipeline(t, num_inference_steps=num_steps)["sample"][0] for t in text_col]

# Uncomment the following line to run the cell which will take about 5 minutes.
# %time images_df.with_column("generated_image", GenerateImageFromText(col("TEXT"), num_steps=1)).show(1)

That took a long time, even when we only ran 5 steps of the model on only a single image (CompVis recommends running 50 steps - notice that the generated image is not very good). If you are on the default Google Colab runtime, this would have taken almost 5 minutes! Running it on more images and more steps would take too long.

Let's see how we can tell Daft that this UDF requires a GPU, and include a step to load our model on the GPU so that it runs much faster. Note that **the following cell will throw an error if you are not running on a machine with a GPU**.

In [None]:
import torch
from diffusers import DiffusionPipeline

# Tell Daft to use N number of GPUs with num_gpus=N
@udf(return_type=PIL.Image.Image, num_gpus=1)
class GenerateImageFromTextGPU:

    def __init__(self):
        self.pipeline = DiffusionPipeline.from_pretrained(
            "CompVis/stable-diffusion-v1-4",
            use_auth_token=HUGGINGFACE_AUTH_TOKEN,
        )
        # 1 GPU is now available to your code and can be used as per usual in your libraries such as PyTorch
        self.pipeline = self.pipeline.to("cuda:0")

    def __call__(self, text_col, num_steps=5):
        return [self.pipeline(t, num_inference_steps=num_steps)["sample"][0] for t in text_col]

%time images_df.with_column("generated_image", GenerateImageFromTextGPU(col("TEXT"), num_steps=30)).show(1)

Running the model on a GPU instead lets us run 30 steps in a minute. The generated image now looks much better, and we have a ~30x speedup from just using CPUs.