# Generating Images from Text with DALL-E

In this tutorial, we will be using the DALL-E model to generate images from text. We will explore how to use GPUs with Daft to accelerate computations.

To run this tutorial:

1. You will need access to a GPU. If you are on Google Colab, you may switch to a GPU runtime by going to the menu `Runtime -> Change runtime type -> Hardware accelerator -> GPU -> Save`.

Let's get started!

In [None]:
!pip install getdaft --pre --extra-index-url https://pypi.anaconda.org/daft-nightly/simple
!pip install min-dalle torch Pillow

## Setting Up

First, let's download a Parquet file containing some example data from the laion_improved_aesthetics 6.5 dataset.

In [None]:
import os
import urllib.request

PARQUET_URL = "https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus/resolve/main/data/train-00000-of-00001-6f24a7497df494ae.parquet"
PARQUET_PATH = "laion_improved_aesthetics_6_5.parquet"

if not os.path.exists(PARQUET_PATH):
    with open(PARQUET_PATH, "wb") as f:
        response = urllib.request.urlopen(PARQUET_URL)
        f.write(response.read())

Now we can load this Parquet file into Daft and peek at the data like so:

In [None]:
from daft import DataFrame, col, udf

parquet_df = DataFrame.read_parquet(PARQUET_PATH)

In [None]:
parquet_df.show(10)

In [None]:
parquet_df = parquet_df.select(col("URL"), col("TEXT"), col("AESTHETIC_SCORE"))

## Downloading Images

Like many datasets, instead of storing the actual images in the dataset's files it looks like the Dataset authors have instead opted to store a URL to the image.

Let's use Daft's builtin functionality to download the images and open them as PIL Images - all in just a few lines of code!

In [None]:
import io
import PIL.Image


parquet_df_with_long_strings = parquet_df.where(col("TEXT").str.length() > 50)
images_df = parquet_df_with_long_strings.with_column(
    "image",
    # Download the images, then load them as PIL.Images if the download was successful
    col("URL").url.download().apply(lambda data: PIL.Image.open(io.BytesIO(data)) if data is not None else None),
)

In [None]:
%%time

images_df.show(5)

# Downloading the Model

Let's download the model's weights - the `min-dalle` library that we are using here allows us to cache the downloaded model weights on disk by calling some `.download_*` methods. Since this tutorial is ran entirely on the local machine, this will speed up all subsequent steps by reusing the downloaded model weights!

In [None]:
%%time

import torch
from min_dalle import MinDalle

model = MinDalle(
    models_root='./pretrained',
    dtype=torch.float32,
    device="cpu",
    is_mega=False, 
    is_reusable=False,
)
model.download_encoder()
model.download_decoder()
model.download_detokenizer()
del model

## Running a model (without a GPU)

Let's run the model on our data without a GPU. Note that the next cell will take a while to run - almost 2 minutes!

In [None]:
import torch
from min_dalle import MinDalle


@udf(return_type=PIL.Image.Image)
class GenerateImageFromText:
    
    def __init__(self):
        self.model = MinDalle(
            models_root='./pretrained',
            dtype=torch.float32,
            device="cpu",
            is_mega=False, 
            is_reusable=True
        )

    def __call__(self, text_col):
        return [
            self.model.generate_image(
                t,
                seed=-1,
                grid_size=1,
                is_seamless=False,
                temperature=1,
                top_k=256,
                supercondition_factor=32,
            ) for t in text_col
        ]

# Uncomment the following line to run the cell which will take about 2 minutes.
# %time images_df.with_column("generated_image", GenerateImageFromText(col("TEXT"))).show(1)

That took a long time since our model was running only on the CPU. If you are on the default Google Colab runtime, this would have taken almost 2 minutes! Running it on more images and more steps would take too long.

Let's see how we can tell Daft that this UDF requires a GPU, and load the model to run on a GPU instead. Note that **the following cell will throw an error if you are not running on a machine with a GPU**.

In [None]:
import torch
from min_dalle import MinDalle

# Tell Daft to use N number of GPUs with num_gpus=N
@udf(return_type=PIL.Image.Image, num_gpus=1)
class GenerateImageFromTextGPU:
    
    def __init__(self):
        self.model = MinDalle(
            models_root='./pretrained',
            dtype=torch.float32,
            # Tell the min-dalle library to load model on GPU
            device="cuda",
            is_mega=False, 
            is_reusable=True
        )

    def __call__(self, text_col):
        return [
            self.model.generate_image(
                t,
                seed=-1,
                grid_size=1,
                is_seamless=False,
                temperature=1,
                top_k=256,
                supercondition_factor=32,
            ) for t in text_col
        ]

%time images_df.with_column("generated_image", GenerateImageFromTextGPU(col("TEXT"))).show(1)

Much better! On Google Colab, this runs in just under 15 seconds which gives us a speedup of about 8x just by running the model on a GPU instead.