# Project: "Text-to-Image Generator with Stable Diffusion"

## Utilize pre-trained generative models like DALL-E-mini or Stable Diffusion to create images from text prompts.

In this example, let’s focus on using Stable Diffusion via the popular diffusers library from Hugging Face, which simplifies working with such models.

### Project Overview

#### Set up Environment

In [1]:
!pip install torch diffusers transformers pillow

Collecting diffusers
  Obtaining dependency information for diffusers from https://files.pythonhosted.org/packages/d1/1c/2ad4e336fe8d83865810f32717a6b38ece3e90c2acc441cfadb5ce950eda/diffusers-0.30.3-py3-none-any.whl.metadata
  Downloading diffusers-0.30.3-py3-none-any.whl.metadata (18 kB)
Downloading diffusers-0.30.3-py3-none-any.whl (2.7 MB)
   ---------------------------------------- 0.0/2.7 MB ? eta -:--:--
   -- ------------------------------------- 0.1/2.7 MB 2.8 MB/s eta 0:00:01
   ---- ----------------------------------- 0.3/2.7 MB 3.2 MB/s eta 0:00:01
   ------ --------------------------------- 0.5/2.7 MB 3.1 MB/s eta 0:00:01
   -------- ------------------------------- 0.6/2.7 MB 3.4 MB/s eta 0:00:01
   ---------- ----------------------------- 0.7/2.7 MB 3.3 MB/s eta 0:00:01
   ------------- -------------------------- 0.9/2.7 MB 3.2 MB/s eta 0:00:01
   --------------- ------------------------ 1.0/2.7 MB 3.2 MB/s eta 0:00:01
   ----------------- ---------------------- 1.2/2.7 MB

#### Import Libraries

In [2]:
from diffusers import StableDiffusionPipeline
import torch
from PIL import Image




  torch.utils._pytree._register_pytree_node(
  torch.utils._pytree._register_pytree_node(


#### Load the Stable Diffusion Model

In [3]:
# Load the pre-trained Stable Diffusion model from Hugging Face
model_id = "CompVis/stable-diffusion-v1-4"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize the pipeline
pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe = pipe.to(device)

model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

safety_checker/config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

(…)kpoints/scheduler_config-checkpoint.json:   0%|          | 0.00/209 [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.


#### Generate Image from Text

In [4]:
def generate_image(prompt, image_size=(512, 512)):
    # Generate an image from the text prompt
    with torch.autocast(device):
        image = pipe(prompt)["images"][0]  # Get the generated image
        return image

#### Save or Display the Generated Image

In [5]:
def save_image(image, filename="generated_image.png"):
    # Save the image to disk
    image.save(filename)
    print(f"Image saved as {filename}")

def show_image(image):
    # Display the image
    image.show()

#### Main Function to Run the Project

In [6]:
def main():
    # Get text prompt from user
    prompt = input("Enter your text prompt: ")

    # Generate image based on prompt
    generated_image = generate_image(prompt)

    # Save and show the image
    show_image(generated_image)
    save_image(generated_image)

if __name__ == "__main__":
    main()

Enter your text prompt: CAT AND DOG FIGHTING EACH OTHER


  0%|          | 0/50 [00:00<?, ?it/s]

Image saved as generated_image.png
