# Generate Images with Stable Diffusion
Use this notebook to create images of various scenes for any subject you prefer. We use Azure OpenAI to create scene descriptions based on your subject.

Example subject: __"animal"__ leads to scene discriptions like the following:  
   
"Full shot raw photo of a majestic white tiger lounging on a rock ledge in a lush green forest setting"  
"Wide angle photo of a playful group of dolphins leaping in the ocean waves with a sunlit horizon"  
"Wide shot raw photo of a flock of birds taking flight over a serene beach during sunrise"  
...

We are using these scene descriptions as input for text prompts to generate images with Stable Diffusion and Realistic Vision models. This way, you can easily generate hndreds of images of specific subject clusters (e.g., "animal", "man", "woman") as input for the image search demo scenarios of this repository.

Use the `image_classes` dictionary below to specify, the subject classes, number of images, model and other parameters based on your preferences.

## Setup

In [8]:
import os
import openai
import sys
sys.path.insert(0, '..')

import shutil
import matplotlib.pyplot as plt
from PIL import Image

from diffusers import StableDiffusionPipeline, AutoencoderKL, DPMSolverMultistepScheduler
import torch
from IPython.display import display

from utils import show_images

In [9]:
openai.api_type = "azure"
openai.api_base = os.getenv("AOAI_ENDPOINT")
openai.api_version = "2023-03-15-preview"
openai.api_key = os.getenv("AOAI_API_KEY")

In [10]:
def generate_prompt(subject):

    """Generate a photo scene description based on a subject"""

    messages = [{"role":"system","content":"You create a concise text prompt of one to two sentences to describe a scene of a photography based on a subject that the user provides. Be very creative and very diverse in selecting examples. Expand the given examples to add more aspects to the scene. "},{"role":"user","content":"man"},{"role":"assistant","content":"Medium shot raw photo of middle aged casually dressed English male tourist eating in a restaurant"},{"role":"user","content":"man"},{"role":"assistant","content":"Close up raw photo of young business formally dressed Asian businessman in a business meeting"},{"role":"user","content":"woman"},{"role":"assistant","content":"Medium shot raw photo of senior casually dressed African woman bicycling"},{"role":"user","content":"woman"},{"role":"assistant","content":"Full shot raw photo of young casually dressed German female tourist driving a a convertible"},{"role":"user","content":"animal"},{"role":"assistant","content":"Medium shot photo of a herd of elephants marching through the savannah with sunset in the background"},{"role":"user","content":"animal"},{"role":"assistant","content":"Closeup raw photo of a butterfly fluttering in a blooming garden with lake in the background"}]
    messages.append({"role":"user","content": subject})                            

    response = openai.ChatCompletion.create(
    engine="gpt-4",
    messages = messages,
    temperature=2.0,
    max_tokens=800,
    top_p=0.95,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None)

    return response['choices'][0]['message']['content']

def generate_images(pipe, prompt, suffix, negative_prompt, num_images=1):

    """Generate images based on Diffusers pipeline, scene prompt, suffic, and negative prompt"""
  
    images = pipe(prompt=prompt + suffix,
                negative_prompt=negative_prompt,
                height=512,
                width=768,
                guidance_scale=7.5,
                num_inference_steps=100,
                num_images_per_prompt=num_images,
                ).images
    
    return images

## Initialize Stable Diffusion Pipelines

In [11]:
# Realistic Vision 2.0 for images of people
rv20_pipe = StableDiffusionPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V2.0",
    # custom_pipeline= 'lpw_stable_diffusion',
    torch_dtype=torch.float16,
    ).to('cuda')

# Stable Diffusion 2.1 for images of other subjects (e.g., animals)
sd21_pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    custom_pipeline= 'lpw_stable_diffusion',
    torch_dtype= torch.float16,
    )

sd21_pipe.scheduler = DPMSolverMultistepScheduler.from_config(sd21_pipe.scheduler.config)
sd21_pipe.to('cuda')

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


StableDiffusionLongPromptWeightingPipeline {
  "_class_name": "StableDiffusionLongPromptWeightingPipeline",
  "_diffusers_version": "0.16.1",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "requires_safety_checker": false,
  "safety_checker": [
    null,
    null
  ],
  "scheduler": [
    "diffusers",
    "DPMSolverMultistepScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

## Specify classes based on desired subjects
Specify your desired image clusters (e.g., "man", "woman", "children"), the text2image vision model, number of samples, suffix to add to the scene description, and the negative prompt.   

In [12]:
root = './testimages' # root folder for generated images

rv20_suffix = ", (high detailed skin:1.2), 8k uhd, dslr, high quality, film grain, real-world, unedited, photorealistic, Fujifilm XT3, natural, authentic"
rv20_negative_prompt = '(semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck'

sd21_suffix = ", National Geographic Wildlife photo of the year, dslr, 8K UHD"
sd21_negative_prompt = 'deformed, disfigured, underexposed, overexposed'

image_classes = [
    {'classname': 'man', 'pipe' : rv20_pipe, 'samples' : 3, 'suffix' : rv20_suffix, 'negative_prompt' : rv20_negative_prompt },
    {'classname': 'woman', 'pipe' : rv20_pipe, 'samples' : 3, 'suffix' : rv20_suffix, 'negative_prompt' : rv20_negative_prompt },
    {'classname': 'children', 'pipe' : rv20_pipe, 'samples' : 3, 'suffix' : rv20_suffix, 'negative_prompt' : rv20_negative_prompt },
    {'classname': 'animal', 'pipe' : sd21_pipe, 'samples' : 3, 'suffix' : sd21_suffix, 'negative_prompt' : sd21_negative_prompt },
]

## Generate images

In [13]:
for classname in image_classes:

    image_path = os.path.join(root, classname['classname'])
    
    # delete folder if it exists. Create empty folder in any case
    if os.path.exists(image_path):
        shutil.rmtree(image_path)
    os.makedirs(image_path)

    for image_idx in range(classname['samples']):
        prompt = generate_prompt(classname['classname'])
        print(prompt)

        image = generate_images(
            pipe= classname['pipe'],
            prompt= prompt,
            suffix= classname['suffix'],
            negative_prompt= classname['negative_prompt'],
            )[0]

        image.save(os.path.join(image_path, f"{prompt}.png"))

Aerial shot raw photo of male urban hiphop street dancer performing in an outdoor event


  0%|          | 0/100 [00:00<?, ?it/s]

Full shot candid photo of a Latin-American street musician performing acoustic guitar in the midst of a bustling marketplace.


  0%|          | 0/100 [00:00<?, ?it/s]

Wide shot raw photo of a professional Hispanic skateboarder performing a stunt in an urban skate park.


  0%|          | 0/100 [00:00<?, ?it/s]

Wide shot photo of a latin American professional female dancer rehearsing gracefully in a rustic studio


  0%|          | 0/100 [00:00<?, ?it/s]

Aerial view raw photo of a Latina woman hiking on a narrow mountain trail, surrounded by lush forest foliage.


  0%|          | 0/100 [00:00<?, ?it/s]

Close-up photo of Latin female scientist examining chemical substances in a brightly lit laboratory


  0%|          | 0/100 [00:00<?, ?it/s]

Medium shot photo of a group of diverse children happily running on a vibrant playground in the park


  0%|          | 0/100 [00:00<?, ?it/s]

Full shot candid photo of a group of ethnically diverse children laughing and playing tag in a park


  0%|          | 0/100 [00:00<?, ?it/s]

Full shot raw photo of a diverse group of children joyfully playing in a colorful park with water splashing in a nearby fountain


  0%|          | 0/100 [00:00<?, ?it/s]

Wide-angle photo of an Australian kangaroo mid-hop in a rocky outback with native grasses blowing in the wind


  0%|          | 0/100 [00:00<?, ?it/s]

Aerial shot raw photo of a flock of colorful parrots flying in unison over a lush, tropical rainforest canopy.


  0%|          | 0/100 [00:00<?, ?it/s]

Full shot raw photo of a golden retriever frolicking on the shoreline during sunset at a bustling beach


  0%|          | 0/100 [00:00<?, ?it/s]

In [None]:
samples_per_class = 3

for classname in image_classes:
    print(f"{classname['classname']}:")
    image_path = os.path.join(root, classname['classname'])
    image_list = [os.path.join(image_path, image) for image in os.listdir(image_path) if image.endswith('.png')]

    show_images(images=image_list[:samples_per_class], cols=3, source='local')