# Chapter 1: Basic Configuration & Structure

---

Stable Diffusion is a latent diffusion model model that produces unique photorealistic images from text and image prompts; capabilities include text-to-image, image-to-image, graphic artwork, image editing, and video creation. Compared to other diffusion-based models, Stable Diffusion significantly reduces processing requirements, allowing users to run the model on lower-cost machines. We will be running Stable Diffusion on Amazon Bedrock (i.e,. via API and run on AWS-managed servers) for text-to-image-based generation. More specifically, we will be using SDXL, a more enhanced version of the Stable Diffusion model that can capture more intricate features and produce superior images.

**Lesson:**

To generate an image, Stable Diffusion produces a completely random image (or sample). The model then iterates through a specified number of steps of "denoising" to produce a final sample that is of the desired quality. Prompts must be composed of a term, phrase, or group of words and phrases, separated by commas. The more details provided via the prompt, the higher likelihood the final image will be of desired quality. As a text-to-image model, Stable Diffusion forgoes several of the parameters familiar to text-to-text models, namely temperature, top_p, and top_k. We will explore Stable Diffusion's parameters below.

First, we will setup our dependencies.

In [None]:
%%capture
# Install dependencies
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

%pip install --quiet "pillow>=9.5,<10"

# Python Built-Ins:
import base64
import io
import json
import os
import sys

# External Dependencies:
import boto3
from PIL import Image

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

modelId = "stability.stable-diffusion-xl"

## Basic Prompt Components and Parameters

*prompt*: the basic unit of prompt engineering and the focus of most of our discussion. We will cover techniques in depth in preceding chapters. For now, feel free to experiment with this field

*weight* (optional): the weight that the model should apply to the prompt; default is 1 and a value that is less than zero declares a negative prompt

*negative prompts* (optional): similar to the prompt, but here you input what you **don't** want, providing an additional way to control image generation. Uses include removing or tweaking unwanted elements witin images (e.g., trees on a field, waves in the ocean, etc.) or adjusting the image's style (e.g., less blurry or amateur)

*cfg_scale* (optional): the classifier-free guidance scale controls how much the image generation process follows the text prompt; a higher value generates images closer to the prompt, but with possibly less diversity and quality. The minimum is 1 (prompt is ignored and generation is allowed creative freedom), while the maximum is 20 (prompt is followed strictly with possible image degradation); default is 7

*seed* (optional): a number used to initialize generation (i.e., determines the initial noise setting); if left blank, the seed is randomly generated (default is 0), but specifying the seed allows reproduction of images (i.e., if the seed, parameters, and prompt are left the same, the same image will be generated every time

*steps* (optional): number of denoising iterations, which will create higher quality images with higher numbers. Note: as you approach 50 steps (the max), the quality increases marginally and/or the model begins to generate unwanted articles in image; default is 30

*style_preset* (optional): guides image generation model towards a particular style. Available style_preset parameters include enhance, anime, photographic, digital-art, comic-book, fantasy-art, line-art, analog-film, neon-punk, isometric, low-poly, origami, modeling-compound, cinematic, 3d-mode, pixel-art, and tile-texture

*clip_guidance_preset* (optional): checks how much the final image matches the given prompt. Possible values include NONE, FAST_BLUE, FAST_GREEN, SIMPLE, SLOW, SLOWER, SLOWEST (e.g., simple provides more realism and subtle color gradient, while fast provides deep contrast and more saturated colors); default is None

*sampler* (optional): the method used in the sampling / denoising process to converge on an image with the desired quality. Possible sampling methods include DDIM, DDPM, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, and K_LMS (if no value is indicated, the appropriate sampler is automatically selected)

width: width of the image to generate, in pixels, in an increment divible by 64. Image dimension must be 1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, or 896x1152

## Example:

Let's start with a simple prompt to see what image is generated with the default parameter values

In [None]:
prompt = "beautiful mountain"

While the above code provides the minimum input necessary for the request to Amazon Bedrock, the below finalizes the request and response formatting and model invokation. The response is decoded and outputted as a png file in the "data" directory.

In [None]:
request = json.dumps({
    "text_prompts": (
        [{"text": prompt}]
    )
})

response = boto3_bedrock.invoke_model(body=request, modelId=modelId)
response_body = json.loads(response.get("body").read())

print(response_body["result"])
base_64_img_str = response_body["artifacts"][0].get("base64")
print(f"{base_64_img_str[0:80]}...")

os.makedirs("data", exist_ok=True)
eg1_1 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, "utf-8"))))
eg1_1.save("data/eg1_1.png")
eg1_1

## Exercises:

With most of the parameters now included in our request to Bedrock, let's experiment.

**Exercise 1.1 - A more beautiful mountain**

Using proper formatting and the included parameters, generate an image of a mountain with the following requirements:
 - use 3 descriptive values / phrases
 - cinematic style
 - ensure the image is relatively the same every time you invoke the model

In [None]:
prompt = "INSERT PROMPT"

cfg_scale = 
seed = 
steps = 
style_preset =   # (e.g. photographic, digital-art, cinematic, ...)
clip_guidance_preset =  # (e.g. FAST_BLUE FAST_GREEN NONE SIMPLE SLOW SLOWER SLOWEST)
sampler =  # (e.g. DDIM, DDPM, K_DPMPP_SDE, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS)
width = 

In [None]:
request = json.dumps({
    "text_prompts": (
        [{"text": prompt}]
    ),
    "cfg_scale": cfg_scale,
    "seed": seed,
    "steps": steps,
    "style_preset": style_preset,
    "clip_guidance_preset": clip_guidance_preset,
    "sampler": sampler,
    "width": width
})

response = boto3_bedrock.invoke_model(body=request, modelId=modelId)
response_body = json.loads(response.get("body").read())

print(response_body["result"])
base_64_img_str = response_body["artifacts"][0].get("base64")
print(f"{base_64_img_str[0:80]}...")

os.makedirs("data", exist_ok=True)
ex1_1 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, "utf-8"))))
ex1_1.save("data/ex1_1.png")
ex1_1

**Exercise 1.2 - The most beautiful mountain**

Keep tweaking the parameters and prompt to see how you can create an even more high quality, beautiful mountain with the following requirements:

 - add 3 elements to the image (e.g., a river, house, etc.)
 - randomize the model more so the image isn't the same every time and the prompt isn't followed as strictly

In [None]:
prompt = "INSERT PROMPT"

cfg_scale = 
seed = 
steps = 
style_preset =   # (e.g. photographic, digital-art, cinematic, ...)
clip_guidance_preset =  # (e.g. FAST_BLUE FAST_GREEN NONE SIMPLE SLOW SLOWER SLOWEST)
sampler =  # (e.g. DDIM, DDPM, K_DPMPP_SDE, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS)
width = 

In [None]:
request = json.dumps({
    "text_prompts": (
        [{"text": prompt}]
    ),
    "cfg_scale": cfg_scale,
    "seed": seed,
    "steps": steps,
    "style_preset": style_preset,
    "clip_guidance_preset": clip_guidance_preset,
    "sampler": sampler,
    "width": width,
})

response = boto3_bedrock.invoke_model(body=request, modelId=modelId)
response_body = json.loads(response.get("body").read())

print(response_body["result"])
base_64_img_str = response_body["artifacts"][0].get("base64")
print(f"{base_64_img_str[0:80]}...")

os.makedirs("data", exist_ok=True)
ex1_2 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, "utf-8"))))
ex1_2.save("data/ex1_2.png")
ex1_2