<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/main/Class_04_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

##### **Module 4: ChatGPT and Large Language Models**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 4 Material

* Part 4.1: Introduction to Large Language Models (LLMs)
* Part 4.2: Chatbots
* **Part 4.3: Image Generation with StableDiffusion**
* Part 4.4: Image Generation with DALL-E


## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    Colab = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    Colab = False

You should see the following output except your GMAIL address should appear on the last line.

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_1_image01B.png)

If your GMAIL address does not appear your lesson will **not** be graded.


## **Accelerated Run-time Check**

You MUST run the following code cell to get credit for this class lesson. The code in this cell checks what hardware acceleration you are using. To run this lesson, you must be running a Graphics Processing Unit (GPU).

In [None]:
# You must run this cell second

import tensorflow as tf

def check_device():
    # Check for available devices
    devices = tf.config.list_physical_devices()

    # Initialize device flags
    cpu = False
    gpu = False
    tpu = False

    # Check device types
    for device in devices:
        if device.device_type == 'CPU':
            cpu = True
        elif device.device_type == 'GPU':
            gpu = True
        elif device.device_type == 'TPU':
            tpu = True

    # Output device status
    if tpu:
        print("Running on TPU")
        print("WARNING: You must run this assigment using a GPU to earn credit")
        print("Change your RUNTIME now!")
    elif gpu:
        print("Running on GPU")
        print("You are using a GPU hardware accelerator--You're good to go!")
    elif cpu:
        print("Running on CPU")
        print("WARNING: You must run this assigment using a GPU to earn credit")
        print("Change your RUNTIME now!")
    else:
        print("No compatible device found")
        print("WARNING: You must run this assigment using either a GPU or a TPU to earn credit")
        print("Change your RUNTIME now!")

# Call the function
check_device()

If you current `Runtime` is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_03/class_03_1_image03B.png)

However, if you received a warning message, you must go back and change your `Runtime` now before you continue.

### **YouTube Introduction to Stable Diffusion**

Run the next cell to see short introduction to Stable Diffusion. This is a suggested, but optional, part of the lesson.

In [1]:
from IPython.display import HTML
video_id = "QdRP9pO89MY"
HTML(f"""
<iframe width="560" height="315"
  src="https://www.youtube.com/embed/{video_id}"
  title="YouTube video player"
  frameborder="0"
  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>
""")

# **Stable Diffusion**

**Stable Diffusion** is a deep learning model for **generative image synthesis**. It belongs to the class of **latent diffusion models (LDMs)**, which generate high-quality images from text prompts by operating in a compressed latent space rather than pixel space. This approach significantly reduces computational cost while maintaining image fidelity.

Stable Diffusion is trained on large-scale image-text datasets and uses a combination of:

- **Variational Autoencoders (VAEs)**: To encode images into a latent space.
- **U-Net architecture**: For denoising latent representations.
- **Text encoders (e.g., CLIP or BERT)**: To condition image generation on natural language prompts.

The model works by iteratively denoising a random latent vector, guided by a text prompt, until a coherent image emerges.

### **Key Features**

- **Text-to-image synthesis**: Generate images from descriptive text.
- **Image-to-image translation**: Modify existing images using prompts.
- **Inpainting and outpainting**: Fill in missing regions or expand images.
- **Custom fine-tuning**: Adapt the model to domain-specific data.

### **Applications in Computational Biology**

Stable Diffusion can be a powerful tool for computational biologists in several ways:

#### **1. Scientific Visualization**
Generate illustrative figures for:
- Molecular structures
- Cellular processes
- Pathways and interactions
- Anatomical diagrams

This can enhance presentations, publications, and educational materials.

#### **2. Data Augmentation**
Use synthetic biological images to:
- Augment training datasets for machine learning models
- Improve robustness in image classification tasks (e.g., histopathology, microscopy)

#### **3. Hypothesis Communication**
Translate complex biological hypotheses into visual representations for:
- Grant proposals
- Interdisciplinary collaboration
- Public outreach

#### **4. Custom Model Training**
Fine-tune Stable Diffusion on domain-specific datasets (e.g., microscopy images, protein structures) to:
- Generate realistic biological imagery
- Explore latent space representations of biological phenomena

#### **5. Interactive Exploration**
Use prompt-based generation to explore:
- Morphological variations
- Evolutionary traits
- Synthetic biology designs

### **Optional YorTube Video**

If you are interested in knowing how diffusion models can be used to generate complex images, run the next cell to watch this YouTube video.

In [None]:
from IPython.display import HTML

# Extracted video ID from the original URL
video_id = "iv-5mZ_9CPY"

# Construct the proper embed URL
embed_url = f"https://www.youtube.com/embed/{video_id}"

# Display the embedded video
HTML(f"""
<iframe width="560" height="315" src="{embed_url}"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen></iframe>
""")


# **Text to Images with StableDiffusion**

We will now see how to use Stable Diffusion to create various images from textual prompts. There will be four settings that we will deal with as we generate these images.

* **model**: We will use the trained/finetuned model. Different models are optimized for different types of images.
* **prompt**: Text that you provide to describe what sort of image you would like created.
* **negative prompt**: Text that you describe elements that should not be present in your image.
* **seed**: The same image for the prompt/negative prompt will always be produced for the same seed. To get a different image for the same prompts, change the seed.


### **Importance of "Setting the Seed"**

In image generation models like **Stable Diffusion**, a **random seed** determines the initial noise used to create an image. This noise is gradually transformed into a coherent image based on the prompt you provide.

A **random seed** is a number used to initialize a random number generator. In Stable Diffusion:

- The model begins with a field of random noise.
- It uses your prompt to guide the transformation of this noise into an image.
- The seed controls the exact pattern of that starting noise.

Even with the **same prompt**, changing the seed changes the initial noise pattern, which leads to a **different final image**.

**Analogy**:  
Imagine sculpting clay using the same instructions. If each lump of clay starts with a different shape (seed), the final sculptures will be similar but **not identical**.

### Use Cases for Seeds

- **Reproducibility**: Using the same seed and prompt will always generate the same image.
- **Exploration**: Trying different seeds lets you explore variations of the same concept.
- **Fine-tuning**: You can find a seed that gives you the best result and reuse it.



### Setup Basic Pipeline

To make use of Stable Diffusion we will use the `HuggingFace DiffusionPipeline`. When setting up the pipeline we specify to use the `CompVis/stable-diffusion-v1-4` model, which is a basic model created to be used with StableDiffusion.

The following code sets up this model and downloads it from `HuggingFace`.

In [None]:
# Setup basic pipeline

from diffusers import DiffusionPipeline
import torch

# Configuration
MODEL_ID = "CompVis/stable-diffusion-v1-4"
CUSTOM_PIPELINE = "lpw_stable_diffusion"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32

# Optional: Set a generator for reproducibility
generator = torch.manual_seed(42)

# Load and configure the pipeline
try:
    pipe = DiffusionPipeline.from_pretrained(
        MODEL_ID,
        custom_pipeline=CUSTOM_PIPELINE,
        torch_dtype=DTYPE,
        generator=generator
    )
    pipe = pipe.to(DEVICE)
    print(f"Pipeline loaded successfully on {DEVICE}.")
except Exception as e:
    print(f"Failed to load pipeline: {e}")


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image01B.png)

### Example 1: Generate Basic Image

We will begin by using Stable Diffusion (`stable-diffusion-v1-4`) to create a simple picture of a Monarch butterfly. As you will see, the image we generate will depend to a great extent on the value to random see. For Example 1 we will set the seed = `100`.

In **Stable Diffusion**, the terms `prompt` and `negative prompt` refer to two different types of input that guide the image generation process:

**Prompt**

* This is the **main description** of what you want the model to generate.
* It includes **keywords, phrases, or detailed descriptions** of the desired scene, style, objects, characters, lighting, mood, etc.

Here is the `prompt` for Example 1:
```text
# Set prompt
prompt= """
a Monarch butterfly"""
```
**Negative Prompt**

* This is used to specify **what you _don’t_ want** in the image.
* It helps the model avoid unwanted elements, styles, or artifacts.

Here is the negative prompt for Example 1:
```type
# Set negative prompt
neg_prompt = """
signature, watermark, incomplete image
"""
```

**NSFW**

As you might imagine it is quite possible to generate images that are considered "not safe for work" (NSFW). The code in the cell below contains the following line of code that will protect you from inadvertently generating a pornographic and/or an extremely violent image:

```type
# Uncomment the next line to remove the safety checker
# pipe.safety_checker = lambda images, clip_input: (images, False)
```
If such an image is generated, you will see the following message.
```text
Potential NSFW content was detected in one or more images. A black image will be returned instead.
Try again with a different prompt and/or seed.
```

You may wish to disable this feature. To do this, uncomment the pipe.`safety_checker` line. Be cafeful, if you do disable this, as unsafe images may be generated containing NSFW themes, which might contain violence, nudity, or sexual themes.
`

In [None]:
# Example 1: Generate basic image

import random

# Set the seed
seed = 100
seed = random.randint(0, 2**32) if seed == -1 else seed
print(f"The seed =", seed)

# Use seed to create the generator
generator = torch.Generator(device='cuda').manual_seed(int(seed))

# Set prompt
prompt= """
a Monarch butterfly"""

# Set negative prompt
neg_prompt = """
signature, watermark, incomplete image
"""

# Uncomment the next line to remove the safety checker
# pipe.safety_checker = lambda images, clip_input: (images, False)

# Generate image
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512,height=512,
              max_embeddings_multiples=3,generator=generator).images[0]

If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image09B.png)

### **Exercise 1A: Generate Basic Image**

In the cell below write the code to generate the same image of a Monarch butterfly generated in Example 1, but set the `seed` to the number `1604`.

In [None]:
# Insert your code for Exercise 1A here


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image10B.png)

### **Exercise 1B: Generate Basic Image**

In the cell below write the code to generate the same image of a Monarch butterfly but this time set the seed to a random number (i.e use a seed value of `-1`).

In [None]:
# Insert your code for Exercise 1B here


If the code is correct you should see something similar to the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image11B.png)

but it's unlikely you will see the same image since you were to use a random seed.

### **Exercise 1C: Generate Basic Image**

In the cell below write the code to generate an image of your choice. It can be any image that you like. Also, the value you pick for your `seed` is also up to you. Show your imagination!

In [None]:
# Insert your code for Exercise 1C here


The output will depend upon your `prompt` and your seed value.

### Example 2: Generate Reference Image

The diffusion model that we have been using so far is adequate for similar images, but is less suitable for high resolution images. This is especially true when it comes to generating images with human faces.

The code in the next cell generates the face of a young Japanese woman. We will use this image as an example of a basic image that can be generated with the `stable-diffusion-v1-4`  model.


In [None]:
# Example 2: Importance of model selection

import random

# Set the seed / Use -1 for random seed
seed = 100
seed = random.randint(0, 2**32) if seed == -1 else seed
print(f"The seed =", seed)

# Use seed to create generator
generator = torch.Generator(device='cuda').manual_seed(int(seed))

# Prompts to generate image
prompt= """
the face of a young Japanese woman"""

neg_prompt = """
signature, watermark, incomplete image
"""

# Uncomment the next line to remove the safety checker
#pipe.safety_checker = lambda images, clip_input: (images, False)

# Generate image
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512,height=512,
              max_embeddings_multiples=3,generator=generator).images[0]

If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image12B.png)

### **Exercise 2: Generate Reference Image**

In the next cell write the code that will generate a human face. You are free to specify the kind of face you want, male or female.  


In [None]:
# Insert your code for Exercise 2 here


The output will depend upon your `prompt`.

# **Stable Diffusion Model Comparison: v1.4 vs v2.1**

In the previous section we used the `stable-diffusion-v1-4` model to generate basic images. In this section we will use the more advanced version of this model, stable-diffusion-2-1 which can generate more realistics images.

Here are the main differences between these two model generations.


**1. Architecture and Text Encoder**

| Feature | v1.4 (CompVis) | v2.1 (StabilityAI) |
|--------|----------------|--------------------|
| Text Encoder | CLIP ViT-L/14 | OpenCLIP ViT-H/14 |
| Architecture | Stable Diffusion 1.x | Updated architecture |
| Native Resolution | 512x512 | 768x768 |


**2. Training Data**
- **v1.4**:
  - Trained on **LAION-Aesthetics v2 5+**.
  - Fine-tuned for 225k steps.
  - Focused on aesthetic images.

- **v2.1**:
  - Trained on a **filtered subset of LAION-5B**.
  - Improved safety and quality filtering.
  - Better prompt alignment and reduced bias


**3. Output Style and Quality**

- **v1.4**:
  - Good for general-purpose image generation.
  - May struggle with complex compositions or high-resolution needs.

- **v2.1**:
  - Better at handling **complex prompts**, **realism**, and **fine details**.
  - Improved **anatomical accuracy**, lighting, and texture rendering.

**4. Use Cases**

| Model | Best For | Limitations |
|-------|----------|-------------|
| **v1.4** | Quick, general image generation | Lower resolution, less prompt nuance |
| **v2.1** | High-quality, detailed images; better realism | May interpret prompts differently due to OpenCLIP |


### Setup Advanced Pipleline

The code in the cell below sets up the more advanced pipeline `stabilityai/stable-diffusion-2-1`.


In [None]:
# Setup advanced pipeline

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    #'hakurei/waifu-diffusion',
    #"SG161222/Realistic_Vision_V2.0",
    'stabilityai/stable-diffusion-2-1',
    custom_pipeline="lpw_stable_diffusion",
    generator=generator,
    torch_dtype=torch.float16
)
pipe=pipe.to("cuda")

If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image07B.png)

### Example 3: Realistic Image

We now generate an image with a much more complex prompt. The positive and negative prompts describe how to generate an image of a young woman. Stable diffusion prompts are usually comma separated lists of attributes to draw.

You will notice that some are enclosed in paranthesis; which designates that this attribute is more important. A number, near the end, separated by a colon specifies how important.

In [None]:
# Example 3: Realistic Image

# Set the seed
seed = 14
seed = random.randint(0, 2**32) if seed == -1 else seed
print(f"The seed =", seed)

# Create generator
generator = torch.Generator(device='cuda').manual_seed(int(seed))

# Prompt
prompt= """
(woman age 26 standing by tree), (long blonde hair:1.2), ray traced shadows,
RAW, 8k, (eczema:0.7), (sub-surface scattering:1.55), (sweat:1.22), (freckles:0.55),
highly detailed skin, (Acne:0.7), (FACE1:0.5), (FACE2:1.2), (FACE3:0.85),
perfect eyes, no makeup. (skin spores:1.05), (skin spores:1.05),
ultra detailed face, ultra detailed skin, film grain, ray tracing, studio lighting"""

# Negative prompt
neg_prompt = """
signature, watermark, airbrush, photoshop, plastic doll,
(ugly eyes, deformed iris, deformed pupils, fused lips and teeth:1.2),
(un-detailed skin, semi-realistic, cgi, 3d, render, sketch, cartoon,
drawing, anime:1.2), text, close up, cropped, out of frame, worst quality,
low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers,
mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry,
dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured,
gross proportions, malformed limbs, missing arms, missing legs, extra arms,
extra legs, fused fingers, too many fingers, long neck, head wear, masculine,
obese, fat, out of frame"""

# Safety checker
#pipe.safety_checker = lambda images, clip_input: (images, False)

# Generate text-to-image
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512,height=512,
              max_embeddings_multiples=3,generator=generator).images[0]

If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image13B.png)

### **Exercise 3: Realistic Image**

Generate an image of old Japanese woman in front of Mount Fuji. Set the seed to `2`.

Define your prompt as:
```text
# Prompt
prompt= """
(Japanese woman age 90+, standing before Mount Fuji), (gray thinning hair:1.2),
ray traced shadows, RAW, 8k, (deep wrinkles:1.5), (sub-surface scattering:1.55),
(age spots:1.2), highly detailed aged skin, (sagging skin:1.3), (FACE1:0.5),
(FACE2:1.2), (FACE3:0.85), cloudy eyes, no makeup, (skin texture:1.05),
ultra detailed face, ultra detailed skin, film grain, ray tracing, studio lighting
"""
```
and your negative prompt as:
```type
# Negative prompt
neg_prompt = """
signature, watermark, airbrush, photoshop, plastic doll,
(ugly eyes, deformed iris, deformed pupils, fused lips and teeth:1.2),
(un-detailed skin, semi-realistic, cgi, 3d, render, sketch, cartoon,
drawing, anime:1.2), text, close up, cropped, out of frame, worst quality,
low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers,
mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry,
dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured,
gross proportions, malformed limbs, missing arms, missing legs, extra arms,
extra legs, fused fingers, too many fingers, long neck, head wear, masculine,
obese, fat, out of frame"""
```

The rest of the code is the same as Example 3.

In [None]:
# Insert your code for Exercise 3 here


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image14B.png)

## **Anime**

**Anime** is a form of animation that originated in **Japan** and has grown into a globally recognized medium. It includes a wide variety of genres, themes, and artistic styles, and is known for its unique storytelling and visual aesthetics.

#### **Key Characteristics**
- **Origin**: Japan
- **Mediums**: TV series, films, web series, OVAs (Original Video Animations)
- **Genres**: Action, Romance, Fantasy, Horror, Sci-Fi, Slice of Life, and more
- **Audience**: All age groups, from children to adults

#### **Notable Examples**
- *Spirited Away* (Studio Ghibli)
- *Naruto*
- *Attack on Titan*
- *One Piece*
- *Demon Slayer*

### **What Is Anime Style?**

**Anime style** refers to the distinctive visual and artistic elements commonly found in anime. It is characterized by stylized character designs, expressive features, and dynamic visuals.

#### **Visual Features**

| Feature              | Description                                                                 |
|----------------------|-----------------------------------------------------------------------------|
| **Eyes**             | Large, expressive, often detailed to convey emotion                         |
| **Facial Features**  | Small noses and mouths, exaggerated expressions                             |
| **Hair**             | Stylized, often colorful, with unique shapes and movement                   |
| **Body Proportions** | Can range from realistic to highly exaggerated depending on the genre       |
| **Line Work**        | Clean and sharp outlines                                                     |
| **Coloring**         | Flat or cel-shaded, with emphasis on contrast and mood                      |
| **Backgrounds**      | Often highly detailed, especially in fantasy or sci-fi settings             |

#### **Storytelling Elements**
- **Emotional depth** and character development
- **Symbolism** and metaphorical themes
- **Cultural references** to Japanese traditions, language, and society
- **Genre blending**, such as mixing romance with supernatural or comedy with horror

#### **Anime vs. Western Animation**

| Aspect              | Anime                                   | Western Animation                      |
|---------------------|------------------------------------------|----------------------------------------|
| **Art Style**       | Stylized, detailed, expressive           | Varies widely, often more cartoonish   |
| **Themes**          | Often complex and mature                 | Often geared toward children/family    |
| **Production**      | Typically serialized with long arcs      | Often episodic                         |
| **Cultural Influence** | Strong Japanese cultural elements     | Western cultural norms and humor       |


## **Anime Models**

If you would like to generate cartoon or Anime style images, the `waifu-diffusion model` will work nicely. The `Waifu Diffusion model` is a specialized AI image generation model designed to create anime-style artwork, particularly focusing on characters that resemble the "waifu" archetype—typically stylized, idealized female characters popular in anime and manga culture.

#### **Key Features:**
* **Anime-focused training data:** Trained on datasets like Danbooru (a large anime image repository).
* **Text-to-image generation:** You input a prompt like "a cute anime girl with blue hair in a school uniform" and it generates an image.
* **Customizable outputs:** You can adjust style, pose, background, and more using prompt engineering.
* **Checkpoint versions:** Includes models like Waifu Diffusion 1.3, 1.4, and 1.5, each improving quality and prompt responsiveness.

The code below loads the pipeline for the `Waifu Diffusion model`.

In [None]:
# Setup Waifu Diffusion model pipeline

from diffusers import DiffusionPipeline
import torch
import random

# Set seed
seed = 102  # or -1 for random
seed = random.randint(0, 2**32) if seed == -1 else seed
print(f"The seed =", seed)

# Create generator
generator = torch.Generator(device='cuda').manual_seed(seed)

# Load pipeline without generator
pipe = DiffusionPipeline.from_pretrained(
    "hakurei/waifu-diffusion",
    custom_pipeline="lpw_stable_diffusion",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Use generator when calling the pipeline
image = pipe(
    prompt="your prompt here",
    negative_prompt="your negative prompt here",
    width=512,
    height=512,
    max_embeddings_multiples=3,
    generator=generator
).images[0]


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image20B.png)

### Example 4: Create Anime Person

The code in the cell below creates an Anime image of a young girl.

In [None]:
# Example 4: Create anime image

# Random number seed, -1 for random seed
seed = 105
seed = random.randint(0, 2**32) if seed == -1 else seed
print(f"The seed =", seed)

# Create generator
generator = torch.Generator(device='cuda').manual_seed(int(seed))

# Prompt
prompt = (
    "best_quality (1girl:1.3) bride brown_hair closed_mouth frills (full_body:1.3) "
    "fox_ear hair_bow happy hood kimono long_sleeves red_bow smile solo tabi "
    "white_kimono wide_sleeves cherry_blossoms"
)

# Negative prompt
neg_prompt = (
    "lowres, bad_anatomy, error_body, error_hands, bad_hands, error_fingers, "
    "missing_fingers, error_legs, bad_legs, error_lighting, error_shadow, "
    "extra_digit, cropped, worst_quality, jpeg_artifacts, watermark, blurry"
)

# Safety check
#pipe.safety_checker = lambda images, clip_input: (images, False)

# Generate the image
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512,height=512,
              max_embeddings_multiples=3,generator=generator).images[0]

If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image16B.png)

### **Exercise 4: Create Anime Person**

In the cell below write the code to generate an Anime image of a young boy. Make sure to set the seed = `100`.

You can reuse the code in Example 4 but change the prompt to read:

~~~text
# Prompt
prompt = (
    "best_quality, (1boy:1.3), heroic_pose, spiky_hair, intense_eyes, anime_uniform, "
    "dramatic_lighting, cinematic_background, serious_expression, solo, full_body, "
    "anime_style, masterpiece, high_detail, vibrant_colors, wind_effects, glowing_aura, "
    "dynamic_composition, epic_scale, battle_ready, stormy_sky, energy_particles, depth_of_field"
)

~~~

In [None]:
# Insert your code for Exercise 4 here


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image23B.png)


### Example 5: Create Anime Scene

The Anime world is full of magical as well as real creatures. One popular Anime TV show that is populated with a variety of creatures is called `Shirokuma Café`.

**Shirokuma Café** (also known as Polar Bear Café) is a Japanese anime and manga series that blends slice-of-life comedy with a whimsical, anthropomorphic twist.
![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image24B.png)

The code in the cell below generates an image that is in `the style of` `Shirokuma Café`. This particular prompt was generated by uploading this image to Microsoft 365 Copilot and asking it to create a prompt to duplicate the image.

In [None]:
# Example 5: Create Anime Scene

# Set the seed
seed = 8
seed = random.randint(0, 2**32) if seed == -1 else seed
print(f"The seed =", seed)

# Create generator
generator = torch.Generator(device='cuda').manual_seed(int(seed))

# Prompt generated by Microsoft 365 Copilot from Shirokuma Café image
prompt = (
    "best_quality, anime_style, high_detail, vibrant_colors, cute_animal_characters, "
    "panda_sitting_with_pink_swim_ring_and_yellow_shorts, waving_paw, penguin_standing_next_to_panda, "
    "lush_green_foliage_background, stone_wall, summer_theme, friendly_expression, "
    "casual_swimwear, relaxed_atmosphere, slice_of_life_scene, soft_shading"
)

# Negative prompt
neg_prompt = (
    "lowres, bad_anatomy, blurry, distorted_faces, error_limbs, bad_hands, missing_fingers, "
    "extra_limbs, worst_quality, jpeg_artifacts, watermark, cropped, poor_lighting, "
    "oversaturated_colors, unnatural_pose, messy_background, out_of_focus, bad_proportions"
)

# Generate the image
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512,height=512,
              max_embeddings_multiples=3,generator=generator).images[0]

If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image25B.png)

Here is the prompt that was used to generate this image:

~~~type
prompt = (
    "best_quality, anime_style, high_detail, vibrant_colors, cute_animal_characters, "
    "panda_sitting_with_pink_swim_ring_and_yellow_shorts, waving_paw, penguin_standing_next_to_panda, "
    "lush_green_foliage_background, stone_wall, summer_theme, friendly_expression, "
    "casual_swimwear, relaxed_atmosphere, slice_of_life_scene, soft_shading"
)
~~~

As you can see the image generated was rather different that one might have expected from the prompt. This kind of _mismatch_ is very common with `text-to-image` generators, even when rather detailed prompts are used.

### **Exercise 5: Create Anime Scene**

In the cell below use exactly the same code that was shown in Example 5.

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image24B.png)

 See if by changing the value of the seed, if you can generate an image that is more similar to the Shirokuma Café image that the prompt was based on.

In [None]:
# Insert your code for Exercise 5 here



What your output looks like will depend upon the value of your seed.

## **Image-to-Image Pipelines in Stable Diffusion**

#### **Background**

Stable Diffusion is a latent text-to-image diffusion model developed by **Stability AI**, first released in **August 2022**. It quickly became popular due to its open-source nature, high-quality outputs, and ability to run on consumer GPUs.

While the original model focused on **text-to-image generation**, the community and developers soon expanded its capabilities to include **image-to-image** transformations, enabling users to guide generation using existing images.

#### **What Is Image-to-Image in Stable Diffusion?**

Image-to-image (img2img) generation allows users to provide:
- A **source image** (e.g., a sketch, photo, or concept art)
- A **text prompt** describing the desired transformation
- A **strength parameter** controlling how much the output deviates from the input

This technique uses the same latent diffusion process but starts from a **noised version of the input image**, allowing for creative reinterpretation while preserving structure.

### **Evolution of Image-to-Image Pipelines**

##### Stable Diffusion v1.5 (2022)
- First widely used version for img2img tasks.
- Introduced via the `StableDiffusionImg2ImgPipeline` in the `diffusers` library.
- Supported basic transformations with prompt guidance and strength control.

##### Stable Diffusion v2.x (Late 2022–2023)
- Improved resolution and semantic understanding.
- Introduced **depth-to-image** and **inpainting** pipelines.
- Better handling of complex prompts and image conditioning.

##### ControlNet Integration (2023)
- Added fine-grained control using **edge maps**, **pose estimation**, **depth maps**, etc.
- Enabled highly structured transformations while preserving artistic freedom.

##### SDXL (2023–2024)
- Major upgrade with richer visual fidelity and prompt comprehension.
- Image-to-image support extended to **SDXLImg2ImgPipeline**.
- Better performance on high-resolution inputs and nuanced prompts.

### **Parameters in Image-to-Image Pipelines**

| Parameter         | Description                                                                 |
|------------------|-----------------------------------------------------------------------------|
| `prompt`          | Text description guiding the transformation                                 |
| `image`           | Input image to be transformed                                               |
| `strength`        | Controls deviation from input (0.0 = faithful, 1.0 = creative)              |
| `guidance_scale`  | Controls adherence to prompt (higher = more prompt-driven)                  |
| `num_inference_steps` | Number of denoising steps (more = better quality, slower)               |
| `generator`       | Random seed generator for reproducibility                                   |

#### **Use Cases**

- **Artistic reinterpretation** of student sketches
- **Scientific visualization** from concept diagrams
- **Creative storytelling** using visual prompts
- **Biological image transformation** for simulations or hypothetical scenarios


### Setup Image-2-Image Pipeline

The code in the cell below sets up the Image-to-Image pipe line.

In [None]:
# Setup image-2-image pipe line


# Import libraries
from diffusers import StableDiffusionImg2ImgPipeline
import torch
from PIL import Image
import requests

# Load the pipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")


If the code is correct you should see something like the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image27B.png)

## **Understanding the `strength` Parameter in Image-to-Image Generation**

##### **What Is `strength`?**

In Stable Diffusion's image-to-image pipeline, the `strength` parameter controls **how much the generated image deviates from the input image**. It determines the level of noise added to the input image before the diffusion process begins.

##### **How It Works**

- The image-to-image pipeline works by **encoding the input image into latent space**, adding noise, and then **denoising it guided by the text prompt**.
- The `strength` parameter sets the **starting point** in the denoising process:
  - **Low strength** → less noise → output is **closer to the input image**
  - **High strength** → more noise → output is **more influenced by the prompt**


#### **Typical Values**

| Strength | Behavior                          | Use Case Example                          |
|----------|-----------------------------------|-------------------------------------------|
| 0.1–0.3  | Very close to input image         | Subtle edits, style transfer              |
| 0.4–0.6  | Balanced between input and prompt | Concept transformation                    |
| 0.7–0.9  | Highly creative, prompt-driven    | Abstract reinterpretation, surreal edits  |

> ⚠️ Values above `0.9` may ignore the input image almost entirely.

### Example 6: Image-to-Image Strength Parameter

The code in the cell below uses this rather famous image of an astronaut riding a horse on the moon as the initial image.

![__](https://biologicslab.co/BIO1173/images/class_04/AstronautMoon.jpg)


The **initial image** (also called the **input image**) is the starting point for the image-to-image generation process in Stable Diffusion. It provides the **visual structure** or **content** that the model will transform based on a given **text prompt**.

#### **How It Works**

1. The initial image is **encoded into latent space** using a Variational Autoencoder (VAE).
2. A certain amount of **noise is added**, controlled by the `strength` parameter.
3. The model then **denoises** the image while being guided by the **text prompt**, producing a new image that blends the original content with the prompt's intent.

### **Prompt**

Here is the text prompt for Example 6:

```text
# Prompt
prompt = "a evil clown riding a tiger on the moon"
```

### **Strength Parameter**

In this example we are going to look at how the `strength parameter` affects image creation.

The code in the cell below has the `strength parameter` = `0.01`. This is an extremely low value so we would expect the output image to be relatively unchanged.

In [None]:
# Example 6: Image-to-Image Strength Parameter

import torch
from IPython.display import display
from PIL import Image
import requests
from diffusers import StableDiffusionImg2ImgPipeline

# Set strength parameter
strength = 0.01  # Creativity level

# Set the seed
seed = 1
print(f"The seed = {seed}")

# Prompt
prompt = "a evil clown riding a tiger on the moon"

# Set additional paraemters
guidance_scale = 7.5
num_inference_steps = 50

# Validate strength
if not (0.01 <= strength <= 0.9):
    raise ValueError("Strength should be between 0.1 and 0.9 for best results.")

# Load pipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# Override safety checker
pipe.safety_checker = lambda images, clip_input: (images, [False] * len(images))

# Create generator
generator = torch.Generator("cuda").manual_seed(seed)

# Load and preprocess inital image
url = "https://biologicslab.co/BIO1173/images/class_04/AstronautMoon.jpg"
init_image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Resize and crop to 512x512
init_image = init_image.resize((512, 512))

# Generate image
try:
    result = pipe(
        prompt=prompt,
        image=init_image,
        strength=strength,
        guidance_scale=guidance_scale,
        num_inference_steps=num_inference_steps,
        generator=generator
    ).images[0]

    # Display result
    display(result)

except Exception as e:
    print("Error during image generation:", e)


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image29B.png)

As expected the output image is relatively unchanged.

### **Exercise 6A: Image-to-Image Strength Parameter**

In the cell below write the code to reproduce Example 6 but set the `strength parameter` = `0.5` but keep the `seed` = `1`.

Let's see how a "medium level" of creativity affects the output image?

In [None]:
# Insert your code for Exercise 6A here


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image31B.png)

The horse in the output image seems to be morphing into a more `tiger-like` creature.

### **Exercise 6B: Image-to-Image Strength Parameter**

In the cell below write the code to reproduce Example 6 but now set the `strength parameter` = `0.75` and keep the `seed` = `1`.

Let's see how a "high level" of creativity affects the output image?

In [None]:
# Insert your code for Exercise 6B here


If the code is correct you should see the following output

![__](https://biologicslab.co/BIO1173/images/class_04/class_04_3_image30B.png)

The output image is now relatively bizzare. The horse has been changed into a tiger albeit a mishapen tiger. The astronaut is also quite different.  

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Copy of Class_04_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

# **Lizard Tail**

### **BACKPROGATION**


![__](https://upload.wikimedia.org/wikipedia/commons/6/60/ArtificialNeuronModel_english.png)


In machine learning, **backpropagation** is a gradient estimation method commonly used for training a neural network to compute its parameter updates.

It is an efficient application of the chain rule to neural networks. Backpropagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through dynamic programming.

Strictly speaking, the term backpropagation refers only to an algorithm for efficiently computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent, or as an intermediate step in a more complicated optimizer, such as Adaptive Moment Estimation.[5] The local minimum convergence, exploding gradient, vanishing gradient, and weak control of learning rate are main disadvantages of these optimization algorithms. The Hessian and quasi-Hessian optimizers solve only local minimum convergence problem, and the backpropagation works longer. These problems caused researchers to develop hybrid and fractional optimization algorithms.

Backpropagation had multiple discoveries and partial discoveries, with a tangled history and terminology. See the history section for details. Some other names for the technique include "reverse mode of automatic differentiation" or "reverse accumulation".

## **Intuition**

**Motivation**

The goal of any supervised learning algorithm is to find a function that best maps a set of inputs to their correct output. The motivation for backpropagation is to train a multi-layered neural network such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output.

**Learning as an optimization problem**

To understand the mathematical derivation of the backpropagation algorithm, it helps to first develop some intuition about the relationship between the actual output of a neuron and the correct output for a particular training example. Consider a simple neural network with two input units, one output unit and no hidden units, and in which each neuron uses a linear output (unlike most work on neural networks, in which mapping from inputs to outputs is non-linear) that is the weighted sum of its input.

**History**

Backpropagation had been derived repeatedly, as it is essentially an efficient application of the chain rule (first written down by Gottfried Wilhelm Leibniz in 1676) to neural networks.

The terminology "back-propagating error correction" was introduced in 1962 by Frank Rosenblatt, but he did not know how to implement this. In any case, he only studied neurons whose outputs were discrete levels, which only had zero derivatives, making backpropagation impossible.

Precursors to backpropagation appeared in optimal control theory since 1950s. Yann LeCun et al credits 1950s work by Pontryagin and others in optimal control theory, especially the adjoint state method, for being a continuous-time version of backpropagation. Hecht-Nielsen credits the Robbins–Monro algorithm (1951)[23] and Arthur Bryson and Yu-Chi Ho's Applied Optimal Control (1969) as presages of backpropagation. Other precursors were Henry J. Kelley 1960, and Arthur E. Bryson (1961). In 1962, Stuart Dreyfus published a simpler derivation based only on the chain rule. In 1973, he adapted parameters of controllers in proportion to error gradients. Unlike modern backpropagation, these precursors used standard Jacobian matrix calculations from one stage to the previous one, neither addressing direct links across several stages nor potential additional efficiency gains due to network sparsity.

The ADALINE (1960) learning algorithm was gradient descent with a squared error loss for a single layer. The first multilayer perceptron (MLP) with more than one layer trained by stochastic gradient descent was published in 1967 by Shun'ichi Amari.[29] The MLP had 5 layers, with 2 learnable layers, and it learned to classify patterns not linearly separable.

**Modern backpropagation**

Modern backpropagation was first published by Seppo Linnainmaa as "reverse mode of automatic differentiation" (1970) for discrete connected networks of nested differentiable functions.

In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard. Werbos described how he developed backpropagation in an interview. In 1971, during his PhD work, he developed backpropagation to mathematicize Freud's "flow of psychic energy". He faced repeated difficulty in publishing the work, only managing in 1981. He also claimed that "the first practical application of back-propagation was for estimating a dynamic model to predict nationalism and social communications in 1974" by him.

Around 1982, David E. Rumelhart independently developed backpropagation and taught the algorithm to others in his research circle. He did not cite previous work as he was unaware of them. He published the algorithm first in a 1985 paper, then in a 1986 Nature paper an experimental analysis of the technique These papers became highly cited, contributed to the popularization of backpropagation, and coincided with the resurging research interest in neural networks during the 1980s.

In 1985, the method was also described by David Parker. Yann LeCun proposed an alternative form of backpropagation for neural networks in his PhD thesis in 1987.

Gradient descent took a considerable amount of time to reach acceptance. Some early objections were: there were no guarantees that gradient descent could reach a global minimum, only local minimum; neurons were "known" by physiologists as making discrete signals (0/1), not continuous ones, and with discrete signals, there is no gradient to take. See the interview with Geoffrey Hinton,[36] who was awarded the 2024 Nobel Prize in Physics for his contributions to the field.

**Early successes**

Contributing to the acceptance were several applications in training neural networks via backpropagation, sometimes achieving popularity outside the research circles.

In 1987, NETtalk learned to convert English text into pronunciation. Sejnowski tried training it with both backpropagation and Boltzmann machine, but found the backpropagation significantly faster, so he used it for the final NETtalk. The NETtalk program became a popular success, appearing on the Today show.

In 1989, Dean A. Pomerleau published ALVINN, a neural network trained to drive autonomously using backpropagation.

The LeNet was published in 1989 to recognize handwritten zip codes.

In 1992, TD-Gammon achieved top human level play in backgammon. It was a reinforcement learning agent with a neural network with two layers, trained by backpropagation.

In 1993, Eric Wan won an international pattern recognition contest through backpropagation.

**After backpropagation**

During the 2000s it fell out of favour, but returned in the 2010s, benefiting from cheap, powerful GPU-based computing systems. This has been especially so in speech recognition, machine vision, natural language processing, and language structure learning research (in which it has been used to explain a variety of phenomena related to first and second language learning.

Error backpropagation has been suggested to explain human brain event-related potential (ERP) components like the N400 and P600.

In 2023, a backpropagation algorithm was implemented on a photonic processor by a team at Stanford University.

## **Backpropagation in Deep Neural Networks**

Backpropagation is a fundamental algorithm used to train deep neural networks. It efficiently computes the **gradient of the loss function** with respect to each weight in the network, enabling the use of **gradient descent** to update the weights and minimize the loss.

![__](https://biologicslab.co/BIO1173/images/class_04/BackProp.jpg)

### Overview of the Process

1. **Forward Pass**: Input data is passed through the network to compute the output (prediction).
2. **Loss Calculation**: The output is compared to the true label using a loss function (e.g., MSE, cross-entropy).
3. **Backward Pass (Backpropagation)**:
   - Gradients of the loss with respect to each parameter are computed using the **chain rule**.
   - These gradients are used to update the weights via an optimization algorithm (e.g., SGD, Adam).

## Mathematical Foundations

Let’s consider a simple feedforward neural network with:
- Input layer: \( x \)
- Hidden layer: \( h = f(Wx + b) \)
- Output layer: \( \hat{y} = g(Vh + c) \)
- Loss function: \( L(\hat{y}, y) \)

### Step 1: Compute Gradients

Using the chain rule:

- Gradient w.r.t. output weights \( V \):
  $$
  \frac{\partial L}{\partial V} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial V}
  $$

- Gradient w.r.t. hidden weights \( W \):
  $$
  \frac{\partial L}{\partial W} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial h} \cdot \frac{\partial h}{\partial W}
  $$

### Step 2: Update Weights

Using gradient descent:
$$
V := V - \eta \cdot \frac{\partial L}{\partial V}
$$

$$
W := W - \eta \cdot \frac{\partial L}{\partial W}
$$

Where $\eta$ is the learning rate.


## Backpropagation Algorithm (Pseudocode)

```python
# Assume a simple 2-layer neural network
for epoch in range(num_epochs):
    for x, y in data:
        # Forward pass
        h = f(W @ x + b)
        y_hat = g(V @ h + c)
        loss = compute_loss(y_hat, y)

        # Backward pass
        dL_dyhat = compute_loss_gradient(y_hat, y)
        dL_dV = dL_dyhat @ h.T
        dL_dh = V.T @ dL_dyhat
        dL_dW = (dL_dh * f_prime(W @ x + b)) @ x.T

        # Update weights
        V -= learning_rate * dL_dV
        W -= learning_rate * dL_dW
