üß† What is Stable Diffusion? (Simple Explanation)
Stable Diffusion is a text ‚Üí image generator.
It works in 3 big ideas:
1. Forward process (noise)
    Add noise to an image until it becomes random.

2. Reverse process (denoising)
    Learn how to remove noise step-by-step using a neural network.

3. Text conditioning
    The denoising process is guided by text (like ‚Äúa cat wearing shoes‚Äù).

üß© Core Components (Big Picture)
| Component        | What it does                  |
| ---------------- | ----------------------------- |
| **Text Encoder** | Converts text ‚Üí numbers       |
| **UNet**         | Removes noise step by step    |
| **VAE**          | Converts image ‚Üî latent space |
| **Scheduler**    | Controls noise removal steps  |


In [1]:
# Step 1: Import
import torch
from diffusers import StableDiffusionPipeline

print("Torch version:", torch.__version__)

Torch version: 2.9.1+cpu




Explanation:

diffusers ‚Üí HuggingFace library for diffusion models
StableDiffusionPipeline ‚Üí pre-built pipeline combining:
    text encoder
    UNet
    VAE
    scheduler

In [2]:
# Step 2: Load Pretrained Model (CPU-friendly)
#model_id = "stabilityai/stable-diffusion-2-base"
model_id = "runwayml/stable-diffusion-v1-5"

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    dtype=torch.float32  # CPU uses float32
)
#pipe = StableDiffusionPipeline.from_pretrained(
#    "hf-internal-testing/tiny-stable-diffusion-pipe"
#)

Keyword arguments {'dtype': torch.float32} are not expected by StableDiffusionPipeline and will be ignored.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

What‚Äôs happening?

    Downloads pretrained weights (~4GB)
    Loads model architecture + weights
    Uses float32 because CPU doesn't support fp16 well

In [3]:
# Step 3 - Move model to CPU, already have CPU but makes explicit
pipe = pipe.to("cpu")
# üß™ Optional: Reduce Memory Usage
# pipe.enable_attention_slicing()

In [8]:
# Step 4: Disable Safety Checker (Optional but good for learning)
pipe.safety_checker = None
# Avoids unnecessary warnings.

In [11]:
# Step 5: Generate an Image(Main part)

#image = pipe("cat with skatter board").images[0]

prompt = "seamless knitting pattern, textile design, flat surface, top view"
negative_prompt = "person, human, model, body, mannequin, arms, face"


image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5
).images[0]

  0%|          | 0/30 [00:00<?, ?it/s]

üîç Line-by-Line Explanation
prompt
Text description ‚Üí converted into embeddings

num_inference_steps = 30
Number of denoising steps.
More steps = better quality but slower.
Think:
    Noise ‚Üí less noise ‚Üí clearer ‚Üí image
    
guidance_scale = 7.5
Controls how strongly the model follows the text.
| Value | Effect                 |
| ----- | ---------------------- |
| 1‚Äì3   | Very creative, loose   |
| 7‚Äì8   | Balanced (recommended) |
| 12+   | Overfitting to text    |


.images[0]
The pipeline returns a list of images.
We take the first one.

In [12]:
# Step 6: Show or Save Image
image.show()

# or
# image.save("fashion_output.png")