📌 Project Description
This project is an interactive AI-powered text-to-image generator built using Stable Diffusion v1.5 and deployed via Gradio in Google Colab. It allows users to input text prompts and generate high-quality, customizable AI art with fine-tuned control over parameters like image size, generation steps, and style guidance.

The system leverages Hugging Face’s Diffusers library for Stable Diffusion inference and Gradio for an intuitive web interface, making AI art generation accessible without requiring local GPU resources.



✨ Key Features
✅ User-Friendly Web Interface – Gradio-based UI for easy interaction
✅ Customizable Image Generation – Control over:

Prompt & Negative Prompt (to refine output)

Steps (20-100 for quality/speed trade-off)

Guidance Scale (1-20 for creativity vs. prompt adherence)

Image Dimensions (256x256 to 1024x1024)

Seed Control (for reproducible results)

✅ Optimized for Colab – Uses FP16 precision & attention slicing to reduce VRAM usage
✅ Real-Time Preview – Displays generated images instantly
✅ Seed Tracking – Outputs the generation seed for reproducibility


⚙️ Implementation Details
1. Model Loading & Optimization
Model: runwayml/stable-diffusion-v1-5 (FP16 precision)

GPU Acceleration: Runs on Colab’s T4/A100 GPU

Memory Optimization:
pipe.enable_attention_slicing()  # Reduces VRAM usage
torch.backends.cudnn.benchmark = True  # Speeds up convolutions

python
pipe.enable_attention_slicing()  # Reduces VRAM usage
torch.backends.cudnn.benchmark = True  # Speeds up convolutions
2. Image Generation Pipeline
Takes text prompt + negative prompt

Uses Stable Diffusion’s pipeline() for inference

Saves output as generated_image.png

3. Gradio UI Components
Inputs:

Text boxes for prompts

Sliders for steps, guidance scale, width, height

Seed input for reproducibility

Outputs:

Rendered image

Used seed for regeneration

4. Deployment in Colab
Uses gr.Blocks() for a structured layout

Launched via demo.launch(share=True) for a public link


In [1]:
!pip install diffusers transformers scipy ftfy accelerate gradio

Collecting ftfy
  Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerat

In [1]:
from diffusers import StableDiffusionPipeline
import torch
import gradio as gr
from IPython.display import Image, display
import random

# Load model (this will download files)
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/308 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

In [2]:
def generate_image(
    prompt,
    negative_prompt="",
    steps=50,
    guidance_scale=7.5,
    width=512,
    height=512,
    seed=None
):
    # Set seed if provided or generate a random one
    if seed == "" or seed is None:
        seed = random.randint(0, 9999999999)
    else:
        try:
            seed = int(seed)
        except:
            seed = random.randint(0, 9999999999)

    generator = torch.Generator("cuda").manual_seed(seed)

    # Generate image
    with torch.autocast("cuda"):
        image = pipe(
            prompt,
            negative_prompt=negative_prompt if negative_prompt else None,
            num_inference_steps=steps,
            guidance_scale=guidance_scale,
            width=width,
            height=height,
            generator=generator
        ).images[0]

    # Save and return
    filename = "generated_image.png"
    image.save(filename)
    return image, seed

In [3]:
def generate_and_display(
    prompt,
    negative_prompt,
    steps,
    guidance_scale,
    width,
    height,
    seed
):
    # Generate the image
    image, used_seed = generate_image(
        prompt,
        negative_prompt,
        steps,
        guidance_scale,
        width,
        height,
        seed
    )

    # Return the image and used seed
    return image, f"Used seed: {used_seed}"

# Create the interface
with gr.Blocks(title="Stable Diffusion Image Generator") as demo:
    gr.Markdown("# 🎨 Stable Diffusion Image Generator")
    gr.Markdown("Create amazing images with AI!")

    with gr.Row():
        with gr.Column():
            prompt = gr.Textbox(label="Prompt", placeholder="Describe the image you want to generate...")
            negative_prompt = gr.Textbox(label="Negative Prompt", placeholder="What you don't want in the image...")

            with gr.Row():
                steps = gr.Slider(20, 100, value=50, step=1, label="Steps")
                guidance_scale = gr.Slider(1.0, 20.0, value=7.5, step=0.5, label="Guidance Scale")

            with gr.Row():
                width = gr.Slider(256, 1024, value=512, step=64, label="Width")
                height = gr.Slider(256, 1024, value=512, step=64, label="Height")

            seed = gr.Textbox(label="Seed (leave empty for random)", placeholder="Enter a number for reproducible results")
            generate_btn = gr.Button("Generate Image", variant="primary")

        with gr.Column():
            output_image = gr.Image(label="Generated Image")
            seed_info = gr.Textbox(label="Seed Info", interactive=False)

    generate_btn.click(
        fn=generate_and_display,
        inputs=[prompt, negative_prompt, steps, guidance_scale, width, height, seed],
        outputs=[output_image, seed_info]
    )

    gr.Markdown("### Tips:")
    gr.Markdown("- Use detailed prompts for better results")
    gr.Markdown("- Higher steps (50-100) = better quality but slower")
    gr.Markdown("- Guidance scale (7-10) works well for most cases")
    gr.Markdown("- Save your seed if you want to recreate the same image")

In [4]:
# Launch the Gradio interface
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://88c5fd8f38cf504d58.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




⚠️ Challenges & Solutions
1. GPU Memory Limitations
Problem: Colab’s free-tier GPUs (T4) have limited VRAM (~15GB).

Solution:

Enabled FP16 mode (torch.float16)

Added attention slicing to reduce peak memory

Optional: Fallback to smaller models (e.g., stabilityai/stable-diffusion-2-1)

2. Slow Model Loading
Problem: Initial from_pretrained() download takes time.

Solution:

Cache the model in Colab’s runtime

Use accelerate for faster loading

3. Unwanted Artifacts in Images
Problem: Strange limbs, distorted faces, or blurry details.

Solution:

Strong negative prompts (e.g., "ugly, deformed, extra limbs")

Higher steps (50-70) for better refinement

Guidance scale tuning (7-10) for balanced results

4. Gradio Interface Freezing
Problem: Long generations cause timeouts.

Solution:

Progress indicators (gr.Progress() in future updates)

Timeout handling (retry mechanism)