# SDXL ControlNet - Architecture-Guided Image Generation

This notebook demonstrates how to use ControlNet with Stable Diffusion XL (SDXL) for precise control over image generation. We'll explore various ControlNet models and show how to generate images from architectural sketches, poses, depth maps, and more.

## What is ControlNet?
ControlNet is a neural network structure that allows you to control diffusion models by adding extra conditions. It enables:
- Pose control using OpenPose
- Edge control using Canny edge detection
- Depth control using depth maps
- Scribble control for sketch-to-image
- And many more conditioning types

## 1. Setup and Installation

In [None]:
# Install required packages
!pip install -q diffusers transformers accelerate controlnet-aux opencv-python pillow matplotlib torch torchvision

In [None]:
import torch
from diffusers import (
    StableDiffusionXLControlNetPipeline, 
    ControlNetModel,
    AutoencoderKL,
    StableDiffusionXLImg2ImgPipeline
)
from diffusers.utils import load_image
import cv2
import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from controlnet_aux import (
    CannyDetector, 
    OpenposeDetector, 
    MidasDetector,
    HEDdetector,
    LineartDetector
)
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name()}")

## 2. Setup ControlNet Models and Pipeline

In [None]:
# Initialize different ControlNet models
def setup_controlnet_models():
    """Setup various ControlNet models for different conditioning types."""
    
    models = {}
    
    # Canny ControlNet for edge control
    print("Loading Canny ControlNet...")
    models['canny'] = ControlNetModel.from_pretrained(
        "diffusers/controlnet-canny-sdxl-1.0",
        torch_dtype=torch.float16,
        use_safetensors=True
    )
    
    # Depth ControlNet for depth control
    print("Loading Depth ControlNet...")
    models['depth'] = ControlNetModel.from_pretrained(
        "diffusers/controlnet-depth-sdxl-1.0", 
        torch_dtype=torch.float16,
        use_safetensors=True
    )
    
    return models

controlnet_models = setup_controlnet_models()
print("ControlNet models loaded successfully!")

In [None]:
# Setup SDXL pipeline with ControlNet
def create_controlnet_pipeline(controlnet_model):
    """Create SDXL ControlNet pipeline."""
    
    # Load VAE for better quality
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix", 
        torch_dtype=torch.float16
    )
    
    # Create pipeline
    pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        controlnet=controlnet_model,
        vae=vae,
        torch_dtype=torch.float16,
        use_safetensors=True,
        variant="fp16"
    )
    
    # Enable memory efficient attention
    pipe.enable_model_cpu_offload()
    pipe.enable_xformers_memory_efficient_attention()
    
    return pipe

print("Pipeline creation function ready")

## 3. Preprocessing Functions

In [None]:
# Initialize preprocessors
canny_detector = CannyDetector()
depth_detector = MidasDetector.from_pretrained('valhalla/t2iadapter-aux-models')
openpose_detector = OpenposeDetector.from_pretrained('lllyasviel/Annotators')
hed_detector = HEDdetector.from_pretrained('lllyasviel/Annotators')
lineart_detector = LineartDetector.from_pretrained('lllyasviel/Annotators')

print("Preprocessors initialized successfully!")

In [None]:
def preprocess_image(image, processor_type, **kwargs):
    """
    Preprocess image based on the ControlNet type.
    
    Args:
        image: Input PIL Image
        processor_type: Type of preprocessing ('canny', 'depth', 'openpose', etc.)
        **kwargs: Additional parameters for processors
    
    Returns:
        Processed PIL Image
    """
    
    if processor_type == 'canny':
        low_threshold = kwargs.get('low_threshold', 100)
        high_threshold = kwargs.get('high_threshold', 200)
        return canny_detector(image, low_threshold=low_threshold, high_threshold=high_threshold)
    
    elif processor_type == 'depth':
        return depth_detector(image)
    
    elif processor_type == 'openpose':
        return openpose_detector(image)
    
    elif processor_type == 'hed':
        return hed_detector(image)
    
    elif processor_type == 'lineart':
        return lineart_detector(image)
    
    else:
        raise ValueError(f"Unknown processor type: {processor_type}")

print("Preprocessing function ready")

## 4. Utility Functions for Visualization

In [None]:
def display_images(images, titles, figsize=(15, 5)):
    """
    Display multiple images side by side.
    """
    fig, axes = plt.subplots(1, len(images), figsize=figsize)
    
    if len(images) == 1:
        axes = [axes]
    
    for img, title, ax in zip(images, titles, axes):
        ax.imshow(img)
        ax.set_title(title, fontsize=12)
        ax.axis('off')
    
    plt.tight_layout()
    plt.show()

def create_sample_architectural_sketch():
    """
    Create a simple architectural sketch for demonstration.
    """
    # Create a simple house sketch
    img = Image.new('RGB', (512, 512), color='white')
    draw = ImageDraw.Draw(img)
    
    # House outline
    draw.rectangle([100, 300, 400, 450], outline='black', width=3)
    
    # Roof
    draw.polygon([80, 300, 250, 200, 420, 300], outline='black', width=3)
    
    # Door
    draw.rectangle([220, 350, 280, 450], outline='black', width=2)
    
    # Windows
    draw.rectangle([130, 320, 180, 360], outline='black', width=2)
    draw.rectangle([320, 320, 370, 360], outline='black', width=2)
    
    # Window crosses
    draw.line([155, 320, 155, 360], fill='black', width=1)
    draw.line([130, 340, 180, 340], fill='black', width=1)
    draw.line([345, 320, 345, 360], fill='black', width=1)
    draw.line([320, 340, 370, 340], fill='black', width=1)
    
    return img

print("Utility functions ready")

## 5. Canny Edge ControlNet Example

In [None]:
# Create or load an architectural sketch
sketch_image = create_sample_architectural_sketch()

# Process with Canny edge detection
canny_image = preprocess_image(sketch_image, 'canny', low_threshold=50, high_threshold=150)

# Display original and processed images
display_images(
    [sketch_image, canny_image], 
    ['Original Sketch', 'Canny Edges'],
    figsize=(10, 5)
)

In [None]:
# Generate image using Canny ControlNet
def generate_with_canny_control(canny_image, prompt, negative_prompt="", num_images=1):
    """
    Generate images using Canny ControlNet.
    """
    # Create pipeline
    pipe = create_controlnet_pipeline(controlnet_models['canny'])
    
    # Generate images
    images = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=canny_image,
        num_inference_steps=30,
        guidance_scale=7.5,
        controlnet_conditioning_scale=1.0,
        num_images_per_prompt=num_images,
        height=512,
        width=512
    ).images
    
    return images

# Test prompts for architectural generation
prompts = [
    "modern luxury house, architectural photography, beautiful lighting, detailed, 4k",
    "victorian style house, ornate details, vintage architecture, professional photo",
    "futuristic house, glass and steel, minimalist design, contemporary architecture"
]

print("Canny ControlNet generation function ready")

In [None]:
# Generate images with different prompts
negative_prompt = "blurry, low quality, distorted, ugly, bad architecture"

print("Generating images with Canny ControlNet...")
for i, prompt in enumerate(prompts[:2]):  # Generate for first 2 prompts
    print(f"\nGenerating: {prompt}")
    
    generated_images = generate_with_canny_control(
        canny_image, 
        prompt, 
        negative_prompt
    )
    
    # Display results
    display_images(
        [canny_image, generated_images[0]], 
        ['Canny Control', f'Generated: Style {i+1}'],
        figsize=(10, 5)
    )

print("Canny ControlNet examples completed!")

## 6. Depth ControlNet Example

In [None]:
# For depth control, let's create a simple depth map or use an existing image
def create_depth_map_example():
    """Create a simple depth map for demonstration."""
    # Create a gradient depth map (closer = darker, further = lighter)
    img = Image.new('L', (512, 512), color=128)
    
    # Convert to numpy for easier manipulation
    depth_array = np.array(img)
    
    # Create depth gradient (simulate a room with perspective)
    for y in range(512):
        for x in range(512):
            # Distance from center bottom
            dist = np.sqrt((x - 256)**2 + (y - 400)**2)
            depth_value = min(255, max(0, int(255 - (dist / 3))))
            depth_array[y, x] = depth_value
    
    return Image.fromarray(depth_array).convert('RGB')

# Create depth map
depth_map = create_depth_map_example()

# Display depth map
display_images([depth_map], ['Depth Map Example'], figsize=(5, 5))

In [None]:
# Generate with depth control
def generate_with_depth_control(depth_image, prompt, negative_prompt=""):
    """
    Generate images using Depth ControlNet.
    """
    # Create pipeline
    pipe = create_controlnet_pipeline(controlnet_models['depth'])
    
    # Generate image
    images = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=depth_image,
        num_inference_steps=30,
        guidance_scale=7.5,
        controlnet_conditioning_scale=0.8,
        height=512,
        width=512
    ).images
    
    return images[0]

# Test depth-controlled generation
depth_prompt = "luxurious interior room, modern furniture, beautiful lighting, architectural photography"
depth_negative = "cluttered, dark, low quality, distorted"

print("Generating with depth control...")
depth_generated = generate_with_depth_control(depth_map, depth_prompt, depth_negative)

# Display results
display_images(
    [depth_map, depth_generated], 
    ['Depth Control', 'Generated Interior'],
    figsize=(10, 5)
)

## 7. Advanced ControlNet Techniques

In [None]:
def multi_controlnet_generation(control_image1, control_image2, prompt, controlnet_types, conditioning_scales):
    """
    Generate images using multiple ControlNets simultaneously.
    
    Args:
        control_image1, control_image2: Control images
        prompt: Text prompt
        controlnet_types: List of ControlNet types to use
        conditioning_scales: List of conditioning scales for each ControlNet
    """
    from diffusers import MultiControlNetModel
    
    # Create multi-ControlNet
    controlnets = [controlnet_models[ct] for ct in controlnet_types]
    multi_controlnet = MultiControlNetModel(controlnets)
    
    # Create pipeline with multi-ControlNet
    pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        controlnet=multi_controlnet,
        torch_dtype=torch.float16,
        use_safetensors=True
    )
    
    pipe.enable_model_cpu_offload()
    
    # Generate with multiple controls
    images = pipe(
        prompt=prompt,
        image=[control_image1, control_image2],
        num_inference_steps=25,
        guidance_scale=7.5,
        controlnet_conditioning_scale=conditioning_scales,
        height=512,
        width=512
    ).images
    
    return images[0]

print("Multi-ControlNet function ready (advanced technique)")

## 8. ControlNet with Custom Conditioning

In [None]:
def adjust_controlnet_strength(control_image, prompt, strength_values):
    """
    Demonstrate how ControlNet conditioning strength affects generation.
    """
    pipe = create_controlnet_pipeline(controlnet_models['canny'])
    
    results = []
    
    for strength in strength_values:
        image = pipe(
            prompt=prompt,
            image=control_image,
            num_inference_steps=25,
            guidance_scale=7.5,
            controlnet_conditioning_scale=strength,
            height=512,
            width=512
        ).images[0]
        
        results.append(image)
    
    return results

# Test different conditioning strengths
strength_values = [0.5, 1.0, 1.5]
test_prompt = "beautiful modern house, architectural photography, professional"

print("Testing different ControlNet conditioning strengths...")
strength_results = adjust_controlnet_strength(canny_image, test_prompt, strength_values)

# Display results
titles = [f'Strength: {s}' for s in strength_values]
display_images(strength_results, titles, figsize=(15, 5))

print("ControlNet strength comparison completed!")

## 9. Image-to-Image with ControlNet

In [None]:
def controlnet_img2img(base_image, control_image, prompt, strength=0.8):
    """
    Combine ControlNet with image-to-image generation.
    """
    # Create ControlNet pipeline
    controlnet_pipe = create_controlnet_pipeline(controlnet_models['canny'])
    
    # First pass: Generate with ControlNet
    controlnet_result = controlnet_pipe(
        prompt=prompt,
        image=control_image,
        num_inference_steps=30,
        guidance_scale=7.5,
        controlnet_conditioning_scale=1.0,
    ).images[0]
    
    # Second pass: Refine with img2img
    img2img_pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-refiner-1.0",
        torch_dtype=torch.float16,
        use_safetensors=True
    )
    img2img_pipe.enable_model_cpu_offload()
    
    refined_result = img2img_pipe(
        prompt=prompt,
        image=controlnet_result,
        strength=strength,
        num_inference_steps=20,
        guidance_scale=7.5
    ).images[0]
    
    return controlnet_result, refined_result

print("ControlNet + Img2Img function ready")

## 10. Batch Processing and Comparison

In [None]:
def batch_style_transfer(control_image, style_prompts):
    """
    Generate multiple architectural styles from the same control image.
    """
    pipe = create_controlnet_pipeline(controlnet_models['canny'])
    
    results = []
    
    for prompt in style_prompts:
        print(f"Generating: {prompt[:50]}...")
        
        image = pipe(
            prompt=prompt,
            negative_prompt="low quality, blurry, distorted",
            image=control_image,
            num_inference_steps=25,
            guidance_scale=7.5,
            controlnet_conditioning_scale=1.0,
        ).images[0]
        
        results.append(image)
    
    return results

# Define architectural styles
architectural_styles = [
    "modern minimalist house, clean lines, glass and concrete, contemporary architecture",
    "traditional japanese house, wooden structure, zen garden, cultural architecture",
    "art deco mansion, ornate details, 1920s style, luxury architecture",
    "sustainable eco house, green roof, solar panels, environmental architecture"
]

print("Generating multiple architectural styles...")
style_results = batch_style_transfer(canny_image, architectural_styles[:2])  # Generate 2 styles

# Display results
display_images(
    [canny_image] + style_results, 
    ['Control Image', 'Modern Style', 'Japanese Style'],
    figsize=(15, 5)
)

print("Batch style transfer completed!")

## 11. Quality Evaluation and Metrics

In [None]:
def evaluate_controlnet_adherence(control_image, generated_image, control_type='canny'):
    """
    Evaluate how well the generated image adheres to the ControlNet conditioning.
    """
    # Extract features from both images
    if control_type == 'canny':
        control_processed = preprocess_image(control_image, 'canny')
        generated_processed = preprocess_image(generated_image, 'canny')
    elif control_type == 'depth':
        control_processed = preprocess_image(control_image, 'depth')
        generated_processed = preprocess_image(generated_image, 'depth')
    
    # Convert to numpy for comparison
    control_array = np.array(control_processed.convert('L'))
    generated_array = np.array(generated_processed.convert('L'))
    
    # Calculate similarity metrics
    # Structural similarity
    from skimage.metrics import structural_similarity as ssim
    similarity_score = ssim(control_array, generated_array)
    
    # Edge preservation (for canny)
    if control_type == 'canny':
        edge_preservation = np.mean(np.abs(control_array - generated_array)) / 255.0
        edge_preservation = 1.0 - edge_preservation  # Convert to similarity score
    else:
        edge_preservation = similarity_score
    
    return {
        'structural_similarity': similarity_score,
        'edge_preservation': edge_preservation,
        'overall_score': (similarity_score + edge_preservation) / 2
    }

# Evaluate our generated images
if 'generated_images' in locals() and len(generated_images) > 0:
    evaluation = evaluate_controlnet_adherence(canny_image, generated_images[0], 'canny')
    
    print("ControlNet Adherence Evaluation:")
    print(f"  Structural Similarity: {evaluation['structural_similarity']:.3f}")
    print(f"  Edge Preservation: {evaluation['edge_preservation']:.3f}")
    print(f"  Overall Score: {evaluation['overall_score']:.3f}")
else:
    print("No generated images available for evaluation")

print("\nEvaluation function ready for use")

## 12. Optimization and Performance Tips

In [None]:
# Performance optimization tips
optimization_tips = """
=== SDXL CONTROLNET OPTIMIZATION TIPS ===

1. MEMORY OPTIMIZATION:
   - Use enable_model_cpu_offload() to move models between CPU/GPU
   - Enable enable_xformers_memory_efficient_attention() for lower VRAM usage
   - Use torch.float16 precision to reduce memory footprint
   - Clear CUDA cache between generations: torch.cuda.empty_cache()

2. SPEED OPTIMIZATION:
   - Reduce num_inference_steps (20-30 is usually sufficient)
   - Use smaller image dimensions during experimentation
   - Batch multiple generations when possible
   - Consider using DPM++ schedulers for faster sampling

3. QUALITY OPTIMIZATION:
   - Fine-tune guidance_scale (7.0-9.0 works well for SDXL)
   - Adjust controlnet_conditioning_scale (0.8-1.2 range)
   - Use high-quality control images with clear features
   - Experiment with different negative prompts

4. CONTROLNET SPECIFIC:
   - Canny: Adjust thresholds for edge detection sensitivity
   - Depth: Ensure proper depth map contrast
   - OpenPose: Use clear, unambiguous poses
   - Multiple ControlNets: Balance conditioning scales

5. PRODUCTION CONSIDERATIONS:
   - Implement proper error handling and retries
   - Monitor GPU memory usage and implement limits
   - Cache models to avoid repeated loading
   - Use proper async/await for web applications
"""

print(optimization_tips)

# Memory cleanup function
def cleanup_memory():
    """Clean up GPU memory after generation."""
    import gc
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.synchronize()

# Clean up
cleanup_memory()
print("\nMemory cleaned up")

## 13. Save and Export Functions

In [None]:
def save_generation_results(images, prompts, control_images, output_dir="./controlnet_outputs"):
    """
    Save generation results with metadata.
    """
    import os
    import json
    from datetime import datetime
    
    # Create output directory
    os.makedirs(output_dir, exist_ok=True)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    results = []
    
    for i, (image, prompt, control_img) in enumerate(zip(images, prompts, control_images)):
        # Save generated image
        gen_filename = f"{timestamp}_generated_{i:03d}.png"
        image.save(os.path.join(output_dir, gen_filename))
        
        # Save control image
        ctrl_filename = f"{timestamp}_control_{i:03d}.png"
        control_img.save(os.path.join(output_dir, ctrl_filename))
        
        # Record metadata
        results.append({
            "generated_image": gen_filename,
            "control_image": ctrl_filename,
            "prompt": prompt,
            "timestamp": timestamp
        })
    
    # Save metadata
    metadata_filename = f"{timestamp}_metadata.json"
    with open(os.path.join(output_dir, metadata_filename), 'w') as f:
        json.dump(results, f, indent=2)
    
    print(f"Results saved to {output_dir}")
    return results

print("Save function ready")

# Example of saving results (uncomment to use)
# if 'style_results' in locals():
#     save_generation_results(
#         style_results[:2], 
#         architectural_styles[:2], 
#         [canny_image] * 2
#     )

## Conclusion

This notebook demonstrated comprehensive usage of SDXL ControlNet for architecture-guided image generation:

### Key Achievements:
1. **ControlNet Setup** - Configured multiple ControlNet models (Canny, Depth)
2. **Preprocessing Pipeline** - Implemented various image preprocessing techniques
3. **Architectural Generation** - Generated buildings from sketches and control inputs
4. **Style Transfer** - Applied different architectural styles to the same structure
5. **Quality Evaluation** - Measured ControlNet adherence and generation quality
6. **Optimization** - Implemented memory and performance optimizations

### ControlNet Applications:
- **Architecture Visualization**: Convert sketches to photorealistic buildings
- **Interior Design**: Control room layouts with depth maps
- **Concept Art**: Generate variations while maintaining structural elements
- **Urban Planning**: Visualize building designs in context

### Best Practices Learned:
1. **Control Image Quality**: High-quality, clear control images produce better results
2. **Conditioning Scale**: Balance between prompt following and control adherence
3. **Prompt Engineering**: Specific, detailed prompts yield better architectural features
4. **Memory Management**: Essential for running SDXL on consumer hardware

### Next Steps:
1. **Custom ControlNet Training**: Train ControlNets on architectural datasets
2. **Multi-Modal Control**: Combine different ControlNet types
3. **Real-time Applications**: Build interactive architectural design tools
4. **3D Integration**: Combine with 3D rendering pipelines
5. **Production Deployment**: Scale for architectural firms and design studios

**Note**: This notebook provides a foundation for architectural AI generation. For production use, consider fine-tuning models on architectural datasets and implementing proper user interfaces.