<p align="center">
    <img src="https://raw.githubusercontent.com/doguilmak/Facades-ControlNet-SD15/refs/heads/main/assets/cover.png" alt="Facades GAN">
</p>

# **Introduction**

This notebook presents a complete and practical pipeline for controlled image generation using **Stable Diffusion** enhanced with **ControlNet**, specifically tailored for architectural **facade synthesis**. Our workflow enables precise generation of building exteriors guided by **semantic segmentation maps**.

We cover the following key components:

* **Data Preparation**: Loading and preprocessing paired images and segmentation masks from the [Facades dataset](https://www.kaggle.com/datasets/balraj98/facades-dataset), with consistent 512×512 resolution for training.

* **Model Configuration**: Leveraging the pretrained `lllyasviel/sd-controlnet-seg` model, we embed ControlNet into a Stable Diffusion pipeline to condition the generation process on structural layouts of facades.

* **Training Loop**: Fine-tuning only the ControlNet adapter layers while freezing the base Stable Diffusion weights. The model is optimized with Mean Squared Error (MSE) loss over 30 epochs using cosine learning rate scheduling.

* **Sampling & Inference**: Producing high-quality, photorealistic building facades from segmentation maps using 50 inference steps, a guidance scale of 9, and consistent architectural detailing.


By the end of this notebook, you’ll have a fully trained ControlNet pipeline capable of transforming facade segmentation maps into coherent, detailed architectural visuals — ideal for generative design, urban planning, and architectural visualization. Let’s dive in!


Make sure your runtime is **GPU** (_not_ CPU or TPU). And if it is an option, make sure you are using _Python 3_. You can select these settings by going to `Runtime -> Change runtime type -> Select the above mentioned settings and then press SAVE`.


In [None]:
!nvidia-smi

## **0. Initial Steps**

This step may took more that a minute.

In [None]:
!git clone https://huggingface.co/doguilmak/facade-controlnet-sd15

In [None]:
from PIL import Image
import torch
from diffusers import StableDiffusionControlNetPipeline
import matplotlib.pyplot as plt

## **1. Loading the Pipeline**

When running the following code, we are loading a **Stable Diffusion pipeline with ControlNet** from a **local directory** (`/content/facade-controlnet-sd15/full_pipeline`). This model is conditioned on input segmentation masks to guide image generation based on text prompts.

* Setting `safety_checker=None` disables the built-in NSFW content filter, often used in domain-specific or custom deployments.
* `torch_dtype=torch.float32` ensures inference runs in standard precision, improving compatibility across devices at the expense of speed.

Next, the following device assignment line:

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```

dynamically chooses GPU or CPU depending on hardware availability. The final call:

```python
pipeline.to(device)
```

moves the entire pipeline—including ControlNet, UNet, VAE, and tokenizer components—to the designated device for accelerated inference.

This setup ensures a fully operational, efficient inference pipeline for producing realistic building facades from semantic inputs.

In [None]:
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "/content/facade-controlnet-sd15/full_pipeline",
    safety_checker=None,
    torch_dtype=torch.float32,
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipeline.to(device)

In PyTorch, calling `.eval()` on a model disables certain behaviors that are only needed during training, such as:
- **Dropout layers**: These are turned off, so they don't randomly zero out activations.
- **BatchNorm layers**: They use learned statistics (mean/variance) instead of computing them from the batch.

Setting the model to evaluation mode ensures **deterministic and stable inference**, which is especially important when generating images. It also slightly improves performance and avoids subtle bugs that can happen if models remain in training mode during inference.

<br>

**Note: If you're using other submodules (like a `text_encoder` or `safety_checker`), you may also want to set those to `.eval()` if they are in use.**



In [None]:
pipeline.controlnet.eval()
pipeline.unet.eval()
pipeline.vae.eval()

- `prompt`: The **positive text prompt** describing what you want to generate (e.g., a high-quality facades).
- `neg_prompt`: The **negative prompt** helps steer the model **away** from undesired traits like blurriness or poor detail.
- `num_steps`: Controls the number of diffusion steps — more steps can improve image quality but take more time.
- `guidance`: Controls how strongly the model follows your text prompt. Values between **7 and 12** are common.

You can pass this `config` dictionary into your generation function or pipeline call.

## **2. Generate Images Using Fine-tuned Model**

In [None]:
PROMPT = 'A flat, front-facing image of an urban building facade, with clearly defined windows, doors, balconies, and architectural elements, minimal background, vector-style rendering, highly detailed, clean lines, HQ, HQ, 4K' # @param
NEG_PROMPT = 'blurry, low quality, hazy' # @param
NUM_STEPS = 50 # @param {type: "number"}
GUIDANCE = 9 # @param {type: "number"}

config = {
    'prompt': PROMPT,
    'neg_prompt': NEG_PROMPT,
    'num_steps': NUM_STEPS,
    'guidance': GUIDANCE
}

def load_control_image(path, size=(256, 256)):
    """
    Loads and resizes a control image (e.g., a segmentation map) from the specified path.

    Args:
        path (str): Path to the control image file.
        size (tuple): Desired output size (width, height). Default is (256, 256).

    Returns:
        PIL.Image: The resized RGB image.
    """
    img = Image.open(path).convert("RGB")
    return img.resize(size)

@torch.no_grad()
def generate_from_path(image_path):
    """
    Loads an image and generates an output image using the ControlNet pipeline.

    Args:
        image_path (str): Path to the input image.

    Returns:
        PIL.Image: The generated image based on the input and prompt.
    """
    img = Image.open(image_path).convert("RGB").resize((512, 512))
    print(f"DEBUG: image type={type(img)}, size={img.size}")

    out = pipeline(
        prompt=[config["prompt"]],
        negative_prompt=[config["neg_prompt"]],
        image=img,
        control_image=img,
        num_inference_steps=config["num_steps"],
        guidance_scale=config["guidance"],
        output_type="pil"
    )

    return out.images[0]

def display_triplet(input_path, target_path):
    """
    Loads an input image and a corresponding control image (annotation),
    generates an output image using a predefined pipeline, and displays
    all three side by side.

    Args:
        input_path (str): Full path to the input image.
        target_path (str): Full path to the target/control image.
    """
    input_img = Image.open(input_path).convert("RGB").resize((512, 512))
    target_img = Image.open(target_path).convert("RGB").resize((512, 512))

    @torch.no_grad()
    def gen(img, control):
        """
        Generates an image from input and control using a pipeline.

        Args:
            img (PIL.Image): Input image.
            control (PIL.Image): Control image (e.g., segmentation map).

        Returns:
            PIL.Image: Generated image.
        """
        return pipeline(
            prompt=[config["prompt"]],
            negative_prompt=[config["neg_prompt"]],
            image=img,
            control_image=control,
            num_inference_steps=config["num_steps"],
            guidance_scale=config["guidance"],
            output_type="pil"
        ).images[0]

    generated_img = gen(input_img, target_img)

    fig, axes = plt.subplots(1, 3, figsize=(15, 5), dpi=200)
    for ax, im, title in zip(axes,
                             [input_img, target_img, generated_img],
                             ["Input", "Target", "Generated"]):
        ax.imshow(im)
        ax.set_title(title)
        ax.axis("off")
    plt.tight_layout()
    plt.show()

In [None]:
result = generate_from_path("/content/60_input.jpg")
# display(result)
plt.figure(figsize=(6, 6), dpi=200)
plt.imshow(result)
plt.axis("off")
plt.show()

In [None]:
display_triplet("/content/60_input.jpg", "/content/60_annot.jpg")