AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation

AtteConDA is a reproducible multi-condition diffusion framework for autonomous-driving synthetic data augmentation. It preserves scene structure with semantic segmentation + depth + edge conditions while explicitly suppressing inter-condition conflicts through PAM (Patch-wise Adaptation Module).

This repository is designed to be useful in two directions at the same time:

Public research release for an English-first GitHub portfolio with clean documentation, model cards, and a project page.
Research hand-off package for future lab members so that the full flow
train -> generate -> evaluate can be reproduced from one repository.

Language note

The repository is now organized in English, but some original figure labels, comments, and filenames were created from the Japanese thesis draft and are still being translated. The public release keeps those assets for reproducibility first, then improves wording incrementally.

👨‍💻 ToDo

✅ Release the training code
✅ Release the inference code
✅ Release the model
□ Release the arXiv paper

Why AtteConDA?

Conventional image augmentation changes pixel statistics or geometry, but it usually cannot create semantic weather/time-of-day changes while keeping high-level road-scene structure intact.

AtteConDA is built around three ideas:

Use multiple structural conditions at once
semantic segmentation, depth, and edge are all injected into the generator.
Reuse strong pretrained controllable diffusion representations
the local control branch is initialized from Uni-ControlNet-compatible weights.
Suppress condition conflicts explicitly
PAM performs patch-wise condition selection so that low-frequency geometry and high-frequency contour constraints do not destroy each other.

Project page

GitHub repository: https://github.com/ShogoNoguchi/AtteConDA
Project page: https://shogonoguchi.github.io/AtteConDA/
Hugging Face collection: https://huggingface.co/collections/Shogo-Noguchi/atteconda

The Pages source is provided in docs/index.html, and the deployment workflow is provided in .github/workflows/pages.yml.

Method overview

1) Why generative augmentation instead of only geometric augmentation?

2) Full pipeline

3) Model structure

4) PAM: Patch-wise Adaptation Module

Result snapshot

Qualitative examples

Quantitative comparison against prior work

Category	Metric	PAM60K	Tune60K	DGInStyle	Best among ours
Semantic Segmentation	mIoU ↑	0.3310	0.3115	0.3722	0.3310
Depth	RMSE ↓	27.77	33.02	36.71	27.77
Edge	L1 Error ↓	0.04493	0.04561	0.09176	0.04493
Object Preservation	F1 ↑	0.1071	0.0889	0.0790	0.1071
Reality	CLIP-CMMD ↓	0.1794	0.1738	0.2710	0.1738
Diversity	1-MS-SSIM ↑	0.8480	0.8497	0.9240	0.8497
Text Alignment	R-Precision@1 ↑	0.3258	0.3563	0.3606	0.3563

What this means

AtteConDA is not trying to win only the easiest semantic-only score. Its core target is structure-preserving augmentation for high-level driving tasks. That is why depth consistency, edge consistency, object preservation, and realism are treated as first-class objectives.

Released model zoo

All released checkpoints are listed in the public Hugging Face collection.

Paper name	Hugging Face model	Link
FullScratch30K	AtteConDA-SDE-Scratch-30K	`https://huggingface.co/Shogo-Noguchi/AtteConDA-SDE-Scratch-30K`
Tune30K	AtteConDA-SDE-UniCon-30K	`https://huggingface.co/Shogo-Noguchi/AtteConDA-SDE-UniCon-30K`
Tune60K	AtteConDA-SDE-UniCon-60K	`https://huggingface.co/Shogo-Noguchi/AtteConDA-SDE-UniCon-60K`
Tune90K	AtteConDA-SDE-UniCon-90K	`https://huggingface.co/Shogo-Noguchi/AtteConDA-SDE-UniCon-90K`
PAM60K	AtteConDA-SDE-UniCon-60K-PAM	`https://huggingface.co/Shogo-Noguchi/AtteConDA-SDE-UniCon-60K-PAM`
Uni-ControlNet initialization checkpoint	AtteConDA-SDE-UniCon-Init	`https://huggingface.co/Shogo-Noguchi/AtteConDA-SDE-UniCon-Init`

Model card templates for all six releases are provided under huggingface_model_cards/.

Repository layout

AtteConDA/
├── README.md
├── LICENSE
├── THIRD_PARTY_NOTICES.md
├── CITATION.cff
├── docs/
│   ├── index.html
│   └── RESEARCHER_README.md
├── environment/
│   └── README.md
├── scripts/
│   └── verify_env.sh
├── prep/
│   ├── ucn_build_conditions.py
│   └── ucn_build_prompts.py
├── Uni-ControlNet/
│   ├── configs/
│   ├── src/train/train.py
│   └── src/tools/build_anno_syndiff_multi.py
├── DGInStyle/
├── eval/
│   └── ucn_eval/
└── figs/

For a researcher-oriented, dependency-aware explanation of the whole repository, see:

docs/RESEARCHER_README.md

Environment setup

This repository ships a reproducible environment description under environment/.

Quick start

git clone https://github.com/ShogoNoguchi/AtteConDA.git
cd AtteConDA
conda env create -f environment/environment.yaml
conda activate atteconda_env

Verification

bash scripts/verify_env.sh atteconda_env

Important note about CUDA 12.8 and RTX 5090

This repository is intentionally aligned with CUDA 12.8 / cu128 because the main training machine is an RTX 5090. The currently shipped environment/environment.yaml is the author’s validated environment and therefore preserves the exact cu128 stack that was already used successfully on the lab machine. See environment/README.md before replacing the PyTorch stack.

Datasets

Training datasets

Dataset	Split / count
BDD10K (semantic subset)	train 7K / val 1K
Cityscapes	train 2975 / val 500
GTA5	24,966
nuImages (front)	18,368
BDD100K (excluding BDD10K overlap)	train 67K / val 10K / test 20K

Evaluation dataset

Dataset	Split / count
Waymo front (first / mid10s / last extracted per segment)	train 2394 / val 606 / test 48

The evaluation pipeline uses Waymo RGB as the source image and compares original vs generated images in shared evaluation spaces.

Step 1. Build condition maps

This stage generates depth / edge / semantic segmentation condition maps from RGB.

1-1. Build condition maps for training datasets

docker run --rm --gpus all \
  -e MPLBACKEND=Agg \
  -e MAIN_PY=/app/ucn_build_conditions.py \
  -e PIP_OVERLAY_DIR=/data/ucn_prep_cache/pip-overlay \
  -e REQS_OVERLAY_PATH=/data/ucn_prep_cache/requirements.overlay.txt \
  -e 'PIP_INSTALL=timm yacs huggingface_hub>=0.34,<1.0 einops' \
  -e PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True,max_split_size_mb:128' \
  -v /home/shogo/coding/eval/ucn_eval/docker/entrypoint.sh:/app/entrypoint.sh:ro \
  -v /home/shogo/coding/prep/ucn_build_conditions.py:/app/ucn_build_conditions.py:ro \
  -v /home/shogo/coding/datasets:/home/shogo/coding/datasets:ro \
  -v /home/shogo/coding/Metric3D:/home/shogo/coding/Metric3D:ro \
  -v /home/shogo/.cache/huggingface:/root/.cache/huggingface \
  -v /data:/data \
  ucn-eval \
  --datasets all --tasks all --semseg-batch-size 1

1-2. Optional subset consistency check

docker run --rm --gpus all \
  -e MPLBACKEND=Agg \
  -e MAIN_PY=/app/ucn_build_conditions.py \
  -e PIP_OVERLAY_DIR=/data/ucn_prep_cache/pip-overlay \
  -e REQS_OVERLAY_PATH=/data/ucn_prep_cache/requirements.overlay.txt \
  -e 'PIP_INSTALL=timm yacs huggingface_hub>=0.34,<1.0 einops' \
  -e PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True,max_split_size_mb:128' \
  -v /home/shogo/coding/eval/ucn_eval/docker/entrypoint.sh:/app/entrypoint.sh:ro \
  -v /home/shogo/coding/prep/ucn_build_conditions.py:/app/ucn_build_conditions.py:ro \
  -v /home/shogo/coding/datasets:/home/shogo/coding/datasets:ro \
  -v /home/shogo/coding/Metric3D:/home/shogo/coding/Metric3D:ro \
  -v /home/shogo/.cache/huggingface:/root/.cache/huggingface \
  -v /data:/data \
  ucn-eval \
  --datasets all --tasks all --subset-check --verbose

Step 2. Build text prompts

This stage generates:

training prompts for the source-domain training datasets
evaluation prompts for Waymo

docker run --rm --gpus all \
  --entrypoint /app/entrypoint_prompts_v2.sh \
  -e MPLBACKEND=Agg \
  -e MAIN_PY=/app/ucn_build_prompts.py \
  -e PIP_OVERLAY_DIR=/data/ucn_prep_cache/pip-overlay \
  -e HF_HOME=/root/.cache/huggingface \
  -e OPENCLIP_CACHE_DIR=/root/.cache/huggingface/hub \
  -e VLM_LOCAL_DIR=/data/hf_models/Qwen/Qwen3-VL-32B-Instruct \
  -e QWEN_QUANT=4bit \
  -e TRANSFORMERS_OFFLINE=0 -e HF_HUB_OFFLINE=0 \
  -v /home/shogo/coding/eval/ucn_eval/docker/entrypoint_prompts_v2.sh:/app/entrypoint_prompts_v2.sh:ro \
  -v /home/shogo/coding/prep/ucn_build_prompts.py:/app/ucn_build_prompts.py:ro \
  -v /home/shogo/coding/datasets:/home/shogo/coding/datasets:ro \
  -v /home/shogo/.cache/huggingface:/root/.cache/huggingface \
  -v /data:/data \
  ucn-eval \
  --jobs train eval_waymo \
  --datasets all \
  --cityscapes-use-splits train \
  --bdd10k10k-use-splits train \
  --bdd100k-use-splits train \
  --vlm-local-dir /data/hf_models/Qwen/Qwen3-VL-32B-Instruct \
  --openclip-cache-dir /root/.cache/huggingface/hub \
  --quant 4bit \
  --clip-batch 8 \
  --cap-max-new 96 \
  --cap-words 45 \
  --hidethinking \
  --verbose

Step 3. Build Uni-ControlNet annotation CSV files

After condition maps and prompts are ready, build the CSV files used by training and Waymo inference/evaluation.

cd /data/coding/B_thesis_Repo/Uni-ControlNet

python -u src/tools/build_anno_syndiff_multi.py \
  --make-train \
  --make-waymo-val \
  --limit -1 \
  --verbose

Expected outputs:

Uni-ControlNet/data/anno_syndiff_train.csv
Uni-ControlNet/data/anno_syndiff_waymo_val.csv

Step 4. Training

4-1. Train without PAM

cd /data/coding/B_thesis_Repo/Uni-ControlNet

python -u src/train/train.py \
  --config-path ./configs/local_v15_syndiff.yaml \
  --learning-rate 1e-5 \
  --batch-size 4 \
  --training-steps 30084 \
  --resume-path /data/coding/Uni-ControlNet/ckpt/init_uni_EX7_fromSD15.ckpt \
  --logdir ./logs/finetune_uni_syndiff_EX7_FromNOPretrain_30Ksteps \
  --logger-version 7 \
  --log-freq 1000 \
  --ckpt-every-n-steps 30000 \
  --sd-locked True \
  --gpus 1

If you start from the released Uni-ControlNet-compatible initialization checkpoint, replace --resume-path with the path to AtteConDA-SDE-UniCon-Init.ckpt. If you train fully from scratch, omit --resume-path.

4-2. Train with PAM

cd /data/coding/B_thesis_Repo/Uni-ControlNet

python -u src/train/train.py \
  --config-path ./configs/local_v15_syndiff_pam2.yaml \
  --learning-rate 1e-5 \
  --batch-size 4 \
  --training-steps 90168 \
  --resume-path /data/coding/ckpts/version_4/checkpoints/epoch=1-step=60000.ckpt \
  --logdir ./logs/finetune_uni_syndiff_pam2_From60KpurePretrain \
  --logger-version 0 \
  --log-freq 1000 \
  --ckpt-every-n-steps 30000 \
  --sd-locked True \
  --gpus 1

Step 5. Waymo inference

5-1. Our model

python -u /data/coding/Uni-ControlNet/src/tools/infer_waymo_unicontrol_offline.py \
  --splits training validation testing \
  --camera front \
  --uni-config /data/coding/Uni-ControlNet/configs/uni_v15.yaml \
  --uni-ckpt /data/coding/Uni-ControlNet/logs/finetune_uni_syndiff_EX7_FromNOPretrain_30Ksteps/lightning_logs/version_7/checkpoints/periodic-stepstep=000030000.ckpt \
  --image-resolution 512 \
  --num-samples 1 \
  --ddim-steps 50 \
  --scale 7.5 \
  --strength 1.0 \
  --global-strength 0.0 \
  --prompt-root /data/syndiff_prompts/prompts_eval_waymo \
  --out-root /data/coding/datasets/WaymoV2/Ucn_byPure_Finetune30K_FromNOPretrain \
  --experiments-root /data/ucn_infer_cache_ex7 \
  --experiment-id EX7 \
  --experiment-note "Perform finetune 30Ksteps without using uni.ckpt, which is officially distributed by UniControlNet. Abration is used to measure the power of prior learning." \
  --overwrite \
  --verbose

Use the PAM config when running a PAM-trained checkpoint. Use the non-PAM config when running a non-PAM checkpoint.

5-2. DGInStyle baseline

cd /data/coding/DGInStyle

python -u /data/coding/DGInStyle/src/tools/infer_waymo_unicontrol_offline.py \
  --splits training validation testing \
  --camera front \
  --DG-config /data/coding/DGInStyle/configs/dginstyle_sd15_semseg.yaml \
  --DG-ckpt yurujaja/DGInStyle \
  --image-resolution 512 \
  --num-samples 1 \
  --ddim-steps 50 \
  --scale 7.5 \
  --strength 1.0 \
  --eta 0.0 \
  --prompt-root /data/syndiff_prompts/prompts_eval_waymo \
  --out-root /data/coding/datasets/WaymoV2/DGInstyle_Pure \
  --experiments-root /data/ucn_infer_cache \
  --experiment-id EX9 \
  --experiment-note "DGInStyle + Qwen3-VL prompts" \
  --verbose

Step 6. Evaluation

6-1. Build the evaluation Docker image

cd /home/shogo/coding/eval/ucn_eval

docker build \
  -t ucn-eval \
  -f /home/shogo/coding/eval/ucn_eval/docker/Dockerfile \
  /home/shogo/coding/eval/ucn_eval

6-2. Run evaluation

docker run --rm --gpus all \
  -e MPLBACKEND=Agg \
  -e PIP_OVERLAY_DIR=/data/ucn_eval_cache_ex9/pip-overlay \
  -e REQS_OVERLAY_PATH=/data/ucn_eval_cache_ex9/requirements.overlay.txt \
  -e 'PIP_INSTALL=timm yacs prefetch_generator easydict thop scikit-image pytesseract huggingface_hub>=0.34,<1.0 einops matplotlib lpips pytorch-msssim' \
  -v /home/shogo/coding/eval/ucn_eval/docker/entrypoint.sh:/app/entrypoint.sh:ro \
  -v /home/shogo/coding/eval/ucn_eval/eval_unicontrol_waymo.py:/app/eval_unicontrol_waymo.py:ro \
  -v /home/shogo/coding/datasets/WaymoV2:/home/shogo/coding/datasets/WaymoV2:ro \
  -v /home/shogo/coding/Metric3D:/home/shogo/coding/Metric3D:ro \
  -v /data:/data \
  -v /home/shogo/.cache/huggingface:/root/.cache/huggingface \
  -v /data/ucn_eval_cache_ex9/torch_hub:/root/.cache/torch/hub \
  ucn-eval \
  --cache-root /data/ucn_eval_cache_ex9 \
  --splits training validation testing \
  --camera front \
  --tasks all \
  --reality-metric clip-cmmd \
  --gdinomodel IDEA-Research/grounding-dino-base \
  --det-prompts car truck bus motorcycle bicycle person pedestrian "traffic light" "traffic sign" "stop sign" "speed limit sign" "crosswalk sign" "construction sign" "traffic cone" \
  --ocr-engine tesseract \
  --iou-thr 0.5 \
  --orig-root /home/shogo/coding/datasets/WaymoV2/extracted \
  --gen-root /data/coding/datasets/WaymoV2/DGInstyle_Pure \
  --prompt-root /data/syndiff_prompts/prompts_eval_waymo \
  --annotation-mode all \
  --annotate-limit 24 \
  --annotate-out /data/ucn_eval_cache_ex9/viz_ex9 \
  --tb --tb-dir /data/ucn_eval_cache_ex9/tensorboard_ex9 \
  --experiment-id EX9 \
  --experiment-note "DGInStyle + Qwen3-VL prompts" \
  --drivable-methods onefroad \
  --no-auto-batch \
  --verbose

TensorBoard

Training and evaluation both support TensorBoard logging.

Typical example:

tensorboard --logdir /data/ucn_eval_cache_ex9/tensorboard_ex9 --bind_all

Researcher hand-off document

For the full lab-facing explanation, including:

dataset trees
dependency trees
exact absolute paths
public-vs-vendored provenance
external runtime model download sources
cleanup recommendations
release checklist

see:

docs/RESEARCHER_README.md

Acknowledgements and upstream dependencies

AtteConDA stands on top of several important upstream projects.

Directly relevant upstream codebases

Uni-ControlNet
all-in-one controllable diffusion backbone used as the main architectural starting point
repository: https://github.com/ShihaoZhaoZSH/Uni-ControlNet
DGInStyle
important prior work and baseline for autonomous-driving diffusion-based augmentation
repository: https://github.com/prs-eth/DGInStyle

Academic context

This repository was implemented and organized by Shogo Noguchi.

This work was conducted as bachelor’s thesis research in Yuminaka Lab, Gunma University, under the academic supervision of Prof. Yasushi Yuminaka.

Paper-level inspiration

PixelPonder
cited as a research idea reference for dynamic multi-condition conflict handling
paper: https://arxiv.org/abs/2503.06684

This repository acknowledges the paper as prior art, but does not claim code provenance from an unlicensed source tree.

Runtime-downloaded models and tools

Stable Diffusion v1.5 family
OneFormer (Cityscapes)
Metric3Dv2
Grounding DINO
CLIP / open_clip
Qwen3-VL
LPIPS / AlexNet
Tesseract OCR

See THIRD_PARTY_NOTICES.md and docs/RESEARCHER_README.md for detailed attribution and release guidance.

License

AtteConDA-original top-level documentation and release assets in this bundle: Apache-2.0
Vendored / modified upstream code: keep each upstream license inside its own subtree
Model weights: follow the model card and upstream base-model terms

Please read:

before redistributing code or checkpoints.

Citation

A software citation file is provided as CITATION.cff.

BibTeX-style software citation:

@misc{noguchi2026atteconda,
  title        = {AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation},
  author       = {Shogo Noguchi},
  year         = {2026},
  howpublished = {GitHub repository},
  note         = {Gunma University}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
DGInStyle		DGInStyle
Uni-ControlNet		Uni-ControlNet
docs		docs
environment		environment
eval/ucn_eval		eval/ucn_eval
figs		figs
huggingface_model_cards		huggingface_model_cards
prep		prep
scripts		scripts
tools		tools
.gitignore		.gitignore
APPLY_ORDER.md		APPLY_ORDER.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
RELEASE_REPORT.md		RELEASE_REPORT.md
REPO_CLEANUP_COMMANDS.sh		REPO_CLEANUP_COMMANDS.sh
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md

Folders and files

Latest commit

History

Repository files navigation

AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation

👨‍💻 ToDo

Why AtteConDA?

Project page

Method overview

1) Why generative augmentation instead of only geometric augmentation?

2) Full pipeline

3) Model structure

4) PAM: Patch-wise Adaptation Module

Result snapshot

Qualitative examples

Quantitative comparison against prior work

What this means

Released model zoo

Repository layout

Environment setup

Quick start

Verification

Important note about CUDA 12.8 and RTX 5090

Datasets

Training datasets

Evaluation dataset

Step 1. Build condition maps

1-1. Build condition maps for training datasets

1-2. Optional subset consistency check

Step 2. Build text prompts

Step 3. Build Uni-ControlNet annotation CSV files

Step 4. Training

4-1. Train without PAM

4-2. Train with PAM

Step 5. Waymo inference

5-1. Our model

5-2. DGInStyle baseline

Step 6. Evaluation

6-1. Build the evaluation Docker image

6-2. Run evaluation

TensorBoard

Researcher hand-off document

Acknowledgements and upstream dependencies

Directly relevant upstream codebases

Academic context

Paper-level inspiration

Runtime-downloaded models and tools

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages