Exhibited at IT Studio Academis, Berlin (March 2026) as part of the Pixels2GenAI project.
This installation applies the learnings from two modules I authored for the Pixels2GenAI thesis project:
- Module 12.3.2 — ControlNet Guided Generation — the diffusion + ControlNet pipeline that drives the anime generation here.
- Module 11.2.3 — Face Detection — the MediaPipe face-landmark work that feeds the hybrid edge map into ControlNet.
The wider curriculum is available at Pixels2GenAI.
- How it works
- Requirements
- Installation
- Usage
- Controls
- Flags
- Repository layout
- Offline renderer
- Known quirks
- Privacy
- License
The piece alternates two cycles:
- Cycle A — self → anime. A webcam snapshot is progressively noised toward pure Gaussian noise, then the reverse diffusion is driven by Stable Diffusion + ControlNet (conditioned on a hybrid edge map: Canny + MediaPipe face mesh + pose skeleton + hand landmarks) into an anime illustration of the visitor.
- Cycle B — anime → self. The anime image dissolves back into noise and crystallizes into the live webcam view, closing the loop.
Forward diffusion uses the DDPM formula
x_t = sqrt(ᾱ_t) · x_0 + sqrt(1 - ᾱ_t) · ε
with selectable linear, cosine, or quadratic noise schedules.
Generation runs on a background thread (~3 s per image on a modern GPU) while the display loop stays at 30 FPS on the main thread. A side panel shows the live webcam, detection overlays, the combined-edges ControlNet input, the noise schedule curve, and the current state, so viewers can see the math driving the output.
- Python 3.11 or newer
- CUDA-capable GPU (for real-time generation); CPU-only machines can run
--no-controlnettest mode - A webcam
- ~5 GB free disk for Stable Diffusion + ControlNet weights (auto-downloaded from Hugging Face on first run)
git clone https://github.com/<your-user>/dissolution.git
cd dissolution
pip install -r requirements.txtMediaPipe task files and the Stable Diffusion / ControlNet checkpoints are fetched automatically on first launch. No manual downloads required.
# Full interactive mode
python dissolution_live.py
# Test mode (no GPU, fallback images)
python dissolution_live.py --no-controlnet
# Pick a different webcam
python dissolution_live.py --camera 1
# Record the main display
python dissolution_live.py --record out.mp4Run python dissolution_live.py --help for the full list.
| Key | Action |
|---|---|
SPACE |
Force-trigger dissolve during MIRROR / IDLE |
1 / 2 / 3 |
Switch edge-detection preview (Canny / face mesh / pose) |
V |
Cycle through prompt presets |
0 |
Toggle sound reactivity |
A |
Toggle ambient audio |
F |
Toggle fullscreen |
P |
Toggle process panel |
Q / ESC |
Quit |
| Flag | Purpose |
|---|---|
--no-controlnet |
Disable GPU generation; use fallback images |
--no-webcam |
Use bundled images instead of the webcam |
--camera N |
Webcam device index |
--resolution WxH |
Override display resolution |
--fullscreen |
Start fullscreen |
--record FILE.mp4 |
Record the main display to MP4 |
--prompt "..." |
Override the generation prompt |
--conditioning F |
ControlNet conditioning scale (0.0–1.0) |
--guidance F |
Classifier-free guidance scale |
--steps N |
Inference steps |
--sound-reactive |
Microphone-driven speed / brightness modulation |
--ambient-audio |
Ambient audio synthesis tracking diffusion state |
--exhibition |
Left-half layout for a 3440×1440 ultrawide |
--watchdog |
Auto-restart on crash |
--max-runtime H |
Graceful shutdown after N hours |
.
├── dissolution_live.py # Interactive exhibition piece
├── dissolution.py # Offline diffusion-loop renderer
├── requirements.txt
├── LICENSE
└── README.md
At runtime the script creates two directories automatically:
mediapipe_models/— auto-downloaded face/pose/hand task files (~42 MB)dissolution_gallery/— local archive of generated images and source snapshots
Both are gitignored.
dissolution.py is a standalone, non-interactive script that renders a seamless palindrome loop between two still images (source → noise → target → noise → source). It is the mathematical reference for the diffusion process that dissolution_live.py animates in real time.
python dissolution.py # full render
python dissolution.py --preview # quick low-res preview
python dissolution.py --schedule quadratic --seed 99 # experiment with schedules- First launch is slow. Stable Diffusion, the ControlNet checkpoint, and the MediaPipe task files all download on the first run (~5 GB). Subsequent launches start in seconds.
- CUDA is effectively required. On CPU, generation takes minutes per image, not seconds — the piece does not work as intended. Use
--no-controlnetfor a UI-only test mode. --exhibitionis hardcoded for a specific monitor. It targets the left half of a 3440×1440 ultrawide (1720×1440 window at x=0). On any other display, use--resolution WxHand--fullscreeninstead.- Webcam index is platform-dependent. If
--camera 0fails, try1or2. On Linux/macOS, OpenCV sometimes needscv2.CAP_V4L2orcv2.CAP_AVFOUNDATION— the script uses the default backend. DISSOLUTION_LEGACY_ASSETSenv var. The code contains optional hooks for LoRA weights and a fabric-image fallback directory from an earlier iteration of the project. They are disabled by default; set the env var to an absolute path to re-enable. Leave unset for the anime pipeline.- Image ID counter persists across runs.
dissolution_gallery/counter.txttracks the last-issuedDISS-XXXXid so restarts don't collide with previous captures. Delete the file to reset. - Generation is asynchronous. The display keeps running at 30 FPS even while the GPU is busy; a new anime image appears at the end of each cycle, not in lock-step with the dissolve animation.
- Watchdog preserves the pipeline across crashes.
--watchdogre-launches the display loop but keeps the already-loaded ControlNet model in memory to avoid the ~30 s reload cost. - No Apple Silicon path. The code targets CUDA. MPS may work with diffusers but has not been tested.
During the exhibition, generated anime images and the corresponding source webcam captures were saved to dissolution_gallery/ for archival purposes. That folder is not included in this repository, because the source captures contain identifiable faces of visitors who did not consent to publication. Anyone running the piece locally will regenerate the folder as they use it.
MIT.