Skip to content

burakkagann/dissolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dissolution

Python License Platform

Exhibited at IT Studio Academis, Berlin (March 2026) as part of the Pixels2GenAI project.

Acknowledgments

This installation applies the learnings from two modules I authored for the Pixels2GenAI thesis project:

The wider curriculum is available at Pixels2GenAI.

Table of Contents

How it works

The piece alternates two cycles:

  • Cycle A — self → anime. A webcam snapshot is progressively noised toward pure Gaussian noise, then the reverse diffusion is driven by Stable Diffusion + ControlNet (conditioned on a hybrid edge map: Canny + MediaPipe face mesh + pose skeleton + hand landmarks) into an anime illustration of the visitor.
  • Cycle B — anime → self. The anime image dissolves back into noise and crystallizes into the live webcam view, closing the loop.

Forward diffusion uses the DDPM formula

x_t = sqrt(ᾱ_t) · x_0 + sqrt(1 - ᾱ_t) · ε

with selectable linear, cosine, or quadratic noise schedules.

Generation runs on a background thread (~3 s per image on a modern GPU) while the display loop stays at 30 FPS on the main thread. A side panel shows the live webcam, detection overlays, the combined-edges ControlNet input, the noise schedule curve, and the current state, so viewers can see the math driving the output.

Requirements

  • Python 3.11 or newer
  • CUDA-capable GPU (for real-time generation); CPU-only machines can run --no-controlnet test mode
  • A webcam
  • ~5 GB free disk for Stable Diffusion + ControlNet weights (auto-downloaded from Hugging Face on first run)

Installation

git clone https://github.com/<your-user>/dissolution.git
cd dissolution
pip install -r requirements.txt

MediaPipe task files and the Stable Diffusion / ControlNet checkpoints are fetched automatically on first launch. No manual downloads required.

Usage

# Full interactive mode
python dissolution_live.py

# Test mode (no GPU, fallback images)
python dissolution_live.py --no-controlnet

# Pick a different webcam
python dissolution_live.py --camera 1

# Record the main display
python dissolution_live.py --record out.mp4

Run python dissolution_live.py --help for the full list.

Controls

Key Action
SPACE Force-trigger dissolve during MIRROR / IDLE
1 / 2 / 3 Switch edge-detection preview (Canny / face mesh / pose)
V Cycle through prompt presets
0 Toggle sound reactivity
A Toggle ambient audio
F Toggle fullscreen
P Toggle process panel
Q / ESC Quit

Flags

Flag Purpose
--no-controlnet Disable GPU generation; use fallback images
--no-webcam Use bundled images instead of the webcam
--camera N Webcam device index
--resolution WxH Override display resolution
--fullscreen Start fullscreen
--record FILE.mp4 Record the main display to MP4
--prompt "..." Override the generation prompt
--conditioning F ControlNet conditioning scale (0.0–1.0)
--guidance F Classifier-free guidance scale
--steps N Inference steps
--sound-reactive Microphone-driven speed / brightness modulation
--ambient-audio Ambient audio synthesis tracking diffusion state
--exhibition Left-half layout for a 3440×1440 ultrawide
--watchdog Auto-restart on crash
--max-runtime H Graceful shutdown after N hours

Repository layout

.
├── dissolution_live.py   # Interactive exhibition piece
├── dissolution.py        # Offline diffusion-loop renderer
├── requirements.txt
├── LICENSE
└── README.md

At runtime the script creates two directories automatically:

  • mediapipe_models/ — auto-downloaded face/pose/hand task files (~42 MB)
  • dissolution_gallery/ — local archive of generated images and source snapshots

Both are gitignored.

Offline renderer

dissolution.py is a standalone, non-interactive script that renders a seamless palindrome loop between two still images (source → noise → target → noise → source). It is the mathematical reference for the diffusion process that dissolution_live.py animates in real time.

python dissolution.py                                  # full render
python dissolution.py --preview                        # quick low-res preview
python dissolution.py --schedule quadratic --seed 99   # experiment with schedules

Known quirks

  • First launch is slow. Stable Diffusion, the ControlNet checkpoint, and the MediaPipe task files all download on the first run (~5 GB). Subsequent launches start in seconds.
  • CUDA is effectively required. On CPU, generation takes minutes per image, not seconds — the piece does not work as intended. Use --no-controlnet for a UI-only test mode.
  • --exhibition is hardcoded for a specific monitor. It targets the left half of a 3440×1440 ultrawide (1720×1440 window at x=0). On any other display, use --resolution WxH and --fullscreen instead.
  • Webcam index is platform-dependent. If --camera 0 fails, try 1 or 2. On Linux/macOS, OpenCV sometimes needs cv2.CAP_V4L2 or cv2.CAP_AVFOUNDATION — the script uses the default backend.
  • DISSOLUTION_LEGACY_ASSETS env var. The code contains optional hooks for LoRA weights and a fabric-image fallback directory from an earlier iteration of the project. They are disabled by default; set the env var to an absolute path to re-enable. Leave unset for the anime pipeline.
  • Image ID counter persists across runs. dissolution_gallery/counter.txt tracks the last-issued DISS-XXXX id so restarts don't collide with previous captures. Delete the file to reset.
  • Generation is asynchronous. The display keeps running at 30 FPS even while the GPU is busy; a new anime image appears at the end of each cycle, not in lock-step with the dissolve animation.
  • Watchdog preserves the pipeline across crashes. --watchdog re-launches the display loop but keeps the already-loaded ControlNet model in memory to avoid the ~30 s reload cost.
  • No Apple Silicon path. The code targets CUDA. MPS may work with diffusers but has not been tested.

Privacy

During the exhibition, generated anime images and the corresponding source webcam captures were saved to dissolution_gallery/ for archival purposes. That folder is not included in this repository, because the source captures contain identifiable faces of visitors who did not consent to publication. Anyone running the piece locally will regenerate the folder as they use it.

License

MIT.

About

Real-time interactive installation — stand in front of a webcam, dissolve into noise, crystallize as anime. Built with Stable Diffusion, ControlNet, and MediaPipe.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages