Deterministic image-to-music generator.
Turn any picture into a reproducible, professional-sounding piece of music — sheet, MIDI, and lossless 24-bit FLAC.
Features · Quick start · Usage · How it works · Türkçe
- Features
- Demo
- Quick start
- Installation
- Dependencies
- Usage
- How it works
- SoundFonts
- Project layout
- Troubleshooting
- Developer commands
- Roadmap
- Contributing
- Developer
- License
- Deterministic by design. The same image always produces a bit-identical FLAC, MIDI, and MusicXML. Every random choice is seeded by the SHA-256 of the source pixels.
- Two professional audio backends.
- SoundFont (recommended): renders the generated MIDI through FluidSynth using a real instrument SoundFont (MuseScore General SF3, GeneralUser GS, FluidR3, …).
- In-house Python synthesizer (fallback): instrument-aware waveforms (additive strings with detune, FM brass, additive woodwinds, Karplus-Strong-inspired harp/guitar, multi-oscillator piano), velocity-sensitive timbre, and a full master bus (air EQ → stereo chorus → multi-tap diffuse reverb → soft-knee compressor → tanh limiter).
- Lossless 24-bit / 48 kHz FLAC output. No psychoacoustic compression — bit-perfect mastering-grade audio.
- MusicXML 4.0 + Type-1 MIDI so you can open the score in MuseScore / Finale / Sibelius / Dorico, or re-render the MIDI in any DAW or sample library.
- Live progress display. Step-by-step Rich console output with timings; a friendly banner is shown when the app is launched without an image.
- Cross-platform installers.
instal.sh(Linux / macOS / WSL) andinstall.ps1(Windows) bootstrap Python, FluidSynth, and SoundFonts. - Deterministic humanization. Micro-timing and velocity jitter are seeded from the image hash, so every render still feels alive — but stays reproducible.
$ python sirenum.py path/to/image.jpg --path renderings
[OK] Image analysed: 840x430, 8 colours, brightness 0.31
[OK] Tonality: A lydian, 96 BPM, 4/4
[OK] Composition built: 5 parts, 64.0 beats
[OK] MusicXML written: image.musicxml
[OK] MIDI written: image.mid
[OK] FluidSynth produced WAV, encoding to FLAC
[OK] FluidSynth render (MuseScore_General.sf3) -> image.flac
8/8 Done ──────────────── 100% 0:00:09Three files are produced next to each other:
| File | Use |
|---|---|
image.musicxml |
Open in MuseScore / Finale / Sibelius / Dorico to read the score. |
image.mid |
Type-1 MIDI — re-render with any DAW or sample library. |
image.flac |
Lossless 24-bit / 48 kHz hi-res audio, ready to publish or master. |
# 1. Clone
git clone https://github.com/cumakurt/sirenum.git
cd sirenum
# 2. Install (Linux / macOS)
chmod +x instal.sh
./instal.sh --install-system-deps --with-soundfont --yes
source .venv/bin/activate
# 3. Run
python sirenum.py path/to/image.jpg --path renderingsWindows equivalent:
.\install.ps1 -InstallFluidSynth -WithSoundfont
.\.venv\Scripts\Activate.ps1
python sirenum.py path\to\image.jpg --path renderingsRunning
python sirenum.pywithout arguments prints a friendly help banner explaining that an image file is required.
chmod +x instal.sh
# Full setup: install system packages, download SoundFont, run tests
./instal.sh --install-system-deps --with-soundfont --yes
# Python packages only (you manage system deps yourself)
./instal.sh
# Activate the venv afterwards
source .venv/bin/activateinstal.sh auto-detects one of these package managers on Linux: apt (Debian/Ubuntu), dnf (Fedora/RHEL), pacman (Arch), zypper (openSUSE). On macOS it uses Homebrew.
| Flag | Description |
|---|---|
--prod |
Runtime dependencies only. |
--dev |
Runtime + development packages (default). |
--all |
[dev,all] — every optional extra. |
--no-tests |
Skip the post-install pytest run. |
--install-system-deps |
Install ffmpeg and fluidsynth via the detected package manager (Linux requires sudo). |
--with-soundfont |
Download MuseScore General SF3 to assets/soundfonts/ (~38 MB). |
--yes, -y |
Accept all interactive prompts. |
--python PATH |
Use a specific Python interpreter. |
-h, --help |
Show help. |
Environment variables: VENV_DIR, PYTHON_BIN, SOUNDFONT_URL.
# Full setup: download FluidSynth Windows binaries, download SoundFont, run tests
.\install.ps1 -InstallFluidSynth -WithSoundfont
# Python packages only
.\install.ps1
# Activate the venv afterwards
.\.venv\Scripts\Activate.ps1install.ps1:
- verifies Python 3.11+,
- creates a
.venvvirtual environment, - installs Sirenum with all dependencies through pip,
- with
-InstallFluidSynth, downloadsfluidsynth-vX.Y-win10-x64-cpp11.zipfrom FluidSynth Releases into%LOCALAPPDATA%\sirenum-tools\and adds thebin\folder to your userPATH, - with
-WithSoundfont, downloads MuseScore General SF3 toassets\soundfonts\.
If PowerShell complains about execution policy:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 -InstallFluidSynth -WithSoundfont| Parameter | Description |
|---|---|
-Profile <prod|dev|all> |
Pip install profile. Defaults to dev. |
-InstallFluidSynth |
Download FluidSynth Windows binaries and add them to PATH. |
-WithSoundfont |
Download MuseScore General SoundFont. |
-NoTests |
Skip the pytest step. |
-PythonPath <path> |
Use a specific Python interpreter. |
-FluidSynthVersion <vX.Y.Z> |
FluidSynth version to download (default v2.5.4). |
Ubuntu / Debian
sudo apt update
sudo apt install -y python3 python3-venv python3-pip ffmpeg fluidsynth
git clone https://github.com/your-user/sirenum.git && cd sirenum
./instal.sh --with-soundfontFedora / RHEL
sudo dnf install -y python3 python3-pip ffmpeg fluidsynth
./instal.sh --with-soundfontArch Linux
sudo pacman -S python python-pip ffmpeg fluidsynth
./instal.sh --with-soundfontopenSUSE
sudo zypper install python3 python3-pip ffmpeg fluidsynth
./instal.sh --with-soundfontmacOS (Homebrew)
brew install python ffmpeg fluid-synth
./instal.sh --with-soundfontWindows 10 / 11 (manual)
-
Install Python 3.11+ (tick Add to PATH).
-
FFmpeg:
winget install Gyan.FFmpeg(optional —imageio-ffmpegalready provides a fallback). -
FluidSynth: download
fluidsynth-vX.Y-win10-x64-cpp11.zipfrom FluidSynth Releases, extract anywhere, and add thebin\folder to yourPATH(System Properties → Environment Variables → Path). -
SoundFont: download MuseScore General and save it as
assets\soundfonts\MuseScore_General.sf3. -
Python environment:
python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -e .[dev]
| Component | Version | Purpose | How to install |
|---|---|---|---|
| Python | 3.11+ | Runs the entire application | python.org, winget install Python.Python.3.13, apt install python3, brew install python |
| pip + venv | bundled | Package management & isolated env | Ships with Python |
| Component | Version | Purpose | How to install |
|---|---|---|---|
| FluidSynth | 2.0+ | Renders MIDI through SoundFont samples to WAV | See Per-distribution commands above. |
| SoundFont (.sf2/.sf3) | — | The actual instrument samples FluidSynth plays back | instal.sh --with-soundfont, install.ps1 -WithSoundfont, or drop a file into assets/soundfonts/. |
| Component | Version | Purpose | How to install |
|---|---|---|---|
| ffmpeg | 5+ | WAV → FLAC encoding (and other audio formats) | System package manager. imageio-ffmpeg ships a fallback binary so things still work without a system ffmpeg. |
| OpenCV | 4.9+ | Faster Sobel + Laplacian texture analysis | pip install -e .[image] — falls back to a NumPy implementation if missing. |
| Package | Version | Purpose |
|---|---|---|
Pillow |
≥ 10.0 | Image loading (40+ formats) |
numpy |
≥ 1.26 | Pixel matrix math, synthesis |
scipy |
≥ 1.11 | Fast biquad / one-pole filters (master bus EQ, compressor smoothing) |
scikit-learn |
≥ 1.4 | KMeans colour quantisation |
rich |
≥ 13 | Coloured terminal output and live progress |
imageio-ffmpeg |
≥ 0.5.1 | Fallback ffmpeg binary when the system ffmpeg is absent |
mido |
≥ 1.3 | MIDI file generation |
| Package | Profile | Purpose |
|---|---|---|
pytest |
[dev] |
Test runner |
pytest-cov |
[dev] |
Coverage |
ruff |
[dev] |
Linter |
mypy |
[dev] |
Static type checker |
imageio |
[image] |
Extra formats (EXR, HDR, AVIF, …) |
opencv-python |
[image] |
Faster texture analysis |
colorthief, colormath |
[image] |
Colour-space helpers |
music21 |
[music] |
Advanced music theory integration |
pyfluidsynth |
[music] |
Python bindings (alternative to subprocess) |
pydub |
[music] |
Extra audio conversions |
From the project root:
python sirenum.py path/to/image.png
python sirenum.py path/to/image.png --dry-run
python sirenum.py path/to/image.png --dry-run --json
python sirenum.py path/to/image.png --path renderings
python sirenum.py path/to/image.png --instrument violin guitar
python sirenum.py --list-instruments
python sirenum.py path/to/image.png --soundfont path/to/orchestra.sf2
python sirenum.py path/to/image.png --no-soundfont # force the Python synthesizer
python sirenum.py path/to/image.png --export-midi # optional MIDI export
python sirenum.py path/to/image.png --no-progress # CI-friendly outputEquivalent package invocations:
python -m sirenum path/to/image.png
sirenum path/to/image.pngRunning with no arguments prints a banner that explains an image file is required, lists supported formats, and shows example commands.
| Argument | Description |
|---|---|
image |
Path to the source image (optional — banner is shown if missing). |
--path PATH, --output PATH |
Output directory (default: ./sirenum_out). |
--instrument NAME [NAME ...] |
Restrict generation to chosen instruments (repeatable). |
--list-instruments |
Print all supported instrument keys and exit. |
--colors N |
Maximum number of colour clusters to extract (default: 8). |
--soundfont PATH |
Use a specific .sf2 / .sf3 SoundFont. |
--no-soundfont |
Skip FluidSynth and force the in-house Python synthesizer. |
--export-midi |
Also write a Type-1 MIDI file (.mid) alongside FLAC and MusicXML. |
--dry-run |
Analyse only — don't write notation or audio files. |
--json |
Emit a machine-readable JSON report. |
--no-progress |
Disable the live progress display (useful in CI / log output). |
Two files are always produced (flower.jpg → flower.musicxml, flower.flac), and MIDI is optional with --export-midi:
| Extension | Contents |
|---|---|
.musicxml |
MuseScore / Finale / Sibelius / Dorico-compatible score (MusicXML 4.0). |
.mid |
Optional Type-1 MIDI; written only when --export-midi is provided. |
.flac |
Lossless 24-bit / 48 kHz hi-res audio. No psychoacoustic compression — original sample fidelity is preserved bit-for-bit. |
┌────────────┐ ┌────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Image │──▶│ Image analyzer │──▶│ Music params │──▶│ Composition │
│ (.jpg/...)│ │ (Pillow + KMeans+ │ (key, tempo, │ │ (5 parts, beats)│
│ │ │ OpenCV/NumPy) │ │ instruments) │ │ │
└────────────┘ └────────────────┘ └──────────────────┘ └──────────────────┘
│
┌───────────────────────────────────────────────────────────┤
▼ ▼
┌──────────────┐ ┌────────────────┐
│ MusicXML │ │ MIDI (Type-1)│
│ (notation) │ │ │
└──────────────┘ └───────┬────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌──────────────────┐ ┌────────────────────┐
│ FluidSynth + │ │ Python synthesizer │
│ SoundFont │ │ (fallback) │
└────────┬─────────┘ └─────────┬──────────┘
▼ ▼
┌────────────────────────────────────────────┐
│ Master bus → 24-bit / 48 kHz FLAC │
└────────────────────────────────────────────┘
- Image analysis. Pillow loads the image, scikit-learn's KMeans extracts a deterministic colour palette, and OpenCV (or a NumPy fallback) computes brightness, contrast, edge density, and sharpness.
- Music parameter derivation. Hue, saturation, lightness, and texture metrics are mapped to a key signature, mode, tempo, time signature, and instrument list. Piano and guitars are always prioritized first; strings, woodwinds, brass, and harp are then added based on colour.
- Composition. Five parts (harmony, bass, lead, counter-melody, texture) are written into a deterministic
Compositionobject. - Notation & MIDI. MusicXML 4.0 and Type-1 MIDI files are written from the composition.
- Audio synthesis. Either FluidSynth + a SoundFont, or the in-house Python synthesizer (instrument-aware waveforms + master bus), produces a 24-bit WAV that is then encoded to FLAC by
ffmpeg.
Every random decision (humanization jitter, voicing details) is seeded by the SHA-256 of the image bytes, so the same picture always produces the same outputs.
Sirenum looks for a SoundFont in this order:
- The
--soundfont PATHCLI argument - The
SIRENUM_SOUNDFONTenvironment variable - The first
.sf2or.sf3file (alphabetical) inassets/soundfonts/
If none is found, or if fluidsynth is not on PATH, the in-house Python synthesizer takes over automatically.
| Name | Size | Licence | Download |
|---|---|---|---|
| MuseScore General SF3 | 38 MB | MIT | osuosl.org |
| GeneralUser GS | 30 MB | "Free for any use" | schristiancollins.com |
| FluidR3 GM | 142 MB | MIT | archive.org |
sirenum/
├── sirenum.py # Convenience launcher (python sirenum.py …)
├── sirenum/ # Application package
│ ├── analyzer/ # Image loading + KMeans + texture analysis
│ ├── music/ # Parameter derivation, instrument mapping
│ ├── renderer/ # Composition, MusicXML, MIDI, audio backends
│ ├── utils/ # Hashing, exceptions, helpers
│ ├── cli.py # Argparse + Rich progress + banner
│ └── __main__.py # `python -m sirenum` entrypoint
├── assets/soundfonts/ # Drop SoundFonts here (gitignored)
├── tests/ # Unit + integration tests
├── instal.sh # Linux / macOS / WSL installer
├── install.ps1 # Windows installer
├── pyproject.toml # Packaging + ruff + pytest config
└── requirements*.txt # Pip lockfiles
| Issue | Fix |
|---|---|
Could not encode FLAC: ffmpeg not found |
Make sure imageio-ffmpeg is installed (pip install imageio-ffmpeg), or install a system ffmpeg. |
FluidSynth executable not found |
Only relevant for the SoundFont backend. The Python synthesizer fallback runs automatically. Install FluidSynth via your platform instructions if you want SoundFont output. |
python sirenum.py: command not found |
The virtual environment is not active. Run source .venv/bin/activate (Linux/macOS) or .\.venv\Scripts\Activate.ps1 (Windows). |
| Windows: running scripts is disabled on this system | Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass |
| The same image produces different output | This should never happen — please open an issue. Likely a Python / NumPy / SciPy version mismatch. |
# Linter
ruff check sirenum.py sirenum tests
# Static type checker
mypy sirenum
# Tests
pytest -q
# Specific test
pytest -q tests/integration/test_render_outputs.py -k soundfont- Optional
pyfluidsynthbackend (skip the subprocess hop). - Stem export (separate FLACs per part).
- Web UI for batch processing.
- Ableton Live / FL Studio project export.
- More instrument families (mallet percussion, choir).
Contributions are very welcome.
-
Fork the repository and create a feature branch (
git checkout -b feature/my-change). -
Run the linter and tests before pushing:
ruff check sirenum.py sirenum tests pytest -q
-
Open a pull request describing your change. If your change affects audio output, please attach a short FLAC sample and the source image.
By contributing, you agree that your contributions will be licensed under the project's GNU AGPL v3.0 licence.
- Cuma KURT
- Email:
cumakurt@gmail.com - LinkedIn: cuma-kurt-34414917
- GitHub: cumakurt/sirenum
Sirenum is released under the GNU AGPL v3.0 License. Third-party SoundFonts and audio dependencies retain their own licences — see their respective project pages.