Sirenum

Deterministic image-to-music generator.
Turn any picture into a reproducible, professional-sounding piece of music — sheet, MIDI, and lossless 24-bit FLAC.

Features · Quick start · Usage · How it works · Türkçe

Features

Deterministic by design. The same image always produces a bit-identical FLAC, MIDI, and MusicXML. Every random choice is seeded by the SHA-256 of the source pixels.
Two professional audio backends.
- SoundFont (recommended): renders the generated MIDI through FluidSynth using a real instrument SoundFont (MuseScore General SF3, GeneralUser GS, FluidR3, …).
- In-house Python synthesizer (fallback): instrument-aware waveforms (additive strings with detune, FM brass, additive woodwinds, Karplus-Strong-inspired harp/guitar, multi-oscillator piano), velocity-sensitive timbre, and a full master bus (air EQ → stereo chorus → multi-tap diffuse reverb → soft-knee compressor → tanh limiter).
Lossless 24-bit / 48 kHz FLAC output. No psychoacoustic compression — bit-perfect mastering-grade audio.
MusicXML 4.0 + Type-1 MIDI so you can open the score in MuseScore / Finale / Sibelius / Dorico, or re-render the MIDI in any DAW or sample library.
Live progress display. Step-by-step Rich console output with timings; a friendly banner is shown when the app is launched without an image.
Cross-platform installers. instal.sh (Linux / macOS / WSL) and install.ps1 (Windows) bootstrap Python, FluidSynth, and SoundFonts.
Deterministic humanization. Micro-timing and velocity jitter are seeded from the image hash, so every render still feels alive — but stays reproducible.

Demo

$ python sirenum.py path/to/image.jpg --path renderings
  [OK] Image analysed: 840x430, 8 colours, brightness 0.31
  [OK] Tonality: A lydian, 96 BPM, 4/4
  [OK] Composition built: 5 parts, 64.0 beats
  [OK] MusicXML written: image.musicxml
  [OK] MIDI written: image.mid
  [OK] FluidSynth produced WAV, encoding to FLAC
  [OK] FluidSynth render (MuseScore_General.sf3) -> image.flac
   8/8 Done ──────────────── 100% 0:00:09

Three files are produced next to each other:

File	Use
`image.musicxml`	Open in MuseScore / Finale / Sibelius / Dorico to read the score.
`image.mid`	Type-1 MIDI — re-render with any DAW or sample library.
`image.flac`	Lossless 24-bit / 48 kHz hi-res audio, ready to publish or master.

Quick start

# 1. Clone
git clone https://github.com/cumakurt/sirenum.git
cd sirenum

# 2. Install (Linux / macOS)
chmod +x instal.sh
./instal.sh --install-system-deps --with-soundfont --yes
source .venv/bin/activate

# 3. Run
python sirenum.py path/to/image.jpg --path renderings

Windows equivalent:

.\install.ps1 -InstallFluidSynth -WithSoundfont
.\.venv\Scripts\Activate.ps1
python sirenum.py path\to\image.jpg --path renderings

Running python sirenum.py without arguments prints a friendly help banner explaining that an image file is required.

Installation

Linux / macOS / WSL / Git Bash

chmod +x instal.sh

# Full setup: install system packages, download SoundFont, run tests
./instal.sh --install-system-deps --with-soundfont --yes

# Python packages only (you manage system deps yourself)
./instal.sh

# Activate the venv afterwards
source .venv/bin/activate

instal.sh auto-detects one of these package managers on Linux: apt (Debian/Ubuntu), dnf (Fedora/RHEL), pacman (Arch), zypper (openSUSE). On macOS it uses Homebrew.

`instal.sh` flags

Flag	Description
`--prod`	Runtime dependencies only.
`--dev`	Runtime + development packages (default).
`--all`	`[dev,all]` — every optional extra.
`--no-tests`	Skip the post-install `pytest` run.
`--install-system-deps`	Install `ffmpeg` and `fluidsynth` via the detected package manager (Linux requires `sudo`).
`--with-soundfont`	Download MuseScore General SF3 to `assets/soundfonts/` (~38 MB).
`--yes`, `-y`	Accept all interactive prompts.
`--python PATH`	Use a specific Python interpreter.
`-h`, `--help`	Show help.

Environment variables: VENV_DIR, PYTHON_BIN, SOUNDFONT_URL.

Windows (PowerShell)

# Full setup: download FluidSynth Windows binaries, download SoundFont, run tests
.\install.ps1 -InstallFluidSynth -WithSoundfont

# Python packages only
.\install.ps1

# Activate the venv afterwards
.\.venv\Scripts\Activate.ps1

install.ps1:

verifies Python 3.11+,
creates a .venv virtual environment,
installs Sirenum with all dependencies through pip,
with -InstallFluidSynth, downloads fluidsynth-vX.Y-win10-x64-cpp11.zip from FluidSynth Releases into %LOCALAPPDATA%\sirenum-tools\ and adds the bin\ folder to your user PATH,
with -WithSoundfont, downloads MuseScore General SF3 to assets\soundfonts\.

If PowerShell complains about execution policy:

Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 -InstallFluidSynth -WithSoundfont

`install.ps1` parameters

Parameter	Description
`-Profile <prod\|dev\|all>`	Pip install profile. Defaults to `dev`.
`-InstallFluidSynth`	Download FluidSynth Windows binaries and add them to `PATH`.
`-WithSoundfont`	Download MuseScore General SoundFont.
`-NoTests`	Skip the `pytest` step.
`-PythonPath <path>`	Use a specific Python interpreter.
`-FluidSynthVersion <vX.Y.Z>`	FluidSynth version to download (default `v2.5.4`).

Per-distribution commands

Ubuntu / Debian

sudo apt update
sudo apt install -y python3 python3-venv python3-pip ffmpeg fluidsynth
git clone https://github.com/your-user/sirenum.git && cd sirenum
./instal.sh --with-soundfont

Fedora / RHEL

sudo dnf install -y python3 python3-pip ffmpeg fluidsynth
./instal.sh --with-soundfont

Arch Linux

sudo pacman -S python python-pip ffmpeg fluidsynth
./instal.sh --with-soundfont

openSUSE

sudo zypper install python3 python3-pip ffmpeg fluidsynth
./instal.sh --with-soundfont

macOS (Homebrew)

brew install python ffmpeg fluid-synth
./instal.sh --with-soundfont

Windows 10 / 11 (manual)

Install Python 3.11+ (tick Add to PATH).
FFmpeg: winget install Gyan.FFmpeg (optional — imageio-ffmpeg already provides a fallback).
FluidSynth: download fluidsynth-vX.Y-win10-x64-cpp11.zip from FluidSynth Releases, extract anywhere, and add the bin\ folder to your PATH (System Properties → Environment Variables → Path).
SoundFont: download MuseScore General and save it as assets\soundfonts\MuseScore_General.sf3.

Python environment:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]

Dependencies

Required system tools

Component	Version	Purpose	How to install
Python	3.11+	Runs the entire application	python.org, `winget install Python.Python.3.13`, `apt install python3`, `brew install python`
pip + venv	bundled	Package management & isolated env	Ships with Python

Recommended (sample-based playback)

Component	Version	Purpose	How to install
FluidSynth	2.0+	Renders MIDI through SoundFont samples to WAV	See Per-distribution commands above.
SoundFont (.sf2/.sf3)	—	The actual instrument samples FluidSynth plays back	`instal.sh --with-soundfont`, `install.ps1 -WithSoundfont`, or drop a file into `assets/soundfonts/`.

Optional

Component	Version	Purpose	How to install
ffmpeg	5+	WAV → FLAC encoding (and other audio formats)	System package manager. `imageio-ffmpeg` ships a fallback binary so things still work without a system ffmpeg.
OpenCV	4.9+	Faster Sobel + Laplacian texture analysis	`pip install -e .[image]` — falls back to a NumPy implementation if missing.

Required Python packages (installed by `pip install -e .`)

Package	Version	Purpose
`Pillow`	≥ 10.0	Image loading (40+ formats)
`numpy`	≥ 1.26	Pixel matrix math, synthesis
`scipy`	≥ 1.11	Fast biquad / one-pole filters (master bus EQ, compressor smoothing)
`scikit-learn`	≥ 1.4	KMeans colour quantisation
`rich`	≥ 13	Coloured terminal output and live progress
`imageio-ffmpeg`	≥ 0.5.1	Fallback ffmpeg binary when the system ffmpeg is absent
`mido`	≥ 1.3	MIDI file generation

Optional / development packages

Package	Profile	Purpose
`pytest`	`[dev]`	Test runner
`pytest-cov`	`[dev]`	Coverage
`ruff`	`[dev]`	Linter
`mypy`	`[dev]`	Static type checker
`imageio`	`[image]`	Extra formats (EXR, HDR, AVIF, …)
`opencv-python`	`[image]`	Faster texture analysis
`colorthief`, `colormath`	`[image]`	Colour-space helpers
`music21`	`[music]`	Advanced music theory integration
`pyfluidsynth`	`[music]`	Python bindings (alternative to subprocess)
`pydub`	`[music]`	Extra audio conversions

Usage

From the project root:

python sirenum.py path/to/image.png
python sirenum.py path/to/image.png --dry-run
python sirenum.py path/to/image.png --dry-run --json
python sirenum.py path/to/image.png --path renderings
python sirenum.py path/to/image.png --instrument violin guitar
python sirenum.py --list-instruments
python sirenum.py path/to/image.png --soundfont path/to/orchestra.sf2
python sirenum.py path/to/image.png --no-soundfont        # force the Python synthesizer
python sirenum.py path/to/image.png --export-midi         # optional MIDI export
python sirenum.py path/to/image.png --no-progress         # CI-friendly output

Equivalent package invocations:

python -m sirenum path/to/image.png
sirenum path/to/image.png

Running with no arguments prints a banner that explains an image file is required, lists supported formats, and shows example commands.

CLI reference

Argument	Description
`image`	Path to the source image (optional — banner is shown if missing).
`--path PATH`, `--output PATH`	Output directory (default: `./sirenum_out`).
`--instrument NAME [NAME ...]`	Restrict generation to chosen instruments (repeatable).
`--list-instruments`	Print all supported instrument keys and exit.
`--colors N`	Maximum number of colour clusters to extract (default: 8).
`--soundfont PATH`	Use a specific `.sf2` / `.sf3` SoundFont.
`--no-soundfont`	Skip FluidSynth and force the in-house Python synthesizer.
`--export-midi`	Also write a Type-1 MIDI file (`.mid`) alongside FLAC and MusicXML.
`--dry-run`	Analyse only — don't write notation or audio files.
`--json`	Emit a machine-readable JSON report.
`--no-progress`	Disable the live progress display (useful in CI / log output).

Output files

Two files are always produced (flower.jpg → flower.musicxml, flower.flac), and MIDI is optional with --export-midi:

Extension	Contents
`.musicxml`	MuseScore / Finale / Sibelius / Dorico-compatible score (MusicXML 4.0).
`.mid`	Optional Type-1 MIDI; written only when `--export-midi` is provided.
`.flac`	Lossless 24-bit / 48 kHz hi-res audio. No psychoacoustic compression — original sample fidelity is preserved bit-for-bit.

How it works

  ┌────────────┐   ┌────────────────┐   ┌──────────────────┐   ┌──────────────────┐
  │  Image     │──▶│  Image analyzer │──▶│  Music params    │──▶│  Composition     │
  │  (.jpg/...)│   │  (Pillow + KMeans+ │  (key, tempo,    │   │  (5 parts, beats)│
  │            │   │   OpenCV/NumPy) │   │   instruments)   │   │                  │
  └────────────┘   └────────────────┘   └──────────────────┘   └──────────────────┘
                                                                          │
              ┌───────────────────────────────────────────────────────────┤
              ▼                                                           ▼
        ┌──────────────┐                                          ┌────────────────┐
        │  MusicXML    │                                          │   MIDI (Type-1)│
        │  (notation)  │                                          │                │
        └──────────────┘                                          └───────┬────────┘
                                                                          │
                                                          ┌───────────────┴───────────────┐
                                                          ▼                               ▼
                                                ┌──────────────────┐           ┌────────────────────┐
                                                │ FluidSynth +     │           │ Python synthesizer │
                                                │ SoundFont        │           │ (fallback)         │
                                                └────────┬─────────┘           └─────────┬──────────┘
                                                         ▼                               ▼
                                                  ┌────────────────────────────────────────────┐
                                                  │  Master bus → 24-bit / 48 kHz FLAC         │
                                                  └────────────────────────────────────────────┘

Image analysis. Pillow loads the image, scikit-learn's KMeans extracts a deterministic colour palette, and OpenCV (or a NumPy fallback) computes brightness, contrast, edge density, and sharpness.
Music parameter derivation. Hue, saturation, lightness, and texture metrics are mapped to a key signature, mode, tempo, time signature, and instrument list. Piano and guitars are always prioritized first; strings, woodwinds, brass, and harp are then added based on colour.
Composition. Five parts (harmony, bass, lead, counter-melody, texture) are written into a deterministic Composition object.
Notation & MIDI. MusicXML 4.0 and Type-1 MIDI files are written from the composition.
Audio synthesis. Either FluidSynth + a SoundFont, or the in-house Python synthesizer (instrument-aware waveforms + master bus), produces a 24-bit WAV that is then encoded to FLAC by ffmpeg.

Every random decision (humanization jitter, voicing details) is seeded by the SHA-256 of the image bytes, so the same picture always produces the same outputs.

SoundFonts

Sirenum looks for a SoundFont in this order:

The --soundfont PATH CLI argument
The SIRENUM_SOUNDFONT environment variable
The first .sf2 or .sf3 file (alphabetical) in assets/soundfonts/

If none is found, or if fluidsynth is not on PATH, the in-house Python synthesizer takes over automatically.

Recommended free SoundFonts

Name	Size	Licence	Download
MuseScore General SF3	38 MB	MIT	osuosl.org
GeneralUser GS	30 MB	"Free for any use"	schristiancollins.com
FluidR3 GM	142 MB	MIT	archive.org

Project layout

sirenum/
├── sirenum.py                 # Convenience launcher (python sirenum.py …)
├── sirenum/                   # Application package
│   ├── analyzer/              # Image loading + KMeans + texture analysis
│   ├── music/                 # Parameter derivation, instrument mapping
│   ├── renderer/              # Composition, MusicXML, MIDI, audio backends
│   ├── utils/                 # Hashing, exceptions, helpers
│   ├── cli.py                 # Argparse + Rich progress + banner
│   └── __main__.py            # `python -m sirenum` entrypoint
├── assets/soundfonts/         # Drop SoundFonts here (gitignored)
├── tests/                     # Unit + integration tests
├── instal.sh                  # Linux / macOS / WSL installer
├── install.ps1                # Windows installer
├── pyproject.toml             # Packaging + ruff + pytest config
└── requirements*.txt          # Pip lockfiles

Troubleshooting

Issue	Fix
`Could not encode FLAC: ffmpeg not found`	Make sure `imageio-ffmpeg` is installed (`pip install imageio-ffmpeg`), or install a system `ffmpeg`.
`FluidSynth executable not found`	Only relevant for the SoundFont backend. The Python synthesizer fallback runs automatically. Install FluidSynth via your platform instructions if you want SoundFont output.
`python sirenum.py: command not found`	The virtual environment is not active. Run `source .venv/bin/activate` (Linux/macOS) or `.\.venv\Scripts\Activate.ps1` (Windows).
Windows: running scripts is disabled on this system	`Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass`
The same image produces different output	This should never happen — please open an issue. Likely a Python / NumPy / SciPy version mismatch.

Developer commands

# Linter
ruff check sirenum.py sirenum tests

# Static type checker
mypy sirenum

# Tests
pytest -q

# Specific test
pytest -q tests/integration/test_render_outputs.py -k soundfont

Roadmap

Optional pyfluidsynth backend (skip the subprocess hop).
Stem export (separate FLACs per part).
Web UI for batch processing.
Ableton Live / FL Studio project export.
More instrument families (mallet percussion, choir).

Contributing

Contributions are very welcome.

Fork the repository and create a feature branch (git checkout -b feature/my-change).

Run the linter and tests before pushing:

ruff check sirenum.py sirenum tests
pytest -q

Open a pull request describing your change. If your change affects audio output, please attach a short FLAC sample and the source image.

By contributing, you agree that your contributions will be licensed under the project's GNU AGPL v3.0 licence.

Developer

Cuma KURT
Email: cumakurt@gmail.com
LinkedIn: cuma-kurt-34414917
GitHub: cumakurt/sirenum

License

Sirenum is released under the GNU AGPL v3.0 License. Third-party SoundFonts and audio dependencies retain their own licences — see their respective project pages.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
renderings		renderings
sirenum		sirenum
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.tr.md		README.tr.md
instal.sh		instal.sh
install.ps1		install.ps1
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
sirenum.py		sirenum.py
sirenum_python_plan.md		sirenum_python_plan.md

Folders and files

Latest commit

History

Repository files navigation

Sirenum

Table of contents

Features

Demo

Quick start

Installation

Linux / macOS / WSL / Git Bash

instal.sh flags

Windows (PowerShell)

install.ps1 parameters

Per-distribution commands

Dependencies

Required system tools

Recommended (sample-based playback)

Optional

Required Python packages (installed by pip install -e .)

Optional / development packages

Usage

CLI reference

Output files

How it works

SoundFonts

Recommended free SoundFonts

Project layout

Troubleshooting

Developer commands

Roadmap

Contributing

Developer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`instal.sh` flags

`install.ps1` parameters

Required Python packages (installed by `pip install -e .`)

Packages