Skip to content

cumakurt/sirenum

Repository files navigation

Sirenum

Deterministic image-to-music generator.
Turn any picture into a reproducible, professional-sounding piece of music — sheet, MIDI, and lossless 24-bit FLAC.

Features · Quick start · Usage · How it works · Türkçe

Python License Audio Notation MIDI Style Tests


Table of contents


Features

  • Deterministic by design. The same image always produces a bit-identical FLAC, MIDI, and MusicXML. Every random choice is seeded by the SHA-256 of the source pixels.
  • Two professional audio backends.
    • SoundFont (recommended): renders the generated MIDI through FluidSynth using a real instrument SoundFont (MuseScore General SF3, GeneralUser GS, FluidR3, …).
    • In-house Python synthesizer (fallback): instrument-aware waveforms (additive strings with detune, FM brass, additive woodwinds, Karplus-Strong-inspired harp/guitar, multi-oscillator piano), velocity-sensitive timbre, and a full master bus (air EQ → stereo chorus → multi-tap diffuse reverb → soft-knee compressor → tanh limiter).
  • Lossless 24-bit / 48 kHz FLAC output. No psychoacoustic compression — bit-perfect mastering-grade audio.
  • MusicXML 4.0 + Type-1 MIDI so you can open the score in MuseScore / Finale / Sibelius / Dorico, or re-render the MIDI in any DAW or sample library.
  • Live progress display. Step-by-step Rich console output with timings; a friendly banner is shown when the app is launched without an image.
  • Cross-platform installers. instal.sh (Linux / macOS / WSL) and install.ps1 (Windows) bootstrap Python, FluidSynth, and SoundFonts.
  • Deterministic humanization. Micro-timing and velocity jitter are seeded from the image hash, so every render still feels alive — but stays reproducible.

Demo

$ python sirenum.py path/to/image.jpg --path renderings
  [OK] Image analysed: 840x430, 8 colours, brightness 0.31
  [OK] Tonality: A lydian, 96 BPM, 4/4
  [OK] Composition built: 5 parts, 64.0 beats
  [OK] MusicXML written: image.musicxml
  [OK] MIDI written: image.mid
  [OK] FluidSynth produced WAV, encoding to FLAC
  [OK] FluidSynth render (MuseScore_General.sf3) -> image.flac
   8/8 Done ──────────────── 100% 0:00:09

Three files are produced next to each other:

File Use
image.musicxml Open in MuseScore / Finale / Sibelius / Dorico to read the score.
image.mid Type-1 MIDI — re-render with any DAW or sample library.
image.flac Lossless 24-bit / 48 kHz hi-res audio, ready to publish or master.

Quick start

# 1. Clone
git clone https://github.com/cumakurt/sirenum.git
cd sirenum

# 2. Install (Linux / macOS)
chmod +x instal.sh
./instal.sh --install-system-deps --with-soundfont --yes
source .venv/bin/activate

# 3. Run
python sirenum.py path/to/image.jpg --path renderings

Windows equivalent:

.\install.ps1 -InstallFluidSynth -WithSoundfont
.\.venv\Scripts\Activate.ps1
python sirenum.py path\to\image.jpg --path renderings

Running python sirenum.py without arguments prints a friendly help banner explaining that an image file is required.

Installation

Linux / macOS / WSL / Git Bash

chmod +x instal.sh

# Full setup: install system packages, download SoundFont, run tests
./instal.sh --install-system-deps --with-soundfont --yes

# Python packages only (you manage system deps yourself)
./instal.sh

# Activate the venv afterwards
source .venv/bin/activate

instal.sh auto-detects one of these package managers on Linux: apt (Debian/Ubuntu), dnf (Fedora/RHEL), pacman (Arch), zypper (openSUSE). On macOS it uses Homebrew.

instal.sh flags

Flag Description
--prod Runtime dependencies only.
--dev Runtime + development packages (default).
--all [dev,all] — every optional extra.
--no-tests Skip the post-install pytest run.
--install-system-deps Install ffmpeg and fluidsynth via the detected package manager (Linux requires sudo).
--with-soundfont Download MuseScore General SF3 to assets/soundfonts/ (~38 MB).
--yes, -y Accept all interactive prompts.
--python PATH Use a specific Python interpreter.
-h, --help Show help.

Environment variables: VENV_DIR, PYTHON_BIN, SOUNDFONT_URL.

Windows (PowerShell)

# Full setup: download FluidSynth Windows binaries, download SoundFont, run tests
.\install.ps1 -InstallFluidSynth -WithSoundfont

# Python packages only
.\install.ps1

# Activate the venv afterwards
.\.venv\Scripts\Activate.ps1

install.ps1:

  • verifies Python 3.11+,
  • creates a .venv virtual environment,
  • installs Sirenum with all dependencies through pip,
  • with -InstallFluidSynth, downloads fluidsynth-vX.Y-win10-x64-cpp11.zip from FluidSynth Releases into %LOCALAPPDATA%\sirenum-tools\ and adds the bin\ folder to your user PATH,
  • with -WithSoundfont, downloads MuseScore General SF3 to assets\soundfonts\.

If PowerShell complains about execution policy:

Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 -InstallFluidSynth -WithSoundfont

install.ps1 parameters

Parameter Description
-Profile <prod|dev|all> Pip install profile. Defaults to dev.
-InstallFluidSynth Download FluidSynth Windows binaries and add them to PATH.
-WithSoundfont Download MuseScore General SoundFont.
-NoTests Skip the pytest step.
-PythonPath <path> Use a specific Python interpreter.
-FluidSynthVersion <vX.Y.Z> FluidSynth version to download (default v2.5.4).

Per-distribution commands

Ubuntu / Debian
sudo apt update
sudo apt install -y python3 python3-venv python3-pip ffmpeg fluidsynth
git clone https://github.com/your-user/sirenum.git && cd sirenum
./instal.sh --with-soundfont
Fedora / RHEL
sudo dnf install -y python3 python3-pip ffmpeg fluidsynth
./instal.sh --with-soundfont
Arch Linux
sudo pacman -S python python-pip ffmpeg fluidsynth
./instal.sh --with-soundfont
openSUSE
sudo zypper install python3 python3-pip ffmpeg fluidsynth
./instal.sh --with-soundfont
macOS (Homebrew)
brew install python ffmpeg fluid-synth
./instal.sh --with-soundfont
Windows 10 / 11 (manual)
  1. Install Python 3.11+ (tick Add to PATH).

  2. FFmpeg: winget install Gyan.FFmpeg (optional — imageio-ffmpeg already provides a fallback).

  3. FluidSynth: download fluidsynth-vX.Y-win10-x64-cpp11.zip from FluidSynth Releases, extract anywhere, and add the bin\ folder to your PATH (System Properties → Environment Variables → Path).

  4. SoundFont: download MuseScore General and save it as assets\soundfonts\MuseScore_General.sf3.

  5. Python environment:

    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    pip install -e .[dev]

Dependencies

Required system tools

Component Version Purpose How to install
Python 3.11+ Runs the entire application python.org, winget install Python.Python.3.13, apt install python3, brew install python
pip + venv bundled Package management & isolated env Ships with Python

Recommended (sample-based playback)

Component Version Purpose How to install
FluidSynth 2.0+ Renders MIDI through SoundFont samples to WAV See Per-distribution commands above.
SoundFont (.sf2/.sf3) The actual instrument samples FluidSynth plays back instal.sh --with-soundfont, install.ps1 -WithSoundfont, or drop a file into assets/soundfonts/.

Optional

Component Version Purpose How to install
ffmpeg 5+ WAV → FLAC encoding (and other audio formats) System package manager. imageio-ffmpeg ships a fallback binary so things still work without a system ffmpeg.
OpenCV 4.9+ Faster Sobel + Laplacian texture analysis pip install -e .[image] — falls back to a NumPy implementation if missing.

Required Python packages (installed by pip install -e .)

Package Version Purpose
Pillow ≥ 10.0 Image loading (40+ formats)
numpy ≥ 1.26 Pixel matrix math, synthesis
scipy ≥ 1.11 Fast biquad / one-pole filters (master bus EQ, compressor smoothing)
scikit-learn ≥ 1.4 KMeans colour quantisation
rich ≥ 13 Coloured terminal output and live progress
imageio-ffmpeg ≥ 0.5.1 Fallback ffmpeg binary when the system ffmpeg is absent
mido ≥ 1.3 MIDI file generation

Optional / development packages

Package Profile Purpose
pytest [dev] Test runner
pytest-cov [dev] Coverage
ruff [dev] Linter
mypy [dev] Static type checker
imageio [image] Extra formats (EXR, HDR, AVIF, …)
opencv-python [image] Faster texture analysis
colorthief, colormath [image] Colour-space helpers
music21 [music] Advanced music theory integration
pyfluidsynth [music] Python bindings (alternative to subprocess)
pydub [music] Extra audio conversions

Usage

From the project root:

python sirenum.py path/to/image.png
python sirenum.py path/to/image.png --dry-run
python sirenum.py path/to/image.png --dry-run --json
python sirenum.py path/to/image.png --path renderings
python sirenum.py path/to/image.png --instrument violin guitar
python sirenum.py --list-instruments
python sirenum.py path/to/image.png --soundfont path/to/orchestra.sf2
python sirenum.py path/to/image.png --no-soundfont        # force the Python synthesizer
python sirenum.py path/to/image.png --export-midi         # optional MIDI export
python sirenum.py path/to/image.png --no-progress         # CI-friendly output

Equivalent package invocations:

python -m sirenum path/to/image.png
sirenum path/to/image.png

Running with no arguments prints a banner that explains an image file is required, lists supported formats, and shows example commands.

CLI reference

Argument Description
image Path to the source image (optional — banner is shown if missing).
--path PATH, --output PATH Output directory (default: ./sirenum_out).
--instrument NAME [NAME ...] Restrict generation to chosen instruments (repeatable).
--list-instruments Print all supported instrument keys and exit.
--colors N Maximum number of colour clusters to extract (default: 8).
--soundfont PATH Use a specific .sf2 / .sf3 SoundFont.
--no-soundfont Skip FluidSynth and force the in-house Python synthesizer.
--export-midi Also write a Type-1 MIDI file (.mid) alongside FLAC and MusicXML.
--dry-run Analyse only — don't write notation or audio files.
--json Emit a machine-readable JSON report.
--no-progress Disable the live progress display (useful in CI / log output).

Output files

Two files are always produced (flower.jpgflower.musicxml, flower.flac), and MIDI is optional with --export-midi:

Extension Contents
.musicxml MuseScore / Finale / Sibelius / Dorico-compatible score (MusicXML 4.0).
.mid Optional Type-1 MIDI; written only when --export-midi is provided.
.flac Lossless 24-bit / 48 kHz hi-res audio. No psychoacoustic compression — original sample fidelity is preserved bit-for-bit.

How it works

  ┌────────────┐   ┌────────────────┐   ┌──────────────────┐   ┌──────────────────┐
  │  Image     │──▶│  Image analyzer │──▶│  Music params    │──▶│  Composition     │
  │  (.jpg/...)│   │  (Pillow + KMeans+ │  (key, tempo,    │   │  (5 parts, beats)│
  │            │   │   OpenCV/NumPy) │   │   instruments)   │   │                  │
  └────────────┘   └────────────────┘   └──────────────────┘   └──────────────────┘
                                                                          │
              ┌───────────────────────────────────────────────────────────┤
              ▼                                                           ▼
        ┌──────────────┐                                          ┌────────────────┐
        │  MusicXML    │                                          │   MIDI (Type-1)│
        │  (notation)  │                                          │                │
        └──────────────┘                                          └───────┬────────┘
                                                                          │
                                                          ┌───────────────┴───────────────┐
                                                          ▼                               ▼
                                                ┌──────────────────┐           ┌────────────────────┐
                                                │ FluidSynth +     │           │ Python synthesizer │
                                                │ SoundFont        │           │ (fallback)         │
                                                └────────┬─────────┘           └─────────┬──────────┘
                                                         ▼                               ▼
                                                  ┌────────────────────────────────────────────┐
                                                  │  Master bus → 24-bit / 48 kHz FLAC         │
                                                  └────────────────────────────────────────────┘
  1. Image analysis. Pillow loads the image, scikit-learn's KMeans extracts a deterministic colour palette, and OpenCV (or a NumPy fallback) computes brightness, contrast, edge density, and sharpness.
  2. Music parameter derivation. Hue, saturation, lightness, and texture metrics are mapped to a key signature, mode, tempo, time signature, and instrument list. Piano and guitars are always prioritized first; strings, woodwinds, brass, and harp are then added based on colour.
  3. Composition. Five parts (harmony, bass, lead, counter-melody, texture) are written into a deterministic Composition object.
  4. Notation & MIDI. MusicXML 4.0 and Type-1 MIDI files are written from the composition.
  5. Audio synthesis. Either FluidSynth + a SoundFont, or the in-house Python synthesizer (instrument-aware waveforms + master bus), produces a 24-bit WAV that is then encoded to FLAC by ffmpeg.

Every random decision (humanization jitter, voicing details) is seeded by the SHA-256 of the image bytes, so the same picture always produces the same outputs.

SoundFonts

Sirenum looks for a SoundFont in this order:

  1. The --soundfont PATH CLI argument
  2. The SIRENUM_SOUNDFONT environment variable
  3. The first .sf2 or .sf3 file (alphabetical) in assets/soundfonts/

If none is found, or if fluidsynth is not on PATH, the in-house Python synthesizer takes over automatically.

Recommended free SoundFonts

Name Size Licence Download
MuseScore General SF3 38 MB MIT osuosl.org
GeneralUser GS 30 MB "Free for any use" schristiancollins.com
FluidR3 GM 142 MB MIT archive.org

Project layout

sirenum/
├── sirenum.py                 # Convenience launcher (python sirenum.py …)
├── sirenum/                   # Application package
│   ├── analyzer/              # Image loading + KMeans + texture analysis
│   ├── music/                 # Parameter derivation, instrument mapping
│   ├── renderer/              # Composition, MusicXML, MIDI, audio backends
│   ├── utils/                 # Hashing, exceptions, helpers
│   ├── cli.py                 # Argparse + Rich progress + banner
│   └── __main__.py            # `python -m sirenum` entrypoint
├── assets/soundfonts/         # Drop SoundFonts here (gitignored)
├── tests/                     # Unit + integration tests
├── instal.sh                  # Linux / macOS / WSL installer
├── install.ps1                # Windows installer
├── pyproject.toml             # Packaging + ruff + pytest config
└── requirements*.txt          # Pip lockfiles

Troubleshooting

Issue Fix
Could not encode FLAC: ffmpeg not found Make sure imageio-ffmpeg is installed (pip install imageio-ffmpeg), or install a system ffmpeg.
FluidSynth executable not found Only relevant for the SoundFont backend. The Python synthesizer fallback runs automatically. Install FluidSynth via your platform instructions if you want SoundFont output.
python sirenum.py: command not found The virtual environment is not active. Run source .venv/bin/activate (Linux/macOS) or .\.venv\Scripts\Activate.ps1 (Windows).
Windows: running scripts is disabled on this system Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
The same image produces different output This should never happen — please open an issue. Likely a Python / NumPy / SciPy version mismatch.

Developer commands

# Linter
ruff check sirenum.py sirenum tests

# Static type checker
mypy sirenum

# Tests
pytest -q

# Specific test
pytest -q tests/integration/test_render_outputs.py -k soundfont

Roadmap

  • Optional pyfluidsynth backend (skip the subprocess hop).
  • Stem export (separate FLACs per part).
  • Web UI for batch processing.
  • Ableton Live / FL Studio project export.
  • More instrument families (mallet percussion, choir).

Contributing

Contributions are very welcome.

  1. Fork the repository and create a feature branch (git checkout -b feature/my-change).

  2. Run the linter and tests before pushing:

    ruff check sirenum.py sirenum tests
    pytest -q
  3. Open a pull request describing your change. If your change affects audio output, please attach a short FLAC sample and the source image.

By contributing, you agree that your contributions will be licensed under the project's GNU AGPL v3.0 licence.

Developer

License

Sirenum is released under the GNU AGPL v3.0 License. Third-party SoundFonts and audio dependencies retain their own licences — see their respective project pages.

About

Deterministic image-to-music generator. Turn any picture into a reproducible, professional-sounding piece of music — sheet, MIDI, and lossless 24-bit FLAC.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors