Sonic Phoenix

A reproducible, resumable, language-agnostic pipeline for turning a chaotic folder of audio files into a clean Language/Artist/Album/Artist - Title.ext library with correct ID3 tags, embedded cover art, and synchronised lyrics — then optionally pushing the result into Spotify playlists and exposing it to AI agents for on-demand playlist curation.

The pipeline is broken into seven phases numbered 01–06 plus a Phase 7 AI skill. Each phase is a series of small, independent scripts that write their state to disk so you can stop, inspect, edit, and resume at any point without redoing work.

How it works at a glance

Raw audio files (any state of disorganisation)
  │
  ▼
Phase 1 — Acoustic fingerprint every file via Shazam (20-way concurrent, resumable)
  │
  ▼
Phase 2 — SHA-256 deduplication + language classification via langdetect + hint files
  │
  ▼
Phase 3 — Fuzzy artist consolidation + structural enforcement into Language/Artist/Album/
  │
  ▼
Phase 4 — Metadata enrichment: iTunes tags, LrcLib synchronised lyrics, HD cover art
  │
  ▼
Phase 5 — Catalog finalisation: merge all data sources into a single read-only JSON
  │
  ▼
Phase 6 — (Optional) Spotify sync: mirror playlists + auto-generate genre "Essentials"
  │
  ▼
Phase 7 — (Optional) AI skill: on-demand playlist curation via any OpenClaw-compatible agent

What this does

Given a pile of audio files scattered across an arbitrary folder tree — with broken, missing, or misleading metadata — the pipeline will:

Identify every track by acoustic fingerprint via ShazamIO, cross-validating against existing tags rather than blindly trusting them.
Deduplicate bit-for-bit via SHA-256 (never deletes — stages duplicates for your review).
Classify each track's language using langdetect plus optional per-language hint files you control.
Organise everything into <SORTED_ROOT>/<Language>/<Artist>/<Album>/<Artist> - <Title>.<ext>.
Enrich the library by fetching canonical metadata from the iTunes Search API, synchronised lyrics from LrcLib, and embedding 1000x1000 cover art.
Sync the finalised local library up to Spotify as either a full mirror playlist or a curated set of per-genre "Essentials" playlists cross-referenced against your actual listening history.
Expose the structured catalog to AI coding agents via a ClawHub skill for natural-language playlist curation on demand.

No part of the pipeline assumes any particular language. Drop a JSON file per language you care about into config/language_hints/ and the pipeline routes into those buckets automatically.

Supported formats

Format	Extension	Shazam ID	ID3 enrichment	Notes
MP3	`.mp3`	Direct	Full	Primary target format. No FFmpeg needed.
FLAC	`.flac`	Via FFmpeg	Vorbis tags	Requires FFmpeg on PATH or in `<MUSIC_ROOT>/ffmpeg/`.
AAC/M4A	`.m4a`, `.aac`	Via FFmpeg	MP4 tags	Requires FFmpeg.
WAV	`.wav`	Via FFmpeg	Minimal	Lossless but no native tag support.
OGG Vorbis	`.ogg`	Via FFmpeg	Vorbis tags	Requires FFmpeg.
WMA	`.wma`	Via FFmpeg	ASF tags	Legacy format. Requires FFmpeg.
Opus	`.opus`	Via FFmpeg	Vorbis tags	Requires FFmpeg.

Requirements

Python 3.12. Python 3.13+ will not work today: the shazamio-core wheel does not yet publish binaries for 3.13, and the source build needs Rust. Python 3.10 and 3.11 will work if you downgrade langdetect, but 3.12 is the supported target.
~2 GB of free disk for the Shazam cache, the metadata catalog, cover art thumbnails, and Duplicates_Staging.
(Optional) FFmpeg on PATH. Shazam only needs it to decode non-MP3 containers (FLAC/M4A/OPUS). If you don't have it globally, drop a portable build at <MUSIC_ROOT>/ffmpeg/ and the pipeline will pick it up automatically.
(Optional) A Spotify developer app if you want Phase 6. See Phase 6: Spotify sync below.

Install

1. Clone the repo into your music folder

The zero-config layout the defaults expect is <MUSIC_ROOT>/sonic-phoenix/. You don't have to follow it — every path is overridable via environment variables — but it's the shortest path to running.

cd /path/to/your/music      # whatever you want MUSIC_ROOT to be
git clone https://github.com/drajb/sonic-phoenix.git
cd sonic-phoenix

2. Create a Python 3.12 virtual environment

# macOS / Linux
python3.12 -m venv .venv
source .venv/bin/activate

# Windows (PowerShell)
py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1

# Windows (cmd)
py -3.12 -m venv .venv
.venv\Scripts\activate.bat

Verify:

python --version     # should print Python 3.12.x

3. Install Python dependencies

pip install --upgrade pip
pip install -r requirements.txt

The heavy dependency is shazamio-core (a Rust binary wheel). If pip tries to build it from source you are not on Python 3.12 — stop and fix the interpreter.

Configure

All configuration lives in environment variables. The only one you are required to set is MUSIC_ROOT.

1. Copy the env template

# macOS / Linux
cp .env.example .env

# Windows (PowerShell)
Copy-Item .env.example .env

2. Edit `.env`

Minimum:

MUSIC_ROOT=/absolute/path/to/your/music

On Windows you can use forward OR back slashes: MUSIC_ROOT=C:/Users/you/Music and MUSIC_ROOT=C:\Users\you\Music both work.

Everything else is optional. See .env.example for the full list with inline documentation.

3. Verify the config

python config.py

Prints a summary of every resolved path and tells you whether Spotify credentials were picked up. If MUSIC_ROOT points somewhere that doesn't exist, the individual scripts will fail loudly via config.require_music_root() — not silently.

4. `.env` is gitignored

The repo ships with a .gitignore that excludes .env, .venv/, .data/, Sorted/, Duplicates_Staging/, and the Spotify token cache. You cannot accidentally commit credentials or your music.

Environment variables reference

Variable	Required	Default	Purpose
`MUSIC_ROOT`	Yes	—	Absolute path to the root of your music collection
`SORTED_ROOT`	No	`<MUSIC_ROOT>/Sorted`	Where the organised library lives
`DATA_DIR`	No	`<MUSIC_ROOT>/.data`	Where pipeline state files are written
`DUPLICATES_STAGING`	No	`<MUSIC_ROOT>/sonic-phoenix/Duplicates_Staging`	Where bit-for-bit duplicates are staged
`UNIDENTIFIED_DIR`	No	`<SORTED_ROOT>/Unidentified`	Where tracks that can't be classified land
`FFMPEG_BIN`	No	`<MUSIC_ROOT>/ffmpeg/bin`	Path to FFmpeg binary (only for non-MP3 formats)
`SHAZAM_CONCURRENCY`	No	`20`	Parallel Shazam lookups. Lower if rate-limited.
`ITUNES_COUNTRIES`	No	`US,GB`	iTunes country codes to rotate through for enrichment
`SPOTIFY_CLIENT_ID`	Phase 6 only	—	Spotify developer app Client ID
`SPOTIFY_CLIENT_SECRET`	Phase 6 only	—	Spotify developer app Client Secret
`SPOTIFY_REDIRECT_URI`	No	`http://127.0.0.1:8888/callback`	Spotify OAuth redirect URI

Optional: language hint files

The pipeline's language classifier uses langdetect by default, which does fine on obvious cases (English titles -> English, Spanish titles -> Spanish) but has a well-known failure mode: Latin-script transliterations of non-Latin-script languages (Hindi/Urdu/Punjabi written with English letters) get classified as English.

The fix is an explicit per-language hint file. Every file at config/language_hints/<Language>.json is loaded automatically. The filename (without .json) is the target folder name under SORTED_ROOT.

This is how you add a new language. It's the single user-facing extension point.

Starter templates

Example files ship under config/language_hints/examples/:

English.json — template for any Latin-script language
Hindi.json — the transliterated-Hindi use case the system was built for
Spanish.json — a second Latin-script example to show the pattern
merge_groups.json — optional, consumed by 03F --merge-languages
genres.json — optional, consumed by 05C_confidence_auditor

To activate any of them, copy them one directory up (out of examples/):

cp config/language_hints/examples/English.json config/language_hints/English.json
cp config/language_hints/examples/Hindi.json   config/language_hints/Hindi.json

Then edit your copies. The full field reference lives in config/language_hints/examples/README.md.

Language-agnostic by construction. There is nothing Hindi- or English-specific in the Python code. If you only care about French and Japanese, ship only French.json and Japanese.json — the pipeline will produce Sorted/French/ and Sorted/Japanese/ folders and nothing else.

Running the pipeline

The scripts are designed to be run in order. Each one is an independent Python file that imports config and picks up its inputs from disk, so you can absolutely stop after any phase, inspect the intermediate state under <MUSIC_ROOT>/.data/, fix anything by hand, and resume.

The happy path (minimal run)

For a first-time run against a fresh pile of audio files, the minimum sequence that gets you from chaos to a clean sorted library is:

# Phase 1 — identify everything via acoustic fingerprint
python 01A_extract_metadata.py
python 01D_shazam_all_files.py       # long-running; resumable

# Phase 2 — classify by language and build the hash catalog
python 02A_catalog_music.py
python 02D_organize_music.py          # physically moves files into Sorted/<Lang>/<Artist>/

# Phase 3 — audit and re-sort
python 03A_consolidate_by_artist.py
python 03D_titanium_resort.py         # requires config/language_hints/*.json

# Phase 4 — enrich with tags, lyrics, cover art
python 04I_polish_and_enrich_v6.py

# Phase 5 — finalise the master catalog
python 05I_finalize_catalog.py

That's it. You now have a clean library under <SORTED_ROOT>/ and a read-only catalog at <DATA_DIR>/final_catalog.json.

Optional extras

If you want to dedupe junk or residue files before enrichment: 05D_force_delete_residue.py, 05F_final_scrub.py.
If you want empty-folder cleanup mid-pipeline: 05E_final_cleanup.py, 05H_final_vacuum.py.
If you want to push everything to Spotify: see Phase 6: Spotify sync below.
If you want deep art fetch (1000x1000 HD): 04F_deep_art_sync.py.
If you want AI-driven playlist curation: see Phase 7: ClawHub AI skill below.

The historical / kitchen-sink path

Phases 1-5 each have multiple script versions that represent the evolution of the project. The 01A-01E, 04A-04I, and 05A-05I scripts are a chronological record: running any one of them in order from A->Z reproduces the full history that got us to 04I (the canonical enrichment script). Reading them in order is by far the fastest way to understand why the project does what it does, but you do not need to run every one. The "happy path" above is the canonical sequence.

Every script has a docstring at the top marked with one of these statuses:

Status	Meaning
CANONICAL	The recommended version. Run this.
HISTORICAL	An earlier iteration kept for reference. Safe to skip.
UTILITY	Standalone tool, not part of the main flow.
DESTRUCTIVE UTILITY	Removes files. Read the docstring before running.
LIBRARY	Imported by other scripts. Not directly runnable.

Phase-by-phase reference

Phase 1 — Discovery & identification

Script	Status	What it does
`01A_extract_metadata.py`	HISTORICAL	Pulls ID3 tags via mutagen. Useful as a quick sanity scan of what your files claim to contain, but existing tags are often unreliable.
`01B_shazam_identify.py`	HISTORICAL	Smoke test for shazamio — identifies a single file.
`01C_shazam_by_hash.py`	HISTORICAL	Hash-keyed Shazam cache. Predecessor to `01D`.
`01D_shazam_all_files.py`	CANONICAL	20-way concurrent Shazam over the whole library. Resumable. Writes to `.data/shazam_final_results.json`. This is the script you actually run.
`01E_test_matching.py`	UTILITY	Diagnostics for fuzzy string matching. Handy when debugging why an artist didn't match.

Phase 2 — Consolidation & initial sort

Script	Status	What it does
`02A_catalog_music.py`	CANONICAL	Builds `catalog.json` with one entry per file: `{hash, tags, language, source}`. Uses langdetect for language classification.
`02B_analyze_catalog.py`	UTILITY	Prints human-readable stats over the catalog (tracks per language, duplicates, etc). Read-only.
`02C_organize_files.py`	HISTORICAL	Early prototype mover. Superseded by `02D`.
`02D_organize_music.py`	CANONICAL	Physically moves every file into `<SORTED_ROOT>/<Lang>/<Artist>/`. Stages duplicates into `Duplicates_Staging/` instead of deleting.

Phase 3 — Hierarchical audit & re-sort

Script	Status	What it does
`03A_consolidate_by_artist.py`	CANONICAL	Merges feature-credited artist folders into their canonical parent (e.g. "Akon feat Eminem" -> "Akon"). Reads overrides from `config/language_hints/artist_map.json`.
`03B_master_audit_sort.py`	CANONICAL	Hint-driven audit pass. Flags orphans, empty folders, and residue files. Requires `config/language_hints/*.json`.
`03C_high_confidence_resort.py`	HISTORICAL	Re-evaluates low-confidence classifications against the hash catalog.
`03D_titanium_resort.py`	CANONICAL	Final structural enforcement. Uses hint files' `artists`, `dna`, `keywords`, and `lang_codes` to hard-route every remaining ambiguous artist to a language.
`03E_scan_remnants.py`	UTILITY	Sweeps `MUSIC_ROOT` for unprocessed leftovers not under `Sorted/`.
`03F_reorganize_binary.py`	UTILITY	Resolves byte-level hash mismatches. With `--merge-languages` it unifies adjacent language buckets per `merge_groups.json`.
`03G_diagnose_shankar.py`	HISTORICAL	Targeted debugger for feature-credit parsing edge cases. Named after the Bollywood trio Shankar-Ehsaan-Loy, which was the original test case.

Phase 4 — Enrichment (tags, lyrics, cover art)

Script	Status	What it does
`04A_enrich_library.py`	HISTORICAL	First iteration of the enricher.
`04B_enrich_library_v2.py`	HISTORICAL	v2 with structured API error handling.
`04C_polish_library.py`	HISTORICAL	Tightens enrichment bounds and artwork dimension minimums.
`04D_fetch_lyrics.py`	HISTORICAL	Standalone Lyrics.ovh wrapper. Superseded by LrcLib in `04I`.
`04E_art_decorator.py`	HISTORICAL	Standalone ID3 APIC image embedder.
`04F_deep_art_sync.py`	UTILITY	Deep art rescue — forces 1000x1000 fetch when the normal pass failed. Run this if your library has sporadic missing cover art after `04I`.
`04G_polish_and_enrich.py`	HISTORICAL	Master enricher, v3.
`04H_polish_and_enrich_v5.py`	HISTORICAL	v5 with strict subset matcher.
`04I_polish_and_enrich_v6.py`	CANONICAL / CROWN JEWEL	The one you run. iTunes country rotation, HTTP 429/403 backoff, synchronised lyrics via LrcLib, optional Pillow-based APIC embedding, strict subset matcher.

Phase 5 — Cleaning & finalisation

Script	Status	What it does
`05A_repair_json.py`	UTILITY	Fixes trailing-comma / truncation corruption in the JSON data stores. Run if a prior script was killed mid-write.
`05B_sanitize_results_json.py`	UTILITY	Strips junk tags before the final migration.
`05C_confidence_auditor.py`	UTILITY	Confidence-scored audit over the classified library. Uses `config/language_hints/genres.json` to rank suspicious classifications. Read-only.
`05D_force_delete_residue.py`	DESTRUCTIVE UTILITY	Hard-purges residue files matching a denoise pattern. Read the docstring first.
`05E_final_cleanup.py`	UTILITY	Bottom-up empty-folder vacuum. Never touches the root or per-language tops.
`05F_final_scrub.py`	DESTRUCTIVE UTILITY	Nuclear scrub of garbage extensions and fragment files. Skips the `Sorted/` subtree entirely.
`05G_final_migration.py`	HISTORICAL	Final migration engine that writes ID3 tags before moving. Superseded by `04I`'s in-place enrichment.
`05H_final_vacuum.py`	UTILITY	Zero-remnant vacuum for a specific language bucket (defaults to "Rescued").
`05I_finalize_catalog.py`	CANONICAL	Writes `.data/final_catalog.json`, the read-only master catalog. Three-tier classification (ID3 -> Shazam -> filename fallback). Run this as the last step of the main pipeline.

Phase 6 — Spotify sync

Entirely optional. Skip the whole phase if you just want a clean local library.

Script	Status	What it does
`06B_spotify_setup.py`	CANONICAL	First Phase 6 script — completes the Spotify OAuth handshake and caches the token. Run once.
`06C_spotify_backup.py`	CANONICAL	Snapshots every playlist you own into `.data/spotify_backups/`. Run before 06D/06E so you have a rollback path.
`06D_spotify_sync_engine.py`	CANONICAL	Mirrors `<SORTED_ROOT>/<Lang>/` into a "Local Library -- Lang" playlist per language. Resumable.
`06E_spotify_discovery_sync.py`	CANONICAL / CROWN JEWEL	Cross-references your local artists with your actual Spotify listening history and auto-generates per-genre "Essentials -- Lang -- Genre" playlists for the intersection.

Setting up Spotify

Go to the Spotify Developer Dashboard and create an app.
In the app settings, add this exact Redirect URI:
```
http://127.0.0.1:8888/callback
```

Copy the Client ID and Client Secret into your .env:

SPOTIFY_CLIENT_ID=your_client_id
SPOTIFY_CLIENT_SECRET=your_client_secret

Run python 06B_spotify_setup.py. Your browser opens, you approve the scopes, the script confirms and caches the token at .data/.spotify_token_cache.
From then on every other 06* script will pick up the cached token silently.

If 06D or 06E return a 403 on playlist creation, your app is in Spotify's "Development Mode" and needs the user explicitly added under "Users and Access" in the Dashboard. The scripts print the exact fix message.

Utility scripts

Tools that live at the repo root, not part of a numbered phase.

Script	Status	Purpose
`absolute_zero_sort.py`	UTILITY	Final-pass classifier for `Sorted/Unidentified/Audit_Needed/`. Hint-driven. Deletes the Unidentified tree after moving what it can.
`common_sense_sort.py`	UTILITY	Non-destructive sibling of `absolute_zero_sort`. Moves what it can but leaves `Unidentified/` intact so you can inspect what's left.
`total_scrub.py`	DESTRUCTIVE UTILITY	Deletes every top-level folder under `MUSIC_ROOT` that is not in the protected set. Only run when you are 100% done and only want the finalised library to remain.
`format_and_rename_project.py`	HISTORICAL	One-shot bootstrap that renamed the original scripts to their phase-prefixed form. Running it today just prints the phase table.
`spotify_auth.py`	LIBRARY	Shared Spotipy OAuth helper imported by every `06*` script. Not runnable on its own.
`config.py`	LIBRARY	Single source of truth for every path, tuning knob, and environment variable. Import only. Running `python config.py` prints a config summary.

ClawHub AI skill (Phase 7)

Sonic Phoenix is also published as an AI agent skill on ClawHub under the name ultimate-music-manager. The skill teaches an AI coding assistant (Claude Code, Codex, Copilot, or any OpenClaw-compatible agent) how to operate the full pipeline on your behalf and curate playlists from your catalog using natural language.

Phase 7 is what turns the pipeline's output from a static library into a living, queryable music system. Once the catalog is built (Phases 1-5), the AI skill lets you say things like "build me a 90s Bollywood nostalgia playlist" or "create a road trip mix, heavy on rock" and the agent has the structured metadata — artist, title, album, language, genre — to execute it.

What's in the skill

The skill lives at ultimate-music-manager/ in this repo and includes:

File	Purpose
`SKILL.md`	Main agent instruction document — setup, config, phase-by-phase reference, troubleshooting
`_meta.json`	ClawHub registry metadata (slug + version)
`scripts/preflight.sh`	7-point environment validator (Python 3.12, venv, deps, `.env`, `MUSIC_ROOT`, FFmpeg)
`scripts/run-pipeline.sh`	Single-command pipeline runner with `--skip-shazam`, `--spotify`, `--dry-run` flags
`scripts/status.sh`	Dashboard: file counts, language breakdown, data file sizes, pipeline progress
`hooks/safety-guard.sh`	PreToolUse hook that intercepts destructive scripts and requires confirmation
`hooks/HOOK.md`	Hook metadata (OpenClaw format)
`references/data-files.md`	Schema and lineage for every JSON artifact the pipeline produces
`references/language-hints-guide.md`	Full guide to creating language hint files with examples

Installing from ClawHub

npx clawhub@latest install ultimate-music-manager

Using the helper scripts directly

You don't need ClawHub to use the scripts — they work standalone from the repo:

# Check your environment is ready
bash ultimate-music-manager/scripts/preflight.sh

# Preview what the pipeline will do
bash ultimate-music-manager/scripts/run-pipeline.sh --dry-run

# Run the full pipeline
bash ultimate-music-manager/scripts/run-pipeline.sh

# Run including Spotify sync
bash ultimate-music-manager/scripts/run-pipeline.sh --spotify

# Check status at any time
bash ultimate-music-manager/scripts/status.sh

Enabling the safety hook (Claude Code)

Add to .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "./ultimate-music-manager/hooks/safety-guard.sh"
      }]
    }]
  }
}

This intercepts attempts to run destructive scripts (05D, 05F, total_scrub, absolute_zero_sort) and injects a confirmation warning. Zero overhead on all other commands.

Data files produced

All working state lives under <DATA_DIR> (default <MUSIC_ROOT>/.data/). Every file here is regenerable — safe to delete if you want to start over.

File	Written by	Purpose
`metadata_catalog.json`	`01A`	ID3 tag dump per file (pre-Shazam)
`shazam_final_results.json`	`01D`	Acoustic identification for every file
`shazam_hash_results.json`	`01C`	Hash-keyed Shazam cache
`catalog.json`	`02A`	Master SHA-256 catalog with language classification
`enrichment_report.json`	`04I`	Per-file enrichment status (art + lyrics booleans)
`mismatch_report.json`	`04I`	Files where iTunes returned a mismatched track
`final_catalog.json`	`05I`	Read-only master catalog — the single source of truth for Phase 6 and Phase 7
`confidence_report.json`	`05C`	Audit confidence scores
`.spotify_token_cache`	`06B`	Cached OAuth token (gitignored)
`spotify_sync_state.json`	`06D`	Resumable per-file sync state
`discovery_sync_state.json`	`06E`	Resumable per-artist discovery state
`spotify_backups/`	`06C`	Rollback snapshot of every Spotify playlist
`spotify_sync.log`	`06D`	Full sync log
`discovery_sync.log`	`06E`	Discovery sync log

Data flow

01A -> metadata_catalog.json
01D -> shazam_final_results.json, shazam_hash_results.json
02A -> catalog.json (merges metadata + shazam + hashes)
04I -> enrichment_report.json, mismatch_report.json (reads catalog, writes enriched ID3 tags to files)
05I -> final_catalog.json (merges all sources into canonical read-only output)
06D -> spotify_sync_state.json (reads final_catalog)
06E -> discovery_sync_state.json (reads final_catalog + Spotify listening history)

Troubleshooting

shazamio-core fails to install / "no matching wheel". You are not on Python 3.12. python --version inside your activated venv should say 3.12.x. Recreate the venv with py -3.12 -m venv .venv.

Shazam returns HTTP 429. Drop SHAZAM_CONCURRENCY in your .env to 5 or 10 and rerun 01D. The script is resumable — it skips anything already identified.

iTunes returns HTTP 403. ITUNES_COUNTRIES=US,GB,AU,CA — add more countries to rotate through. 04I already retries with backoff.

[config] MUSIC_ROOT does not exist. Your .env is not being read, or MUSIC_ROOT is set to a path that doesn't exist. python config.py will tell you exactly which path it tried.

Every artist ends up under "English". You need config/language_hints/*.json files for the languages you care about. Out of the box langdetect can only tell apart actual script families — transliterated Hindi, romanised Japanese, and similar cases need hints. Copy config/language_hints/examples/ templates and edit.

Spotify 403 when creating a playlist. Your developer app is in Development Mode. Go to the Spotify Dashboard -> your app -> Users and Access -> add your Spotify account email. Then rerun.

Files are "missing" after a run. Nothing is ever deleted by the canonical pipeline. Check Duplicates_Staging/ first — that is where duplicates go to wait for your review.

JSON data files are corrupted (trailing commas, truncated). A prior script was killed mid-write. Run python 05A_repair_json.py to sanitize the data stores, then resume the pipeline from wherever you left off.

Enrichment seems stuck or slow. 04I respects iTunes rate limits with exponential backoff. If it's cycling through 429 retries, add more country codes to ITUNES_COUNTRIES in your .env to spread the load. The script is resumable — safe to kill and restart.

Design notes

Resumable by construction. Every long-running script writes state to disk after each unit of work and picks up where it left off on the next run. You can kill 01D, 04I, 06D, 06E at any point with Ctrl-C without losing progress.
Nothing is deleted without permission. The canonical pipeline (01-04, 05I, 06) only moves files. Deletions are gated behind explicit 05D, 05F, absolute_zero_sort, total_scrub which are each marked DESTRUCTIVE UTILITY in their docstrings.
Cross-validation over blind trust. Existing ID3 tags are not thrown away — they are cross-validated against Shazam's acoustic fingerprint. Where Shazam confirms the tags, they stay. Where it disagrees, the acoustic result wins. Where Shazam can't identify a track, the tags are sanitised and used as a fallback.
Single source of truth for config. Everything that's user-tunable is in config.py, which reads from environment variables (optionally via .env). No magic constants hidden in individual scripts.
No credentials in code. Spotify keys come from env vars only. A grep for client IDs across the repo returns zero hits — the only place they can possibly live is your local .env which is gitignored.
Language-agnostic. Zero hardcoded language names anywhere in the Python. All language knowledge is loaded at runtime from config/language_hints/*.json.
Script status is documented inline. Every .py file's top-of-file docstring begins with a Status: line (CANONICAL / HISTORICAL / UTILITY / DESTRUCTIVE UTILITY) and a Run if: line. You do not need to read the code to know whether to run a script.
Historical scripts are preserved. The 04A-04I evolution is kept in the repo as a chronological record of every edge case the enrichment pipeline encountered. Reading them in order is the fastest way to understand the problem space.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config/language_hints/examples		config/language_hints/examples
ultimate-music-manager		ultimate-music-manager
workspace_instructions		workspace_instructions
.env.example		.env.example
.gitignore		.gitignore
01A_extract_metadata.py		01A_extract_metadata.py
01B_shazam_identify.py		01B_shazam_identify.py
01C_shazam_by_hash.py		01C_shazam_by_hash.py
01D_shazam_all_files.py		01D_shazam_all_files.py
01E_test_matching.py		01E_test_matching.py
02A_catalog_music.py		02A_catalog_music.py
02B_analyze_catalog.py		02B_analyze_catalog.py
02C_organize_files.py		02C_organize_files.py
02D_organize_music.py		02D_organize_music.py
03A_consolidate_by_artist.py		03A_consolidate_by_artist.py
03B_master_audit_sort.py		03B_master_audit_sort.py
03C_high_confidence_resort.py		03C_high_confidence_resort.py
03D_titanium_resort.py		03D_titanium_resort.py
03E_scan_remnants.py		03E_scan_remnants.py
03F_reorganize_binary.py		03F_reorganize_binary.py
03G_diagnose_shankar.py		03G_diagnose_shankar.py
04A_enrich_library.py		04A_enrich_library.py
04B_enrich_library_v2.py		04B_enrich_library_v2.py
04C_polish_library.py		04C_polish_library.py
04D_fetch_lyrics.py		04D_fetch_lyrics.py
04E_art_decorator.py		04E_art_decorator.py
04F_deep_art_sync.py		04F_deep_art_sync.py
04G_polish_and_enrich.py		04G_polish_and_enrich.py
04H_polish_and_enrich_v5.py		04H_polish_and_enrich_v5.py
04I_polish_and_enrich_v6.py		04I_polish_and_enrich_v6.py
05A_repair_json.py		05A_repair_json.py
05B_sanitize_results_json.py		05B_sanitize_results_json.py
05C_confidence_auditor.py		05C_confidence_auditor.py
05D_force_delete_residue.py		05D_force_delete_residue.py
05E_final_cleanup.py		05E_final_cleanup.py
05F_final_scrub.py		05F_final_scrub.py
05G_final_migration.py		05G_final_migration.py
05H_final_vacuum.py		05H_final_vacuum.py
05I_finalize_catalog.py		05I_finalize_catalog.py
06B_spotify_setup.py		06B_spotify_setup.py
06C_spotify_backup.py		06C_spotify_backup.py
06D_spotify_sync_engine.py		06D_spotify_sync_engine.py
06E_spotify_discovery_sync.py		06E_spotify_discovery_sync.py
LICENSE		LICENSE
README.md		README.md
absolute_zero_sort.py		absolute_zero_sort.py
common_sense_sort.py		common_sense_sort.py
config.py		config.py
decisions_and_issues.md		decisions_and_issues.md
format_and_rename_project.py		format_and_rename_project.py
requirements.txt		requirements.txt
spotify_auth.py		spotify_auth.py
total_scrub.py		total_scrub.py

Folders and files

Latest commit

History

Repository files navigation

Sonic Phoenix

Table of contents

How it works at a glance

What this does

Supported formats

Requirements

Install

1. Clone the repo into your music folder

2. Create a Python 3.12 virtual environment

3. Install Python dependencies

Configure

1. Copy the env template

2. Edit .env

3. Verify the config

4. .env is gitignored

Environment variables reference

Optional: language hint files

Starter templates

Running the pipeline

The happy path (minimal run)

Optional extras

The historical / kitchen-sink path

Phase-by-phase reference

Phase 1 — Discovery & identification

Phase 2 — Consolidation & initial sort

Phase 3 — Hierarchical audit & re-sort

Phase 4 — Enrichment (tags, lyrics, cover art)

Phase 5 — Cleaning & finalisation

Phase 6 — Spotify sync

Setting up Spotify

Utility scripts

ClawHub AI skill (Phase 7)

What's in the skill

Installing from ClawHub

Using the helper scripts directly

Enabling the safety hook (Claude Code)

Data files produced

Data flow

Troubleshooting

Design notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Edit `.env`

4. `.env` is gitignored

Packages