A reproducible, resumable, language-agnostic pipeline for turning a chaotic folder of audio files into a clean Language/Artist/Album/Artist - Title.ext library with correct ID3 tags, embedded cover art, and synchronised lyrics — then optionally pushing the result into Spotify playlists and exposing it to AI agents for on-demand playlist curation.
The pipeline is broken into seven phases numbered 01–06 plus a Phase 7 AI skill. Each phase is a series of small, independent scripts that write their state to disk so you can stop, inspect, edit, and resume at any point without redoing work.
- How it works at a glance
- What this does
- Supported formats
- Requirements
- Install
- Configure
- Optional: language hint files
- Running the pipeline
- Phase-by-phase reference
- Utility scripts
- ClawHub AI skill (Phase 7)
- Data files produced
- Troubleshooting
- Design notes
- License
Raw audio files (any state of disorganisation)
│
▼
Phase 1 — Acoustic fingerprint every file via Shazam (20-way concurrent, resumable)
│
▼
Phase 2 — SHA-256 deduplication + language classification via langdetect + hint files
│
▼
Phase 3 — Fuzzy artist consolidation + structural enforcement into Language/Artist/Album/
│
▼
Phase 4 — Metadata enrichment: iTunes tags, LrcLib synchronised lyrics, HD cover art
│
▼
Phase 5 — Catalog finalisation: merge all data sources into a single read-only JSON
│
▼
Phase 6 — (Optional) Spotify sync: mirror playlists + auto-generate genre "Essentials"
│
▼
Phase 7 — (Optional) AI skill: on-demand playlist curation via any OpenClaw-compatible agent
Given a pile of audio files scattered across an arbitrary folder tree — with broken, missing, or misleading metadata — the pipeline will:
- Identify every track by acoustic fingerprint via ShazamIO, cross-validating against existing tags rather than blindly trusting them.
- Deduplicate bit-for-bit via SHA-256 (never deletes — stages duplicates for your review).
- Classify each track's language using langdetect plus optional per-language hint files you control.
- Organise everything into
<SORTED_ROOT>/<Language>/<Artist>/<Album>/<Artist> - <Title>.<ext>. - Enrich the library by fetching canonical metadata from the iTunes Search API, synchronised lyrics from LrcLib, and embedding 1000x1000 cover art.
- Sync the finalised local library up to Spotify as either a full mirror playlist or a curated set of per-genre "Essentials" playlists cross-referenced against your actual listening history.
- Expose the structured catalog to AI coding agents via a ClawHub skill for natural-language playlist curation on demand.
No part of the pipeline assumes any particular language. Drop a JSON file per language you care about into config/language_hints/ and the pipeline routes into those buckets automatically.
| Format | Extension | Shazam ID | ID3 enrichment | Notes |
|---|---|---|---|---|
| MP3 | .mp3 |
Direct | Full | Primary target format. No FFmpeg needed. |
| FLAC | .flac |
Via FFmpeg | Vorbis tags | Requires FFmpeg on PATH or in <MUSIC_ROOT>/ffmpeg/. |
| AAC/M4A | .m4a, .aac |
Via FFmpeg | MP4 tags | Requires FFmpeg. |
| WAV | .wav |
Via FFmpeg | Minimal | Lossless but no native tag support. |
| OGG Vorbis | .ogg |
Via FFmpeg | Vorbis tags | Requires FFmpeg. |
| WMA | .wma |
Via FFmpeg | ASF tags | Legacy format. Requires FFmpeg. |
| Opus | .opus |
Via FFmpeg | Vorbis tags | Requires FFmpeg. |
- Python 3.12. Python 3.13+ will not work today: the
shazamio-corewheel does not yet publish binaries for 3.13, and the source build needs Rust. Python 3.10 and 3.11 will work if you downgradelangdetect, but 3.12 is the supported target. - ~2 GB of free disk for the Shazam cache, the metadata catalog, cover art thumbnails, and Duplicates_Staging.
- (Optional) FFmpeg on
PATH. Shazam only needs it to decode non-MP3 containers (FLAC/M4A/OPUS). If you don't have it globally, drop a portable build at<MUSIC_ROOT>/ffmpeg/and the pipeline will pick it up automatically. - (Optional) A Spotify developer app if you want Phase 6. See Phase 6: Spotify sync below.
The zero-config layout the defaults expect is <MUSIC_ROOT>/sonic-phoenix/. You don't have to follow it — every path is overridable via environment variables — but it's the shortest path to running.
cd /path/to/your/music # whatever you want MUSIC_ROOT to be
git clone https://github.com/drajb/sonic-phoenix.git
cd sonic-phoenix# macOS / Linux
python3.12 -m venv .venv
source .venv/bin/activate
# Windows (PowerShell)
py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1
# Windows (cmd)
py -3.12 -m venv .venv
.venv\Scripts\activate.batVerify:
python --version # should print Python 3.12.xpip install --upgrade pip
pip install -r requirements.txtThe heavy dependency is shazamio-core (a Rust binary wheel). If pip tries to build it from source you are not on Python 3.12 — stop and fix the interpreter.
All configuration lives in environment variables. The only one you are required to set is MUSIC_ROOT.
# macOS / Linux
cp .env.example .env
# Windows (PowerShell)
Copy-Item .env.example .envMinimum:
MUSIC_ROOT=/absolute/path/to/your/musicOn Windows you can use forward OR back slashes: MUSIC_ROOT=C:/Users/you/Music and MUSIC_ROOT=C:\Users\you\Music both work.
Everything else is optional. See .env.example for the full list with inline documentation.
python config.pyPrints a summary of every resolved path and tells you whether Spotify credentials were picked up. If MUSIC_ROOT points somewhere that doesn't exist, the individual scripts will fail loudly via config.require_music_root() — not silently.
The repo ships with a .gitignore that excludes .env, .venv/, .data/, Sorted/, Duplicates_Staging/, and the Spotify token cache. You cannot accidentally commit credentials or your music.
| Variable | Required | Default | Purpose |
|---|---|---|---|
MUSIC_ROOT |
Yes | — | Absolute path to the root of your music collection |
SORTED_ROOT |
No | <MUSIC_ROOT>/Sorted |
Where the organised library lives |
DATA_DIR |
No | <MUSIC_ROOT>/.data |
Where pipeline state files are written |
DUPLICATES_STAGING |
No | <MUSIC_ROOT>/sonic-phoenix/Duplicates_Staging |
Where bit-for-bit duplicates are staged |
UNIDENTIFIED_DIR |
No | <SORTED_ROOT>/Unidentified |
Where tracks that can't be classified land |
FFMPEG_BIN |
No | <MUSIC_ROOT>/ffmpeg/bin |
Path to FFmpeg binary (only for non-MP3 formats) |
SHAZAM_CONCURRENCY |
No | 20 |
Parallel Shazam lookups. Lower if rate-limited. |
ITUNES_COUNTRIES |
No | US,GB |
iTunes country codes to rotate through for enrichment |
SPOTIFY_CLIENT_ID |
Phase 6 only | — | Spotify developer app Client ID |
SPOTIFY_CLIENT_SECRET |
Phase 6 only | — | Spotify developer app Client Secret |
SPOTIFY_REDIRECT_URI |
No | http://127.0.0.1:8888/callback |
Spotify OAuth redirect URI |
The pipeline's language classifier uses langdetect by default, which does fine on obvious cases (English titles -> English, Spanish titles -> Spanish) but has a well-known failure mode: Latin-script transliterations of non-Latin-script languages (Hindi/Urdu/Punjabi written with English letters) get classified as English.
The fix is an explicit per-language hint file. Every file at config/language_hints/<Language>.json is loaded automatically. The filename (without .json) is the target folder name under SORTED_ROOT.
This is how you add a new language. It's the single user-facing extension point.
Example files ship under config/language_hints/examples/:
English.json— template for any Latin-script languageHindi.json— the transliterated-Hindi use case the system was built forSpanish.json— a second Latin-script example to show the patternmerge_groups.json— optional, consumed by03F --merge-languagesgenres.json— optional, consumed by05C_confidence_auditor
To activate any of them, copy them one directory up (out of examples/):
cp config/language_hints/examples/English.json config/language_hints/English.json
cp config/language_hints/examples/Hindi.json config/language_hints/Hindi.jsonThen edit your copies. The full field reference lives in config/language_hints/examples/README.md.
Language-agnostic by construction. There is nothing Hindi- or English-specific in the Python code. If you only care about French and Japanese, ship only
French.jsonandJapanese.json— the pipeline will produceSorted/French/andSorted/Japanese/folders and nothing else.
The scripts are designed to be run in order. Each one is an independent Python file that imports config and picks up its inputs from disk, so you can absolutely stop after any phase, inspect the intermediate state under <MUSIC_ROOT>/.data/, fix anything by hand, and resume.
For a first-time run against a fresh pile of audio files, the minimum sequence that gets you from chaos to a clean sorted library is:
# Phase 1 — identify everything via acoustic fingerprint
python 01A_extract_metadata.py
python 01D_shazam_all_files.py # long-running; resumable
# Phase 2 — classify by language and build the hash catalog
python 02A_catalog_music.py
python 02D_organize_music.py # physically moves files into Sorted/<Lang>/<Artist>/
# Phase 3 — audit and re-sort
python 03A_consolidate_by_artist.py
python 03D_titanium_resort.py # requires config/language_hints/*.json
# Phase 4 — enrich with tags, lyrics, cover art
python 04I_polish_and_enrich_v6.py
# Phase 5 — finalise the master catalog
python 05I_finalize_catalog.pyThat's it. You now have a clean library under <SORTED_ROOT>/ and a read-only catalog at <DATA_DIR>/final_catalog.json.
- If you want to dedupe junk or residue files before enrichment:
05D_force_delete_residue.py,05F_final_scrub.py. - If you want empty-folder cleanup mid-pipeline:
05E_final_cleanup.py,05H_final_vacuum.py. - If you want to push everything to Spotify: see Phase 6: Spotify sync below.
- If you want deep art fetch (1000x1000 HD):
04F_deep_art_sync.py. - If you want AI-driven playlist curation: see Phase 7: ClawHub AI skill below.
Phases 1-5 each have multiple script versions that represent the evolution of the project. The 01A-01E, 04A-04I, and 05A-05I scripts are a chronological record: running any one of them in order from A->Z reproduces the full history that got us to 04I (the canonical enrichment script). Reading them in order is by far the fastest way to understand why the project does what it does, but you do not need to run every one. The "happy path" above is the canonical sequence.
Every script has a docstring at the top marked with one of these statuses:
| Status | Meaning |
|---|---|
| CANONICAL | The recommended version. Run this. |
| HISTORICAL | An earlier iteration kept for reference. Safe to skip. |
| UTILITY | Standalone tool, not part of the main flow. |
| DESTRUCTIVE UTILITY | Removes files. Read the docstring before running. |
| LIBRARY | Imported by other scripts. Not directly runnable. |
| Script | Status | What it does |
|---|---|---|
01A_extract_metadata.py |
HISTORICAL | Pulls ID3 tags via mutagen. Useful as a quick sanity scan of what your files claim to contain, but existing tags are often unreliable. |
01B_shazam_identify.py |
HISTORICAL | Smoke test for shazamio — identifies a single file. |
01C_shazam_by_hash.py |
HISTORICAL | Hash-keyed Shazam cache. Predecessor to 01D. |
01D_shazam_all_files.py |
CANONICAL | 20-way concurrent Shazam over the whole library. Resumable. Writes to .data/shazam_final_results.json. This is the script you actually run. |
01E_test_matching.py |
UTILITY | Diagnostics for fuzzy string matching. Handy when debugging why an artist didn't match. |
| Script | Status | What it does |
|---|---|---|
02A_catalog_music.py |
CANONICAL | Builds catalog.json with one entry per file: {hash, tags, language, source}. Uses langdetect for language classification. |
02B_analyze_catalog.py |
UTILITY | Prints human-readable stats over the catalog (tracks per language, duplicates, etc). Read-only. |
02C_organize_files.py |
HISTORICAL | Early prototype mover. Superseded by 02D. |
02D_organize_music.py |
CANONICAL | Physically moves every file into <SORTED_ROOT>/<Lang>/<Artist>/. Stages duplicates into Duplicates_Staging/ instead of deleting. |
| Script | Status | What it does |
|---|---|---|
03A_consolidate_by_artist.py |
CANONICAL | Merges feature-credited artist folders into their canonical parent (e.g. "Akon feat Eminem" -> "Akon"). Reads overrides from config/language_hints/artist_map.json. |
03B_master_audit_sort.py |
CANONICAL | Hint-driven audit pass. Flags orphans, empty folders, and residue files. Requires config/language_hints/*.json. |
03C_high_confidence_resort.py |
HISTORICAL | Re-evaluates low-confidence classifications against the hash catalog. |
03D_titanium_resort.py |
CANONICAL | Final structural enforcement. Uses hint files' artists, dna, keywords, and lang_codes to hard-route every remaining ambiguous artist to a language. |
03E_scan_remnants.py |
UTILITY | Sweeps MUSIC_ROOT for unprocessed leftovers not under Sorted/. |
03F_reorganize_binary.py |
UTILITY | Resolves byte-level hash mismatches. With --merge-languages it unifies adjacent language buckets per merge_groups.json. |
03G_diagnose_shankar.py |
HISTORICAL | Targeted debugger for feature-credit parsing edge cases. Named after the Bollywood trio Shankar-Ehsaan-Loy, which was the original test case. |
| Script | Status | What it does |
|---|---|---|
04A_enrich_library.py |
HISTORICAL | First iteration of the enricher. |
04B_enrich_library_v2.py |
HISTORICAL | v2 with structured API error handling. |
04C_polish_library.py |
HISTORICAL | Tightens enrichment bounds and artwork dimension minimums. |
04D_fetch_lyrics.py |
HISTORICAL | Standalone Lyrics.ovh wrapper. Superseded by LrcLib in 04I. |
04E_art_decorator.py |
HISTORICAL | Standalone ID3 APIC image embedder. |
04F_deep_art_sync.py |
UTILITY | Deep art rescue — forces 1000x1000 fetch when the normal pass failed. Run this if your library has sporadic missing cover art after 04I. |
04G_polish_and_enrich.py |
HISTORICAL | Master enricher, v3. |
04H_polish_and_enrich_v5.py |
HISTORICAL | v5 with strict subset matcher. |
04I_polish_and_enrich_v6.py |
CANONICAL / CROWN JEWEL | The one you run. iTunes country rotation, HTTP 429/403 backoff, synchronised lyrics via LrcLib, optional Pillow-based APIC embedding, strict subset matcher. |
| Script | Status | What it does |
|---|---|---|
05A_repair_json.py |
UTILITY | Fixes trailing-comma / truncation corruption in the JSON data stores. Run if a prior script was killed mid-write. |
05B_sanitize_results_json.py |
UTILITY | Strips junk tags before the final migration. |
05C_confidence_auditor.py |
UTILITY | Confidence-scored audit over the classified library. Uses config/language_hints/genres.json to rank suspicious classifications. Read-only. |
05D_force_delete_residue.py |
DESTRUCTIVE UTILITY | Hard-purges residue files matching a denoise pattern. Read the docstring first. |
05E_final_cleanup.py |
UTILITY | Bottom-up empty-folder vacuum. Never touches the root or per-language tops. |
05F_final_scrub.py |
DESTRUCTIVE UTILITY | Nuclear scrub of garbage extensions and fragment files. Skips the Sorted/ subtree entirely. |
05G_final_migration.py |
HISTORICAL | Final migration engine that writes ID3 tags before moving. Superseded by 04I's in-place enrichment. |
05H_final_vacuum.py |
UTILITY | Zero-remnant vacuum for a specific language bucket (defaults to "Rescued"). |
05I_finalize_catalog.py |
CANONICAL | Writes .data/final_catalog.json, the read-only master catalog. Three-tier classification (ID3 -> Shazam -> filename fallback). Run this as the last step of the main pipeline. |
Entirely optional. Skip the whole phase if you just want a clean local library.
| Script | Status | What it does |
|---|---|---|
06B_spotify_setup.py |
CANONICAL | First Phase 6 script — completes the Spotify OAuth handshake and caches the token. Run once. |
06C_spotify_backup.py |
CANONICAL | Snapshots every playlist you own into .data/spotify_backups/. Run before 06D/06E so you have a rollback path. |
06D_spotify_sync_engine.py |
CANONICAL | Mirrors <SORTED_ROOT>/<Lang>/ into a "Local Library -- Lang" playlist per language. Resumable. |
06E_spotify_discovery_sync.py |
CANONICAL / CROWN JEWEL | Cross-references your local artists with your actual Spotify listening history and auto-generates per-genre "Essentials -- Lang -- Genre" playlists for the intersection. |
-
Go to the Spotify Developer Dashboard and create an app.
-
In the app settings, add this exact Redirect URI:
http://127.0.0.1:8888/callback -
Copy the Client ID and Client Secret into your
.env:SPOTIFY_CLIENT_ID=your_client_id SPOTIFY_CLIENT_SECRET=your_client_secret
-
Run
python 06B_spotify_setup.py. Your browser opens, you approve the scopes, the script confirms and caches the token at.data/.spotify_token_cache. -
From then on every other
06*script will pick up the cached token silently.
If 06D or 06E return a 403 on playlist creation, your app is in Spotify's "Development Mode" and needs the user explicitly added under "Users and Access" in the Dashboard. The scripts print the exact fix message.
Tools that live at the repo root, not part of a numbered phase.
| Script | Status | Purpose |
|---|---|---|
absolute_zero_sort.py |
UTILITY | Final-pass classifier for Sorted/Unidentified/Audit_Needed/. Hint-driven. Deletes the Unidentified tree after moving what it can. |
common_sense_sort.py |
UTILITY | Non-destructive sibling of absolute_zero_sort. Moves what it can but leaves Unidentified/ intact so you can inspect what's left. |
total_scrub.py |
DESTRUCTIVE UTILITY | Deletes every top-level folder under MUSIC_ROOT that is not in the protected set. Only run when you are 100% done and only want the finalised library to remain. |
format_and_rename_project.py |
HISTORICAL | One-shot bootstrap that renamed the original scripts to their phase-prefixed form. Running it today just prints the phase table. |
spotify_auth.py |
LIBRARY | Shared Spotipy OAuth helper imported by every 06* script. Not runnable on its own. |
config.py |
LIBRARY | Single source of truth for every path, tuning knob, and environment variable. Import only. Running python config.py prints a config summary. |
Sonic Phoenix is also published as an AI agent skill on ClawHub under the name ultimate-music-manager. The skill teaches an AI coding assistant (Claude Code, Codex, Copilot, or any OpenClaw-compatible agent) how to operate the full pipeline on your behalf and curate playlists from your catalog using natural language.
Phase 7 is what turns the pipeline's output from a static library into a living, queryable music system. Once the catalog is built (Phases 1-5), the AI skill lets you say things like "build me a 90s Bollywood nostalgia playlist" or "create a road trip mix, heavy on rock" and the agent has the structured metadata — artist, title, album, language, genre — to execute it.
The skill lives at ultimate-music-manager/ in this repo and includes:
| File | Purpose |
|---|---|
SKILL.md |
Main agent instruction document — setup, config, phase-by-phase reference, troubleshooting |
_meta.json |
ClawHub registry metadata (slug + version) |
scripts/preflight.sh |
7-point environment validator (Python 3.12, venv, deps, .env, MUSIC_ROOT, FFmpeg) |
scripts/run-pipeline.sh |
Single-command pipeline runner with --skip-shazam, --spotify, --dry-run flags |
scripts/status.sh |
Dashboard: file counts, language breakdown, data file sizes, pipeline progress |
hooks/safety-guard.sh |
PreToolUse hook that intercepts destructive scripts and requires confirmation |
hooks/HOOK.md |
Hook metadata (OpenClaw format) |
references/data-files.md |
Schema and lineage for every JSON artifact the pipeline produces |
references/language-hints-guide.md |
Full guide to creating language hint files with examples |
npx clawhub@latest install ultimate-music-managerYou don't need ClawHub to use the scripts — they work standalone from the repo:
# Check your environment is ready
bash ultimate-music-manager/scripts/preflight.sh
# Preview what the pipeline will do
bash ultimate-music-manager/scripts/run-pipeline.sh --dry-run
# Run the full pipeline
bash ultimate-music-manager/scripts/run-pipeline.sh
# Run including Spotify sync
bash ultimate-music-manager/scripts/run-pipeline.sh --spotify
# Check status at any time
bash ultimate-music-manager/scripts/status.shAdd to .claude/settings.json:
{
"hooks": {
"PreToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "./ultimate-music-manager/hooks/safety-guard.sh"
}]
}]
}
}This intercepts attempts to run destructive scripts (05D, 05F, total_scrub, absolute_zero_sort) and injects a confirmation warning. Zero overhead on all other commands.
All working state lives under <DATA_DIR> (default <MUSIC_ROOT>/.data/). Every file here is regenerable — safe to delete if you want to start over.
| File | Written by | Purpose |
|---|---|---|
metadata_catalog.json |
01A |
ID3 tag dump per file (pre-Shazam) |
shazam_final_results.json |
01D |
Acoustic identification for every file |
shazam_hash_results.json |
01C |
Hash-keyed Shazam cache |
catalog.json |
02A |
Master SHA-256 catalog with language classification |
enrichment_report.json |
04I |
Per-file enrichment status (art + lyrics booleans) |
mismatch_report.json |
04I |
Files where iTunes returned a mismatched track |
final_catalog.json |
05I |
Read-only master catalog — the single source of truth for Phase 6 and Phase 7 |
confidence_report.json |
05C |
Audit confidence scores |
.spotify_token_cache |
06B |
Cached OAuth token (gitignored) |
spotify_sync_state.json |
06D |
Resumable per-file sync state |
discovery_sync_state.json |
06E |
Resumable per-artist discovery state |
spotify_backups/ |
06C |
Rollback snapshot of every Spotify playlist |
spotify_sync.log |
06D |
Full sync log |
discovery_sync.log |
06E |
Discovery sync log |
01A -> metadata_catalog.json
01D -> shazam_final_results.json, shazam_hash_results.json
02A -> catalog.json (merges metadata + shazam + hashes)
04I -> enrichment_report.json, mismatch_report.json (reads catalog, writes enriched ID3 tags to files)
05I -> final_catalog.json (merges all sources into canonical read-only output)
06D -> spotify_sync_state.json (reads final_catalog)
06E -> discovery_sync_state.json (reads final_catalog + Spotify listening history)
shazamio-core fails to install / "no matching wheel".
You are not on Python 3.12. python --version inside your activated venv should say 3.12.x. Recreate the venv with py -3.12 -m venv .venv.
Shazam returns HTTP 429.
Drop SHAZAM_CONCURRENCY in your .env to 5 or 10 and rerun 01D. The script is resumable — it skips anything already identified.
iTunes returns HTTP 403.
ITUNES_COUNTRIES=US,GB,AU,CA — add more countries to rotate through. 04I already retries with backoff.
[config] MUSIC_ROOT does not exist.
Your .env is not being read, or MUSIC_ROOT is set to a path that doesn't exist. python config.py will tell you exactly which path it tried.
Every artist ends up under "English".
You need config/language_hints/*.json files for the languages you care about. Out of the box langdetect can only tell apart actual script families — transliterated Hindi, romanised Japanese, and similar cases need hints. Copy config/language_hints/examples/ templates and edit.
Spotify 403 when creating a playlist. Your developer app is in Development Mode. Go to the Spotify Dashboard -> your app -> Users and Access -> add your Spotify account email. Then rerun.
Files are "missing" after a run.
Nothing is ever deleted by the canonical pipeline. Check Duplicates_Staging/ first — that is where duplicates go to wait for your review.
JSON data files are corrupted (trailing commas, truncated).
A prior script was killed mid-write. Run python 05A_repair_json.py to sanitize the data stores, then resume the pipeline from wherever you left off.
Enrichment seems stuck or slow.
04I respects iTunes rate limits with exponential backoff. If it's cycling through 429 retries, add more country codes to ITUNES_COUNTRIES in your .env to spread the load. The script is resumable — safe to kill and restart.
- Resumable by construction. Every long-running script writes state to disk after each unit of work and picks up where it left off on the next run. You can kill
01D,04I,06D,06Eat any point with Ctrl-C without losing progress. - Nothing is deleted without permission. The canonical pipeline (01-04, 05I, 06) only moves files. Deletions are gated behind explicit
05D,05F,absolute_zero_sort,total_scrubwhich are each marked DESTRUCTIVE UTILITY in their docstrings. - Cross-validation over blind trust. Existing ID3 tags are not thrown away — they are cross-validated against Shazam's acoustic fingerprint. Where Shazam confirms the tags, they stay. Where it disagrees, the acoustic result wins. Where Shazam can't identify a track, the tags are sanitised and used as a fallback.
- Single source of truth for config. Everything that's user-tunable is in
config.py, which reads from environment variables (optionally via.env). No magic constants hidden in individual scripts. - No credentials in code. Spotify keys come from env vars only. A
grepfor client IDs across the repo returns zero hits — the only place they can possibly live is your local.envwhich is gitignored. - Language-agnostic. Zero hardcoded language names anywhere in the Python. All language knowledge is loaded at runtime from
config/language_hints/*.json. - Script status is documented inline. Every
.pyfile's top-of-file docstring begins with aStatus:line (CANONICAL / HISTORICAL / UTILITY / DESTRUCTIVE UTILITY) and aRun if:line. You do not need to read the code to know whether to run a script. - Historical scripts are preserved. The
04A-04Ievolution is kept in the repo as a chronological record of every edge case the enrichment pipeline encountered. Reading them in order is the fastest way to understand the problem space.
MIT License
Copyright (c) 2026 Rohit Burani
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.