Two music discovery scripts that scan your local collection and generate HTML reports from Last.fm data. Run them from the command line or let a Docker-hosted web dashboard trigger and schedule them for you.
Missing Popular Albums — for every artist you own, finds the single highest-playcount album or EP you don't have yet.
Discover Similar Artists — queries Last.fm for artists similar to those in your collection, filters out anything you already own, and surfaces the top recommendation per candidate with their most popular album.
- Docker (recommended) or Python 3.12+
- A Last.fm API key (free)
- A Navidrome instance or a local music directory
Docker gives you a web dashboard to trigger runs, watch live logs, view reports, and schedule automatic scans — no Python setup required.
git clone https://github.com/cdeschenes/cratedigger.git
cd cratedigger
cp .env.example .env
# Edit .env — at minimum set:
# LASTFM_API_KEY — your Last.fm API key
# AUTH_PASS — password for the web UI
# SECRET_KEY — any long random string
# NAVIDROME_* — or set MUSIC_ROOT to your local music path
docker compose up -d --buildOpen http://localhost:8080 and sign in with admin / your AUTH_PASS.
Music folder: If you're not using Navidrome, uncomment the volume line in
docker-compose.yamland point it at your music directory.
Run the scripts directly without Docker. Reports are written as standalone HTML files.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env — set LASTFM_API_KEY and NAVIDROME_* (or MUSIC_ROOT)
python missing_popular_albums.py
python discover_similar_artists.pyReports are written to missing_popular_albums.html and discover_similar_artists.html in the current directory (overridable via .env).
All settings live in .env. Copy .env.example to get started — required fields are clearly marked at the top. When using Docker, output paths are pinned to the /data volume automatically and should not be set manually.
| Variable | Default | Required | Purpose |
|---|---|---|---|
LASTFM_API_KEY |
Yes | Last.fm API key. Get one at last.fm/api. | |
MUSIC_ROOT |
/Volumes/NAS/Media/Music/Music_Server |
Fallback | Filesystem path to scan when Navidrome is not configured. |
NAVIDROME_URL |
No | Base URL of your Navidrome instance, e.g. https://navidrome.example.com |
|
NAVIDROME_USER |
No | Navidrome username | |
NAVIDROME_PASS |
No | Navidrome password | |
NAVIDROME_MUSIC_FOLDER |
No | Navidrome library name to restrict the scan to. If set and the name doesn't match, the script aborts and lists available names. Leave empty to scan all libraries. |
All three of NAVIDROME_URL, NAVIDROME_USER, and NAVIDROME_PASS must be set to use Navidrome. If any are missing, the script falls back to the filesystem scan.
| Variable | Default | Purpose |
|---|---|---|
FUZZ_THRESHOLD |
90 |
Fuzzy-match sensitivity (0–100). Lower = more permissive matching. Rarely needs changing. |
DEFAULT_WORKERS |
4 |
Concurrent Last.fm requests per run. |
MAX_WORKERS |
8 |
Upper bound enforced by --workers. |
TOP_ALBUM_LIMIT |
25 |
How many of an artist's top albums to fetch from Last.fm. |
REQUEST_TIMEOUT |
15 |
HTTP timeout in seconds. |
REQUEST_DELAY_MIN |
0.15 |
Minimum random delay between Last.fm requests (seconds). |
REQUEST_DELAY_MAX |
0.3 |
Maximum random delay between Last.fm requests (seconds). |
MAX_RETRIES |
3 |
Retry attempts on Last.fm API errors. |
CACHE_VERSION |
2 |
Internal. Increment to force a full cache refresh after structural changes. |
| Variable | Default | Purpose |
|---|---|---|
HTML_OUT |
missing_popular_albums.html |
Output path for the missing albums report. |
CACHE_FILE |
.cache/lastfm_top_albums.json |
Cache file for Last.fm top-album data. |
LOG_FILE |
missing_popular_albums.log |
Log file for missing_popular_albums.py. |
DISCOVER_HTML_OUT |
discover_similar_artists.html |
Output path for the similar artists report. |
DISCOVER_CACHE_FILE |
.cache/similar_artists.json |
Cache file for similar-artist and tag data. |
DISCOVER_LOG_FILE |
discover_similar_artists.log |
Log file for discover_similar_artists.py. |
| Variable | Default | Purpose |
|---|---|---|
SUGGESTIONS_PER_ARTIST |
2 |
Max candidate artists to collect per local artist from Last.fm's similar-artist list. |
SIMILAR_ARTIST_LIMIT |
30 |
How many similar artists Last.fm returns per query before filtering. |
DISCOVER_TAG_OVERLAP |
1 |
Minimum number of shared Last.fm genre tags between a candidate and at least one source artist. Set to 0 to disable genre filtering entirely. |
DISCOVER_SIMILARITY_MODE |
lastfm |
lastfm: sort by Last.fm shared-listener score (default). tags: re-score candidates by genre-tag Jaccard similarity and drop zero-overlap matches — fixes cross-genre mismatches. |
| Variable | Default | Purpose |
|---|---|---|
AUTH_USER |
admin |
HTTP Basic Auth username. |
AUTH_PASS |
HTTP Basic Auth password. Required. Empty string means every login attempt fails. | |
SCHEDULE_MISSING |
5-field cron expression for automatic runs of missing_popular_albums.py. Empty = disabled. Example: 0 3 * * 0 (Sunday 3 AM). |
|
SCHEDULE_DISCOVER |
5-field cron expression for automatic runs of discover_similar_artists.py. Empty = disabled. |
|
DATA_DIR |
/data |
Directory where the web app looks for reports. Set automatically in Docker. |
SPOTIFY_CLIENT_ID |
Spotify app client ID. Enables Spotify embeds in the viewer. Requires a Spotify Premium account to register a dev app (as of Feb 2026). | |
SPOTIFY_CLIENT_SECRET |
Spotify app client secret. | |
YOUTUBE_API_KEY |
YouTube Data API v3 key. Enables YouTube embeds in the viewer. Must have the Data API v3 enabled in Google Cloud Console (not the IFrame Player API). | |
SLSKD_URL |
Base URL of your SLSKD instance, e.g. https://slskd.yourdomain.com. Enables the SLSKD search button on every card. |
|
SLSKD_API_KEY |
API key for SLSKD (set in appsettings.yml under web.authentication.api_keys). Preferred over username/password. |
|
SLSKD_USER / SLSKD_PASS |
Fallback credentials if not using an API key. |
Both scripts share these flags:
| Flag | Purpose |
|---|---|
--no-cache |
Ignore cached Last.fm data and re-fetch everything. Still writes fresh cache after the run. |
--limit-artists N |
Process only the first N artists alphabetically. Useful for testing without a full run. |
--workers N |
Number of concurrent Last.fm requests (1 to MAX_WORKERS). |
missing_popular_albums.py also accepts:
| Flag | Purpose |
|---|---|
--trace-artist "Name" |
Print Navidrome filesystem paths for every album by that artist, then exit. Requires Navidrome to be configured. |
The Docker container runs a FastAPI app on port 8080. The dashboard shows job status (idle / running / succeeded / failed), lets you trigger runs on demand, and streams live log output to the browser.
The Report Viewer (/) shows both reports as paginated card grids with AJAX navigation (Prev / Next updates the cards without a full page reload, and browser back/forward works). Each card includes:
- Streaming preview — hover the album art to reveal service icons (Apple Music, Spotify, YouTube). Click one to open an embedded player directly inside the card. No credentials needed for Apple Music; Spotify and YouTube require
SPOTIFY_*/YOUTUBE_API_KEYin your.env. - Copy — copies the artist + album title to clipboard.
- Dismiss — hides the card permanently. Dismissed items are stored in
/data/dismissed.jsonand are excluded from future script runs as well.
| Method | Path | Auth | Purpose |
|---|---|---|---|
| GET | /healthz |
None | Docker health check |
| GET | / |
Basic | Combined report viewer (home) |
| GET | /dashboard |
Basic | Script run dashboard |
| GET | /report/{missing|discover} |
Basic | Serve the generated HTML report |
| POST | /run/{missing|discover} |
Basic | Trigger a script run. Returns 409 if already running. |
| GET | /status/{missing|discover} |
Basic | JSON job status snapshot |
| GET | /logs/{missing|discover} |
Basic | SSE live log stream |
| GET | /api/section/{section} |
Basic | AJAX partial — card grid + pager for one section |
| GET | /api/stream-info |
Basic | Look up streaming embed URL for an album |
| POST | /api/slskd-search |
Basic | Queue album search on a running SLSKD instance |
| POST | /dismiss |
Basic | Add item to dismissed list |
| DELETE | /dismiss |
Basic | Remove item from dismissed list |
| GET | /dismissed |
Basic | Return full dismissed list |
- HTTP Basic Auth uses
secrets.compare_digest()— timing-safe against brute-force enumeration. AUTH_PASSdefaults to an empty string, which causes every login to fail until you set it. This is intentional.- HTTPS is handled by Traefik. The app itself speaks plain HTTP on 8080.
- OpenAPI docs are disabled (
/docsand/redocreturn 404). - The container runs as UID 1002 / GID 990 (non-root).
- The NAS music volume is mounted read-only.
- Scripts are launched via
subprocesslist form — no shell interpolation.
Last.fm responses are cached in .cache/ as JSON files. Typical size is 17–36 MB for a large library. In Docker, the cache directory lives at /data/.cache/ and survives container restarts via the volume mount.
--no-cache skips reading the cache but still writes fresh data at the end. Cache version is embedded in each file — a version mismatch causes a full re-fetch on the next run, which also overwrites the old cache.
NAVIDROME_MUSIC_FOLDER not found error on startup
The name you set must match a Navidrome library name exactly (case-insensitive). The error message lists available names. Run --trace-artist to verify the connection is working, or leave NAVIDROME_MUSIC_FOLDER empty to scan all libraries.
Report is empty or has far fewer entries than expected
Check the log file — artists with no Last.fm data log No albums found on Last.fm. If the library scan returned zero artists, verify MUSIC_ROOT exists and contains audio files, or that Navidrome credentials are correct. Run with --limit-artists 5 first to confirm the pipeline works end-to-end.
Cache seems stale after a library change
Run with --no-cache to force fresh Last.fm data. The cache doesn't auto-expire — it only updates when an artist is looked up and the cache misses.
Auth not working in the web app
Confirm AUTH_PASS is set in .env and the container was restarted after the change. An empty AUTH_PASS rejects every login attempt by design.
SSE log stream stops immediately or never connects
The X-Accel-Buffering: no header is set to prevent Traefik and nginx from buffering the stream. If you have an intermediate proxy not honoring this header, disable response buffering in its config. Refreshing the dashboard mid-run replays the buffered log output (up to 2000 lines) before resuming live streaming.
- docs/USERGUIDE.md — detailed setup, configuration reference, viewer features, and troubleshooting
- docs/ROADMAP.md — planned features and known behaviors
- CHANGELOG.md — version history


