Add (library, library_version) index on latest#74
Merged
Conversation
The PRIMARY KEY on `latest` orders columns as (library, compiler,
library_version, ...), with `compiler` between `library` and
`library_version`. Lookups by (library, library_version) -- the new
/failedbuilds/:lib/:ver endpoint -- can only use the leading `library`
prefix and degrade to a partial scan filtered by library_version. On the
177K-row production DB on EBS gp2, that's ~11s on cold cache for popular
libraries.
Live timing of the existing endpoint:
GET /whathasfailedbefore (PK lookup) 86ms
GET /failedbuilds/boost_bin/1.85.0 (cold) 11.4s
GET /failedbuilds/boost_bin/1.85.0 (warm) 130ms
GET /failedbuilds/fmt/10.0.0 (cold lib) 987ms
EXPLAIN QUERY PLAN with the new index, verified locally:
SEARCH latest USING INDEX idx_latest_lib_ver (library=? AND library_version=?)
This was eating most of the speedup the infra-side bulk-fetch change was
meant to deliver (compiler-explorer/infra#2100): a single 11s cold fetch
roughly cancels out the ~80s saved across 1740 cached per-iteration
lookups in a boost_bin run.
There was a problem hiding this comment.
Pull request overview
Adds a targeted SQLite index to speed up (library, library_version) lookups against the latest table, addressing slow cold-cache performance for the /failedbuilds/:lib/:ver endpoint caused by the current composite primary key column order.
Changes:
- Add a new migration that creates an index on
latest(library, library_version)(idempotent viaIF NOT EXISTS). - Add a corresponding down migration to drop the index (also guarded by
IF EXISTS).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mattgodbolt
added a commit
that referenced
this pull request
May 6, 2026
…75) #72 added /failedbuilds and #74 added a (library, library_version) index. The index made finding rows fast, but SQLite still has to read each matched row to extract (compiler, compiler_version, arch, libcxx, compiler_flags, commithash). The `latest` table has a `logging` column holding tens-of-KB build-log text per row, so reading 100+ rows means ~10MB of mostly-discarded payload. On the production DB (17GB on EBS gp2) the cold-cache cost is 8-15s per fetch -- which dominates the library-builder hot path on a multi-version library run. A covering index containing all SELECT columns turns the query index-only -- SQLite never touches the table. Index size ~26MB on the production data; fits in OS page cache, immune to gunicorn's I/O churn against the rest of the conan_server tree. The previous narrower index is dropped because the new one's leading prefix (library, library_version) serves the same lookups. Verified locally: EXPLAIN QUERY PLAN -> SEARCH latest USING COVERING INDEX idx_latest_failedbuilds (library=? AND library_version=? AND success=?) The PK-based hasFailedBefore lookup is unchanged (still uses the auto-generated PK index). Refs compiler-explorer/infra#1342.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
/failedbuilds/:lib/:verendpoint added in #72 takes ~11s on cold cache for popular libraries on the production DB, despite returning only a few KB of JSON. Cause: thelatesttable's primary key is(library, compiler, library_version, compiler_version, arch, libcxx, compiler_flags)-- withcompilerbetweenlibraryandlibrary_version. Lookups by(library, library_version)can only use the leadinglibraryprefix, so SQLite scans every row for that library and filters bylibrary_version. On a 177K-row DB on EBS gp2 that's seconds.Live timing (current production)
Verification (local)
Applied the migration to a fresh DB, populated rows, ran
EXPLAIN QUERY PLAN:vs. (without index):
Existing
hasFailedBeforePK lookup unchanged (still uses the autoindex).Why this matters
This is the missing piece for compiler-explorer/infra#2100 -- the bulk-fetch was supposed to drop ~80s off a boost_bin × popular-compilers-only run, but on the live system the saved per-iteration time is approximately cancelled out by the single 11s cold fetch. Adding the index should bring cold-cache fetches into the milliseconds range.
Deployment
The migration framework (
sqlite/migrations) tracks applied migrations in amigrationstable, so this only runs once.CREATE INDEX IF NOT EXISTSis idempotent defence in depth.After deploy, re-time:
Expect: well under 1s on cold cache.