Skip to content

Always build policy_data.db from source#700

Open
baogorek wants to merge 1 commit intomainfrom
fix/build-database-from-source
Open

Always build policy_data.db from source#700
baogorek wants to merge 1 commit intomainfrom
fix/build-database-from-source

Conversation

@baogorek
Copy link
Copy Markdown
Collaborator

@baogorek baogorek commented Apr 7, 2026

Summary

  • Remove policy_data.db download from HuggingFace in download_private_prerequisites.py — the DB is now always built from source via make database
  • Add database as a dependency of make data so every data build produces a fresh DB matching the current ETL code
  • Add make database step to Modal data_build.py (and install make in the Modal image)
  • Remove upload-database and promote-database Makefile targets since HF is no longer the DB source of truth
  • Remove database entry from download_calibration_inputs() in huggingface.py
  • Update DATABASE_GUIDE.md to reflect the new provenance

Test plan

  • ruff format --check and ruff check pass on all modified Python files
  • No remaining policy_data.db download references from HF (verified via grep)
  • All 546 unit tests pass

🤖 Generated with Claude Code

The HF copy can go stale if ETL scripts change but nobody re-uploads.
Building from source via `make database` guarantees the DB matches the
current code. Removes promote-database and upload-database targets since
HF is no longer the DB source of truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant