A mini-benchmark comparing Large Language Model estimates to PolicyEngine calculations of US tax/benefit programs.
- Python 3.9+
policyengine-usfor ground truthedslfor LLM queries- (Optional)
pytest, etc. for tests
- Clone the repo:
git clone https://github.com/YOUR_USERNAME/policybench.git cd policybench