cosilico-validators

External validation framework for Cosilico DSL encodings.

Tests vs Validation

This tool is for external validation, not testing.

Aspect	Tests (in cosilico-us)	Validation (this repo)
Purpose	Verify encoding matches statute	Compare against external tools
Authority	Authoritative—our truth	Informational—tools may have bugs
Approach	TDD—test-first development	Audit—report consistency
Output	Pass/Fail	Comparison report with disagreements
Location	`cosilico-us/26/32/tests/`	`cosilico-validators`

Example validation output:

EITC Validation Report vs TAXSIM-35 (TY 2023)
═════════════════════════════════════════════
Agreement: 12/13 (92%)

Disagreements:
┌─────────────────────┬──────────┬────────┬─────────────────────────────┐
│ Case                │ Cosilico │ TAXSIM │ Explanation                 │
├─────────────────────┼──────────┼────────┼─────────────────────────────┤
│ Childless, age 23   │ $0       │ $600   │ TAXSIM bug: ignores age req │
│                     │          │        │ See: 26 USC § 32(c)(1)(A)   │
│                     │          │        │ Issue: PE/taxsim#662        │
└─────────────────────┴──────────┴────────┴─────────────────────────────┘

Overview

cosilico-validators compares Cosilico calculations against external systems (TAXSIM, PolicyEngine, TaxAct) to generate validation reports. These reports document both agreements and disagreements—with statute citations explaining where we believe external tools are incorrect.

┌─────────────────────────────────────────────────────────────────────┐
│                     Cosilico DSL Test Cases                          │
│                    (with expected values from statute)               │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Consensus Engine                              │
├──────────────────┬──────────────────┬──────────────────────────────┤
│    PRIMARY       │    REFERENCE     │        SUPPLEMENTARY         │
│    TaxAct        │   PolicyEngine   │      PSL Tax-Calculator      │
│  (ground truth)  │     TAXSIM       │      Atlanta Fed PRD         │
└──────────────────┴──────────────────┴──────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Outputs                                       │
│  • Reward signal (-1.0 to +1.0) for training                        │
│  • Consensus level (FULL_AGREEMENT, PRIMARY_CONFIRMED, etc.)        │
│  • Confidence score (0.0 to 1.0)                                    │
│  • Potential upstream bugs (auto-filed to GitHub)                   │
└─────────────────────────────────────────────────────────────────────┘

Installation

# Basic installation (TAXSIM only)
pip install cosilico-validators

# With PolicyEngine support
pip install cosilico-validators[policyengine]

# All validators
pip install cosilico-validators[all]

# Development
pip install cosilico-validators[dev]

Quick Start

Python API

from cosilico_validators import ConsensusEngine, TestCase
from cosilico_validators.validators.policyengine import PolicyEngineValidator
from cosilico_validators.validators.taxsim import TaxsimValidator

# Create validators
validators = [
    PolicyEngineValidator(),
    TaxsimValidator(),
]

# Create consensus engine
engine = ConsensusEngine(validators, tolerance=15.0)

# Define test case
test_case = TestCase(
    name="EITC single no children",
    inputs={
        "earned_income": 15000,
        "filing_status": "SINGLE",
        "eitc_qualifying_children_count": 0,
    },
    expected={"eitc": 600},
    citation="26 USC § 32",
)

# Validate with Claude's confidence
result = engine.validate(
    test_case,
    variable="eitc",
    year=2024,
    claude_confidence=0.95,
)

print(f"Consensus: {result.consensus_level.value}")
print(f"Reward signal: {result.reward_signal:+.2f}")
print(f"Matches expected: {result.matches_expected}")

# Check for potential upstream bugs
if result.potential_bugs:
    for bug in result.potential_bugs:
        print(f"Potential bug in {bug['validator']}: "
              f"expected ${bug['expected']}, got ${bug['actual']}")

CLI

# Validate test cases
cosilico-validators validate tests.yaml --variable eitc --year 2024

# With Claude confidence for bug detection
cosilico-validators validate tests.yaml -v eitc --claude-confidence 0.95

# Save results to JSON
cosilico-validators validate tests.yaml -v eitc -o results.json

# List available validators
cosilico-validators validators

# File issues for potential bugs (dry run)
cosilico-validators file-issues results.json --dry-run

# Actually file issues
export GITHUB_TOKEN=your_token
cosilico-validators file-issues results.json --repo PolicyEngine/policyengine-us

Consensus Levels

Level	Description	Reward Bonus
`FULL_AGREEMENT`	All validators agree within tolerance	+0.5
`PRIMARY_CONFIRMED`	Primary (TaxAct) + majority agree	+0.4
`MAJORITY_AGREEMENT`	>50% of validators agree	+0.2
`DISAGREEMENT`	No consensus reached	-0.2
`POTENTIAL_UPSTREAM_BUG`	Claude confident, validators disagree	+0.1

Reward Signal

The reward signal ranges from -1.0 to +1.0:

reward = consensus_bonus + match_bonus
# consensus_bonus: Based on consensus level (-0.2 to +0.5)
# match_bonus: Weighted by validator type (PRIMARY = 2x weight)

Higher rewards indicate:

More validators agree with expected value
Primary validator confirms
Higher consensus level

Upstream Bug Detection

When Claude is highly confident (>90%) but validators disagree with the expected value:

The system flags a POTENTIAL_UPSTREAM_BUG
Details are captured with citation and inputs
Issues can be auto-filed to GitHub

from cosilico_validators.upstream import GitHubIssueManager

manager = GitHubIssueManager(token="github_token")
results = manager.file_all_bugs(
    validation_result.potential_bugs,
    dry_run=False,
    confidence_threshold=0.9,
)

Supported Variables

TAXSIM

federal_income_tax, state_income_tax
eitc, ctc, actc, cdctc
agi, taxable_income, amt
fica, state_eitc

PolicyEngine

eitc, ctc, income_tax
snap, medicaid, tanf
And all 2000+ variables in policyengine-us

Test Case Format

# tests.yaml
test_cases:
  - name: "EITC single filer, one child"
    inputs:
      earned_income: 25000
      filing_status: SINGLE
      eitc_qualifying_children_count: 1
    expected:
      eitc: 3995
    citation: "26 USC § 32(b)(1)(A)"
    notes: "Phase-in complete, before plateau ends"

  - name: "CTC married filing jointly, two children"
    inputs:
      earned_income: 150000
      filing_status: JOINT
      num_children: 2
    expected:
      ctc: 4000
    citation: "26 USC § 24"

Development

# Clone the repo
git clone https://github.com/CosilicoAI/cosilico-validators.git
cd cosilico-validators

# Install with dev dependencies
pip install -e ".[dev,policyengine]"

# Run tests
pytest

# Type check
mypy src/

Architecture

cosilico-validators/
├── src/cosilico_validators/
│   ├── validators/           # Validator implementations
│   │   ├── base.py          # BaseValidator, TestCase, ValidatorResult
│   │   ├── policyengine.py  # PolicyEngine US integration
│   │   ├── taxsim.py        # NBER TAXSIM web service
│   │   └── taxact.py        # TaxAct (manual/primary)
│   ├── consensus/
│   │   └── engine.py        # Multi-system consensus & reward
│   ├── upstream/
│   │   └── github.py        # GitHub issue filing
│   └── cli.py               # Command-line interface
└── tests/

Related Projects

cosilico-lawarchive - Statute encoding pipeline
policyengine-us - US tax-benefit microsimulation
TAXSIM - NBER tax calculator
PSL Tax-Calculator - Policy Simulation Library

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
baselines		baselines
examples		examples
resources/taxsim		resources/taxsim
src/cosilico_validators		src/cosilico_validators
tests		tests
.gitignore		.gitignore
README.md		README.md
TAX_VALIDATION_PROMPT.md		TAX_VALIDATION_PROMPT.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
validation-results.json		validation-results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cosilico-validators

Tests vs Validation

Overview

Installation

Quick Start

Python API

CLI

Consensus Levels

Reward Signal

Upstream Bug Detection

Supported Variables

TAXSIM

PolicyEngine

Test Case Format

Development

Architecture

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cosilico-validators

Tests vs Validation

Overview

Installation

Quick Start

Python API

CLI

Consensus Levels

Reward Signal

Upstream Bug Detection

Supported Variables

TAXSIM

PolicyEngine

Test Case Format

Development

Architecture

Related Projects

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages