Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Agent Instructions

These instructions apply repository-wide.

## Skills system

Canonical AI-facing engineering skills live under `docs/engineering/skills/`.
Use those files as the source of truth across Codex, Claude, Copilot, and other
AI tools.

When adding, moving, or reviewing tests, read
`docs/engineering/skills/testing.md`.

When reviewing changes to public APIs, architecture, documentation, or generated
artifacts, read `docs/engineering/skills/documentation_review.md`.

## GitHub PRs

Read `docs/engineering/skills/github-prs.md` before opening, replacing, or
sharing any pull request.

Before creating or sharing any PR, all developers and agents must:

1. Confirm the target remote is the canonical repository:
`gh repo view PolicyEngine/policyengine-core --json nameWithOwner`.
2. Add a towncrier changelog fragment in `changelog.d/` using the format
documented in `docs/engineering/skills/github-prs.md`.
3. Push the branch to `PolicyEngine/policyengine-core`.
4. Create the PR against `master`.
5. Verify the PR head repository before reporting it.

If you cannot push to the canonical repository, stop and ask for access. Do not
open a fork PR as a fallback unless the user explicitly asks for one.
24 changes: 24 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Claude Instructions

These instructions apply repository-wide.

## Canonical guidance

Repository-wide AI-facing engineering guidance lives in `AGENTS.md`.
Canonical skills live under `docs/engineering/skills/`.

Use those files as the source of truth. This file is a Claude adapter and should
stay thin; do not duplicate detailed testing, CI, formatting, or architecture
rules here.

## Required skill lookup

Before opening, replacing, or sharing a PR, read
`docs/engineering/skills/github-prs.md`. Add the required towncrier changelog
fragment before creating the PR.

When adding, moving, or reviewing tests, read
`docs/engineering/skills/testing.md` before editing.

When reviewing changes to public APIs, architecture, documentation, or generated
artifacts, read `docs/engineering/skills/documentation_review.md`.
1 change: 1 addition & 0 deletions changelog.d/add-ai-engineering-skills.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added provider-neutral AI engineering guidance for tests, documentation review, pull requests, and changelog fragments.
1 change: 1 addition & 0 deletions changelog.d/fix-utf8-byte-enum-encoding.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed enum encoding for HDF5-style UTF-8 byte-string arrays containing non-ASCII enum member names.
17 changes: 17 additions & 0 deletions docs/engineering/skills/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Engineering Skills

This directory is the canonical source for AI-facing engineering rules.

Tool-specific instruction files such as `AGENTS.md`, `CLAUDE.md`, and
`.github/copilot-instructions.md` should point here instead of duplicating
implementation-specific guidance. When a rule changes, update the skill here
first, then keep adapters thin.

Current skills:

- `documentation_review.md`: model-neutral review checklist for public API,
architecture, documentation, and generated artifact changes.
- `github-prs.md`: canonical PR workflow, required changelog fragments, PR head
verification, and title conventions.
- `testing.md`: test placement, fixture scope, command, and environment
expectations.
51 changes: 51 additions & 0 deletions docs/engineering/skills/documentation_review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Documentation Review

Use this skill when reviewing a pull request that changes public APIs,
architecture, documentation, generated artifacts, or developer-facing workflows.

## Review goal

Documentation review is a harness check, not copyediting. Confirm that durable
documentation still describes the code paths a maintainer or AI agent would use
to understand, validate, or modify the system.

Do not put PR-specific confidence, impact, or reviewer notes into durable API
docs. Keep durable facts in source documentation and keep review judgment in the
PR description or review comment.

## Trigger files

Run documentation review when a PR changes any of these paths:

- `policyengine_core/`
- `docs/`
- `.github/`
- `README.md`
- `CONTRIBUTING.md`
- `pyproject.toml`
- generated documentation or package metadata

Also run it when a PR changes public import surfaces, command-line behavior,
test/development workflows, changelog tooling, or release behavior even if the
changed path is not listed above.

## Checks

- Public API changes are reflected in relevant docs or API reference pages.
- Developer workflow changes are reflected in `README.md`, contributing docs, or
AI-facing skills when needed.
- Generated artifacts are not hand-edited unless the repo expects them to be
checked in.
- Changelog fragment guidance still matches the towncrier config in
`pyproject.toml`.
- Review notes distinguish facts proven by code or tests from assumptions.

## Review output

Use a concise PR-facing note. Include:

- Documentation changes observed, or `not required` with a reason.
- Commands run, or commands not run with a reason.
- Impact: `low`, `medium`, or `high`.
- Confidence: `low`, `medium`, or `high`.
- Known gaps that should be handled in this PR or a follow-up.
71 changes: 71 additions & 0 deletions docs/engineering/skills/github-prs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# GitHub PRs

These rules apply to every developer and AI agent opening pull requests in this
repository.

## Repository and branch

Open PRs against `master` in `PolicyEngine/policyengine-core`.

Before creating or sharing a PR:

1. Confirm the canonical repository is reachable:
`gh repo view PolicyEngine/policyengine-core --json nameWithOwner`.
2. Add the required towncrier changelog fragment under `changelog.d/`.
3. Push the current branch to `PolicyEngine/policyengine-core`.
4. Create the PR against `master`.
5. Verify the PR head repository before reporting it:
`gh pr view <PR> --repo PolicyEngine/policyengine-core --json headRepositoryOwner,headRepository`.

If you cannot push to the canonical repository, stop and ask for access. Do not
open a fork PR as a fallback unless the user explicitly asks for one.

## Changelog fragment

A changelog entry is required before opening, replacing, or sharing a PR. When a
user asks an AI agent to open a PR, the agent must check for an appropriate
fragment and add one if it is missing before running `gh pr create`.

Use towncrier fragments in this format:

```text
changelog.d/<short-slug>.<type>.md
```

Allowed `<type>` values are configured in `pyproject.toml`:

- `breaking`
- `added`
- `changed`
- `fixed`
- `removed`

Examples:

```text
changelog.d/fix-enum-utf8-bytes.fixed.md
changelog.d/add-agent-pr-guidance.changed.md
```

Write one concise Markdown sentence in the fragment. Use `fixed` for bug fixes,
`added` for new user-facing capabilities, `changed` for behavior, documentation,
tooling, or refactors, `removed` for removals, and `breaking` only for changes
that intentionally break compatibility. Prefer updating an existing branch
fragment over adding duplicate fragments for the same PR.

Do not run `make changelog` for a PR. The release workflow builds
`CHANGELOG.md` from fragments after merge.

## PR title

Do not add `[codex]`, `[claude]`, `[copilot]`, or other agent labels to PR
titles. Use a plain descriptive title.

## PR body

Keep the description concrete:

- Link the issue the PR fixes.
- Summarize the user-visible behavior change.
- List the tests or checks run.
- Call out any commands that were not run and why.
38 changes: 38 additions & 0 deletions docs/engineering/skills/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Testing Skill

Use this skill whenever adding, moving, or reviewing tests.

## Commands

Use `uv run` for Python commands so the repo environment is selected
consistently.

Common checks:

```bash
uv run pytest tests/core/enums/test_enum.py -v
uv run pytest tests/core/test_file.py::test_name -v
make format
make test
make documentation
```

Run the narrowest test that proves the change while working. Before handing off a
broader behavioral change, run the relevant focused tests and formatting check.

## Placement

- Put core package tests under `tests/core/`.
- Put smoke tests under `tests/smoke/`.
- Keep fixtures under `tests/fixtures/` unless pytest fixture discovery requires
a local `conftest.py`.
- Do not add tests inside `policyengine_core/`.

## Fixture and dependency boundaries

- Keep root `tests/conftest.py` lightweight.
- Avoid network access, cloud credentials, or country package imports in ordinary
unit tests unless the test is explicitly a smoke/integration check.
- Prefer small synthetic fixtures for regression tests.
- When fixing a bug, add a regression test that fails without the fix and passes
with it unless the change is documentation-only.
7 changes: 5 additions & 2 deletions policyengine_core/enums/enum.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,11 @@ def encode(cls, array: Union[EnumArray, np.ndarray]) -> EnumArray:
indices = np.array([item.index for item in array], dtype=ENUM_ARRAY_DTYPE)
return EnumArray(indices, cls)

# Convert byte-strings or object arrays to Unicode strings
if array.dtype.kind == "S" or array.dtype == object:
# Convert fixed-width byte strings, as returned by h5py for string
# datasets, to Unicode before matching enum names.
if array.dtype.kind == "S":
array = np.char.decode(array, "utf-8")
elif array.dtype == object:
array = array.astype(str)

if isinstance(array, np.ndarray) and array.dtype.kind in {"U", "S"}:
Expand Down
24 changes: 24 additions & 0 deletions tests/core/enums/test_enum.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,30 @@ class Sample(Enum):
assert "FOO" in error_message or "BAR" in error_message or "BAZ" in error_message


def test_enum_encode_utf8_byte_string_array():
"""
Test that HDF5-style UTF-8 byte strings encode as enum names.

The ñ mirrors values like DOÑA_ANA_COUNTY_NM. It is a non-ASCII
character, so NumPy's default byte-string conversion cannot decode it
as ASCII.
"""

class Sample(Enum):
DOÑA_ANA = "Doña Ana"
DWORKIN = "dworkin"

byte_string_array = np.array(
["DOÑA_ANA".encode("utf-8"), b"DWORKIN"],
dtype="S10",
)

encoded_array = Sample.encode(byte_string_array)

assert isinstance(encoded_array, EnumArray)
assert list(encoded_array) == [0, 1]


def test_enum_encode_empty_string_raises_error():
"""Test that encoding empty strings raises ValueError."""

Expand Down
Loading