diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..f8efd06b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,33 @@ +# Agent Instructions + +These instructions apply repository-wide. + +## Skills system + +Canonical AI-facing engineering skills live under `docs/engineering/skills/`. +Use those files as the source of truth across Codex, Claude, Copilot, and other +AI tools. + +When adding, moving, or reviewing tests, read +`docs/engineering/skills/testing.md`. + +When reviewing changes to public APIs, architecture, documentation, or generated +artifacts, read `docs/engineering/skills/documentation_review.md`. + +## GitHub PRs + +Read `docs/engineering/skills/github-prs.md` before opening, replacing, or +sharing any pull request. + +Before creating or sharing any PR, all developers and agents must: + +1. Confirm the target remote is the canonical repository: + `gh repo view PolicyEngine/policyengine-core --json nameWithOwner`. +2. Add a towncrier changelog fragment in `changelog.d/` using the format + documented in `docs/engineering/skills/github-prs.md`. +3. Push the branch to `PolicyEngine/policyengine-core`. +4. Create the PR against `master`. +5. Verify the PR head repository before reporting it. + +If you cannot push to the canonical repository, stop and ask for access. Do not +open a fork PR as a fallback unless the user explicitly asks for one. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..2194fc1c --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,24 @@ +# Claude Instructions + +These instructions apply repository-wide. + +## Canonical guidance + +Repository-wide AI-facing engineering guidance lives in `AGENTS.md`. +Canonical skills live under `docs/engineering/skills/`. + +Use those files as the source of truth. This file is a Claude adapter and should +stay thin; do not duplicate detailed testing, CI, formatting, or architecture +rules here. + +## Required skill lookup + +Before opening, replacing, or sharing a PR, read +`docs/engineering/skills/github-prs.md`. Add the required towncrier changelog +fragment before creating the PR. + +When adding, moving, or reviewing tests, read +`docs/engineering/skills/testing.md` before editing. + +When reviewing changes to public APIs, architecture, documentation, or generated +artifacts, read `docs/engineering/skills/documentation_review.md`. diff --git a/changelog.d/add-ai-engineering-skills.changed.md b/changelog.d/add-ai-engineering-skills.changed.md new file mode 100644 index 00000000..36a337e1 --- /dev/null +++ b/changelog.d/add-ai-engineering-skills.changed.md @@ -0,0 +1 @@ +Added provider-neutral AI engineering guidance for tests, documentation review, pull requests, and changelog fragments. diff --git a/changelog.d/fix-utf8-byte-enum-encoding.fixed.md b/changelog.d/fix-utf8-byte-enum-encoding.fixed.md new file mode 100644 index 00000000..c7cfa0f2 --- /dev/null +++ b/changelog.d/fix-utf8-byte-enum-encoding.fixed.md @@ -0,0 +1 @@ +Fixed enum encoding for HDF5-style UTF-8 byte-string arrays containing non-ASCII enum member names. diff --git a/docs/engineering/skills/README.md b/docs/engineering/skills/README.md new file mode 100644 index 00000000..5f68e80c --- /dev/null +++ b/docs/engineering/skills/README.md @@ -0,0 +1,17 @@ +# Engineering Skills + +This directory is the canonical source for AI-facing engineering rules. + +Tool-specific instruction files such as `AGENTS.md`, `CLAUDE.md`, and +`.github/copilot-instructions.md` should point here instead of duplicating +implementation-specific guidance. When a rule changes, update the skill here +first, then keep adapters thin. + +Current skills: + +- `documentation_review.md`: model-neutral review checklist for public API, + architecture, documentation, and generated artifact changes. +- `github-prs.md`: canonical PR workflow, required changelog fragments, PR head + verification, and title conventions. +- `testing.md`: test placement, fixture scope, command, and environment + expectations. diff --git a/docs/engineering/skills/documentation_review.md b/docs/engineering/skills/documentation_review.md new file mode 100644 index 00000000..8d084da0 --- /dev/null +++ b/docs/engineering/skills/documentation_review.md @@ -0,0 +1,51 @@ +# Documentation Review + +Use this skill when reviewing a pull request that changes public APIs, +architecture, documentation, generated artifacts, or developer-facing workflows. + +## Review goal + +Documentation review is a harness check, not copyediting. Confirm that durable +documentation still describes the code paths a maintainer or AI agent would use +to understand, validate, or modify the system. + +Do not put PR-specific confidence, impact, or reviewer notes into durable API +docs. Keep durable facts in source documentation and keep review judgment in the +PR description or review comment. + +## Trigger files + +Run documentation review when a PR changes any of these paths: + +- `policyengine_core/` +- `docs/` +- `.github/` +- `README.md` +- `CONTRIBUTING.md` +- `pyproject.toml` +- generated documentation or package metadata + +Also run it when a PR changes public import surfaces, command-line behavior, +test/development workflows, changelog tooling, or release behavior even if the +changed path is not listed above. + +## Checks + +- Public API changes are reflected in relevant docs or API reference pages. +- Developer workflow changes are reflected in `README.md`, contributing docs, or + AI-facing skills when needed. +- Generated artifacts are not hand-edited unless the repo expects them to be + checked in. +- Changelog fragment guidance still matches the towncrier config in + `pyproject.toml`. +- Review notes distinguish facts proven by code or tests from assumptions. + +## Review output + +Use a concise PR-facing note. Include: + +- Documentation changes observed, or `not required` with a reason. +- Commands run, or commands not run with a reason. +- Impact: `low`, `medium`, or `high`. +- Confidence: `low`, `medium`, or `high`. +- Known gaps that should be handled in this PR or a follow-up. diff --git a/docs/engineering/skills/github-prs.md b/docs/engineering/skills/github-prs.md new file mode 100644 index 00000000..88673c82 --- /dev/null +++ b/docs/engineering/skills/github-prs.md @@ -0,0 +1,71 @@ +# GitHub PRs + +These rules apply to every developer and AI agent opening pull requests in this +repository. + +## Repository and branch + +Open PRs against `master` in `PolicyEngine/policyengine-core`. + +Before creating or sharing a PR: + +1. Confirm the canonical repository is reachable: + `gh repo view PolicyEngine/policyengine-core --json nameWithOwner`. +2. Add the required towncrier changelog fragment under `changelog.d/`. +3. Push the current branch to `PolicyEngine/policyengine-core`. +4. Create the PR against `master`. +5. Verify the PR head repository before reporting it: + `gh pr view --repo PolicyEngine/policyengine-core --json headRepositoryOwner,headRepository`. + +If you cannot push to the canonical repository, stop and ask for access. Do not +open a fork PR as a fallback unless the user explicitly asks for one. + +## Changelog fragment + +A changelog entry is required before opening, replacing, or sharing a PR. When a +user asks an AI agent to open a PR, the agent must check for an appropriate +fragment and add one if it is missing before running `gh pr create`. + +Use towncrier fragments in this format: + +```text +changelog.d/..md +``` + +Allowed `` values are configured in `pyproject.toml`: + +- `breaking` +- `added` +- `changed` +- `fixed` +- `removed` + +Examples: + +```text +changelog.d/fix-enum-utf8-bytes.fixed.md +changelog.d/add-agent-pr-guidance.changed.md +``` + +Write one concise Markdown sentence in the fragment. Use `fixed` for bug fixes, +`added` for new user-facing capabilities, `changed` for behavior, documentation, +tooling, or refactors, `removed` for removals, and `breaking` only for changes +that intentionally break compatibility. Prefer updating an existing branch +fragment over adding duplicate fragments for the same PR. + +Do not run `make changelog` for a PR. The release workflow builds +`CHANGELOG.md` from fragments after merge. + +## PR title + +Do not add `[codex]`, `[claude]`, `[copilot]`, or other agent labels to PR +titles. Use a plain descriptive title. + +## PR body + +Keep the description concrete: + +- Link the issue the PR fixes. +- Summarize the user-visible behavior change. +- List the tests or checks run. +- Call out any commands that were not run and why. diff --git a/docs/engineering/skills/testing.md b/docs/engineering/skills/testing.md new file mode 100644 index 00000000..86e21448 --- /dev/null +++ b/docs/engineering/skills/testing.md @@ -0,0 +1,38 @@ +# Testing Skill + +Use this skill whenever adding, moving, or reviewing tests. + +## Commands + +Use `uv run` for Python commands so the repo environment is selected +consistently. + +Common checks: + +```bash +uv run pytest tests/core/enums/test_enum.py -v +uv run pytest tests/core/test_file.py::test_name -v +make format +make test +make documentation +``` + +Run the narrowest test that proves the change while working. Before handing off a +broader behavioral change, run the relevant focused tests and formatting check. + +## Placement + +- Put core package tests under `tests/core/`. +- Put smoke tests under `tests/smoke/`. +- Keep fixtures under `tests/fixtures/` unless pytest fixture discovery requires + a local `conftest.py`. +- Do not add tests inside `policyengine_core/`. + +## Fixture and dependency boundaries + +- Keep root `tests/conftest.py` lightweight. +- Avoid network access, cloud credentials, or country package imports in ordinary + unit tests unless the test is explicitly a smoke/integration check. +- Prefer small synthetic fixtures for regression tests. +- When fixing a bug, add a regression test that fails without the fix and passes + with it unless the change is documentation-only. diff --git a/policyengine_core/enums/enum.py b/policyengine_core/enums/enum.py index 0522cdaf..8d91e691 100644 --- a/policyengine_core/enums/enum.py +++ b/policyengine_core/enums/enum.py @@ -77,8 +77,11 @@ def encode(cls, array: Union[EnumArray, np.ndarray]) -> EnumArray: indices = np.array([item.index for item in array], dtype=ENUM_ARRAY_DTYPE) return EnumArray(indices, cls) - # Convert byte-strings or object arrays to Unicode strings - if array.dtype.kind == "S" or array.dtype == object: + # Convert fixed-width byte strings, as returned by h5py for string + # datasets, to Unicode before matching enum names. + if array.dtype.kind == "S": + array = np.char.decode(array, "utf-8") + elif array.dtype == object: array = array.astype(str) if isinstance(array, np.ndarray) and array.dtype.kind in {"U", "S"}: diff --git a/tests/core/enums/test_enum.py b/tests/core/enums/test_enum.py index 19983f83..1618a9f2 100644 --- a/tests/core/enums/test_enum.py +++ b/tests/core/enums/test_enum.py @@ -74,6 +74,30 @@ class Sample(Enum): assert "FOO" in error_message or "BAR" in error_message or "BAZ" in error_message +def test_enum_encode_utf8_byte_string_array(): + """ + Test that HDF5-style UTF-8 byte strings encode as enum names. + + The ñ mirrors values like DOÑA_ANA_COUNTY_NM. It is a non-ASCII + character, so NumPy's default byte-string conversion cannot decode it + as ASCII. + """ + + class Sample(Enum): + DOÑA_ANA = "Doña Ana" + DWORKIN = "dworkin" + + byte_string_array = np.array( + ["DOÑA_ANA".encode("utf-8"), b"DWORKIN"], + dtype="S10", + ) + + encoded_array = Sample.encode(byte_string_array) + + assert isinstance(encoded_array, EnumArray) + assert list(encoded_array) == [0, 1] + + def test_enum_encode_empty_string_raises_error(): """Test that encoding empty strings raises ValueError."""