PolicyEngine · anth-volk · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,33 @@
+# Agent Instructions
+
+These instructions apply repository-wide.
+
+## Skills system
+
+Canonical AI-facing engineering skills live under `docs/engineering/skills/`.
+Use those files as the source of truth across Codex, Claude, Copilot, and other
+AI tools.
+
+When adding, moving, or reviewing tests, read
+`docs/engineering/skills/testing.md`.
+
+When reviewing changes to public APIs, architecture, documentation, or generated
+artifacts, read `docs/engineering/skills/documentation_review.md`.
+
+## GitHub PRs
+
+Read `docs/engineering/skills/github-prs.md` before opening, replacing, or
+sharing any pull request.
+
+Before creating or sharing any PR, all developers and agents must:
+
+1. Confirm the target remote is the canonical repository:
+   `gh repo view PolicyEngine/policyengine-core --json nameWithOwner`.
+2. Add a towncrier changelog fragment in `changelog.d/` using the format
+   documented in `docs/engineering/skills/github-prs.md`.
+3. Push the branch to `PolicyEngine/policyengine-core`.
+4. Create the PR against `master`.
+5. Verify the PR head repository before reporting it.
+
+If you cannot push to the canonical repository, stop and ask for access. Do not
+open a fork PR as a fallback unless the user explicitly asks for one.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,24 @@
+# Claude Instructions
+
+These instructions apply repository-wide.
+
+## Canonical guidance
+
+Repository-wide AI-facing engineering guidance lives in `AGENTS.md`.
+Canonical skills live under `docs/engineering/skills/`.
+
+Use those files as the source of truth. This file is a Claude adapter and should
+stay thin; do not duplicate detailed testing, CI, formatting, or architecture
+rules here.
+
+## Required skill lookup
+
+Before opening, replacing, or sharing a PR, read
+`docs/engineering/skills/github-prs.md`. Add the required towncrier changelog
+fragment before creating the PR.
+
+When adding, moving, or reviewing tests, read
+`docs/engineering/skills/testing.md` before editing.
+
+When reviewing changes to public APIs, architecture, documentation, or generated
+artifacts, read `docs/engineering/skills/documentation_review.md`.
diff --git a/changelog.d/add-ai-engineering-skills.changed.md b/changelog.d/add-ai-engineering-skills.changed.md
@@ -0,0 +1 @@
+Added provider-neutral AI engineering guidance for tests, documentation review, pull requests, and changelog fragments.
diff --git a/changelog.d/fix-utf8-byte-enum-encoding.fixed.md b/changelog.d/fix-utf8-byte-enum-encoding.fixed.md
@@ -0,0 +1 @@
+Fixed enum encoding for HDF5-style UTF-8 byte-string arrays containing non-ASCII enum member names.
diff --git a/docs/engineering/skills/README.md b/docs/engineering/skills/README.md
@@ -0,0 +1,17 @@
+# Engineering Skills
+
+This directory is the canonical source for AI-facing engineering rules.
+
+Tool-specific instruction files such as `AGENTS.md`, `CLAUDE.md`, and
+`.github/copilot-instructions.md` should point here instead of duplicating
+implementation-specific guidance. When a rule changes, update the skill here
+first, then keep adapters thin.
+
+Current skills:
+
+- `documentation_review.md`: model-neutral review checklist for public API,
+  architecture, documentation, and generated artifact changes.
+- `github-prs.md`: canonical PR workflow, required changelog fragments, PR head
+  verification, and title conventions.
+- `testing.md`: test placement, fixture scope, command, and environment
+  expectations.
diff --git a/docs/engineering/skills/documentation_review.md b/docs/engineering/skills/documentation_review.md
@@ -0,0 +1,51 @@
+# Documentation Review
+
+Use this skill when reviewing a pull request that changes public APIs,
+architecture, documentation, generated artifacts, or developer-facing workflows.
+
+## Review goal
+
+Documentation review is a harness check, not copyediting. Confirm that durable
+documentation still describes the code paths a maintainer or AI agent would use
+to understand, validate, or modify the system.
+
+Do not put PR-specific confidence, impact, or reviewer notes into durable API
+docs. Keep durable facts in source documentation and keep review judgment in the
+PR description or review comment.
+
+## Trigger files
+
+Run documentation review when a PR changes any of these paths:
+
+- `policyengine_core/`
+- `docs/`
+- `.github/`
+- `README.md`
+- `CONTRIBUTING.md`
+- `pyproject.toml`
+- generated documentation or package metadata
+
+Also run it when a PR changes public import surfaces, command-line behavior,
+test/development workflows, changelog tooling, or release behavior even if the
+changed path is not listed above.
+
+## Checks
+
+- Public API changes are reflected in relevant docs or API reference pages.
+- Developer workflow changes are reflected in `README.md`, contributing docs, or
+  AI-facing skills when needed.
+- Generated artifacts are not hand-edited unless the repo expects them to be
+  checked in.
+- Changelog fragment guidance still matches the towncrier config in
+  `pyproject.toml`.
+- Review notes distinguish facts proven by code or tests from assumptions.
+
+## Review output
+
+Use a concise PR-facing note. Include:
+
+- Documentation changes observed, or `not required` with a reason.
+- Commands run, or commands not run with a reason.
+- Impact: `low`, `medium`, or `high`.
+- Confidence: `low`, `medium`, or `high`.
+- Known gaps that should be handled in this PR or a follow-up.
diff --git a/docs/engineering/skills/github-prs.md b/docs/engineering/skills/github-prs.md
@@ -0,0 +1,71 @@
+# GitHub PRs
+
+These rules apply to every developer and AI agent opening pull requests in this
+repository.
+
+## Repository and branch
+
+Open PRs against `master` in `PolicyEngine/policyengine-core`.
+
+Before creating or sharing a PR:
+
+1. Confirm the canonical repository is reachable:
+   `gh repo view PolicyEngine/policyengine-core --json nameWithOwner`.
+2. Add the required towncrier changelog fragment under `changelog.d/`.
+3. Push the current branch to `PolicyEngine/policyengine-core`.
+4. Create the PR against `master`.
+5. Verify the PR head repository before reporting it:
+   `gh pr view <PR> --repo PolicyEngine/policyengine-core --json headRepositoryOwner,headRepository`.
+
+If you cannot push to the canonical repository, stop and ask for access. Do not
+open a fork PR as a fallback unless the user explicitly asks for one.
+
+## Changelog fragment
+
+A changelog entry is required before opening, replacing, or sharing a PR. When a
+user asks an AI agent to open a PR, the agent must check for an appropriate
+fragment and add one if it is missing before running `gh pr create`.
+
+Use towncrier fragments in this format:
+
+```text
+changelog.d/<short-slug>.<type>.md
+```
+
+Allowed `<type>` values are configured in `pyproject.toml`:
+
+- `breaking`
+- `added`
+- `changed`
+- `fixed`
+- `removed`
+
+Examples:
+
+```text
+changelog.d/fix-enum-utf8-bytes.fixed.md
+changelog.d/add-agent-pr-guidance.changed.md
+```
+
+Write one concise Markdown sentence in the fragment. Use `fixed` for bug fixes,
+`added` for new user-facing capabilities, `changed` for behavior, documentation,
+tooling, or refactors, `removed` for removals, and `breaking` only for changes
+that intentionally break compatibility. Prefer updating an existing branch
+fragment over adding duplicate fragments for the same PR.
+
+Do not run `make changelog` for a PR. The release workflow builds
+`CHANGELOG.md` from fragments after merge.
+
+## PR title
+
+Do not add `[codex]`, `[claude]`, `[copilot]`, or other agent labels to PR
+titles. Use a plain descriptive title.
+
+## PR body
+
+Keep the description concrete:
+
+- Link the issue the PR fixes.
+- Summarize the user-visible behavior change.
+- List the tests or checks run.
+- Call out any commands that were not run and why.
diff --git a/docs/engineering/skills/testing.md b/docs/engineering/skills/testing.md
@@ -0,0 +1,38 @@
+# Testing Skill
+
+Use this skill whenever adding, moving, or reviewing tests.
+
+## Commands
+
+Use `uv run` for Python commands so the repo environment is selected
+consistently.
+
+Common checks:
+
+```bash
+uv run pytest tests/core/enums/test_enum.py -v
+uv run pytest tests/core/test_file.py::test_name -v
+make format
+make test
+make documentation
+```
+
+Run the narrowest test that proves the change while working. Before handing off a
+broader behavioral change, run the relevant focused tests and formatting check.
+
+## Placement
+
+- Put core package tests under `tests/core/`.
+- Put smoke tests under `tests/smoke/`.
+- Keep fixtures under `tests/fixtures/` unless pytest fixture discovery requires
+  a local `conftest.py`.
+- Do not add tests inside `policyengine_core/`.
+
+## Fixture and dependency boundaries
+
+- Keep root `tests/conftest.py` lightweight.
+- Avoid network access, cloud credentials, or country package imports in ordinary
+  unit tests unless the test is explicitly a smoke/integration check.
+- Prefer small synthetic fixtures for regression tests.
+- When fixing a bug, add a regression test that fails without the fix and passes
+  with it unless the change is documentation-only.
diff --git a/policyengine_core/enums/enum.py b/policyengine_core/enums/enum.py
@@ -77,8 +77,11 @@ def encode(cls, array: Union[EnumArray, np.ndarray]) -> EnumArray:
             indices = np.array([item.index for item in array], dtype=ENUM_ARRAY_DTYPE)
             return EnumArray(indices, cls)
 
-        # Convert byte-strings or object arrays to Unicode strings
-        if array.dtype.kind == "S" or array.dtype == object:
+        # Convert fixed-width byte strings, as returned by h5py for string
+        # datasets, to Unicode before matching enum names.
+        if array.dtype.kind == "S":
+            array = np.char.decode(array, "utf-8")
+        elif array.dtype == object:
             array = array.astype(str)
 
         if isinstance(array, np.ndarray) and array.dtype.kind in {"U", "S"}:

diff --git a/tests/core/enums/test_enum.py b/tests/core/enums/test_enum.py
@@ -74,6 +74,30 @@ class Sample(Enum):
     assert "FOO" in error_message or "BAR" in error_message or "BAZ" in error_message
 
 
+def test_enum_encode_utf8_byte_string_array():
+    """
+    Test that HDF5-style UTF-8 byte strings encode as enum names.
+
+    The ñ mirrors values like DOÑA_ANA_COUNTY_NM. It is a non-ASCII
+    character, so NumPy's default byte-string conversion cannot decode it
+    as ASCII.
+    """
+
+    class Sample(Enum):
+        DOÑA_ANA = "Doña Ana"
+        DWORKIN = "dworkin"
+
+    byte_string_array = np.array(
+        ["DOÑA_ANA".encode("utf-8"), b"DWORKIN"],
+        dtype="S10",
+    )
+
+    encoded_array = Sample.encode(byte_string_array)
+
+    assert isinstance(encoded_array, EnumArray)
+    assert list(encoded_array) == [0, 1]
+
+
 def test_enum_encode_empty_string_raises_error():
     """Test that encoding empty strings raises ValueError."""
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Added provider-neutral AI engineering guidance for tests, documentation review, pull requests, and changelog fragments.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Fixed enum encoding for HDF5-style UTF-8 byte-string arrays containing non-ASCII enum member names.