Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
285 changes: 285 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
# CLAUDE.md — XDM Contributor Guide for AI Coding Assistants

This file supplements [CONTRIBUTING.md](CONTRIBUTING.md) and [README.md](README.md) with the operational knowledge that repeatedly trips up both first-time contributors and AI coding assistants (Claude Code, Copilot, Cursor, etc.) when opening pull requests against `adobe/xdm`.

**Read [CONTRIBUTING.md](CONTRIBUTING.md) first.** This file does not replace it — it captures the reviewer feedback patterns, pre-submit checklist, and schema-editing gotchas that live outside the written spec and are learned by doing. If CONTRIBUTING.md and this file disagree, CONTRIBUTING.md wins.

---

## Audience

- **AI coding assistants** working on an XDM contribution on a contributor's behalf.
- **Human contributors** who want the same pre-flight checklist reviewers apply.

Both audiences should treat the [Non-negotiables](#non-negotiables) section as hard requirements: PRs that miss any of them are routinely sent back.

---

## Quick start

```bash
git clone git@github.com:<your-fork>/xdm.git
cd xdm
npm install
npm test # mocha — validates every *.example.*.json against its schema
npm run lint # prettier --write && git diff --exit-code
npm run validate # node bin/validate-schemas.js
npm run incompatibility-check # guards against backward-incompatible schema changes
```

All four commands must pass locally before you open a PR. The first three run in CI ([.circleci/config.yml](.circleci/config.yml)).

---

## Non-negotiables

These are the reviewer feedback themes that come up the most. Every PR must satisfy all of them.

1. **Link a GitHub issue.** Every PR body must reference an open issue on `adobe/xdm`. Use `Closes #<n>` so merging the PR auto-closes the issue. Open a new issue first if none fits. (Per [CONTRIBUTING.md](CONTRIBUTING.md) "How to Contribute".)
2. **Reuse before you invent.** Before adding a new class, mixin/fieldgroup, or datatype, search the repo for something that already fits and extend it instead. See [Reuse-over-reinvent](#reuse-over-reinvent).
3. **Every `enum` value is documented in `meta:enum`.** Hard enums (JSON Schema `enum`) and soft/open enums (plain `string` properties with known values) both require a sibling `meta:enum` object mapping each value to a human-readable label. This is the single most common omission. (CONTRIBUTING.md: "When using `enum` in JSON schema, document all values using `meta:enum`.")
4. **Deprecate, don't remove or rename.** Even on `meta:status: experimental` schemas, additive/deprecation-only PRs are reviewed and merged faster. Removing or renaming a field is a breaking change and requires explicit justification.
5. **Every example must validate.** Every field in `*.example.*.json` must be defined in the schema it validates against. No invented fields, no extra wrapper objects, no hallucinated properties. `npm test` enforces this.
6. **Every new schema ships with at least one `*.example.*.json`.** If you add or materially change a schema, you also add or update an example.
7. **Every new schema sets `meta:status`.** One of `stable`, `experimental`, or `deprecated`. New schemas almost always start at `experimental`.
8. **Never hand-edit files under `docs/reference/`.** That directory is regenerated by `npm run markdown` and committed by CircleCI on merges to `master`. Don't include regenerated docs in a schema PR.

---

## Contribution workflow

A single end-to-end path that satisfies [CONTRIBUTING.md](CONTRIBUTING.md) and matches what reviewers expect. Skip steps only when they genuinely don't apply.

1. **Sync upstream.** `git fetch upstream && git checkout master && git reset --hard upstream/master` (or rebase) so your fork's `master` matches `adobe/xdm:master`.
2. **Open or pick a GitHub issue** on `adobe/xdm` describing the change. If the issue already exists, note its number.
3. **Branch from `master`** with a name that references the issue: `feature-<issue>-short-description` or `bug-<issue>-short-description`.
4. **Make the smallest change that satisfies the issue.** Favor additive/deprecation-only edits. See [Schema design rules](#schema-design-rules).
5. **Update or add `*.example.*.json`** for every schema you touched. Verify the example uses only fields defined in the schema and matches the schema's types, enums, and `xdm:` prefixing.
6. **Run the full validation suite** (below). Zero new failures required.
7. **Commit.** Conventional style `<subject>` or `<TICKET-KEY> <subject>` (internal Adobe contributors typically prefix with a Jira key). Keep commits focused; squash noise before pushing.
8. **Push to your fork** and open a **draft** PR against `adobe/xdm:master`. Fill in the PR body per [PR requirements](#pr-requirements).
9. **Mark the PR ready for review** once validation is green.
10. **Respond to review comments** by pushing new commits to the same branch. Don't force-push to a shared PR branch unless explicitly asked — reviewers use incremental diffs.

---

## Validation commands

Run all of these locally before opening the PR. Report the numbers in the PR body under a `## Validation` section so reviewers don't have to re-run them.

| Command | What it does | Notes |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `npm install` | Installs dependencies | Run once per fresh clone or worktree. |
| `npm test` | Mocha — validates every `*.example.*.json` against its schema | Currently ~2300+ passing. A failure here almost always means an example has a field not in the schema, or a value that violates `enum`/`type`/`pattern`. |
| `npm run lint` | `prettier --write` then `git diff --exit-code` | If prettier changes files, **lint fails**. Always commit the prettified output. |
| `npm run validate` | `node bin/validate-schemas.js` — validates every example file via the local `$id` registry, fetching unresolved refs over HTTPS | There are pre-existing failures in `master`. Verify your changes produce **zero new** failures, not zero total failures. |
| `npm run incompatibility-check` | Flags backward-incompatible edits to existing schemas | Must be clean for the files you touched. |

CI runs `lint` and `test` on every PR. `validate` currently runs locally only — see the companion CI-proposal issue for ongoing work to close that gap.

---

## Schema design rules

### Reuse-over-reinvent

**The first design step for any new property, field, or concept is to look for an existing XDM definition that already fits.** This is explicit in [CONTRIBUTING.md](CONTRIBUTING.md)'s "Schema design general guidelines" and in the XDM team's internal design-review guidance:

- "Always start introducing new properties by mapping them to the existing set of Classes, Mixins and Data types."
- "See if you can enhance existing Mixins and Data types by adding the un-mapped properties from the above step."
- "To introduce a new entity in XDM, only add a new class if the new business concept could not be added by existing set of XDM classes."

Concretely, before you create anything new:

1. **Grep/search `components/datatypes/`** for similar concepts. Names worth checking: geographic → `demographic/geo.schema.json`; identity → `identitymap`; time → existing `dateTime` conventions; measurement units → existing datatypes.
2. **Grep/search `components/fieldgroups/`** (or `components/mixins/` in older paths) for a fieldgroup that already groups related properties.
3. **Check `schemas/`** for a composite schema that already combines the class and fieldgroups you need.
4. **Look at related industry schemas** — XDM deliberately borrows from schema.org and Microsoft CDM. See [CONTRIBUTING.md "Design for Continuity"](CONTRIBUTING.md) and [docs/standards.md](docs/standards.md).
5. **Only if nothing fits**, create a new datatype (if the shape will be reused in 2+ places) or inline the properties into the relevant fieldgroup (if used in exactly one place).

When extending an existing definition, prefer adding properties or enum values to the existing schema over creating a parallel one. Parallel schemas that model the same concept create cross-network query pain downstream in AEP.

### Documenting enums (`meta:enum`)

XDM distinguishes **hard enums** (enforced by JSON Schema `enum`) and **soft enums** (open `string` fields with documented known values). **Both require `meta:enum`.**

```jsonc
// Hard enum — closed list
"xdm:adType": {
"title": "Ad type",
"type": "string",
"enum": ["video", "image", "text", "native"],
"meta:enum": {
"video": "Video ad",
"image": "Static image ad",
"text": "Text-only ad",
"native": "Native ad unit"
}
}

// Soft enum — open list, but documented known values
"xdm:adNetwork": {
"title": "Ad network",
"type": "string",
"meta:enum": {
"google_ads": "Google Ads",
"meta": "Meta (Facebook/Instagram)",
"tiktok_ads": "TikTok Ads"
}
}
```

Rules:

- **Every value in `enum` must appear as a key in `meta:enum`.** CI does not yet enforce this; reviewers do.
- **Don't convert an existing `string` field into a hard `enum`** unless the domain is genuinely closed and you've confirmed no known consumer emits other values. Soft enums (string + `meta:enum`) are the default for producer-specific vocabularies.
- **When adding enum values, preserve existing values** — this is additive. When deprecating, keep the value and note deprecation in the `meta:enum` label or field description.
- **Match the repo's casing convention** for enum values — typically `snake_case` (`video_bumper`, `app_install`), not the source API's `SCREAMING_SNAKE_CASE`.
- **When collapsing multiple source values onto one XDM value** (e.g., merging two Google Ads enum rows into one XDM enum), note the mapping explicitly in the PR description so reviewers can audit.

### Deprecate, don't remove

For any schema, including `experimental` ones:

- **To retire a field:** keep it, add/extend the `description` to say `(DEPRECATED — use <X> instead)`, and where applicable set `"meta:status": "deprecated"` on the property.
- **To rename a field:** add the new field, deprecate the old one, and wait for a future stabilization pass to remove.
- **To retire an enum value:** keep it in `enum`, note the deprecation in its `meta:enum` label.

This keeps existing producers from breaking and gets the PR reviewed/merged faster.

### File placement

From [CONTRIBUTING.md "Json file location guidelines"](CONTRIBUTING.md):

| Concept | Location |
| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| Class | `components/classes/<class>.schema.json` |
| Mixin / Fieldgroup scoped to one class | `components/mixins/<ClassName>/<name>.schema.json` (newer content lives under `components/fieldgroups/`) |
| Mixin / Fieldgroup shared across classes | `components/mixins/shared/<name>.schema.json` |
| Datatype | `components/datatypes/<name>.schema.json` (often nested by domain, e.g. `components/datatypes/paid-media/`) |
| Global, non-extensible schema | `schemas/<name>.schema.json` |
| Adobe/solution-specific | `extensions/<vendor>/<product>/...` |
| Example file | Same directory as its schema, named `<schema>.example.<N>.json` |
| Invalid example (should fail to validate) | Same directory, named `<schema>.invalid.<N>.json` |

### Conventions

- **JSON Schema:** `draft-6`.
- **`$ref`:** always absolute to the target schema's `$id` (e.g., `https://ns.adobe.com/xdm/datatypes/geo`). Never relative (`../datatypes/geo.schema.json`).
- **`$id`:** unique, immutable per schema, matches the file path but **without** the `.schema.json` suffix.
- **Property keys:** `xdm:`-namespaced (`xdm:campaignName`), `camelCase` after the prefix.
- **Acronyms in `camelCase`:** `documentID`, `dmaID`, `xdmAPI` — first acronym lowercased when leading, second acronym uppercased when concatenated.
- **Booleans:** prefix `is` or `has` (`isRemovable`, `hasExpired`) where it reads naturally.
- **Every schema:** includes `meta:license`, `$id`, `$schema`, `title`, `type`, `description`, and `meta:status`.
- **Every property:** has `title`, `description`, and a concrete `type` (no `"type": ["string", "number"]` mixtures).
- **Every array property:** has an `items` definition with a concrete type.
- **Nested objects:** extract to a separate `*.schema.json` datatype if they themselves contain object-typed properties. Don't deep-nest inline.
- **No non-semantic limits:** don't set `"maxLength": 255` just because your database uses `VARCHAR(255)`. Bounds should reflect the business domain.

---

## Example file rules

Examples are the single biggest source of validation failures in PRs. Before pushing:

1. **Every field in the example is defined in the schema.** No invented fields, no extra wrapper objects, no hallucinated nested properties.
2. **Every `xdm:`-prefixed property in the schema is `xdm:`-prefixed in the example.** Missing prefix on one key breaks validation.
3. **Every enum value in the example is listed in the schema's `enum` array** (for hard enums).
4. **Types match** — if the schema says `"type": "number"`, don't put an object.
5. **Sweep neighbors.** When you change a field, update every `*.example.*.json` that references that field, not just the one next to the schema file.
6. **New schema ⇒ at least one example.** [CONTRIBUTING.md](CONTRIBUTING.md) requires it.

If `npm test` fails with `AssertionError [ERR_ASSERTION]: Example is not valid`, read the full mocha output — it names the example file and the specific property that failed.

---

## PR requirements

### PR title

- Short, present tense, describes the "what."
- If your organization uses a ticket tracker, a leading ticket key is fine: `<TICKET-KEY> <subject>`.

### PR body

Template that matches reviewer expectations:

```markdown
## Summary

Closes #<issue-number>

<1–3 sentences on what changed and why. Call out "non-breaking" or "breaking"
explicitly. If you added or deprecated enum values, say exactly how many.>

## Motivation

<The problem being solved. If you're aligning with an industry spec or an
upstream ad-network/data-source schema, cite the primary source (link).>

## Changes

- `path/to/schema.json` — <what changed>
- `path/to/schema.example.1.json` — <what changed>

## Validation

- `npm test` — <N> passing
- `npm run lint` — clean
- `npm run validate` — zero new failures vs `master` baseline
- `npm run incompatibility-check` — clean

## Breaking changes

None. <Or: list them and confirm they are also recorded in CHANGELOG.md.>
```

### Other PR rules

- **Draft first.** Open as a draft until validation passes and you've self-reviewed the diff. Mark ready for review afterward.
- **Base branch is `master`**, not `main`.
- **Fork PRs only.** Branches on `adobe/xdm` are for maintainers.
- **List breaking changes in `CHANGELOG.md`** if any. Additive/deprecation-only PRs don't need `CHANGELOG` edits.
- **Don't commit regenerated files under `docs/reference/`.** CircleCI handles that on merge to `master`.

---

## Common pitfalls

A running list of things PRs get sent back for. Add to this list as new patterns emerge.

- **Forgot `meta:enum`** for one or more enum values. The single most common omission.
- **Converted a soft enum (plain string) into a hard `enum`** without confirming the domain is closed. Prefer additive `meta:enum` entries on the existing `string`.
- **Removed or renamed a field** instead of deprecating it.
- **Example references a field the schema doesn't define**, or an `xdm:`-prefixed property without the prefix.
- **Missed `xdm:` prefix** on a new property key.
- **Relative `$ref`** (`../datatypes/...`) instead of absolute (`https://ns.adobe.com/...`).
- **Deep-nested inline object types** that should have been extracted to a datatype.
- **PR has no `Closes #` reference**, or references a non-existent issue.
- **Prettier reformatted files** but the formatting changes weren't committed, so CI lint fails.
- **Hand-edited files under `docs/reference/`** — these are regenerated artifacts.
- **Created a parallel datatype** for something that already exists in `components/datatypes/`. See [Reuse-over-reinvent](#reuse-over-reinvent).
- **Invented a rationale** in the PR description (e.g., "the Google Ads API requires X") without linking the primary source. Reviewers will ask for citations.

---

## Repository layout (quick reference)

- `components/classes/` — XDM classes (behaviors: record, timeseries)
- `components/datatypes/` — reusable data types
- `components/fieldgroups/` — fieldgroup/mixin definitions
- `components/mixins/` — legacy mixin path (newer work lands in `fieldgroups/`)
- `schemas/` — top-level, non-extensible schemas
- `extensions/` — vendor/solution-specific extensions (Adobe products, etc.)
- `docs/` — hand-written documentation
- `docs/reference/` — **generated** documentation; never hand-edit
- `bin/` — validation scripts, stabilization tooling
- `test/` — mocha tests that drive `npm test`

---

## When in doubt

- **Read [CONTRIBUTING.md](CONTRIBUTING.md) again.** It's long but authoritative.
- **Read a recent merged PR** in the same area (e.g., `gh pr list --state merged --search <keyword>`) to see the exact shape reviewers approved.
- **Ask in the GitHub issue before opening the PR** if the design choice is non-obvious — one round of async design feedback beats three rounds of review comments.