Task 129 — v4 normalization block carrier (parse_methods digest + field_maps scaffold) (INE-61) by e-fu · Pull Request #10 · ZenHive/ccxt_extract

e-fu · 2026-05-08T10:19:25Z

Summary

Lands the Phase 12 prerequisite: scaffolds the v4 normalization section under --schema-target=4 only. v3 emission stays byte-identical (verified on binance/deribit/okx — MD5-identical between development tip and this branch).

What ships

New CcxtExtract.Normalization module (lib/ccxt_extract/normalization.ex)

build/2 projects the per-exchange priv/discoveries/parse_methods.json entry into a compact, AST-free digest ({params, return_type, async, statement_count}) and scaffolds field_maps + response_envelopes keyed by parser type (ticker/trade/ohlcv/order/position/balance/market/transaction/deposit_address).
No AST body. Re-emitting full ESTree bodies regresses the 91.6% Hex-cap reduction Task 117 shipped — binance v4 spec is only ~5 KB larger than v3 (carrier ~7.4 KB out of ~2 MB).

Pipeline wiring (lib/ccxt_extract/pipeline.ex)

Pipeline.build_exchange_data/3 calls Normalization.build/2 only on the v4 path; v3 path is untouched.
Schema.build_exchange_v4/4 reads :normalization from opts and emits the new block; the old empty placeholder %{} is gone.

Schema (priv/schema/exchange_v4.json)

New Normalization, NormalizationDigestRecord, NormalizationStubRecord, NormalizationStubValue defs.
Strict on top-level keys; permissive on stub values (oneOf [null, object]) so Phase 12 sub-bundles can populate without a schema break.

Provenance (lib/ccxt_extract/provenance.ex)

Three new derived pointers in v4: /normalization/parse_methods_digest, /normalization/field_maps, /normalization/response_envelopes.

Two new contract invariants (lib/ccxt_extract/contract_test.ex)

normalization_shape_valid — top-level + per-record shape gate. Mirrors testnet_urls_shape_valid. Skipped on v3-shaped output.
parse_methods_digest_covers_inventory — every method in the per-exchange parse_methods.json entry must surface in the emitted digest (lossless-projection guard). Inventory loaded once via load_parse_methods_inventory/1 and threaded through observed.

Acceptance criteria

Check	Status
`mix format --check-formatted`	✅ clean
`mix compile --warnings-as-errors`	✅ clean
`mix credo --strict`	✅ TODO/FIXME exit-2 only (carve-out per AGENTS.md)
`mix doctor`	✅ 100% (105 modules)
`mix dialyzer`	✅ 0 warnings
`mix sobelow --exit Low`	⚠️ line-fingerprint drift (see Reviewer note)
`mix test.json --quiet`	✅ identical failure set as `development` baseline (15 pre-existing cached-integration / corpus-staleness failures, 0 new)
`mix test.json --cover` (Normalization module)	✅ 95.24% on the new module
`mix ccxt_extract.contract_test`	✅ both new invariants 0 findings (correctly skip on v3-shaped output)
`mix ccxt_extract.validate --schema-target=4` (tier1+tier2)	✅ 19/19 exchanges validate clean
v3 emit byte-identical to `development` for binance/deribit/okx	✅ MD5 match
binance v4 spec size	✅ ~5 KB delta vs v3 (no Hex-cap regression)
PR opened non-draft	✅

Reviewer note (`.sobelow-skips`)

Per linear-workflow § "Code-Only PRs + Required Acceptance Criteria":

.sobelow-skips exception: for repos with sobelow line-fingerprint drift, the harness fails-loud-with-diff if drift is detected; commit-review applies the regen at merge in the same post-merge commit. Agent never touches the file.

This PR's edits to lib/ccxt_extract/contract_test.ex (added Normalization alias + parse_methods_inventory baseline plumbing) and lib/ccxt_extract/pipeline.ex (added Normalization alias + 8-line v4 carrier wiring) shifted line numbers, invalidating 8 fingerprints and producing 8 new ones. Local mix sobelow --exit Low flags one finding (validation.ex:184 — the same finding skip 666 covers, just with a slightly different context hash). Reviewer regenerates .sobelow-skips in the post-merge bookkeeping commit.

The CI gate (mix sobelow with no --exit) returns 0 regardless of findings, so harness CI passes.

Out of scope (deferred to commit-review's post-merge bookkeeping commit)

ROADMAP.md (Task 129 status flip ⬜ → ✅)
CHANGELOG.md ([Unreleased] entry)
SCHEMA.md (Normalization carrier description for v4 section)
.sobelow-skips (line-fingerprint regen)

Cross-repo

../ccxt_client/ROADMAP.md Phase 12 parser tasks remain blocked until field_maps are populated (Tasks 74–82 / 83). This carrier is the load-bearing prerequisite — no ccxt_client task flips on this PR alone.

Scoring

[D:3/B:6/U:7 → Eff:2.17] 🚀

Linear Issue: INE-61

Summary by CodeRabbit

New Features
- v4 schema now emits a populated normalization block with parse-method digests, field maps, and response envelopes.
- Emitted provenance marks normalization paths as derived.
Validation
- New runtime checks validate normalization shape and ensure parse-method digests cover discovered methods.
Tests
- Added unit and integration tests covering normalization construction, validation, and provenance.

linear-code · 2026-05-08T10:19:29Z

INE-61

coderabbitai · 2026-05-08T10:19:36Z

📝 Walkthrough

Walkthrough

This PR introduces the Normalization module for v4 schemas: it builds a compact normalization block (digest of parse methods, scaffolded field_maps and response_envelopes) from discovery data, integrates it into the pipeline and schema, validates structure via two new contract invariants, and extends test coverage across modules, contracts, and integration tests.

Changes

Normalization v4 Schema Construction & Validation

Layer / File(s)	Summary
Data Shape & Contract Definitions `lib/ccxt_extract/normalization.ex`, `priv/schema/exchange_v4.json`	JSON schema expanded from empty placeholder to strict structure requiring `parse_methods_digest`, `field_maps`, `response_envelopes` with new `$defs` types (`NormalizationDigestRecord`, `NormalizationStubRecord`, `NormalizationStubValue`); Normalization module provides parser type vocabulary and required key contract functions.
Core Builder Implementation `lib/ccxt_extract/normalization.ex`	`build/2` derives compact AST-free digest (method name → params, return_type, async, statement_count) from `parse_methods`; normalizes params lists and coerces statement counts to non-negative; scaffolds `field_maps` and `response_envelopes` with `_unresolved_reason` markers.
Pipeline & Schema Integration `lib/ccxt_extract/pipeline.ex`, `lib/ccxt_extract/schema.ex`, `lib/ccxt_extract/provenance.ex`	Pipeline constructs normalization from `data.parse_methods[id]` and injects into v4 options; Schema emits normalization in v4 output replacing prior empty object; Provenance registers `/normalization/parse_methods_digest`, `/normalization/field_maps`, `/normalization/response_envelopes` as derived pointers.
Contract Invariants & Validation `lib/ccxt_extract/contract_test.ex`	Registers `check_normalization_shape_valid/2` validating required/extra keys and field types; registers `check_parse_methods_digest_covers_inventory/2` ensuring digest includes all discovery inventory methods; loads `priv/discoveries/parse_methods.json` into baseline inventory.
Unit Tests: Normalization Module `test/ccxt_extract/normalization_test.exs`	Covers `build/2` digest derivation, scaffold structure, required keys, type validity, method-name inventory preservation, and defensive handling of malformed params/statements/non-map inputs.
Contract Invariant Tests `test/ccxt_extract/contract_test_test.exs`	Tests `check_normalization_shape_valid/2` for missing required keys, extra keys, digest record type errors, field_maps/response_envelopes malformations; tests `check_parse_methods_digest_covers_inventory/2` for skip conditions, exact match, and missing inventory methods with deterministic sorting.
Pipeline, Provenance & Integration Tests `test/ccxt_extract/pipeline_test.exs`, `test/integration/cached/schema_v4_emit_cached_test.exs`	Updates v4 emit test to assert normalization scaffold; adds new tests for digest projection from `parse_methods.json`, v3 exclusion of normalization, stub structure validation; extends provenance test to require normalization pointers; integration test validates emitted normalization structure and unresolved markers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

ZenHive/ccxt_extract#7: Introduces the v4 emission path that this PR extends by building and threading a concrete normalization value into Pipeline → Schema.build_exchange_v4/4 / Provenance code paths.

Poem

🐰 A tiny rabbit coded through the night,
Turning parse methods into a digest bright.
Stubs stand ready with reasons not yet done,
Invariants check each method, one by one.
Hooray — v4 normalization, tidy and light!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introduction of a v4 normalization block carrier that projects parse_methods into a digest and scaffolds field_maps and response_envelopes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cursor/task-129-normalization-carrier-293e

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR scaffolds a new v4-only normalization block in the emitted exchange schema, including an AST-free parse_methods_digest plus stub field_maps and response_envelopes, while keeping v3 output unchanged.

Changes:

Add CcxtExtract.Normalization to build the v4 normalization carrier (digest + scaffolds).
Wire the pipeline/schema to include normalization only for --schema-target=4, and extend v4 provenance pointers accordingly.
Extend v4 JSON Schema and add/expand tests + contract invariants to validate the new block.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
test/integration/cached/schema_v4_emit_cached_test.exs	Asserts v4 emits `normalization` with required keys and unresolved scaffolds.
test/ccxt_extract/pipeline_test.exs	Updates v4 pipeline expectations; adds targeted tests for digest projection and v3 absence.
test/ccxt_extract/normalization_test.exs	New unit tests for digest/scaffold generation and defensive behavior.
test/ccxt_extract/contract_test_test.exs	Adds tests for new normalization-related contract invariants.
priv/schema/exchange_v4.json	Defines the `Normalization` block, digest record, and stub record/value schema.
lib/ccxt_extract/schema.ex	Emits `normalization` in v4 using provided/derived scaffold instead of `%{}`.
lib/ccxt_extract/provenance.ex	Adds derived provenance pointers for `/normalization/*` in v4.
lib/ccxt_extract/pipeline.ex	Computes `normalization` only on the v4 path and threads it into schema build opts.
lib/ccxt_extract/normalization.ex	New module implementing digest projection and stub scaffold generation.
lib/ccxt_extract/contract_test.ex	Adds invariants for normalization shape and digest coverage vs inventory; loads parse_methods inventory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  defp digest_findings(_id, nil), do: []
+
+  defp digest_findings(id, digest) when is_map(digest) do
+    digest


+  defp stub_record_findings(_id, _path, nil), do: []
+


+    if Map.has_key?(exchange, "normalization") do
+      digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
+      inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)
+
+      digest_inventory_findings(id, digest, inventory)
+    else


    auth_sections = structure_data["authenticated_sections"]
    sign_method = structure_data["sign_method"]
    describe_api = structure_data["describe_api"]
+    normalization = Keyword.get(opts, :normalization) || CcxtExtract.Normalization.build(nil)



chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e86b1d667f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T10:23:52Z

+  defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) do
+    digest_keys = digest |> Map.keys() |> MapSet.new()


Guard non-map digests before checking inventory

If a v4 output has a malformed normalization.parse_methods_digest value (for example a list or string) and the parse-method inventory contains this exchange, this invariant crashes at Map.keys/1 instead of reporting findings; run_invariants/2 does not rescue invariant errors, so one bad exchange aborts the entire contract-test run. The shape invariant already reports parse_methods_digest as non-map, so this coverage check should skip or emit its own finding unless digest is a map.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-08T10:23:52Z

    "Normalization": {
      "type": "object",
-      "description": "Response-normalization surface. Empty placeholder until Task 129 ships parse_methods_digest (compact, no AST body), Phase 12 ships field_maps, and Task 83 ships response_envelopes. Permissive (additionalProperties: true) so future tasks can populate without a schema break.",
-      "additionalProperties": true,
-      "properties": {}
+      "description": "Response-normalization surface. Task 129 ships parse_methods_digest (compact, no AST body — re-emitting bodies regresses the 91.6% Hex-cap reduction Task 117 shipped), and stub field_maps + response_envelopes scaffolds. Phase 12 sub-bundles (Tasks 74–83) populate the field maps and envelope paths on top of this scaffold.",
+      "required": [
+        "parse_methods_digest",
+        "field_maps",
+        "response_envelopes"
+      ],


Document the new normalization contract

This changes the emitted v4 JSON shape from an open empty normalization placeholder to a required, closed parse_methods_digest/field_maps/response_envelopes contract, but the commit does not update SCHEMA.md. The repository AGENTS.md documentation invariant explicitly requires SCHEMA.md updates when the emitted JSON shape changes; without that, consumers reading the contract docs will still see stale normalization guidance.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/ccxt_extract/schema.ex (1)

192-218: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Teach validate_v4/1 about the injected normalization shape.

build_exchange_v4/4 now accepts caller-provided :normalization, but validate_v4/1 never descends into that section. A payload with %{"normalization" => %{}} still passes preflight, so malformed v4 output can slip through Pipeline.extract/1 until a later full-schema run.

Minimal follow-up

+ `@required_normalization_keys_v4` ~w(parse_methods_digest field_maps response_envelopes)
...
       |> check_required_keys(data["auth"], `@required_auth_keys_v4`, "auth")
       |> check_required_keys(data["errors"], `@required_errors_keys_v4`, "errors")
+      |> check_required_keys(data["normalization"], `@required_normalization_keys_v4`, "normalization")
       |> check_required_keys(data["markets"], `@required_markets_keys_v4`, "markets")
       |> check_required_keys(data["raw"], `@required_raw_keys_v4`, "raw")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/schema.ex` around lines 192 - 218, validate_v4/1 must also
descend into the injected "normalization" section added by build_exchange_v4/4;
update validate_v4/1 to fetch the "normalization" key from the payload and
validate it against the Normalization module (e.g. call
CcxtExtract.Normalization.validate/1 or the module's schema/validator) so that a
caller-provided %{"normalization" => %{}} is rejected, and wire any validation
errors into the same preflight error reporting path as the other sections.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1185-1189: The code assumes `digest` is a map before passing it to
`digest_inventory_findings`, which can crash if
`normalization.parse_methods_digest` is a scalar; after retrieving `digest` with
get_in(exchange, ["normalization", "parse_methods_digest"]) (currently assigned
to `digest`), guard it with an is_map check and coerce non-maps to an empty map
(or bail out and return the same kind of finding as normalization_shape_valid
does) before calling digest_inventory_findings(id, digest, inventory); this
avoids Map.keys/1 errors when malformed digests are emitted.

In `@lib/ccxt_extract/normalization.ex`:
- Around line 117-160: Add `@spec` annotations for each private helper to satisfy
Dialyzer and the repo rule: annotate digest_from_entry/1 (accepts nil or map and
returns map()), digest_from_methods/1 (map() -> map()), digest_record/1 (map() |
any() -> map()), normalize_params/1 (nil | list() | any() -> list()),
normalize_param/1 (map() | any() -> map()), and normalize_statement_count/1
(integer() | any() -> non_neg_integer()). Use concrete Elixir types (map(),
list(), integer(), non_neg_integer(), nil) and ensure specs match the existing
return shapes (maps/lists/integers) for functions like digest_record/1 and
normalize_param/1.

In `@priv/schema/exchange_v4.json`:
- Around line 348-359: The _unresolved_reason field currently allows any
non-empty string; tighten it to the closed vocabulary scaffold sentinel by
replacing the "type":"string","minLength":1 branch with an enum of allowed
sentinel(s) (e.g. ["not_yet_derived"]) so the oneOf becomes either null or that
specific sentinel value; update the schema for _unresolved_reason accordingly
(refer to the _unresolved_reason property in the diff).

---

Outside diff comments:
In `@lib/ccxt_extract/schema.ex`:
- Around line 192-218: validate_v4/1 must also descend into the injected
"normalization" section added by build_exchange_v4/4; update validate_v4/1 to
fetch the "normalization" key from the payload and validate it against the
Normalization module (e.g. call CcxtExtract.Normalization.validate/1 or the
module's schema/validator) so that a caller-provided %{"normalization" => %{}}
is rejected, and wire any validation errors into the same preflight error
reporting path as the other sections.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6baf393e-f4a2-4212-8b2c-48bc50565e2f

📥 Commits

Reviewing files that changed from the base of the PR and between b65a526 and e86b1d6.

📒 Files selected for processing (10)

lib/ccxt_extract/contract_test.ex
lib/ccxt_extract/normalization.ex
lib/ccxt_extract/pipeline.ex
lib/ccxt_extract/provenance.ex
lib/ccxt_extract/schema.ex
priv/schema/exchange_v4.json
test/ccxt_extract/contract_test_test.exs
test/ccxt_extract/normalization_test.exs
test/ccxt_extract/pipeline_test.exs
test/integration/cached/schema_v4_emit_cached_test.exs

coderabbitai · 2026-05-08T10:25:56Z

+    if Map.has_key?(exchange, "normalization") do
+      digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
+      inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)
+
+      digest_inventory_findings(id, digest, inventory)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-map digests before calling Map.keys/1.

If a broken emit writes normalization.parse_methods_digest as a scalar, this invariant raises and aborts the whole contract run instead of returning a finding. normalization_shape_valid already handles malformed sections defensively, so this check should short-circuit the same way.

One safe way to harden it

def check_parse_methods_digest_covers_inventory(exchange, observed) do id = exchange_id(exchange) if Map.has_key?(exchange, "normalization") do - digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{} + digest = get_in(exchange, ["normalization", "parse_methods_digest"]) inventory = Map.get(observed[:parse_methods_inventory] || %{}, id) digest_inventory_findings(id, digest, inventory) else [] end end defp digest_inventory_findings(_id, _digest, nil), do: [] + defp digest_inventory_findings(_id, digest, _inventory) when not is_map(digest), do: []

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/contract_test.ex` around lines 1185 - 1189, The code assumes `digest` is a map before passing it to `digest_inventory_findings`, which can crash if `normalization.parse_methods_digest` is a scalar; after retrieving `digest` with get_in(exchange, ["normalization", "parse_methods_digest"]) (currently assigned to `digest`), guard it with an is_map check and coerce non-maps to an empty map (or bail out and return the same kind of finding as normalization_shape_valid does) before calling digest_inventory_findings(id, digest, inventory); this avoids Map.keys/1 errors when malformed digests are emitted.

coderabbitai · 2026-05-08T10:25:56Z

+  defp digest_from_entry(nil), do: %{}
+  defp digest_from_entry(%{"parse_methods" => methods}) when is_map(methods), do: digest_from_methods(methods)
+  defp digest_from_entry(_), do: %{}
+
+  defp digest_from_methods(methods) do
+    Map.new(methods, fn {name, ast} ->
+      {name, digest_record(ast)}
+    end)
+  end
+
+  defp digest_record(ast) when is_map(ast) do
+    %{
+      "params" => normalize_params(Map.get(ast, "params")),
+      "return_type" => Map.get(ast, "return_type"),
+      "async" => Map.get(ast, "async", false) == true,
+      "statement_count" => normalize_statement_count(Map.get(ast, "statements"))
+    }
+  end
+
+  defp digest_record(_) do
+    %{
+      "params" => [],
+      "return_type" => nil,
+      "async" => false,
+      "statement_count" => 0
+    }
+  end
+
+  defp normalize_params(nil), do: []
+
+  defp normalize_params(list) when is_list(list) do
+    Enum.map(list, &normalize_param/1)
+  end
+
+  defp normalize_params(_), do: []
+
+  defp normalize_param(%{"name" => name} = param) when is_binary(name) do
+    %{"name" => name, "type" => Map.get(param, "type")}
+  end
+
+  defp normalize_param(_), do: %{"name" => "", "type" => nil}
+
+  defp normalize_statement_count(n) when is_integer(n) and n >= 0, do: n
+  defp normalize_statement_count(_), do: 0


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add @specs for the private helpers.

This new module leaves every defp untyped, which breaks the repo rule for Elixir modules and makes Dialyzer regressions here harder to catch.

As per coding guidelines, Every Elixir function must have a @SPEC — def and defp alike. No exceptions for 'trivial' helpers.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/normalization.ex` around lines 117 - 160, Add `@spec` annotations for each private helper to satisfy Dialyzer and the repo rule: annotate digest_from_entry/1 (accepts nil or map and returns map()), digest_from_methods/1 (map() -> map()), digest_record/1 (map() | any() -> map()), normalize_params/1 (nil | list() | any() -> list()), normalize_param/1 (map() | any() -> map()), and normalize_statement_count/1 (integer() | any() -> non_neg_integer()). Use concrete Elixir types (map(), list(), integer(), non_neg_integer(), nil) and ensure specs match the existing return shapes (maps/lists/integers) for functions like digest_record/1 and normalize_param/1.

coderabbitai · 2026-05-08T10:25:56Z

+        "_unresolved_reason": {
+          "oneOf": [
+            {
+              "type": "null"
+            },
+            {
+              "type": "string",
+              "minLength": 1
+            }
+          ],
+          "description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."
+        }


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Pin _unresolved_reason to the scaffold sentinel.

This field is documented as a closed-vocabulary scaffold tag, but the schema currently accepts any non-empty string. That makes unexpected sentinel values schema-valid and weakens the new v4 contract.

Proposed schema tightening

"_unresolved_reason": { "oneOf": [ { "type": "null" }, { "type": "string", - "minLength": 1 + "enum": [ + "not_yet_derived" + ] } ], "description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated." }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"_unresolved_reason": {

"oneOf": [

{

"type": "null"

},

{

"type": "string",

"minLength": 1

}

],

"description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."

}

"_unresolved_reason": {

"oneOf": [

{

"type": "null"

},

{

"type": "string",

"enum": [

"not_yet_derived"

]

}

],

"description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@priv/schema/exchange_v4.json` around lines 348 - 359, The _unresolved_reason field currently allows any non-empty string; tighten it to the closed vocabulary scaffold sentinel by replacing the "type":"string","minLength":1 branch with an enum of allowed sentinel(s) (e.g. ["not_yet_derived"]) so the oneOf becomes either null or that specific sentinel value; update the schema for _unresolved_reason accordingly (refer to the _unresolved_reason property in the diff).

…ld_maps scaffold) (INE-61) Lands the Phase 12 prerequisite: scaffolds the v4 `normalization` section under `--schema-target=4` only. v3 emission stays byte-identical (verified on binance/deribit/okx). * New `CcxtExtract.Normalization` module — `build/2` projects the per- exchange `priv/discoveries/parse_methods.json` entry into a compact, AST-free digest (`{params, return_type, async, statement_count}`) and scaffolds `field_maps` + `response_envelopes` keyed by parser type. No AST body: re-emitting full ESTree bodies regresses the 91.6% Hex-cap reduction Task 117 shipped (binance v4 spec is only ~5 KB larger than v3). * `Pipeline.build_exchange_data/3` calls `Normalization.build/2` only on the v4 path; v3 path is untouched. * `Schema.build_exchange_v4/4` reads `:normalization` from opts and emits the new block; the old empty placeholder `%{}` is gone. * `priv/schema/exchange_v4.json`: adds `Normalization`, `NormalizationDigestRecord`, `NormalizationStubRecord`, `NormalizationStubValue` defs. Strict on top-level keys, permissive on stub values (oneOf null/object) so Phase 12 sub-bundles can populate without a schema break. * `Provenance.build_default_v4/0` adds three new derived pointers: `/normalization/parse_methods_digest`, `/normalization/field_maps`, `/normalization/response_envelopes`. * Two new contract invariants: - `normalization_shape_valid` — top-level + per-record shape gate; skipped on v3-shaped output (no top-level `normalization` key). - `parse_methods_digest_covers_inventory` — every method in the per-exchange parse_methods.json entry must surface in the emitted digest. Inventory loaded once via `load_parse_methods_inventory/1` and threaded through `observed`. Tests: new `test/ccxt_extract/normalization_test.exs` (round-trip + shape + defensive cases, ~95% coverage on the new module), new contract-test cases for both invariants, extended `pipeline_test.exs` v4 emit asserts, extended cached `schema_v4_emit_cached_test.exs` to assert the new block on tier 1 priority exchanges (binance / deribit / okx). Validated against `exchange_v4.json` for all 19 tier1+tier2 exchanges (0 failures). v3 binance/deribit/okx emission MD5-identical between `development` and this branch. `.sobelow-skips` line-fingerprint drift expected from line shifts in contract_test.ex / pipeline.ex; per linear-workflow § "Code-Only PRs" the agent does not touch the file — commit-review applies the regen in the post-merge bookkeeping commit. Co-authored-by: E.FU <e-fu@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

lib/ccxt_extract/contract_test.ex (1)

1424-1428: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-map digests before Map.keys/1.

If normalization.parse_methods_digest is emitted as a scalar, this path still crashes the whole contract run instead of returning a finding or skipping malformed data.

Suggested hardening

   def check_parse_methods_digest_covers_inventory(exchange, observed) do
     id = exchange_id(exchange)

     if Map.has_key?(exchange, "normalization") do
-      digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
+      digest = get_in(exchange, ["normalization", "parse_methods_digest"])
       inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)

       digest_inventory_findings(id, digest, inventory)
     else
       []
     end
   end

   defp digest_inventory_findings(_id, _digest, nil), do: []
+  defp digest_inventory_findings(_id, digest, _inventory) when not is_map(digest), do: []

   defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) do

Also applies to: 1436-1437

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/contract_test.ex` around lines 1424 - 1428, The code assumes
normalization.parse_methods_digest is a map and later calls Map.keys/1 which
will crash on scalars; update the guards so before calling
digest_inventory_findings (and the analogous block at lines 1436-1437) you
verify the digest is a map (use is_map(digest)) and if not either coerce it to
an empty map (digest = %{}) or return/record a finding for malformed digest, so
Map.keys/1 is never invoked on a non-map value; apply the same check for
observed[:parse_methods_inventory] where appropriate.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1381-1396: stub_value_findings/3 currently only iterates
Normalization.parser_types() so a scalar _unresolved_reason slips through;
update the function to validate _unresolved_reason as well by either including
"_unresolved_reason" in the checked keys or adding an explicit check after the
Enum.flat_map that inspects Map.fetch(record, "_unresolved_reason") and, if
{:ok, other} and not nil and not a map, returns a normalization_finding(id,
"#{path}._unresolved_reason", "must be null or a map, got #{inspect(other)}");
keep the same message format and return shape so callers of
stub_value_findings/3 behave unchanged.
- Around line 1279-1286: The current Map.get(record, "...") calls (used when
computing digest_findings, field_maps_findings, response_envelopes_findings)
swallow present-but-null values, so explicit "key": null doesn't produce a shape
finding; replace those Map.get/2 usages with Map.fetch(record, "key") (or an
equivalent that distinguishes present-nil from missing) and pass the {:ok,
value} result through to stub_record_findings (so {:ok, nil} yields a finding),
while letting :error continue to be treated as the missing-key case already
reported; apply the same change where Map.get/2 is used in the other occurrences
mentioned (lines 1299-1308 and 1364-1378).
- Around line 1792-1804: The current JsonIO.read_json(path) branch silently
returns %{} for any error or wrong shape, hiding malformed parse_methods.json;
change the logic in the case handling (the block calling JsonIO.read_json) so
that only a missing-file error (e.g., {:error, :enoent} or whatever JsonIO uses
for not-found) returns %{}, while any other {:error, reason} or an {:ok, value}
that does not match %{"exchanges" => entries} should raise or propagate an error
(or at minimum log and rethrow) so malformed JSON or unexpected top-level shape
surfaces and does not silently disable parse_methods_digest_covers_inventory.

---

Duplicate comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1424-1428: The code assumes normalization.parse_methods_digest is
a map and later calls Map.keys/1 which will crash on scalars; update the guards
so before calling digest_inventory_findings (and the analogous block at lines
1436-1437) you verify the digest is a map (use is_map(digest)) and if not either
coerce it to an empty map (digest = %{}) or return/record a finding for
malformed digest, so Map.keys/1 is never invoked on a non-map value; apply the
same check for observed[:parse_methods_inventory] where appropriate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 854fdfa8-6a93-4764-a924-dd881e377b59

📥 Commits

Reviewing files that changed from the base of the PR and between e86b1d6 and c5ad67f.

📒 Files selected for processing (10)

lib/ccxt_extract/contract_test.ex
lib/ccxt_extract/normalization.ex
lib/ccxt_extract/pipeline.ex
lib/ccxt_extract/provenance.ex
lib/ccxt_extract/schema.ex
priv/schema/exchange_v4.json
test/ccxt_extract/contract_test_test.exs
test/ccxt_extract/normalization_test.exs
test/ccxt_extract/pipeline_test.exs
test/integration/cached/schema_v4_emit_cached_test.exs

✅ Files skipped from review due to trivial changes (1)

lib/ccxt_extract/provenance.ex

🚧 Files skipped from review as they are similar to previous changes (7)

lib/ccxt_extract/pipeline.ex
test/integration/cached/schema_v4_emit_cached_test.exs
test/ccxt_extract/normalization_test.exs
test/ccxt_extract/contract_test_test.exs
lib/ccxt_extract/normalization.ex
lib/ccxt_extract/schema.ex
priv/schema/exchange_v4.json

coderabbitai · 2026-05-08T13:10:40Z

+    digest_findings =
+      digest_findings(id, Map.get(record, "parse_methods_digest"))
+
+    field_maps_findings =
+      stub_record_findings(id, "normalization.field_maps", Map.get(record, "field_maps"))
+
+    response_envelopes_findings =
+      stub_record_findings(id, "normalization.response_envelopes", Map.get(record, "response_envelopes"))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let present-but-null normalization sections pass.

Map.get/2 plus the nil clauses here means "parse_methods_digest": null, "field_maps": null, and "response_envelopes": null produce no shape finding. Missing keys are already reported above; explicit null should still fail this invariant.

Also applies to: 1299-1308, 1364-1378

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/contract_test.ex` around lines 1279 - 1286, The current Map.get(record, "...") calls (used when computing digest_findings, field_maps_findings, response_envelopes_findings) swallow present-but-null values, so explicit "key": null doesn't produce a shape finding; replace those Map.get/2 usages with Map.fetch(record, "key") (or an equivalent that distinguishes present-nil from missing) and pass the {:ok, value} result through to stub_record_findings (so {:ok, nil} yields a finding), while letting :error continue to be treated as the missing-key case already reported; apply the same change where Map.get/2 is used in the other occurrences mentioned (lines 1299-1308 and 1364-1378).

coderabbitai · 2026-05-08T13:10:40Z

+  defp stub_value_findings(id, path, record) do
+    Enum.flat_map(Normalization.parser_types(), fn key ->
+      case Map.fetch(record, key) do
+        {:ok, nil} ->
+          []
+
+        {:ok, v} when is_map(v) ->
+          []
+
+        {:ok, other} ->
+          [normalization_finding(id, "#{path}.#{key}", "must be null or a map, got #{inspect(other)}")]
+
+        :error ->
+          []
+      end
+    end)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate _unresolved_reason values too.

stub_value_findings/3 only walks Normalization.parser_types/0, so _unresolved_reason can hold any scalar and still pass once the key-set check succeeds. The docstring says every stub value should be null or a map.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/contract_test.ex` around lines 1381 - 1396, stub_value_findings/3 currently only iterates Normalization.parser_types() so a scalar _unresolved_reason slips through; update the function to validate _unresolved_reason as well by either including "_unresolved_reason" in the checked keys or adding an explicit check after the Enum.flat_map that inspects Map.fetch(record, "_unresolved_reason") and, if {:ok, other} and not nil and not a map, returns a normalization_finding(id, "#{path}._unresolved_reason", "must be null or a map, got #{inspect(other)}"); keep the same message format and return shape so callers of stub_value_findings/3 behave unchanged.

coderabbitai · 2026-05-08T13:10:40Z

+    case JsonIO.read_json(path) do
+      {:ok, %{"exchanges" => entries}} when is_list(entries) ->
+        Map.new(entries, fn entry ->
+          methods = Map.get(entry, "parse_methods") || %{}
+          {entry["id"], Map.keys(methods)}
+        end)
+
+      {:ok, _malformed} ->
+        %{}
+
+      {:error, _reason} ->
+        %{}
+    end


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail loudly on malformed parse_methods.json.

This returns %{} for any read or shape error, which silently disables parse_methods_digest_covers_inventory. Skipping only when the discovery file is missing makes sense for v3-only runs; malformed JSON or an unexpected top-level shape should surface instead.

Suggested tightening

case JsonIO.read_json(path) do {:ok, %{"exchanges" => entries}} when is_list(entries) -> Map.new(entries, fn entry -> methods = Map.get(entry, "parse_methods") || %{} {entry["id"], Map.keys(methods)} end) - {:ok, _malformed} -> - %{} + {:ok, _malformed} -> + raise "invalid parse_methods inventory at #{path}: expected %{\"exchanges\" => [...]}" - {:error, _reason} -> + {:error, {:missing_input, _}} -> %{} + + {:error, reason} -> + raise "failed to load parse_methods inventory at #{path}: #{inspect(reason)}" end end

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

case JsonIO.read_json(path) do

{:ok, %{"exchanges" => entries}} when is_list(entries) ->

Map.new(entries, fn entry ->

methods = Map.get(entry, "parse_methods") || %{}

{entry["id"], Map.keys(methods)}

end)

{:ok, _malformed} ->

%{}

{:error, _reason} ->

%{}

end

case JsonIO.read_json(path) do

{:ok, %{"exchanges" => entries}} when is_list(entries) ->

Map.new(entries, fn entry ->

methods = Map.get(entry, "parse_methods") || %{}

{entry["id"], Map.keys(methods)}

end)

{:ok, _malformed} ->

raise "invalid parse_methods inventory at #{path}: expected %{\"exchanges\" => [...]}"

{:error, {:missing_input, _}} ->

%{}

{:error, reason} ->

raise "failed to load parse_methods inventory at #{path}: #{inspect(reason)}"

end

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/contract_test.ex` around lines 1792 - 1804, The current JsonIO.read_json(path) branch silently returns %{} for any error or wrong shape, hiding malformed parse_methods.json; change the logic in the case handling (the block calling JsonIO.read_json) so that only a missing-file error (e.g., {:error, :enoent} or whatever JsonIO uses for not-found) returns %{}, while any other {:error, reason} or an {:ok, value} that does not match %{"exchanges" => entries} should raise or propagate an error (or at minimum log and rethrow) so malformed JSON or unexpected top-level shape surfaces and does not silently disable parse_methods_digest_covers_inventory.

…rrier Hardens the v4 normalization block against malformed input and tightens the contract surface that CodeRabbit + Codex flagged on PR #10. - contract_test: guard `digest_inventory_findings/3` against non-map digests so `Map.keys/1` no longer crashes the run on a scalar - contract_test: validate `_unresolved_reason` in `stub_value_findings/3` (must be null or the sentinel `"not_yet_derived"`) - contract_test: drop nil-clauses in `digest_findings/2` and `stub_record_findings/3` so explicit `"key": null` produces a finding instead of silently passing - contract_test: distinguish `:missing_input` from `:invalid_json` / malformed-shape in `load_parse_methods_inventory/1` — corrupt JSON or wrong top-level shape now raises with regen instructions instead of silently disabling `parse_methods_digest_covers_inventory` - normalization: add @SPEC on six private helpers per the every-function-gets-a-spec mandate - schema: `validate_v4/1` now descends into `/normalization` so a caller-provided `%{}` fails preflight - exchange_v4.json: tighten `_unresolved_reason` from open-string to enum `["not_yet_derived"]` (closed-vocabulary scaffold sentinel)

…rrier (#15) Hardens the v4 normalization block against malformed input and tightens the contract surface that CodeRabbit + Codex flagged on PR #10. - contract_test: guard `digest_inventory_findings/3` against non-map digests so `Map.keys/1` no longer crashes the run on a scalar - contract_test: validate `_unresolved_reason` in `stub_value_findings/3` (must be null or the sentinel `"not_yet_derived"`) - contract_test: drop nil-clauses in `digest_findings/2` and `stub_record_findings/3` so explicit `"key": null` produces a finding instead of silently passing - contract_test: distinguish `:missing_input` from `:invalid_json` / malformed-shape in `load_parse_methods_inventory/1` — corrupt JSON or wrong top-level shape now raises with regen instructions instead of silently disabling `parse_methods_digest_covers_inventory` - normalization: add @SPEC on six private helpers per the every-function-gets-a-spec mandate - schema: `validate_v4/1` now descends into `/normalization` so a caller-provided `%{}` fails preflight - exchange_v4.json: tighten `_unresolved_reason` from open-string to enum `["not_yet_derived"]` (closed-vocabulary scaffold sentinel)

Copilot AI review requested due to automatic review settings May 8, 2026 10:19

Copilot started reviewing on behalf of e-fu May 8, 2026 10:19 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

e-fu force-pushed the cursor/task-129-normalization-carrier-293e branch from e86b1d6 to c5ad67f Compare May 8, 2026 13:04

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

e-fu merged commit 7c5ee73 into development May 8, 2026
2 checks passed

e-fu deleted the cursor/task-129-normalization-carrier-293e branch May 8, 2026 13:10

e-fu mentioned this pull request May 8, 2026

fix(INE-61): bot-review hardening on Task 129 normalization carrier #15

Merged

4 tasks

This was referenced May 8, 2026

Task 89 — Rate-limit bucket config (axes / refill / size) (INE-65) #11

Merged

Task 78 — parseOHLCV field map (pure-array scope) #17

Merged

Task 74: parseTicker field map + coercion + enums #21

Merged

		defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) do
		digest_keys = digest \|> Map.keys() \|> MapSet.new()

Conversation

e-fu commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What ships

Acceptance criteria

Reviewer note (.sobelow-skips)

Out of scope (deferred to commit-review's post-merge bookkeeping commit)

Cross-repo

Scoring

Summary by CodeRabbit

Uh oh!

linear-code Bot commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

e-fu commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Reviewer note (`.sobelow-skips`)

coderabbitai Bot commented May 8, 2026 •

edited

Loading