Skip to content

Task 129 — v4 normalization block carrier (parse_methods digest + field_maps scaffold) (INE-61)#10

Merged
e-fu merged 1 commit intodevelopmentfrom
cursor/task-129-normalization-carrier-293e
May 8, 2026
Merged

Task 129 — v4 normalization block carrier (parse_methods digest + field_maps scaffold) (INE-61)#10
e-fu merged 1 commit intodevelopmentfrom
cursor/task-129-normalization-carrier-293e

Conversation

@e-fu
Copy link
Copy Markdown
Contributor

@e-fu e-fu commented May 8, 2026

Summary

Lands the Phase 12 prerequisite: scaffolds the v4 normalization section under --schema-target=4 only. v3 emission stays byte-identical (verified on binance/deribit/okx — MD5-identical between development tip and this branch).

What ships

New CcxtExtract.Normalization module (lib/ccxt_extract/normalization.ex)

  • build/2 projects the per-exchange priv/discoveries/parse_methods.json entry into a compact, AST-free digest ({params, return_type, async, statement_count}) and scaffolds field_maps + response_envelopes keyed by parser type (ticker/trade/ohlcv/order/position/balance/market/transaction/deposit_address).
  • No AST body. Re-emitting full ESTree bodies regresses the 91.6% Hex-cap reduction Task 117 shipped — binance v4 spec is only ~5 KB larger than v3 (carrier ~7.4 KB out of ~2 MB).

Pipeline wiring (lib/ccxt_extract/pipeline.ex)

  • Pipeline.build_exchange_data/3 calls Normalization.build/2 only on the v4 path; v3 path is untouched.
  • Schema.build_exchange_v4/4 reads :normalization from opts and emits the new block; the old empty placeholder %{} is gone.

Schema (priv/schema/exchange_v4.json)

  • New Normalization, NormalizationDigestRecord, NormalizationStubRecord, NormalizationStubValue defs.
  • Strict on top-level keys; permissive on stub values (oneOf [null, object]) so Phase 12 sub-bundles can populate without a schema break.

Provenance (lib/ccxt_extract/provenance.ex)

  • Three new derived pointers in v4: /normalization/parse_methods_digest, /normalization/field_maps, /normalization/response_envelopes.

Two new contract invariants (lib/ccxt_extract/contract_test.ex)

  • normalization_shape_valid — top-level + per-record shape gate. Mirrors testnet_urls_shape_valid. Skipped on v3-shaped output.
  • parse_methods_digest_covers_inventory — every method in the per-exchange parse_methods.json entry must surface in the emitted digest (lossless-projection guard). Inventory loaded once via load_parse_methods_inventory/1 and threaded through observed.

Acceptance criteria

Check Status
mix format --check-formatted ✅ clean
mix compile --warnings-as-errors ✅ clean
mix credo --strict ✅ TODO/FIXME exit-2 only (carve-out per AGENTS.md)
mix doctor ✅ 100% (105 modules)
mix dialyzer ✅ 0 warnings
mix sobelow --exit Low ⚠️ line-fingerprint drift (see Reviewer note)
mix test.json --quiet ✅ identical failure set as development baseline (15 pre-existing cached-integration / corpus-staleness failures, 0 new)
mix test.json --cover (Normalization module) ✅ 95.24% on the new module
mix ccxt_extract.contract_test ✅ both new invariants 0 findings (correctly skip on v3-shaped output)
mix ccxt_extract.validate --schema-target=4 (tier1+tier2) ✅ 19/19 exchanges validate clean
v3 emit byte-identical to development for binance/deribit/okx ✅ MD5 match
binance v4 spec size ✅ ~5 KB delta vs v3 (no Hex-cap regression)
PR opened non-draft

Reviewer note (.sobelow-skips)

Per linear-workflow § "Code-Only PRs + Required Acceptance Criteria":

.sobelow-skips exception: for repos with sobelow line-fingerprint drift, the harness fails-loud-with-diff if drift is detected; commit-review applies the regen at merge in the same post-merge commit. Agent never touches the file.

This PR's edits to lib/ccxt_extract/contract_test.ex (added Normalization alias + parse_methods_inventory baseline plumbing) and lib/ccxt_extract/pipeline.ex (added Normalization alias + 8-line v4 carrier wiring) shifted line numbers, invalidating 8 fingerprints and producing 8 new ones. Local mix sobelow --exit Low flags one finding (validation.ex:184 — the same finding skip 666 covers, just with a slightly different context hash). Reviewer regenerates .sobelow-skips in the post-merge bookkeeping commit.

The CI gate (mix sobelow with no --exit) returns 0 regardless of findings, so harness CI passes.

Out of scope (deferred to commit-review's post-merge bookkeeping commit)

  • ROADMAP.md (Task 129 status flip ⬜ → ✅)
  • CHANGELOG.md ([Unreleased] entry)
  • SCHEMA.md (Normalization carrier description for v4 section)
  • .sobelow-skips (line-fingerprint regen)

Cross-repo

../ccxt_client/ROADMAP.md Phase 12 parser tasks remain blocked until field_maps are populated (Tasks 74–82 / 83). This carrier is the load-bearing prerequisite — no ccxt_client task flips on this PR alone.

Scoring

[D:3/B:6/U:7 → Eff:2.17] 🚀

Linear Issue: INE-61

Open in Web Open in Cursor 

Summary by CodeRabbit

  • New Features

    • v4 schema now emits a populated normalization block with parse-method digests, field maps, and response envelopes.
    • Emitted provenance marks normalization paths as derived.
  • Validation

    • New runtime checks validate normalization shape and ensure parse-method digests cover discovered methods.
  • Tests

    • Added unit and integration tests covering normalization construction, validation, and provenance.

Copilot AI review requested due to automatic review settings May 8, 2026 10:19
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 8, 2026

INE-61

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces the Normalization module for v4 schemas: it builds a compact normalization block (digest of parse methods, scaffolded field_maps and response_envelopes) from discovery data, integrates it into the pipeline and schema, validates structure via two new contract invariants, and extends test coverage across modules, contracts, and integration tests.

Changes

Normalization v4 Schema Construction & Validation

Layer / File(s) Summary
Data Shape & Contract Definitions
lib/ccxt_extract/normalization.ex, priv/schema/exchange_v4.json
JSON schema expanded from empty placeholder to strict structure requiring parse_methods_digest, field_maps, response_envelopes with new $defs types (NormalizationDigestRecord, NormalizationStubRecord, NormalizationStubValue); Normalization module provides parser type vocabulary and required key contract functions.
Core Builder Implementation
lib/ccxt_extract/normalization.ex
build/2 derives compact AST-free digest (method name → params, return_type, async, statement_count) from parse_methods; normalizes params lists and coerces statement counts to non-negative; scaffolds field_maps and response_envelopes with _unresolved_reason markers.
Pipeline & Schema Integration
lib/ccxt_extract/pipeline.ex, lib/ccxt_extract/schema.ex, lib/ccxt_extract/provenance.ex
Pipeline constructs normalization from data.parse_methods[id] and injects into v4 options; Schema emits normalization in v4 output replacing prior empty object; Provenance registers /normalization/parse_methods_digest, /normalization/field_maps, /normalization/response_envelopes as derived pointers.
Contract Invariants & Validation
lib/ccxt_extract/contract_test.ex
Registers check_normalization_shape_valid/2 validating required/extra keys and field types; registers check_parse_methods_digest_covers_inventory/2 ensuring digest includes all discovery inventory methods; loads priv/discoveries/parse_methods.json into baseline inventory.
Unit Tests: Normalization Module
test/ccxt_extract/normalization_test.exs
Covers build/2 digest derivation, scaffold structure, required keys, type validity, method-name inventory preservation, and defensive handling of malformed params/statements/non-map inputs.
Contract Invariant Tests
test/ccxt_extract/contract_test_test.exs
Tests check_normalization_shape_valid/2 for missing required keys, extra keys, digest record type errors, field_maps/response_envelopes malformations; tests check_parse_methods_digest_covers_inventory/2 for skip conditions, exact match, and missing inventory methods with deterministic sorting.
Pipeline, Provenance & Integration Tests
test/ccxt_extract/pipeline_test.exs, test/integration/cached/schema_v4_emit_cached_test.exs
Updates v4 emit test to assert normalization scaffold; adds new tests for digest projection from parse_methods.json, v3 exclusion of normalization, stub structure validation; extends provenance test to require normalization pointers; integration test validates emitted normalization structure and unresolved markers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • ZenHive/ccxt_extract#7: Introduces the v4 emission path that this PR extends by building and threading a concrete normalization value into PipelineSchema.build_exchange_v4/4 / Provenance code paths.

Poem

🐰 A tiny rabbit coded through the night,
Turning parse methods into a digest bright.
Stubs stand ready with reasons not yet done,
Invariants check each method, one by one.
Hooray — v4 normalization, tidy and light!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introduction of a v4 normalization block carrier that projects parse_methods into a digest and scaffolds field_maps and response_envelopes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/task-129-normalization-carrier-293e

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR scaffolds a new v4-only normalization block in the emitted exchange schema, including an AST-free parse_methods_digest plus stub field_maps and response_envelopes, while keeping v3 output unchanged.

Changes:

  • Add CcxtExtract.Normalization to build the v4 normalization carrier (digest + scaffolds).
  • Wire the pipeline/schema to include normalization only for --schema-target=4, and extend v4 provenance pointers accordingly.
  • Extend v4 JSON Schema and add/expand tests + contract invariants to validate the new block.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/integration/cached/schema_v4_emit_cached_test.exs Asserts v4 emits normalization with required keys and unresolved scaffolds.
test/ccxt_extract/pipeline_test.exs Updates v4 pipeline expectations; adds targeted tests for digest projection and v3 absence.
test/ccxt_extract/normalization_test.exs New unit tests for digest/scaffold generation and defensive behavior.
test/ccxt_extract/contract_test_test.exs Adds tests for new normalization-related contract invariants.
priv/schema/exchange_v4.json Defines the Normalization block, digest record, and stub record/value schema.
lib/ccxt_extract/schema.ex Emits normalization in v4 using provided/derived scaffold instead of %{}.
lib/ccxt_extract/provenance.ex Adds derived provenance pointers for /normalization/* in v4.
lib/ccxt_extract/pipeline.ex Computes normalization only on the v4 path and threads it into schema build opts.
lib/ccxt_extract/normalization.ex New module implementing digest projection and stub scaffold generation.
lib/ccxt_extract/contract_test.ex Adds invariants for normalization shape and digest coverage vs inventory; loads parse_methods inventory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1060 to +1063
defp digest_findings(_id, nil), do: []

defp digest_findings(id, digest) when is_map(digest) do
digest
Comment on lines +1125 to +1126
defp stub_record_findings(_id, _path, nil), do: []

Comment on lines +1185 to +1190
if Map.has_key?(exchange, "normalization") do
digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)

digest_inventory_findings(id, digest, inventory)
else
Comment on lines 189 to 193
auth_sections = structure_data["authenticated_sections"]
sign_method = structure_data["sign_method"]
describe_api = structure_data["describe_api"]
normalization = Keyword.get(opts, :normalization) || CcxtExtract.Normalization.build(nil)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e86b1d667f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1197 to +1198
defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) do
digest_keys = digest |> Map.keys() |> MapSet.new()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard non-map digests before checking inventory

If a v4 output has a malformed normalization.parse_methods_digest value (for example a list or string) and the parse-method inventory contains this exchange, this invariant crashes at Map.keys/1 instead of reporting findings; run_invariants/2 does not rescue invariant errors, so one bad exchange aborts the entire contract-test run. The shape invariant already reports parse_methods_digest as non-map, so this coverage check should skip or emit its own finding unless digest is a map.

Useful? React with 👍 / 👎.

Comment on lines 241 to +248
"Normalization": {
"type": "object",
"description": "Response-normalization surface. Empty placeholder until Task 129 ships parse_methods_digest (compact, no AST body), Phase 12 ships field_maps, and Task 83 ships response_envelopes. Permissive (additionalProperties: true) so future tasks can populate without a schema break.",
"additionalProperties": true,
"properties": {}
"description": "Response-normalization surface. Task 129 ships parse_methods_digest (compact, no AST body — re-emitting bodies regresses the 91.6% Hex-cap reduction Task 117 shipped), and stub field_maps + response_envelopes scaffolds. Phase 12 sub-bundles (Tasks 74–83) populate the field maps and envelope paths on top of this scaffold.",
"required": [
"parse_methods_digest",
"field_maps",
"response_envelopes"
],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Document the new normalization contract

This changes the emitted v4 JSON shape from an open empty normalization placeholder to a required, closed parse_methods_digest/field_maps/response_envelopes contract, but the commit does not update SCHEMA.md. The repository AGENTS.md documentation invariant explicitly requires SCHEMA.md updates when the emitted JSON shape changes; without that, consumers reading the contract docs will still see stale normalization guidance.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/ccxt_extract/schema.ex (1)

192-218: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Teach validate_v4/1 about the injected normalization shape.

build_exchange_v4/4 now accepts caller-provided :normalization, but validate_v4/1 never descends into that section. A payload with %{"normalization" => %{}} still passes preflight, so malformed v4 output can slip through Pipeline.extract/1 until a later full-schema run.

Minimal follow-up
+ `@required_normalization_keys_v4` ~w(parse_methods_digest field_maps response_envelopes)
...
       |> check_required_keys(data["auth"], `@required_auth_keys_v4`, "auth")
       |> check_required_keys(data["errors"], `@required_errors_keys_v4`, "errors")
+      |> check_required_keys(data["normalization"], `@required_normalization_keys_v4`, "normalization")
       |> check_required_keys(data["markets"], `@required_markets_keys_v4`, "markets")
       |> check_required_keys(data["raw"], `@required_raw_keys_v4`, "raw")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/schema.ex` around lines 192 - 218, validate_v4/1 must also
descend into the injected "normalization" section added by build_exchange_v4/4;
update validate_v4/1 to fetch the "normalization" key from the payload and
validate it against the Normalization module (e.g. call
CcxtExtract.Normalization.validate/1 or the module's schema/validator) so that a
caller-provided %{"normalization" => %{}} is rejected, and wire any validation
errors into the same preflight error reporting path as the other sections.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1185-1189: The code assumes `digest` is a map before passing it to
`digest_inventory_findings`, which can crash if
`normalization.parse_methods_digest` is a scalar; after retrieving `digest` with
get_in(exchange, ["normalization", "parse_methods_digest"]) (currently assigned
to `digest`), guard it with an is_map check and coerce non-maps to an empty map
(or bail out and return the same kind of finding as normalization_shape_valid
does) before calling digest_inventory_findings(id, digest, inventory); this
avoids Map.keys/1 errors when malformed digests are emitted.

In `@lib/ccxt_extract/normalization.ex`:
- Around line 117-160: Add `@spec` annotations for each private helper to satisfy
Dialyzer and the repo rule: annotate digest_from_entry/1 (accepts nil or map and
returns map()), digest_from_methods/1 (map() -> map()), digest_record/1 (map() |
any() -> map()), normalize_params/1 (nil | list() | any() -> list()),
normalize_param/1 (map() | any() -> map()), and normalize_statement_count/1
(integer() | any() -> non_neg_integer()). Use concrete Elixir types (map(),
list(), integer(), non_neg_integer(), nil) and ensure specs match the existing
return shapes (maps/lists/integers) for functions like digest_record/1 and
normalize_param/1.

In `@priv/schema/exchange_v4.json`:
- Around line 348-359: The _unresolved_reason field currently allows any
non-empty string; tighten it to the closed vocabulary scaffold sentinel by
replacing the "type":"string","minLength":1 branch with an enum of allowed
sentinel(s) (e.g. ["not_yet_derived"]) so the oneOf becomes either null or that
specific sentinel value; update the schema for _unresolved_reason accordingly
(refer to the _unresolved_reason property in the diff).

---

Outside diff comments:
In `@lib/ccxt_extract/schema.ex`:
- Around line 192-218: validate_v4/1 must also descend into the injected
"normalization" section added by build_exchange_v4/4; update validate_v4/1 to
fetch the "normalization" key from the payload and validate it against the
Normalization module (e.g. call CcxtExtract.Normalization.validate/1 or the
module's schema/validator) so that a caller-provided %{"normalization" => %{}}
is rejected, and wire any validation errors into the same preflight error
reporting path as the other sections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6baf393e-f4a2-4212-8b2c-48bc50565e2f

📥 Commits

Reviewing files that changed from the base of the PR and between b65a526 and e86b1d6.

📒 Files selected for processing (10)
  • lib/ccxt_extract/contract_test.ex
  • lib/ccxt_extract/normalization.ex
  • lib/ccxt_extract/pipeline.ex
  • lib/ccxt_extract/provenance.ex
  • lib/ccxt_extract/schema.ex
  • priv/schema/exchange_v4.json
  • test/ccxt_extract/contract_test_test.exs
  • test/ccxt_extract/normalization_test.exs
  • test/ccxt_extract/pipeline_test.exs
  • test/integration/cached/schema_v4_emit_cached_test.exs

Comment on lines +1185 to +1189
if Map.has_key?(exchange, "normalization") do
digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)

digest_inventory_findings(id, digest, inventory)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-map digests before calling Map.keys/1.

If a broken emit writes normalization.parse_methods_digest as a scalar, this invariant raises and aborts the whole contract run instead of returning a finding. normalization_shape_valid already handles malformed sections defensively, so this check should short-circuit the same way.

One safe way to harden it
   def check_parse_methods_digest_covers_inventory(exchange, observed) do
     id = exchange_id(exchange)

     if Map.has_key?(exchange, "normalization") do
-      digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
+      digest = get_in(exchange, ["normalization", "parse_methods_digest"])
       inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)

       digest_inventory_findings(id, digest, inventory)
     else
       []
     end
   end

   defp digest_inventory_findings(_id, _digest, nil), do: []
+  defp digest_inventory_findings(_id, digest, _inventory) when not is_map(digest), do: []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/contract_test.ex` around lines 1185 - 1189, The code assumes
`digest` is a map before passing it to `digest_inventory_findings`, which can
crash if `normalization.parse_methods_digest` is a scalar; after retrieving
`digest` with get_in(exchange, ["normalization", "parse_methods_digest"])
(currently assigned to `digest`), guard it with an is_map check and coerce
non-maps to an empty map (or bail out and return the same kind of finding as
normalization_shape_valid does) before calling digest_inventory_findings(id,
digest, inventory); this avoids Map.keys/1 errors when malformed digests are
emitted.

Comment on lines +117 to +160
defp digest_from_entry(nil), do: %{}
defp digest_from_entry(%{"parse_methods" => methods}) when is_map(methods), do: digest_from_methods(methods)
defp digest_from_entry(_), do: %{}

defp digest_from_methods(methods) do
Map.new(methods, fn {name, ast} ->
{name, digest_record(ast)}
end)
end

defp digest_record(ast) when is_map(ast) do
%{
"params" => normalize_params(Map.get(ast, "params")),
"return_type" => Map.get(ast, "return_type"),
"async" => Map.get(ast, "async", false) == true,
"statement_count" => normalize_statement_count(Map.get(ast, "statements"))
}
end

defp digest_record(_) do
%{
"params" => [],
"return_type" => nil,
"async" => false,
"statement_count" => 0
}
end

defp normalize_params(nil), do: []

defp normalize_params(list) when is_list(list) do
Enum.map(list, &normalize_param/1)
end

defp normalize_params(_), do: []

defp normalize_param(%{"name" => name} = param) when is_binary(name) do
%{"name" => name, "type" => Map.get(param, "type")}
end

defp normalize_param(_), do: %{"name" => "", "type" => nil}

defp normalize_statement_count(n) when is_integer(n) and n >= 0, do: n
defp normalize_statement_count(_), do: 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add @specs for the private helpers.

This new module leaves every defp untyped, which breaks the repo rule for Elixir modules and makes Dialyzer regressions here harder to catch.

As per coding guidelines, Every Elixir function must have a @SPEC — def and defp alike. No exceptions for 'trivial' helpers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/normalization.ex` around lines 117 - 160, Add `@spec`
annotations for each private helper to satisfy Dialyzer and the repo rule:
annotate digest_from_entry/1 (accepts nil or map and returns map()),
digest_from_methods/1 (map() -> map()), digest_record/1 (map() | any() ->
map()), normalize_params/1 (nil | list() | any() -> list()), normalize_param/1
(map() | any() -> map()), and normalize_statement_count/1 (integer() | any() ->
non_neg_integer()). Use concrete Elixir types (map(), list(), integer(),
non_neg_integer(), nil) and ensure specs match the existing return shapes
(maps/lists/integers) for functions like digest_record/1 and normalize_param/1.

Comment on lines +348 to +359
"_unresolved_reason": {
"oneOf": [
{
"type": "null"
},
{
"type": "string",
"minLength": 1
}
],
"description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Pin _unresolved_reason to the scaffold sentinel.

This field is documented as a closed-vocabulary scaffold tag, but the schema currently accepts any non-empty string. That makes unexpected sentinel values schema-valid and weakens the new v4 contract.

Proposed schema tightening
         "_unresolved_reason": {
           "oneOf": [
             {
               "type": "null"
             },
             {
               "type": "string",
-              "minLength": 1
+              "enum": [
+                "not_yet_derived"
+              ]
             }
           ],
           "description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"_unresolved_reason": {
"oneOf": [
{
"type": "null"
},
{
"type": "string",
"minLength": 1
}
],
"description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."
}
"_unresolved_reason": {
"oneOf": [
{
"type": "null"
},
{
"type": "string",
"enum": [
"not_yet_derived"
]
}
],
"description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@priv/schema/exchange_v4.json` around lines 348 - 359, The _unresolved_reason
field currently allows any non-empty string; tighten it to the closed vocabulary
scaffold sentinel by replacing the "type":"string","minLength":1 branch with an
enum of allowed sentinel(s) (e.g. ["not_yet_derived"]) so the oneOf becomes
either null or that specific sentinel value; update the schema for
_unresolved_reason accordingly (refer to the _unresolved_reason property in the
diff).

…ld_maps scaffold) (INE-61)

Lands the Phase 12 prerequisite: scaffolds the v4 `normalization` section
under `--schema-target=4` only. v3 emission stays byte-identical (verified
on binance/deribit/okx).

* New `CcxtExtract.Normalization` module — `build/2` projects the per-
  exchange `priv/discoveries/parse_methods.json` entry into a compact,
  AST-free digest (`{params, return_type, async, statement_count}`) and
  scaffolds `field_maps` + `response_envelopes` keyed by parser type.
  No AST body: re-emitting full ESTree bodies regresses the 91.6% Hex-cap
  reduction Task 117 shipped (binance v4 spec is only ~5 KB larger than v3).
* `Pipeline.build_exchange_data/3` calls `Normalization.build/2` only on
  the v4 path; v3 path is untouched.
* `Schema.build_exchange_v4/4` reads `:normalization` from opts and emits
  the new block; the old empty placeholder `%{}` is gone.
* `priv/schema/exchange_v4.json`: adds `Normalization`,
  `NormalizationDigestRecord`, `NormalizationStubRecord`,
  `NormalizationStubValue` defs. Strict on top-level keys, permissive
  on stub values (oneOf null/object) so Phase 12 sub-bundles can populate
  without a schema break.
* `Provenance.build_default_v4/0` adds three new derived pointers:
  `/normalization/parse_methods_digest`, `/normalization/field_maps`,
  `/normalization/response_envelopes`.
* Two new contract invariants:
  - `normalization_shape_valid` — top-level + per-record shape gate;
    skipped on v3-shaped output (no top-level `normalization` key).
  - `parse_methods_digest_covers_inventory` — every method in the
    per-exchange parse_methods.json entry must surface in the emitted
    digest. Inventory loaded once via `load_parse_methods_inventory/1`
    and threaded through `observed`.

Tests: new `test/ccxt_extract/normalization_test.exs` (round-trip + shape
+ defensive cases, ~95% coverage on the new module), new contract-test
cases for both invariants, extended `pipeline_test.exs` v4 emit asserts,
extended cached `schema_v4_emit_cached_test.exs` to assert the new block
on tier 1 priority exchanges (binance / deribit / okx).

Validated against `exchange_v4.json` for all 19 tier1+tier2 exchanges
(0 failures). v3 binance/deribit/okx emission MD5-identical between
`development` and this branch.

`.sobelow-skips` line-fingerprint drift expected from line shifts in
contract_test.ex / pipeline.ex; per linear-workflow §
"Code-Only PRs" the agent does not touch the file — commit-review
applies the regen in the post-merge bookkeeping commit.

Co-authored-by: E.FU <e-fu@users.noreply.github.com>
@e-fu e-fu force-pushed the cursor/task-129-normalization-carrier-293e branch from e86b1d6 to c5ad67f Compare May 8, 2026 13:04
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
lib/ccxt_extract/contract_test.ex (1)

1424-1428: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-map digests before Map.keys/1.

If normalization.parse_methods_digest is emitted as a scalar, this path still crashes the whole contract run instead of returning a finding or skipping malformed data.

Suggested hardening
   def check_parse_methods_digest_covers_inventory(exchange, observed) do
     id = exchange_id(exchange)

     if Map.has_key?(exchange, "normalization") do
-      digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
+      digest = get_in(exchange, ["normalization", "parse_methods_digest"])
       inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)

       digest_inventory_findings(id, digest, inventory)
     else
       []
     end
   end

   defp digest_inventory_findings(_id, _digest, nil), do: []
+  defp digest_inventory_findings(_id, digest, _inventory) when not is_map(digest), do: []

   defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) do

Also applies to: 1436-1437

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/contract_test.ex` around lines 1424 - 1428, The code assumes
normalization.parse_methods_digest is a map and later calls Map.keys/1 which
will crash on scalars; update the guards so before calling
digest_inventory_findings (and the analogous block at lines 1436-1437) you
verify the digest is a map (use is_map(digest)) and if not either coerce it to
an empty map (digest = %{}) or return/record a finding for malformed digest, so
Map.keys/1 is never invoked on a non-map value; apply the same check for
observed[:parse_methods_inventory] where appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1381-1396: stub_value_findings/3 currently only iterates
Normalization.parser_types() so a scalar _unresolved_reason slips through;
update the function to validate _unresolved_reason as well by either including
"_unresolved_reason" in the checked keys or adding an explicit check after the
Enum.flat_map that inspects Map.fetch(record, "_unresolved_reason") and, if
{:ok, other} and not nil and not a map, returns a normalization_finding(id,
"#{path}._unresolved_reason", "must be null or a map, got #{inspect(other)}");
keep the same message format and return shape so callers of
stub_value_findings/3 behave unchanged.
- Around line 1279-1286: The current Map.get(record, "...") calls (used when
computing digest_findings, field_maps_findings, response_envelopes_findings)
swallow present-but-null values, so explicit "key": null doesn't produce a shape
finding; replace those Map.get/2 usages with Map.fetch(record, "key") (or an
equivalent that distinguishes present-nil from missing) and pass the {:ok,
value} result through to stub_record_findings (so {:ok, nil} yields a finding),
while letting :error continue to be treated as the missing-key case already
reported; apply the same change where Map.get/2 is used in the other occurrences
mentioned (lines 1299-1308 and 1364-1378).
- Around line 1792-1804: The current JsonIO.read_json(path) branch silently
returns %{} for any error or wrong shape, hiding malformed parse_methods.json;
change the logic in the case handling (the block calling JsonIO.read_json) so
that only a missing-file error (e.g., {:error, :enoent} or whatever JsonIO uses
for not-found) returns %{}, while any other {:error, reason} or an {:ok, value}
that does not match %{"exchanges" => entries} should raise or propagate an error
(or at minimum log and rethrow) so malformed JSON or unexpected top-level shape
surfaces and does not silently disable parse_methods_digest_covers_inventory.

---

Duplicate comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1424-1428: The code assumes normalization.parse_methods_digest is
a map and later calls Map.keys/1 which will crash on scalars; update the guards
so before calling digest_inventory_findings (and the analogous block at lines
1436-1437) you verify the digest is a map (use is_map(digest)) and if not either
coerce it to an empty map (digest = %{}) or return/record a finding for
malformed digest, so Map.keys/1 is never invoked on a non-map value; apply the
same check for observed[:parse_methods_inventory] where appropriate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 854fdfa8-6a93-4764-a924-dd881e377b59

📥 Commits

Reviewing files that changed from the base of the PR and between e86b1d6 and c5ad67f.

📒 Files selected for processing (10)
  • lib/ccxt_extract/contract_test.ex
  • lib/ccxt_extract/normalization.ex
  • lib/ccxt_extract/pipeline.ex
  • lib/ccxt_extract/provenance.ex
  • lib/ccxt_extract/schema.ex
  • priv/schema/exchange_v4.json
  • test/ccxt_extract/contract_test_test.exs
  • test/ccxt_extract/normalization_test.exs
  • test/ccxt_extract/pipeline_test.exs
  • test/integration/cached/schema_v4_emit_cached_test.exs
✅ Files skipped from review due to trivial changes (1)
  • lib/ccxt_extract/provenance.ex
🚧 Files skipped from review as they are similar to previous changes (7)
  • lib/ccxt_extract/pipeline.ex
  • test/integration/cached/schema_v4_emit_cached_test.exs
  • test/ccxt_extract/normalization_test.exs
  • test/ccxt_extract/contract_test_test.exs
  • lib/ccxt_extract/normalization.ex
  • lib/ccxt_extract/schema.ex
  • priv/schema/exchange_v4.json

Comment on lines +1279 to +1286
digest_findings =
digest_findings(id, Map.get(record, "parse_methods_digest"))

field_maps_findings =
stub_record_findings(id, "normalization.field_maps", Map.get(record, "field_maps"))

response_envelopes_findings =
stub_record_findings(id, "normalization.response_envelopes", Map.get(record, "response_envelopes"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let present-but-null normalization sections pass.

Map.get/2 plus the nil clauses here means "parse_methods_digest": null, "field_maps": null, and "response_envelopes": null produce no shape finding. Missing keys are already reported above; explicit null should still fail this invariant.

Also applies to: 1299-1308, 1364-1378

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/contract_test.ex` around lines 1279 - 1286, The current
Map.get(record, "...") calls (used when computing digest_findings,
field_maps_findings, response_envelopes_findings) swallow present-but-null
values, so explicit "key": null doesn't produce a shape finding; replace those
Map.get/2 usages with Map.fetch(record, "key") (or an equivalent that
distinguishes present-nil from missing) and pass the {:ok, value} result through
to stub_record_findings (so {:ok, nil} yields a finding), while letting :error
continue to be treated as the missing-key case already reported; apply the same
change where Map.get/2 is used in the other occurrences mentioned (lines
1299-1308 and 1364-1378).

Comment on lines +1381 to +1396
defp stub_value_findings(id, path, record) do
Enum.flat_map(Normalization.parser_types(), fn key ->
case Map.fetch(record, key) do
{:ok, nil} ->
[]

{:ok, v} when is_map(v) ->
[]

{:ok, other} ->
[normalization_finding(id, "#{path}.#{key}", "must be null or a map, got #{inspect(other)}")]

:error ->
[]
end
end)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate _unresolved_reason values too.

stub_value_findings/3 only walks Normalization.parser_types/0, so _unresolved_reason can hold any scalar and still pass once the key-set check succeeds. The docstring says every stub value should be null or a map.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/contract_test.ex` around lines 1381 - 1396,
stub_value_findings/3 currently only iterates Normalization.parser_types() so a
scalar _unresolved_reason slips through; update the function to validate
_unresolved_reason as well by either including "_unresolved_reason" in the
checked keys or adding an explicit check after the Enum.flat_map that inspects
Map.fetch(record, "_unresolved_reason") and, if {:ok, other} and not nil and not
a map, returns a normalization_finding(id, "#{path}._unresolved_reason", "must
be null or a map, got #{inspect(other)}"); keep the same message format and
return shape so callers of stub_value_findings/3 behave unchanged.

Comment on lines +1792 to +1804
case JsonIO.read_json(path) do
{:ok, %{"exchanges" => entries}} when is_list(entries) ->
Map.new(entries, fn entry ->
methods = Map.get(entry, "parse_methods") || %{}
{entry["id"], Map.keys(methods)}
end)

{:ok, _malformed} ->
%{}

{:error, _reason} ->
%{}
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail loudly on malformed parse_methods.json.

This returns %{} for any read or shape error, which silently disables parse_methods_digest_covers_inventory. Skipping only when the discovery file is missing makes sense for v3-only runs; malformed JSON or an unexpected top-level shape should surface instead.

Suggested tightening
     case JsonIO.read_json(path) do
       {:ok, %{"exchanges" => entries}} when is_list(entries) ->
         Map.new(entries, fn entry ->
           methods = Map.get(entry, "parse_methods") || %{}
           {entry["id"], Map.keys(methods)}
         end)

-      {:ok, _malformed} ->
-        %{}
+      {:ok, _malformed} ->
+        raise "invalid parse_methods inventory at #{path}: expected %{\"exchanges\" => [...]}"

-      {:error, _reason} ->
+      {:error, {:missing_input, _}} ->
         %{}
+
+      {:error, reason} ->
+        raise "failed to load parse_methods inventory at #{path}: #{inspect(reason)}"
     end
   end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case JsonIO.read_json(path) do
{:ok, %{"exchanges" => entries}} when is_list(entries) ->
Map.new(entries, fn entry ->
methods = Map.get(entry, "parse_methods") || %{}
{entry["id"], Map.keys(methods)}
end)
{:ok, _malformed} ->
%{}
{:error, _reason} ->
%{}
end
case JsonIO.read_json(path) do
{:ok, %{"exchanges" => entries}} when is_list(entries) ->
Map.new(entries, fn entry ->
methods = Map.get(entry, "parse_methods") || %{}
{entry["id"], Map.keys(methods)}
end)
{:ok, _malformed} ->
raise "invalid parse_methods inventory at #{path}: expected %{\"exchanges\" => [...]}"
{:error, {:missing_input, _}} ->
%{}
{:error, reason} ->
raise "failed to load parse_methods inventory at #{path}: #{inspect(reason)}"
end
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ccxt_extract/contract_test.ex` around lines 1792 - 1804, The current
JsonIO.read_json(path) branch silently returns %{} for any error or wrong shape,
hiding malformed parse_methods.json; change the logic in the case handling (the
block calling JsonIO.read_json) so that only a missing-file error (e.g.,
{:error, :enoent} or whatever JsonIO uses for not-found) returns %{}, while any
other {:error, reason} or an {:ok, value} that does not match %{"exchanges" =>
entries} should raise or propagate an error (or at minimum log and rethrow) so
malformed JSON or unexpected top-level shape surfaces and does not silently
disable parse_methods_digest_covers_inventory.

@e-fu e-fu merged commit 7c5ee73 into development May 8, 2026
2 checks passed
@e-fu e-fu deleted the cursor/task-129-normalization-carrier-293e branch May 8, 2026 13:10
e-fu added a commit that referenced this pull request May 8, 2026
…rrier

Hardens the v4 normalization block against malformed input and tightens
the contract surface that CodeRabbit + Codex flagged on PR #10.

- contract_test: guard `digest_inventory_findings/3` against non-map
  digests so `Map.keys/1` no longer crashes the run on a scalar
- contract_test: validate `_unresolved_reason` in `stub_value_findings/3`
  (must be null or the sentinel `"not_yet_derived"`)
- contract_test: drop nil-clauses in `digest_findings/2` and
  `stub_record_findings/3` so explicit `"key": null` produces a finding
  instead of silently passing
- contract_test: distinguish `:missing_input` from `:invalid_json` /
  malformed-shape in `load_parse_methods_inventory/1` — corrupt JSON or
  wrong top-level shape now raises with regen instructions instead of
  silently disabling `parse_methods_digest_covers_inventory`
- normalization: add @SPEC on six private helpers per the
  every-function-gets-a-spec mandate
- schema: `validate_v4/1` now descends into `/normalization` so a
  caller-provided `%{}` fails preflight
- exchange_v4.json: tighten `_unresolved_reason` from open-string to
  enum `["not_yet_derived"]` (closed-vocabulary scaffold sentinel)
e-fu added a commit that referenced this pull request May 8, 2026
…rrier (#15)

Hardens the v4 normalization block against malformed input and tightens
the contract surface that CodeRabbit + Codex flagged on PR #10.

- contract_test: guard `digest_inventory_findings/3` against non-map
  digests so `Map.keys/1` no longer crashes the run on a scalar
- contract_test: validate `_unresolved_reason` in `stub_value_findings/3`
  (must be null or the sentinel `"not_yet_derived"`)
- contract_test: drop nil-clauses in `digest_findings/2` and
  `stub_record_findings/3` so explicit `"key": null` produces a finding
  instead of silently passing
- contract_test: distinguish `:missing_input` from `:invalid_json` /
  malformed-shape in `load_parse_methods_inventory/1` — corrupt JSON or
  wrong top-level shape now raises with regen instructions instead of
  silently disabling `parse_methods_digest_covers_inventory`
- normalization: add @SPEC on six private helpers per the
  every-function-gets-a-spec mandate
- schema: `validate_v4/1` now descends into `/normalization` so a
  caller-provided `%{}` fails preflight
- exchange_v4.json: tighten `_unresolved_reason` from open-string to
  enum `["not_yet_derived"]` (closed-vocabulary scaffold sentinel)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants