Task 129 — v4 normalization block carrier (parse_methods digest + field_maps scaffold) (INE-61)#10
Conversation
📝 WalkthroughWalkthroughThis PR introduces the Normalization module for v4 schemas: it builds a compact ChangesNormalization v4 Schema Construction & Validation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR scaffolds a new v4-only normalization block in the emitted exchange schema, including an AST-free parse_methods_digest plus stub field_maps and response_envelopes, while keeping v3 output unchanged.
Changes:
- Add
CcxtExtract.Normalizationto build the v4normalizationcarrier (digest + scaffolds). - Wire the pipeline/schema to include
normalizationonly for--schema-target=4, and extend v4 provenance pointers accordingly. - Extend v4 JSON Schema and add/expand tests + contract invariants to validate the new block.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/cached/schema_v4_emit_cached_test.exs | Asserts v4 emits normalization with required keys and unresolved scaffolds. |
| test/ccxt_extract/pipeline_test.exs | Updates v4 pipeline expectations; adds targeted tests for digest projection and v3 absence. |
| test/ccxt_extract/normalization_test.exs | New unit tests for digest/scaffold generation and defensive behavior. |
| test/ccxt_extract/contract_test_test.exs | Adds tests for new normalization-related contract invariants. |
| priv/schema/exchange_v4.json | Defines the Normalization block, digest record, and stub record/value schema. |
| lib/ccxt_extract/schema.ex | Emits normalization in v4 using provided/derived scaffold instead of %{}. |
| lib/ccxt_extract/provenance.ex | Adds derived provenance pointers for /normalization/* in v4. |
| lib/ccxt_extract/pipeline.ex | Computes normalization only on the v4 path and threads it into schema build opts. |
| lib/ccxt_extract/normalization.ex | New module implementing digest projection and stub scaffold generation. |
| lib/ccxt_extract/contract_test.ex | Adds invariants for normalization shape and digest coverage vs inventory; loads parse_methods inventory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| defp digest_findings(_id, nil), do: [] | ||
|
|
||
| defp digest_findings(id, digest) when is_map(digest) do | ||
| digest |
| defp stub_record_findings(_id, _path, nil), do: [] | ||
|
|
| if Map.has_key?(exchange, "normalization") do | ||
| digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{} | ||
| inventory = Map.get(observed[:parse_methods_inventory] || %{}, id) | ||
|
|
||
| digest_inventory_findings(id, digest, inventory) | ||
| else |
| auth_sections = structure_data["authenticated_sections"] | ||
| sign_method = structure_data["sign_method"] | ||
| describe_api = structure_data["describe_api"] | ||
| normalization = Keyword.get(opts, :normalization) || CcxtExtract.Normalization.build(nil) | ||
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e86b1d667f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) do | ||
| digest_keys = digest |> Map.keys() |> MapSet.new() |
There was a problem hiding this comment.
Guard non-map digests before checking inventory
If a v4 output has a malformed normalization.parse_methods_digest value (for example a list or string) and the parse-method inventory contains this exchange, this invariant crashes at Map.keys/1 instead of reporting findings; run_invariants/2 does not rescue invariant errors, so one bad exchange aborts the entire contract-test run. The shape invariant already reports parse_methods_digest as non-map, so this coverage check should skip or emit its own finding unless digest is a map.
Useful? React with 👍 / 👎.
| "Normalization": { | ||
| "type": "object", | ||
| "description": "Response-normalization surface. Empty placeholder until Task 129 ships parse_methods_digest (compact, no AST body), Phase 12 ships field_maps, and Task 83 ships response_envelopes. Permissive (additionalProperties: true) so future tasks can populate without a schema break.", | ||
| "additionalProperties": true, | ||
| "properties": {} | ||
| "description": "Response-normalization surface. Task 129 ships parse_methods_digest (compact, no AST body — re-emitting bodies regresses the 91.6% Hex-cap reduction Task 117 shipped), and stub field_maps + response_envelopes scaffolds. Phase 12 sub-bundles (Tasks 74–83) populate the field maps and envelope paths on top of this scaffold.", | ||
| "required": [ | ||
| "parse_methods_digest", | ||
| "field_maps", | ||
| "response_envelopes" | ||
| ], |
There was a problem hiding this comment.
Document the new normalization contract
This changes the emitted v4 JSON shape from an open empty normalization placeholder to a required, closed parse_methods_digest/field_maps/response_envelopes contract, but the commit does not update SCHEMA.md. The repository AGENTS.md documentation invariant explicitly requires SCHEMA.md updates when the emitted JSON shape changes; without that, consumers reading the contract docs will still see stale normalization guidance.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/ccxt_extract/schema.ex (1)
192-218:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTeach
validate_v4/1about the injectednormalizationshape.
build_exchange_v4/4now accepts caller-provided:normalization, butvalidate_v4/1never descends into that section. A payload with%{"normalization" => %{}}still passes preflight, so malformed v4 output can slip throughPipeline.extract/1until a later full-schema run.Minimal follow-up
+ `@required_normalization_keys_v4` ~w(parse_methods_digest field_maps response_envelopes) ... |> check_required_keys(data["auth"], `@required_auth_keys_v4`, "auth") |> check_required_keys(data["errors"], `@required_errors_keys_v4`, "errors") + |> check_required_keys(data["normalization"], `@required_normalization_keys_v4`, "normalization") |> check_required_keys(data["markets"], `@required_markets_keys_v4`, "markets") |> check_required_keys(data["raw"], `@required_raw_keys_v4`, "raw")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/schema.ex` around lines 192 - 218, validate_v4/1 must also descend into the injected "normalization" section added by build_exchange_v4/4; update validate_v4/1 to fetch the "normalization" key from the payload and validate it against the Normalization module (e.g. call CcxtExtract.Normalization.validate/1 or the module's schema/validator) so that a caller-provided %{"normalization" => %{}} is rejected, and wire any validation errors into the same preflight error reporting path as the other sections.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1185-1189: The code assumes `digest` is a map before passing it to
`digest_inventory_findings`, which can crash if
`normalization.parse_methods_digest` is a scalar; after retrieving `digest` with
get_in(exchange, ["normalization", "parse_methods_digest"]) (currently assigned
to `digest`), guard it with an is_map check and coerce non-maps to an empty map
(or bail out and return the same kind of finding as normalization_shape_valid
does) before calling digest_inventory_findings(id, digest, inventory); this
avoids Map.keys/1 errors when malformed digests are emitted.
In `@lib/ccxt_extract/normalization.ex`:
- Around line 117-160: Add `@spec` annotations for each private helper to satisfy
Dialyzer and the repo rule: annotate digest_from_entry/1 (accepts nil or map and
returns map()), digest_from_methods/1 (map() -> map()), digest_record/1 (map() |
any() -> map()), normalize_params/1 (nil | list() | any() -> list()),
normalize_param/1 (map() | any() -> map()), and normalize_statement_count/1
(integer() | any() -> non_neg_integer()). Use concrete Elixir types (map(),
list(), integer(), non_neg_integer(), nil) and ensure specs match the existing
return shapes (maps/lists/integers) for functions like digest_record/1 and
normalize_param/1.
In `@priv/schema/exchange_v4.json`:
- Around line 348-359: The _unresolved_reason field currently allows any
non-empty string; tighten it to the closed vocabulary scaffold sentinel by
replacing the "type":"string","minLength":1 branch with an enum of allowed
sentinel(s) (e.g. ["not_yet_derived"]) so the oneOf becomes either null or that
specific sentinel value; update the schema for _unresolved_reason accordingly
(refer to the _unresolved_reason property in the diff).
---
Outside diff comments:
In `@lib/ccxt_extract/schema.ex`:
- Around line 192-218: validate_v4/1 must also descend into the injected
"normalization" section added by build_exchange_v4/4; update validate_v4/1 to
fetch the "normalization" key from the payload and validate it against the
Normalization module (e.g. call CcxtExtract.Normalization.validate/1 or the
module's schema/validator) so that a caller-provided %{"normalization" => %{}}
is rejected, and wire any validation errors into the same preflight error
reporting path as the other sections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6baf393e-f4a2-4212-8b2c-48bc50565e2f
📒 Files selected for processing (10)
lib/ccxt_extract/contract_test.exlib/ccxt_extract/normalization.exlib/ccxt_extract/pipeline.exlib/ccxt_extract/provenance.exlib/ccxt_extract/schema.expriv/schema/exchange_v4.jsontest/ccxt_extract/contract_test_test.exstest/ccxt_extract/normalization_test.exstest/ccxt_extract/pipeline_test.exstest/integration/cached/schema_v4_emit_cached_test.exs
| if Map.has_key?(exchange, "normalization") do | ||
| digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{} | ||
| inventory = Map.get(observed[:parse_methods_inventory] || %{}, id) | ||
|
|
||
| digest_inventory_findings(id, digest, inventory) |
There was a problem hiding this comment.
Guard non-map digests before calling Map.keys/1.
If a broken emit writes normalization.parse_methods_digest as a scalar, this invariant raises and aborts the whole contract run instead of returning a finding. normalization_shape_valid already handles malformed sections defensively, so this check should short-circuit the same way.
One safe way to harden it
def check_parse_methods_digest_covers_inventory(exchange, observed) do
id = exchange_id(exchange)
if Map.has_key?(exchange, "normalization") do
- digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{}
+ digest = get_in(exchange, ["normalization", "parse_methods_digest"])
inventory = Map.get(observed[:parse_methods_inventory] || %{}, id)
digest_inventory_findings(id, digest, inventory)
else
[]
end
end
defp digest_inventory_findings(_id, _digest, nil), do: []
+ defp digest_inventory_findings(_id, digest, _inventory) when not is_map(digest), do: []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/ccxt_extract/contract_test.ex` around lines 1185 - 1189, The code assumes
`digest` is a map before passing it to `digest_inventory_findings`, which can
crash if `normalization.parse_methods_digest` is a scalar; after retrieving
`digest` with get_in(exchange, ["normalization", "parse_methods_digest"])
(currently assigned to `digest`), guard it with an is_map check and coerce
non-maps to an empty map (or bail out and return the same kind of finding as
normalization_shape_valid does) before calling digest_inventory_findings(id,
digest, inventory); this avoids Map.keys/1 errors when malformed digests are
emitted.
| defp digest_from_entry(nil), do: %{} | ||
| defp digest_from_entry(%{"parse_methods" => methods}) when is_map(methods), do: digest_from_methods(methods) | ||
| defp digest_from_entry(_), do: %{} | ||
|
|
||
| defp digest_from_methods(methods) do | ||
| Map.new(methods, fn {name, ast} -> | ||
| {name, digest_record(ast)} | ||
| end) | ||
| end | ||
|
|
||
| defp digest_record(ast) when is_map(ast) do | ||
| %{ | ||
| "params" => normalize_params(Map.get(ast, "params")), | ||
| "return_type" => Map.get(ast, "return_type"), | ||
| "async" => Map.get(ast, "async", false) == true, | ||
| "statement_count" => normalize_statement_count(Map.get(ast, "statements")) | ||
| } | ||
| end | ||
|
|
||
| defp digest_record(_) do | ||
| %{ | ||
| "params" => [], | ||
| "return_type" => nil, | ||
| "async" => false, | ||
| "statement_count" => 0 | ||
| } | ||
| end | ||
|
|
||
| defp normalize_params(nil), do: [] | ||
|
|
||
| defp normalize_params(list) when is_list(list) do | ||
| Enum.map(list, &normalize_param/1) | ||
| end | ||
|
|
||
| defp normalize_params(_), do: [] | ||
|
|
||
| defp normalize_param(%{"name" => name} = param) when is_binary(name) do | ||
| %{"name" => name, "type" => Map.get(param, "type")} | ||
| end | ||
|
|
||
| defp normalize_param(_), do: %{"name" => "", "type" => nil} | ||
|
|
||
| defp normalize_statement_count(n) when is_integer(n) and n >= 0, do: n | ||
| defp normalize_statement_count(_), do: 0 |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add @specs for the private helpers.
This new module leaves every defp untyped, which breaks the repo rule for Elixir modules and makes Dialyzer regressions here harder to catch.
As per coding guidelines, Every Elixir function must have a @SPEC — def and defp alike. No exceptions for 'trivial' helpers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/ccxt_extract/normalization.ex` around lines 117 - 160, Add `@spec`
annotations for each private helper to satisfy Dialyzer and the repo rule:
annotate digest_from_entry/1 (accepts nil or map and returns map()),
digest_from_methods/1 (map() -> map()), digest_record/1 (map() | any() ->
map()), normalize_params/1 (nil | list() | any() -> list()), normalize_param/1
(map() | any() -> map()), and normalize_statement_count/1 (integer() | any() ->
non_neg_integer()). Use concrete Elixir types (map(), list(), integer(),
non_neg_integer(), nil) and ensure specs match the existing return shapes
(maps/lists/integers) for functions like digest_record/1 and normalize_param/1.
| "_unresolved_reason": { | ||
| "oneOf": [ | ||
| { | ||
| "type": "null" | ||
| }, | ||
| { | ||
| "type": "string", | ||
| "minLength": 1 | ||
| } | ||
| ], | ||
| "description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated." | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Pin _unresolved_reason to the scaffold sentinel.
This field is documented as a closed-vocabulary scaffold tag, but the schema currently accepts any non-empty string. That makes unexpected sentinel values schema-valid and weakens the new v4 contract.
Proposed schema tightening
"_unresolved_reason": {
"oneOf": [
{
"type": "null"
},
{
"type": "string",
- "minLength": 1
+ "enum": [
+ "not_yet_derived"
+ ]
}
],
"description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated."
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "_unresolved_reason": { | |
| "oneOf": [ | |
| { | |
| "type": "null" | |
| }, | |
| { | |
| "type": "string", | |
| "minLength": 1 | |
| } | |
| ], | |
| "description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated." | |
| } | |
| "_unresolved_reason": { | |
| "oneOf": [ | |
| { | |
| "type": "null" | |
| }, | |
| { | |
| "type": "string", | |
| "enum": [ | |
| "not_yet_derived" | |
| ] | |
| } | |
| ], | |
| "description": "Closed-vocabulary scaffold tag. \"not_yet_derived\" is the Task 129 default. Phase 12 sub-bundles flip it to null when every parser-type entry is populated." | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@priv/schema/exchange_v4.json` around lines 348 - 359, The _unresolved_reason
field currently allows any non-empty string; tighten it to the closed vocabulary
scaffold sentinel by replacing the "type":"string","minLength":1 branch with an
enum of allowed sentinel(s) (e.g. ["not_yet_derived"]) so the oneOf becomes
either null or that specific sentinel value; update the schema for
_unresolved_reason accordingly (refer to the _unresolved_reason property in the
diff).
…ld_maps scaffold) (INE-61)
Lands the Phase 12 prerequisite: scaffolds the v4 `normalization` section
under `--schema-target=4` only. v3 emission stays byte-identical (verified
on binance/deribit/okx).
* New `CcxtExtract.Normalization` module — `build/2` projects the per-
exchange `priv/discoveries/parse_methods.json` entry into a compact,
AST-free digest (`{params, return_type, async, statement_count}`) and
scaffolds `field_maps` + `response_envelopes` keyed by parser type.
No AST body: re-emitting full ESTree bodies regresses the 91.6% Hex-cap
reduction Task 117 shipped (binance v4 spec is only ~5 KB larger than v3).
* `Pipeline.build_exchange_data/3` calls `Normalization.build/2` only on
the v4 path; v3 path is untouched.
* `Schema.build_exchange_v4/4` reads `:normalization` from opts and emits
the new block; the old empty placeholder `%{}` is gone.
* `priv/schema/exchange_v4.json`: adds `Normalization`,
`NormalizationDigestRecord`, `NormalizationStubRecord`,
`NormalizationStubValue` defs. Strict on top-level keys, permissive
on stub values (oneOf null/object) so Phase 12 sub-bundles can populate
without a schema break.
* `Provenance.build_default_v4/0` adds three new derived pointers:
`/normalization/parse_methods_digest`, `/normalization/field_maps`,
`/normalization/response_envelopes`.
* Two new contract invariants:
- `normalization_shape_valid` — top-level + per-record shape gate;
skipped on v3-shaped output (no top-level `normalization` key).
- `parse_methods_digest_covers_inventory` — every method in the
per-exchange parse_methods.json entry must surface in the emitted
digest. Inventory loaded once via `load_parse_methods_inventory/1`
and threaded through `observed`.
Tests: new `test/ccxt_extract/normalization_test.exs` (round-trip + shape
+ defensive cases, ~95% coverage on the new module), new contract-test
cases for both invariants, extended `pipeline_test.exs` v4 emit asserts,
extended cached `schema_v4_emit_cached_test.exs` to assert the new block
on tier 1 priority exchanges (binance / deribit / okx).
Validated against `exchange_v4.json` for all 19 tier1+tier2 exchanges
(0 failures). v3 binance/deribit/okx emission MD5-identical between
`development` and this branch.
`.sobelow-skips` line-fingerprint drift expected from line shifts in
contract_test.ex / pipeline.ex; per linear-workflow §
"Code-Only PRs" the agent does not touch the file — commit-review
applies the regen in the post-merge bookkeeping commit.
Co-authored-by: E.FU <e-fu@users.noreply.github.com>
e86b1d6 to
c5ad67f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
lib/ccxt_extract/contract_test.ex (1)
1424-1428:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard non-map digests before
Map.keys/1.If
normalization.parse_methods_digestis emitted as a scalar, this path still crashes the whole contract run instead of returning a finding or skipping malformed data.Suggested hardening
def check_parse_methods_digest_covers_inventory(exchange, observed) do id = exchange_id(exchange) if Map.has_key?(exchange, "normalization") do - digest = get_in(exchange, ["normalization", "parse_methods_digest"]) || %{} + digest = get_in(exchange, ["normalization", "parse_methods_digest"]) inventory = Map.get(observed[:parse_methods_inventory] || %{}, id) digest_inventory_findings(id, digest, inventory) else [] end end defp digest_inventory_findings(_id, _digest, nil), do: [] + defp digest_inventory_findings(_id, digest, _inventory) when not is_map(digest), do: [] defp digest_inventory_findings(id, digest, inventory) when is_list(inventory) doAlso applies to: 1436-1437
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/ccxt_extract/contract_test.ex` around lines 1424 - 1428, The code assumes normalization.parse_methods_digest is a map and later calls Map.keys/1 which will crash on scalars; update the guards so before calling digest_inventory_findings (and the analogous block at lines 1436-1437) you verify the digest is a map (use is_map(digest)) and if not either coerce it to an empty map (digest = %{}) or return/record a finding for malformed digest, so Map.keys/1 is never invoked on a non-map value; apply the same check for observed[:parse_methods_inventory] where appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1381-1396: stub_value_findings/3 currently only iterates
Normalization.parser_types() so a scalar _unresolved_reason slips through;
update the function to validate _unresolved_reason as well by either including
"_unresolved_reason" in the checked keys or adding an explicit check after the
Enum.flat_map that inspects Map.fetch(record, "_unresolved_reason") and, if
{:ok, other} and not nil and not a map, returns a normalization_finding(id,
"#{path}._unresolved_reason", "must be null or a map, got #{inspect(other)}");
keep the same message format and return shape so callers of
stub_value_findings/3 behave unchanged.
- Around line 1279-1286: The current Map.get(record, "...") calls (used when
computing digest_findings, field_maps_findings, response_envelopes_findings)
swallow present-but-null values, so explicit "key": null doesn't produce a shape
finding; replace those Map.get/2 usages with Map.fetch(record, "key") (or an
equivalent that distinguishes present-nil from missing) and pass the {:ok,
value} result through to stub_record_findings (so {:ok, nil} yields a finding),
while letting :error continue to be treated as the missing-key case already
reported; apply the same change where Map.get/2 is used in the other occurrences
mentioned (lines 1299-1308 and 1364-1378).
- Around line 1792-1804: The current JsonIO.read_json(path) branch silently
returns %{} for any error or wrong shape, hiding malformed parse_methods.json;
change the logic in the case handling (the block calling JsonIO.read_json) so
that only a missing-file error (e.g., {:error, :enoent} or whatever JsonIO uses
for not-found) returns %{}, while any other {:error, reason} or an {:ok, value}
that does not match %{"exchanges" => entries} should raise or propagate an error
(or at minimum log and rethrow) so malformed JSON or unexpected top-level shape
surfaces and does not silently disable parse_methods_digest_covers_inventory.
---
Duplicate comments:
In `@lib/ccxt_extract/contract_test.ex`:
- Around line 1424-1428: The code assumes normalization.parse_methods_digest is
a map and later calls Map.keys/1 which will crash on scalars; update the guards
so before calling digest_inventory_findings (and the analogous block at lines
1436-1437) you verify the digest is a map (use is_map(digest)) and if not either
coerce it to an empty map (digest = %{}) or return/record a finding for
malformed digest, so Map.keys/1 is never invoked on a non-map value; apply the
same check for observed[:parse_methods_inventory] where appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 854fdfa8-6a93-4764-a924-dd881e377b59
📒 Files selected for processing (10)
lib/ccxt_extract/contract_test.exlib/ccxt_extract/normalization.exlib/ccxt_extract/pipeline.exlib/ccxt_extract/provenance.exlib/ccxt_extract/schema.expriv/schema/exchange_v4.jsontest/ccxt_extract/contract_test_test.exstest/ccxt_extract/normalization_test.exstest/ccxt_extract/pipeline_test.exstest/integration/cached/schema_v4_emit_cached_test.exs
✅ Files skipped from review due to trivial changes (1)
- lib/ccxt_extract/provenance.ex
🚧 Files skipped from review as they are similar to previous changes (7)
- lib/ccxt_extract/pipeline.ex
- test/integration/cached/schema_v4_emit_cached_test.exs
- test/ccxt_extract/normalization_test.exs
- test/ccxt_extract/contract_test_test.exs
- lib/ccxt_extract/normalization.ex
- lib/ccxt_extract/schema.ex
- priv/schema/exchange_v4.json
| digest_findings = | ||
| digest_findings(id, Map.get(record, "parse_methods_digest")) | ||
|
|
||
| field_maps_findings = | ||
| stub_record_findings(id, "normalization.field_maps", Map.get(record, "field_maps")) | ||
|
|
||
| response_envelopes_findings = | ||
| stub_record_findings(id, "normalization.response_envelopes", Map.get(record, "response_envelopes")) |
There was a problem hiding this comment.
Don't let present-but-null normalization sections pass.
Map.get/2 plus the nil clauses here means "parse_methods_digest": null, "field_maps": null, and "response_envelopes": null produce no shape finding. Missing keys are already reported above; explicit null should still fail this invariant.
Also applies to: 1299-1308, 1364-1378
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/ccxt_extract/contract_test.ex` around lines 1279 - 1286, The current
Map.get(record, "...") calls (used when computing digest_findings,
field_maps_findings, response_envelopes_findings) swallow present-but-null
values, so explicit "key": null doesn't produce a shape finding; replace those
Map.get/2 usages with Map.fetch(record, "key") (or an equivalent that
distinguishes present-nil from missing) and pass the {:ok, value} result through
to stub_record_findings (so {:ok, nil} yields a finding), while letting :error
continue to be treated as the missing-key case already reported; apply the same
change where Map.get/2 is used in the other occurrences mentioned (lines
1299-1308 and 1364-1378).
| defp stub_value_findings(id, path, record) do | ||
| Enum.flat_map(Normalization.parser_types(), fn key -> | ||
| case Map.fetch(record, key) do | ||
| {:ok, nil} -> | ||
| [] | ||
|
|
||
| {:ok, v} when is_map(v) -> | ||
| [] | ||
|
|
||
| {:ok, other} -> | ||
| [normalization_finding(id, "#{path}.#{key}", "must be null or a map, got #{inspect(other)}")] | ||
|
|
||
| :error -> | ||
| [] | ||
| end | ||
| end) |
There was a problem hiding this comment.
Validate _unresolved_reason values too.
stub_value_findings/3 only walks Normalization.parser_types/0, so _unresolved_reason can hold any scalar and still pass once the key-set check succeeds. The docstring says every stub value should be null or a map.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/ccxt_extract/contract_test.ex` around lines 1381 - 1396,
stub_value_findings/3 currently only iterates Normalization.parser_types() so a
scalar _unresolved_reason slips through; update the function to validate
_unresolved_reason as well by either including "_unresolved_reason" in the
checked keys or adding an explicit check after the Enum.flat_map that inspects
Map.fetch(record, "_unresolved_reason") and, if {:ok, other} and not nil and not
a map, returns a normalization_finding(id, "#{path}._unresolved_reason", "must
be null or a map, got #{inspect(other)}"); keep the same message format and
return shape so callers of stub_value_findings/3 behave unchanged.
| case JsonIO.read_json(path) do | ||
| {:ok, %{"exchanges" => entries}} when is_list(entries) -> | ||
| Map.new(entries, fn entry -> | ||
| methods = Map.get(entry, "parse_methods") || %{} | ||
| {entry["id"], Map.keys(methods)} | ||
| end) | ||
|
|
||
| {:ok, _malformed} -> | ||
| %{} | ||
|
|
||
| {:error, _reason} -> | ||
| %{} | ||
| end |
There was a problem hiding this comment.
Fail loudly on malformed parse_methods.json.
This returns %{} for any read or shape error, which silently disables parse_methods_digest_covers_inventory. Skipping only when the discovery file is missing makes sense for v3-only runs; malformed JSON or an unexpected top-level shape should surface instead.
Suggested tightening
case JsonIO.read_json(path) do
{:ok, %{"exchanges" => entries}} when is_list(entries) ->
Map.new(entries, fn entry ->
methods = Map.get(entry, "parse_methods") || %{}
{entry["id"], Map.keys(methods)}
end)
- {:ok, _malformed} ->
- %{}
+ {:ok, _malformed} ->
+ raise "invalid parse_methods inventory at #{path}: expected %{\"exchanges\" => [...]}"
- {:error, _reason} ->
+ {:error, {:missing_input, _}} ->
%{}
+
+ {:error, reason} ->
+ raise "failed to load parse_methods inventory at #{path}: #{inspect(reason)}"
end
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case JsonIO.read_json(path) do | |
| {:ok, %{"exchanges" => entries}} when is_list(entries) -> | |
| Map.new(entries, fn entry -> | |
| methods = Map.get(entry, "parse_methods") || %{} | |
| {entry["id"], Map.keys(methods)} | |
| end) | |
| {:ok, _malformed} -> | |
| %{} | |
| {:error, _reason} -> | |
| %{} | |
| end | |
| case JsonIO.read_json(path) do | |
| {:ok, %{"exchanges" => entries}} when is_list(entries) -> | |
| Map.new(entries, fn entry -> | |
| methods = Map.get(entry, "parse_methods") || %{} | |
| {entry["id"], Map.keys(methods)} | |
| end) | |
| {:ok, _malformed} -> | |
| raise "invalid parse_methods inventory at #{path}: expected %{\"exchanges\" => [...]}" | |
| {:error, {:missing_input, _}} -> | |
| %{} | |
| {:error, reason} -> | |
| raise "failed to load parse_methods inventory at #{path}: #{inspect(reason)}" | |
| end |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/ccxt_extract/contract_test.ex` around lines 1792 - 1804, The current
JsonIO.read_json(path) branch silently returns %{} for any error or wrong shape,
hiding malformed parse_methods.json; change the logic in the case handling (the
block calling JsonIO.read_json) so that only a missing-file error (e.g.,
{:error, :enoent} or whatever JsonIO uses for not-found) returns %{}, while any
other {:error, reason} or an {:ok, value} that does not match %{"exchanges" =>
entries} should raise or propagate an error (or at minimum log and rethrow) so
malformed JSON or unexpected top-level shape surfaces and does not silently
disable parse_methods_digest_covers_inventory.
…rrier Hardens the v4 normalization block against malformed input and tightens the contract surface that CodeRabbit + Codex flagged on PR #10. - contract_test: guard `digest_inventory_findings/3` against non-map digests so `Map.keys/1` no longer crashes the run on a scalar - contract_test: validate `_unresolved_reason` in `stub_value_findings/3` (must be null or the sentinel `"not_yet_derived"`) - contract_test: drop nil-clauses in `digest_findings/2` and `stub_record_findings/3` so explicit `"key": null` produces a finding instead of silently passing - contract_test: distinguish `:missing_input` from `:invalid_json` / malformed-shape in `load_parse_methods_inventory/1` — corrupt JSON or wrong top-level shape now raises with regen instructions instead of silently disabling `parse_methods_digest_covers_inventory` - normalization: add @SPEC on six private helpers per the every-function-gets-a-spec mandate - schema: `validate_v4/1` now descends into `/normalization` so a caller-provided `%{}` fails preflight - exchange_v4.json: tighten `_unresolved_reason` from open-string to enum `["not_yet_derived"]` (closed-vocabulary scaffold sentinel)
…rrier (#15) Hardens the v4 normalization block against malformed input and tightens the contract surface that CodeRabbit + Codex flagged on PR #10. - contract_test: guard `digest_inventory_findings/3` against non-map digests so `Map.keys/1` no longer crashes the run on a scalar - contract_test: validate `_unresolved_reason` in `stub_value_findings/3` (must be null or the sentinel `"not_yet_derived"`) - contract_test: drop nil-clauses in `digest_findings/2` and `stub_record_findings/3` so explicit `"key": null` produces a finding instead of silently passing - contract_test: distinguish `:missing_input` from `:invalid_json` / malformed-shape in `load_parse_methods_inventory/1` — corrupt JSON or wrong top-level shape now raises with regen instructions instead of silently disabling `parse_methods_digest_covers_inventory` - normalization: add @SPEC on six private helpers per the every-function-gets-a-spec mandate - schema: `validate_v4/1` now descends into `/normalization` so a caller-provided `%{}` fails preflight - exchange_v4.json: tighten `_unresolved_reason` from open-string to enum `["not_yet_derived"]` (closed-vocabulary scaffold sentinel)
Summary
Lands the Phase 12 prerequisite: scaffolds the v4
normalizationsection under--schema-target=4only. v3 emission stays byte-identical (verified on binance/deribit/okx — MD5-identical betweendevelopmenttip and this branch).What ships
New
CcxtExtract.Normalizationmodule (lib/ccxt_extract/normalization.ex)build/2projects the per-exchangepriv/discoveries/parse_methods.jsonentry into a compact, AST-free digest ({params, return_type, async, statement_count}) and scaffoldsfield_maps+response_envelopeskeyed by parser type (ticker/trade/ohlcv/order/position/balance/market/transaction/deposit_address).Pipeline wiring (
lib/ccxt_extract/pipeline.ex)Pipeline.build_exchange_data/3callsNormalization.build/2only on the v4 path; v3 path is untouched.Schema.build_exchange_v4/4reads:normalizationfrom opts and emits the new block; the old empty placeholder%{}is gone.Schema (
priv/schema/exchange_v4.json)Normalization,NormalizationDigestRecord,NormalizationStubRecord,NormalizationStubValuedefs.oneOf [null, object]) so Phase 12 sub-bundles can populate without a schema break.Provenance (
lib/ccxt_extract/provenance.ex)/normalization/parse_methods_digest,/normalization/field_maps,/normalization/response_envelopes.Two new contract invariants (
lib/ccxt_extract/contract_test.ex)normalization_shape_valid— top-level + per-record shape gate. Mirrorstestnet_urls_shape_valid. Skipped on v3-shaped output.parse_methods_digest_covers_inventory— every method in the per-exchangeparse_methods.jsonentry must surface in the emitted digest (lossless-projection guard). Inventory loaded once viaload_parse_methods_inventory/1and threaded throughobserved.Acceptance criteria
mix format --check-formattedmix compile --warnings-as-errorsmix credo --strictmix doctormix dialyzermix sobelow --exit Lowmix test.json --quietdevelopmentbaseline (15 pre-existing cached-integration / corpus-staleness failures, 0 new)mix test.json --cover(Normalization module)mix ccxt_extract.contract_testmix ccxt_extract.validate --schema-target=4(tier1+tier2)developmentfor binance/deribit/okxReviewer note (
.sobelow-skips)Per linear-workflow § "Code-Only PRs + Required Acceptance Criteria":
This PR's edits to
lib/ccxt_extract/contract_test.ex(addedNormalizationalias + parse_methods_inventory baseline plumbing) andlib/ccxt_extract/pipeline.ex(addedNormalizationalias + 8-line v4 carrier wiring) shifted line numbers, invalidating 8 fingerprints and producing 8 new ones. Localmix sobelow --exit Lowflags one finding (validation.ex:184— the same finding skip 666 covers, just with a slightly different context hash). Reviewer regenerates.sobelow-skipsin the post-merge bookkeeping commit.The CI gate (
mix sobelowwith no--exit) returns 0 regardless of findings, so harness CI passes.Out of scope (deferred to commit-review's post-merge bookkeeping commit)
ROADMAP.md(Task 129 status flip ⬜ → ✅)CHANGELOG.md([Unreleased] entry)SCHEMA.md(Normalization carrier description for v4 section).sobelow-skips(line-fingerprint regen)Cross-repo
../ccxt_client/ROADMAP.mdPhase 12 parser tasks remain blocked until field_maps are populated (Tasks 74–82 / 83). This carrier is the load-bearing prerequisite — no ccxt_client task flips on this PR alone.Scoring
[D:3/B:6/U:7 → Eff:2.17] 🚀
Linear Issue: INE-61
Summary by CodeRabbit
New Features
Validation
Tests