Skip to content

test(workflow-core): add unit test coverage for ResultSchema#4786

Closed
aglinxinyuan wants to merge 4 commits into
apache:mainfrom
aglinxinyuan:xinyuan-test-result-schema-spec
Closed

test(workflow-core): add unit test coverage for ResultSchema#4786
aglinxinyuan wants to merge 4 commits into
apache:mainfrom
aglinxinyuan:xinyuan-test-result-schema-spec

Conversation

@aglinxinyuan
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan commented May 3, 2026

What changes were proposed in this PR?

Add ResultSchemaSpec pinning the canonical column layout of ResultSchema:

  • runtimeStatisticsSchema exposes the canonical columns in order
  • runtimeStatisticsSchema pins the type of every column (operatorId STRING; time TIMESTAMP; inputTupleCnt/inputTupleSize/outputTupleCnt/outputTupleSize/dataProcessingTime/controlProcessingTime/idleTime LONG; numWorkers/status INTEGER) so a type change in any slot fails the spec
  • consoleMessagesSchema exposes a single STRING message column

Any related issues, documentation, discussions?

Closes #4785

How was this PR tested?

sbt "WorkflowCore/testOnly org.apache.texera.amber.core.storage.result.ResultSchemaSpec" — 3/3 tests pass.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

Add ResultSchemaSpec pinning the canonical column layout of
runtimeStatisticsSchema (column order and per-column types) and
consoleMessagesSchema, guarding against silent drift.

Closes apache#4785

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 3, 2026 01:51
@github-actions github-actions Bot added the common label May 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new unit spec for ResultSchema in workflow-core to pin the expected layout of runtime-statistics and console-messages result documents, helping catch schema drift before it breaks downstream consumers.

Changes:

  • Add ResultSchemaSpec covering the runtime-statistics column order.
  • Add assertions for selected runtime-statistics column types and the console-messages schema shape.
  • Introduce focused unit coverage for a schema contract used by result-storage readers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.69%. Comparing base (a5b8957) to head (743cfb5).
⚠️ Report is 28 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4786      +/-   ##
============================================
+ Coverage     42.13%   42.69%   +0.56%     
- Complexity     2001     2038      +37     
============================================
  Files           957      957              
  Lines         34094    34094              
  Branches       3753     3753              
============================================
+ Hits          14364    14556     +192     
+ Misses        18952    18748     -204     
- Partials        778      790      +12     
Flag Coverage Δ
access-control-service 28.12% <ø> (ø)
amber 41.56% <ø> (+1.23%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 33.24% <ø> (ø)
workflow-compiling-service 47.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aglinxinyuan aglinxinyuan requested a review from Yicong-Huang May 3, 2026 01:59
Per Copilot feedback on apache#4786: include `operatorId`, `inputTupleSize`,
`outputTupleSize`, `dataProcessingTime`, and `controlProcessingTime`
in the type assertions. Downstream readers deserialize this schema
positionally and cast every slot, so a type change in any column
should fail the spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@@ -0,0 +1,70 @@
/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this file needs test? seems it only defined three static schemas. Can we skip it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check the coverage about this change. I kind of don't believe the current numbers shown.

Copy link
Copy Markdown
Contributor Author

@aglinxinyuan aglinxinyuan May 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair concern. The motivation is that downstream readers deserialize this schema positionally and cast each slot to a concrete type, so a silent reorder/retype would break consumers without any local-file change being obvious in review (the OG Copilot thread above made the same point). Pinning the column list + types here lets a CI red flag catch the drift instead of waiting for a runtime cast failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the spec in 743cfb5 so it goes beyond redeclaring the schema. New runtimeStatisticsSchema coverage: stable name → index mapping for positional readers; unknown-name lookup throws and the error message names the missing column; containsAttribute returns false for unknown names; column names are unique; toRawSchemafromRawSchema round-trips names + types intact (the cross-language serialization contract that Python and external consumers actually depend on); singleton-val identity. Parallel coverage added for consoleMessagesSchema. 12 tests total, all passing.

Let me know if this changes your mind on closing — happy to either keep it or close if it still feels like ceremony.

aglinxinyuan and others added 2 commits May 3, 2026 01:38
Drop the awkward "expose ... canonical ..." phrasing in favor of plain
"list its columns in the declared order" / "have a single STRING column"
per Yicong-Huang's review note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Yicong-Huang's review on apache#4786: the previous version mostly
re-stated the source of truth. Lift the spec from "schema definition
parrot" to "schema contract pinning" by exercising real lookup,
serialization, and identity behaviors that downstream consumers depend on.

Pull the (name, type) layout into a single `runtimeStatsLayout` source
of truth and drive multiple tests off it. New behaviors covered for
runtimeStatisticsSchema:

- name → index mapping is stable for positional readers
- getAttribute on an unknown name throws and the error message names it
- containsAttribute returns false for an unknown name
- column names are unique (no accidental dupes)
- toRawSchema → fromRawSchema round-trips names + types intact (the
  cross-language serialization contract that Python / external consumers
  actually depend on)
- the schema is a singleton val (same instance per access)

Parallel coverage added for consoleMessagesSchema: index of `message` is
0, unknown-name lookup throws, toRawSchema equals `{message: STRING}`,
and singleton identity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add unit test coverage for ResultSchema

4 participants