test(workflow-core): add unit test coverage for ResultSchema#4786
test(workflow-core): add unit test coverage for ResultSchema#4786aglinxinyuan wants to merge 4 commits into
Conversation
Add ResultSchemaSpec pinning the canonical column layout of runtimeStatisticsSchema (column order and per-column types) and consoleMessagesSchema, guarding against silent drift. Closes apache#4785 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new unit spec for ResultSchema in workflow-core to pin the expected layout of runtime-statistics and console-messages result documents, helping catch schema drift before it breaks downstream consumers.
Changes:
- Add
ResultSchemaSpeccovering the runtime-statistics column order. - Add assertions for selected runtime-statistics column types and the console-messages schema shape.
- Introduce focused unit coverage for a schema contract used by result-storage readers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4786 +/- ##
============================================
+ Coverage 42.13% 42.69% +0.56%
- Complexity 2001 2038 +37
============================================
Files 957 957
Lines 34094 34094
Branches 3753 3753
============================================
+ Hits 14364 14556 +192
+ Misses 18952 18748 -204
- Partials 778 790 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Per Copilot feedback on apache#4786: include `operatorId`, `inputTupleSize`, `outputTupleSize`, `dataProcessingTime`, and `controlProcessingTime` in the type assertions. Downstream readers deserialize this schema positionally and cast every slot, so a type change in any column should fail the spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @@ -0,0 +1,70 @@ | |||
| /* | |||
There was a problem hiding this comment.
do you think this file needs test? seems it only defined three static schemas. Can we skip it?
There was a problem hiding this comment.
let's check the coverage about this change. I kind of don't believe the current numbers shown.
There was a problem hiding this comment.
Fair concern. The motivation is that downstream readers deserialize this schema positionally and cast each slot to a concrete type, so a silent reorder/retype would break consumers without any local-file change being obvious in review (the OG Copilot thread above made the same point). Pinning the column list + types here lets a CI red flag catch the drift instead of waiting for a runtime cast failure.
There was a problem hiding this comment.
Expanded the spec in 743cfb5 so it goes beyond redeclaring the schema. New runtimeStatisticsSchema coverage: stable name → index mapping for positional readers; unknown-name lookup throws and the error message names the missing column; containsAttribute returns false for unknown names; column names are unique; toRawSchema → fromRawSchema round-trips names + types intact (the cross-language serialization contract that Python and external consumers actually depend on); singleton-val identity. Parallel coverage added for consoleMessagesSchema. 12 tests total, all passing.
Let me know if this changes your mind on closing — happy to either keep it or close if it still feels like ceremony.
Drop the awkward "expose ... canonical ..." phrasing in favor of plain "list its columns in the declared order" / "have a single STRING column" per Yicong-Huang's review note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Yicong-Huang's review on apache#4786: the previous version mostly re-stated the source of truth. Lift the spec from "schema definition parrot" to "schema contract pinning" by exercising real lookup, serialization, and identity behaviors that downstream consumers depend on. Pull the (name, type) layout into a single `runtimeStatsLayout` source of truth and drive multiple tests off it. New behaviors covered for runtimeStatisticsSchema: - name → index mapping is stable for positional readers - getAttribute on an unknown name throws and the error message names it - containsAttribute returns false for an unknown name - column names are unique (no accidental dupes) - toRawSchema → fromRawSchema round-trips names + types intact (the cross-language serialization contract that Python / external consumers actually depend on) - the schema is a singleton val (same instance per access) Parallel coverage added for consoleMessagesSchema: index of `message` is 0, unknown-name lookup throws, toRawSchema equals `{message: STRING}`, and singleton identity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What changes were proposed in this PR?
Add
ResultSchemaSpecpinning the canonical column layout ofResultSchema:runtimeStatisticsSchemaexposes the canonical columns in orderruntimeStatisticsSchemapins the type of every column (operatorIdSTRING;timeTIMESTAMP;inputTupleCnt/inputTupleSize/outputTupleCnt/outputTupleSize/dataProcessingTime/controlProcessingTime/idleTimeLONG;numWorkers/statusINTEGER) so a type change in any slot fails the specconsoleMessagesSchemaexposes a single STRINGmessagecolumnAny related issues, documentation, discussions?
Closes #4785
How was this PR tested?
sbt "WorkflowCore/testOnly org.apache.texera.amber.core.storage.result.ResultSchemaSpec"— 3/3 tests pass.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.7)