Skip to content

test(amber): add unit test coverage for record-storage cluster#5447

Merged
aglinxinyuan merged 2 commits into
apache:mainfrom
aglinxinyuan:test-record-storage-cluster
Jun 7, 2026
Merged

test(amber): add unit test coverage for record-storage cluster#5447
aglinxinyuan merged 2 commits into
apache:mainfrom
aglinxinyuan:test-record-storage-cluster

Conversation

@aglinxinyuan

@aglinxinyuan aglinxinyuan commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Pin behavior of three previously-uncovered modules in engine/common/storage that sit on the checkpoint / fault-tolerance hot path via SequentialRecordStorage.getStorage(...). No production-code changes.

Spec Source class Tests
EmptyRecordStorageSpec EmptyRecordStorage 11
VFSRecordStorageSpec VFSRecordStorage 9
SequentialRecordStorageSpec SequentialRecordStorage (abstract + companion) 9

All three spec files follow the <srcClassName>Spec.scala one-to-one convention.

Behavior pinned

Surface Contract
SequentialRecordStorage.getStorage(None) dispatches to EmptyRecordStorage
SequentialRecordStorage.getStorage(Some(file://…)) dispatches to VFSRecordStorage and the returned instance round-trips a record
SequentialRecordWriter / SequentialRecordReader round-trip a sequence of records through the size-prefixed binary frame; the reader's inputStreamGen thunk supports re-reading the same byte stream
SequentialRecordStorage.fetchAllRecords yields the underlying iterator's records in order (and Iterable.empty when nothing was written)
VFSRecordStorage constructor auto-creates the target folder; leaves an existing folder + contents intact
VFSRecordStorage.getWriter / getReader round-trip records through a local file:// URI; produce empty iterator when the file has no records; multiple files under the same folder do not cross-pollinate
VFSRecordStorage.deleteStorage removes the on-disk folder created by the constructor
VFSRecordStorage.containsFolder distinguishes existing folder vs. existing file vs. missing entry
EmptyRecordStorage.containsFolder always returns false regardless of folder name
EmptyRecordStorage.deleteStorage is a safe no-op (idempotent)
EmptyRecordStorage.getReader yields zero records for any fileName; successive getReader calls produce independent iterators
EmptyRecordStorage.getWriter returns a writer whose flush() / close() work without writeRecord having been called; a second writer is unaffected by closing the first

Notes

  • The hdfs:// dispatch branch of getStorage is deliberately left out — HDFSRecordStorage's constructor calls FileSystem.get, which can block on DNS / network and is unit-test-hostile. The branch is a single line and any regression there would surface immediately in higher-level checkpoint / fault-tolerance suites that exercise hdfs:// URIs.
  • The serde-touching paths (SequentialRecordWriter.writeRecord / SequentialRecordReader's iterator) hard-code AmberRuntime.serde. The two specs that exercise this path (VFSRecordStorageSpec, SequentialRecordStorageSpec) own a suite-local ActorSystem and inject it into AmberRuntime via reflection, tearing it down in afterAll — same pattern as CheckpointSubsystemSpec / ClientEventSpec. EmptyRecordStorageSpec deliberately avoids writeRecord so it does not need the harness.

Any related issues, documentation, discussions?

Closes #5446.

How was this PR tested?

Pure unit-test additions; verified locally with:

  • sbt "WorkflowExecutionService/testOnly org.apache.texera.amber.engine.common.storage.EmptyRecordStorageSpec org.apache.texera.amber.engine.common.storage.SequentialRecordStorageSpec org.apache.texera.amber.engine.common.storage.VFSRecordStorageSpec" — 29 tests, all green
  • sbt scalafmtCheckAll — clean
  • CI to confirm

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Sonnet 4.5)

Pins behavior of `EmptyRecordStorage`, `VFSRecordStorage`, and
`SequentialRecordStorage` (abstract + companion) which previously
had no characterization tests despite being on the checkpoint /
fault-tolerance hot path via `getStorage(...)`.

Closes apache#5446
Copilot AI review requested due to automatic review settings June 7, 2026 04:27
@github-actions github-actions Bot added the engine label Jun 7, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds characterization/unit tests for Amber’s sequential record-storage implementations on the checkpoint/fault-tolerance hot path, increasing confidence in factory dispatch (getStorage) and record framing/round-trip behavior without changing production code.

Changes:

  • Add VFSRecordStorageSpec covering folder lifecycle, read/write round-trips, containsFolder, and deletion semantics for file:// storage.
  • Add SequentialRecordStorageSpec covering size-prefixed framing, iterator re-read behavior, fetchAllRecords, and getStorage dispatch for None and file://.
  • Add EmptyRecordStorageSpec covering null-object semantics for reader/writer lifecycle, idempotent delete, and containsFolder.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
amber/src/test/scala/org/apache/texera/amber/engine/common/storage/VFSRecordStorageSpec.scala New unit spec for VFSRecordStorage behavior (folder creation, IO round-trips, delete/containsFolder).
amber/src/test/scala/org/apache/texera/amber/engine/common/storage/SequentialRecordStorageSpec.scala New unit spec for framing + factory dispatch + fetchAllRecords composition.
amber/src/test/scala/org/apache/texera/amber/engine/common/storage/EmptyRecordStorageSpec.scala New unit spec for EmptyRecordStorage null-object contracts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter

codecov-commenter commented Jun 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.91%. Comparing base (f2f0564) to head (2dee6e3).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5447      +/-   ##
============================================
+ Coverage     51.85%   51.91%   +0.05%     
- Complexity     2468     2474       +6     
============================================
  Files          1067     1067              
  Lines         41258    41258              
  Branches       4437     4437              
============================================
+ Hits          21394    21418      +24     
+ Misses        18607    18576      -31     
- Partials       1257     1264       +7     
Flag Coverage Δ *Carryforward flag
access-control-service 42.22% <ø> (ø) Carriedforward from 30737ef
agent-service 33.76% <ø> (ø) Carriedforward from 30737ef
amber 52.91% <ø> (+0.15%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø) Carriedforward from 30737ef
config-service 56.06% <ø> (ø) Carriedforward from 30737ef
file-service 38.32% <ø> (ø) Carriedforward from 30737ef
frontend 46.40% <ø> (ø) Carriedforward from 30737ef
pyamber 90.69% <ø> (ø) Carriedforward from 30737ef
python 90.83% <ø> (ø) Carriedforward from 30737ef
workflow-compiling-service 58.69% <ø> (ø) Carriedforward from 30737ef

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Address Copilot review feedback on apache#5447 — `Files.walk` returns a
closeable Stream backed by an open directory handle. Without the
explicit close, the handle stays open until GC, which can flake
temp-dir deletion on Windows. Wrap traversal in try/finally so the
stream is released even if iteration throws.
@aglinxinyuan aglinxinyuan requested a review from Yicong-Huang June 7, 2026 07:11
@aglinxinyuan aglinxinyuan added this pull request to the merge queue Jun 7, 2026
Merged via the queue into apache:main with commit 75b4619 Jun 7, 2026
18 checks passed
@aglinxinyuan aglinxinyuan deleted the test-record-storage-cluster branch June 7, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add unit test coverage for record-storage cluster (Empty / VFS / Sequential)

4 participants