test(amber): add unit test coverage for record-storage cluster#5447
Conversation
Pins behavior of `EmptyRecordStorage`, `VFSRecordStorage`, and `SequentialRecordStorage` (abstract + companion) which previously had no characterization tests despite being on the checkpoint / fault-tolerance hot path via `getStorage(...)`. Closes apache#5446
There was a problem hiding this comment.
Pull request overview
Adds characterization/unit tests for Amber’s sequential record-storage implementations on the checkpoint/fault-tolerance hot path, increasing confidence in factory dispatch (getStorage) and record framing/round-trip behavior without changing production code.
Changes:
- Add
VFSRecordStorageSpeccovering folder lifecycle, read/write round-trips,containsFolder, and deletion semantics forfile://storage. - Add
SequentialRecordStorageSpeccovering size-prefixed framing, iterator re-read behavior,fetchAllRecords, andgetStoragedispatch forNoneandfile://. - Add
EmptyRecordStorageSpeccovering null-object semantics for reader/writer lifecycle, idempotent delete, andcontainsFolder.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| amber/src/test/scala/org/apache/texera/amber/engine/common/storage/VFSRecordStorageSpec.scala | New unit spec for VFSRecordStorage behavior (folder creation, IO round-trips, delete/containsFolder). |
| amber/src/test/scala/org/apache/texera/amber/engine/common/storage/SequentialRecordStorageSpec.scala | New unit spec for framing + factory dispatch + fetchAllRecords composition. |
| amber/src/test/scala/org/apache/texera/amber/engine/common/storage/EmptyRecordStorageSpec.scala | New unit spec for EmptyRecordStorage null-object contracts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5447 +/- ##
============================================
+ Coverage 51.85% 51.91% +0.05%
- Complexity 2468 2474 +6
============================================
Files 1067 1067
Lines 41258 41258
Branches 4437 4437
============================================
+ Hits 21394 21418 +24
+ Misses 18607 18576 -31
- Partials 1257 1264 +7
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Address Copilot review feedback on apache#5447 — `Files.walk` returns a closeable Stream backed by an open directory handle. Without the explicit close, the handle stays open until GC, which can flake temp-dir deletion on Windows. Wrap traversal in try/finally so the stream is released even if iteration throws.
What changes were proposed in this PR?
Pin behavior of three previously-uncovered modules in
engine/common/storagethat sit on the checkpoint / fault-tolerance hot path viaSequentialRecordStorage.getStorage(...). No production-code changes.EmptyRecordStorageSpecEmptyRecordStorageVFSRecordStorageSpecVFSRecordStorageSequentialRecordStorageSpecSequentialRecordStorage(abstract + companion)All three spec files follow the
<srcClassName>Spec.scalaone-to-one convention.Behavior pinned
SequentialRecordStorage.getStorage(None)EmptyRecordStorageSequentialRecordStorage.getStorage(Some(file://…))VFSRecordStorageand the returned instance round-trips a recordSequentialRecordWriter/SequentialRecordReaderinputStreamGenthunk supports re-reading the same byte streamSequentialRecordStorage.fetchAllRecordsIterable.emptywhen nothing was written)VFSRecordStorageconstructorVFSRecordStorage.getWriter/getReaderfile://URI; produce empty iterator when the file has no records; multiple files under the same folder do not cross-pollinateVFSRecordStorage.deleteStorageVFSRecordStorage.containsFolderEmptyRecordStorage.containsFolderfalseregardless of folder nameEmptyRecordStorage.deleteStorageEmptyRecordStorage.getReadergetReadercalls produce independent iteratorsEmptyRecordStorage.getWriterflush()/close()work withoutwriteRecordhaving been called; a second writer is unaffected by closing the firstNotes
hdfs://dispatch branch ofgetStorageis deliberately left out —HDFSRecordStorage's constructor callsFileSystem.get, which can block on DNS / network and is unit-test-hostile. The branch is a single line and any regression there would surface immediately in higher-level checkpoint / fault-tolerance suites that exercisehdfs://URIs.SequentialRecordWriter.writeRecord/SequentialRecordReader's iterator) hard-codeAmberRuntime.serde. The two specs that exercise this path (VFSRecordStorageSpec,SequentialRecordStorageSpec) own a suite-localActorSystemand inject it intoAmberRuntimevia reflection, tearing it down inafterAll— same pattern asCheckpointSubsystemSpec/ClientEventSpec.EmptyRecordStorageSpecdeliberately avoidswriteRecordso it does not need the harness.Any related issues, documentation, discussions?
Closes #5446.
How was this PR tested?
Pure unit-test additions; verified locally with:
sbt "WorkflowExecutionService/testOnly org.apache.texera.amber.engine.common.storage.EmptyRecordStorageSpec org.apache.texera.amber.engine.common.storage.SequentialRecordStorageSpec org.apache.texera.amber.engine.common.storage.VFSRecordStorageSpec"— 29 tests, all greensbt scalafmtCheckAll— cleanWas this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Sonnet 4.5)