Background
Three modules in engine/common/storage currently lack a dedicated unit-spec:
| Source class |
Package |
Purpose |
SequentialRecordStorage |
org.apache.texera.amber.engine.common.storage |
Abstract sequential-record reader/writer + getStorage factory |
VFSRecordStorage |
(same) |
Apache Commons VFS concrete implementation |
EmptyRecordStorage |
(same) |
Null-object implementation (no-op writer / EOF reader / always-false containsFolder) |
All three are reachable from production code (SequentialRecordStorage.getStorage is the factory used by checkpoint logging) but none have characterization tests. A regression in any of these would only surface as a downstream serde / replay failure.
What we want pinned
Behavior we want to lock in:
| Area |
Contract |
SequentialRecordStorage.getStorage(None) |
returns an EmptyRecordStorage |
SequentialRecordStorage.getStorage(Some(file://…)) |
returns a VFSRecordStorage |
SequentialRecordStorage.getStorage(Some(hdfs://…)) |
dispatches to HDFSRecordStorage (path covered without actually opening an HDFS connection by asserting the constructor blows up on a non-resolvable host rather than silently returning VFSRecordStorage) |
SequentialRecordWriter / SequentialRecordReader |
round-trip a sequence of records through AmberRuntime.serde (size-prefixed framing) |
SequentialRecordStorage.fetchAllRecords |
iterates all records returned by the underlying reader |
VFSRecordStorage constructor |
auto-creates the target folder when it does not exist |
VFSRecordStorage.getWriter / getReader |
round-trip a record through a local file:// URI |
VFSRecordStorage.deleteStorage |
removes the on-disk folder created by the constructor |
VFSRecordStorage.containsFolder |
distinguishes existing folder vs. existing file vs. missing entry |
EmptyRecordStorage.getWriter |
returns a writer backed by NullOutputStream (writes are silently discarded) |
EmptyRecordStorage.getReader |
returns a reader that yields zero records |
EmptyRecordStorage.deleteStorage / containsFolder |
are no-op and always-false respectively |
Scope
- New spec files (one per source class per the spec-filename convention):
SequentialRecordStorageSpec.scala
VFSRecordStorageSpec.scala
EmptyRecordStorageSpec.scala
- No production-code changes.
- Tests use the production wire path (
AmberRuntime.serde) the same way CheckpointSubsystemSpec / ClientEventSpec do (a suite-local ActorSystem injected into AmberRuntime via reflection, torn down in afterAll).
Background
Three modules in
engine/common/storagecurrently lack a dedicated unit-spec:SequentialRecordStorageorg.apache.texera.amber.engine.common.storagegetStoragefactoryVFSRecordStorageEmptyRecordStoragefalsecontainsFolder)All three are reachable from production code (
SequentialRecordStorage.getStorageis the factory used by checkpoint logging) but none have characterization tests. A regression in any of these would only surface as a downstream serde / replay failure.What we want pinned
Behavior we want to lock in:
SequentialRecordStorage.getStorage(None)EmptyRecordStorageSequentialRecordStorage.getStorage(Some(file://…))VFSRecordStorageSequentialRecordStorage.getStorage(Some(hdfs://…))HDFSRecordStorage(path covered without actually opening an HDFS connection by asserting the constructor blows up on a non-resolvable host rather than silently returningVFSRecordStorage)SequentialRecordWriter/SequentialRecordReaderAmberRuntime.serde(size-prefixed framing)SequentialRecordStorage.fetchAllRecordsVFSRecordStorageconstructorVFSRecordStorage.getWriter/getReaderfile://URIVFSRecordStorage.deleteStorageVFSRecordStorage.containsFolderEmptyRecordStorage.getWriterNullOutputStream(writes are silently discarded)EmptyRecordStorage.getReaderEmptyRecordStorage.deleteStorage/containsFolderfalserespectivelyScope
SequentialRecordStorageSpec.scalaVFSRecordStorageSpec.scalaEmptyRecordStorageSpec.scalaAmberRuntime.serde) the same wayCheckpointSubsystemSpec/ClientEventSpecdo (a suite-localActorSysteminjected intoAmberRuntimevia reflection, torn down inafterAll).