Implement iterating partitions in insertion order without global index#254
Implement iterating partitions in insertion order without global index#254
Conversation
…ndex is specified Co-authored-by: albe <4259532+albe@users.noreply.github.com> Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/992c92a7-025b-417b-9626-2ce2390beadd
|
@copilot reconsidering the choice of default behavior being partition scanning, this is probably a bad choice. The default should be the better behavior, which is using the global index if it's (still) there. Change the API so that in order to NOT use the global index, the index argument needs to be explicitly set to false. This will make this change b/c as well. |
Co-authored-by: albe <4259532+albe@users.noreply.github.com> Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/ab2b998e-7f96-4809-825a-70ea29c0246c
Done in ff744f8. Default behavior ( |
|
@copilot Let's condense the ReadablePartition API surface such that we don't introduce a new method for reading with headers. We already do read the document header anyway in |
…emove readAllWithHeaders Co-authored-by: albe <4259532+albe@users.noreply.github.com> Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/09d5945b-45eb-4ae2-8637-fc9802c46b61
Done in fcd5390. Removed |
|
@copilot looking good, one last thing: consider the case mentioned in #264 where the primary index lags behind actual documents in the partitions. In that case we want to use the functionality in this PR to bring the primary index (and eventually secondary indexes) back in sync by iterating documents by their document headers sequence number, then adding the missing index entries. This means we not only need to receive the document data, but the document sequence number, partition name and partition offset. |
…BySequenceNumber Co-authored-by: albe <4259532+albe@users.noreply.github.com> Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/97833e10-26b5-48f7-96a3-aa626b11ec94
Done in d87beda. Two minimal changes:
Callers doing reindexing can drive it via: for (const { document, sequenceNumber, partitionName, position } of storage.iteratePartitionsBySequenceNumber(0, Infinity)) {
// rebuild index entries from partitionName + position + sequenceNumber
} |
src/Storage/ReadableStorage.js
Outdated
| } | ||
|
|
||
| if (!result.done && headerOut.sequenceNumber <= untilSeq) { | ||
| iterators.push({ gen, headerOut, data: result.value, sequenceNumber: headerOut.sequenceNumber, position: headerOut.position, partitionName: partition.name }); |
There was a problem hiding this comment.
@copilot code readability nitpick: rename the iterators variable to partitions and gen to reader here.
Name variables more to the what than the how.
There was a problem hiding this comment.
Done in eb379fe — iterators → partitions, gen → reader throughout iteratePartitionsBySequenceNumber.
…eNumber Co-authored-by: albe <4259532+albe@users.noreply.github.com> Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/0a7ac0a1-f311-40c2-8415-a1aae70291c3
readRangewithindex=falsenow iterates all partitions directly in sequenceNumber order, allowing cross-partition insertion-order iteration without a global index (useful when the global index is unavailable or corrupted). The default behavior — using the global index — is unchanged.Changes
ReadablePartition— header exposure viareadFromandreadAllRather than introducing a new method, the document header fields are now exposed through the existing partition read API:
readFrom(position, size, headerOut): Accepts an optionalheaderOutobject. When provided, it is mutated in-place with{ dataSize, sequenceNumber, time64 }from the document header — zero extra allocation on the default (null) code path.readAll(after, headerOut): PassesheaderOutthrough toreadFromon each iteration, and additionally populatesheaderOut.positionwith the document's byte offset within the partition. UsesheaderOut.dataSizefor position advancement when available (avoids aBuffer.byteLengthcall).ReadableStorage— partition scan modereadRange: Default behavior (index = null) uses the global index as before. Passingindex = falseexplicitly triggers partition scan mode.iterateRange: Whenindex === false, delegates toiteratePartitionsBySequenceNumberand yields only the document. Otherwise uses the provided index or falls back to the global index.iteratePartitionsBySequenceNumber(fromSeq, untilSeq)(public API): K-way merge across all partitions. Allocates one reusableheaderOutobject per partition and passes it toreadAll. Yields{ document, sequenceNumber, partitionName, position }per entry, providing all information needed to rebuild index entries. Advances each partition iterator to the first document withsequenceNumber >= fromSeq, then repeatedly yields the minimum-sequenceNumber document. SequenceNumbers in headers are 0-based;from/untilin the public API are 1-based (fromSeq = from - 1).Breaking change
None — the default
readRange(from, until)behavior is unchanged (uses the global index). Passingfalseas the third argument is new opt-in functionality.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.