Conversation
…sting dash-separated ones Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/460d1a66-74b8-4837-b0bc-43906c29515f Co-authored-by: albe <4259532+albe@users.noreply.github.com>
…File-prefixed directory Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/460d1a66-74b8-4837-b0bc-43906c29515f Co-authored-by: albe <4259532+albe@users.noreply.github.com>
…; drop unnecessary sorts Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/c384a8a8-015c-4fbb-bc3b-a44c6d770679 Co-authored-by: albe <4259532+albe@users.noreply.github.com>
…orFilesSync utility Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/1e87526b-9919-4296-91c7-907f60b9b37a Co-authored-by: albe <4259532+albe@users.noreply.github.com>
| * @param {RegExp} regexPattern The pattern to match relative file paths against. | ||
| * @param {function(string)} onEach Called with the first capturing group (or full match) for each matching path. | ||
| */ | ||
| function scanForFilesSync(directory, regexPattern, onEach) { |
There was a problem hiding this comment.
@copilot I see you implemented scanFilesSync to not change storage initialization, which so far has been a sync operation. I would like to unify things more than duplicate code.
There's two ways I see, making Storage initialization async with a 'ready' event like EventStore does, or make Storage.open() async and let it do the partition scanning lazily. However I do not fully see how either of that would affect the rest of the code base. From a gut feeling, having Storage behave more like EventStore is favorable though.
There was a problem hiding this comment.
Investigated both approaches. Here's the concrete impact:
Option A – Async open() (move partition scan into open() with a callback)
ReadableStorage.open(),WritableStorage.open(), andReadOnlyStorage.open()all become callback-based.EventStore.initialize()needs to wait for the storage'ready'event (or anopencallback) before accessing partitions incheckUnfinishedCommits().Storage.spec.jshas ~144.open()calls — most of those tests would need to become async (callback or Promise) to remain correct.- All direct Storage callers (Consumer.spec.js, stress tests) would also need updating.
Option B – Emit 'ready' from the constructor (like EventStore)
- Similar scope: every caller that currently treats Storage as "immediately usable after construction + open()" would need to listen for
'ready'first. - Adds a required event-listener pattern to every Storage user, which is a larger API break.
Both options are meaningful architectural improvements but require changes across ~5 files and ~150 call sites. My recommendation is to keep the current scanForFilesSync approach in this PR (it's an isolated, self-contained utility), and pursue the async Storage initialization as a separate, focused PR where the full test surface can be properly updated. Happy to implement either option in a follow-up if you'd prefer.
There was a problem hiding this comment.
@copilot create a follow up PR based on this branch and implement option B there. Given Storage is 2nd order API a break might be okay in a minor version still, but that's t.b.d. Need to see the implementation and impact first
|
@copilot resolve the merge conflicts in this pull request |
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/c52381ea-b3e2-442c-8048-04072f3e7797 Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Done in 33152df. Merged the three changes from main (PR #300) into this branch:
All 380 tests pass. |
|
@copilot resolve the merge conflicts in this pull request |
…from main (PR #302) Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Done in 22b4a6c. Merged the large refactor from main (PR #302 — IndexMatcher/PartitionPool):
All 392 tests pass. |
|
@copilot update the documentation to explain the new category system and provide examples in where this is useful. Typically for managing entity streams where an arbitrarily large amount of entity instances can exist in the code, e.g. users on large platforms. In that case the stream category should contain the type prefix and a hash based subfolder schema with two or more levels |
…sed sharding patterns Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/fb471a1d-be20-4e54-8f1c-f25f9ef2e2b2 Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Done in 1c72b19. Updated
|
…y prefixes Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/36a733ca-68eb-4c19-b02e-82ef3086c986 Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/09c0bba1-6316-48b8-a0e6-f7deb6387399 Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Summary
Adds a second stream-naming convention (
category/id) that maps to a subdirectory layout on disk, while keeping all existing dash-separated flat-file streams (category-id) completely unchanged.getEventStreamForCategory("x")now unions both layouts.Layout convention
x-foo$data/eventstore.x-foo(unchanged)$streams/eventstore.stream-x-foo.index(unchanged)x/foo$data/eventstore.x/foo(new)$streams/eventstore.stream-x/foo.index(new)x/foo/bar$data/eventstore.x/foo/bar(new)$streams/eventstore.stream-x/foo/bar.index(new)No migration required — existing flat files are untouched.
Behavior
getEventStreamForCategory("x")returns events from all streams whose name starts withx-orx/(consistent with the existing dash semantics which also returns all descendants, not just direct children).getEventStreamForCategory("x/foo")narrows to streams starting withx/foo/, enabling hierarchical category queries.Changes
src/util.jspathimport.scanForFilesrecursive: it now descends into subdirectories and matches the relative path from the root against the regex pattern, soeventstore.stream-x/foo.indexyieldsstream-x/foo. Error handling short-circuits immediately on first error.scanForFilesSync: a synchronous counterpart toscanForFilesusingreaddirSync, for use in synchronous initialization contexts.src/EventStore.jsensureDirectory.createEventStream: callsensureDirectoryon the index subdirectory for slash-named streams before callingensureIndex.getEventStreamForCategory: filter extended to include streams starting withcategoryName + '/'in addition tocategoryName + '-'.src/Storage/WritableStorage.jsgetPartition: callsensureDirectoryon the partition parent directory when the partition name contains a/.src/Storage/ReadableStorage.jsscanPartitions: replaced inline recursive directory-scan closure with a call to the genericscanForFilesSyncutility, using a regex pattern to match files by the storageFile prefix and filter out.index,.branch, and.lockfiles.test/EventStore.spec.jsfs.readdirmock to handle both 2-arg and 3-arg call forms.getEventStreamForCategory("x/foo")a/b/c,a/b/d) including async persistencedocs/streams.mdcategory-idlayout can degrade at scale (large number of files in a single directory).category/idlayout with an on-disk directory tree example.getEventStreamForCategorytransparently unions both layouts.type/XX/YY/id) for very large entity populations (e.g. users on large platforms), showing a two-level hex-prefix scheme that distributes millions of streams across up to 65 536 balanced leaf directories.getEventStreamForCategory("user/a3")), all shards (getEventStreamForCategory("user")), reading a specific entity stream, and a UUID v4 variant using leading characters as the shard prefix.