Skip to content

Support hierarchical slash-separated stream names alongside existing dash-separated category streams#301

Merged
albe merged 9 commits intomainfrom
copilot/add-folder-partitioning-scheme
May 1, 2026
Merged

Support hierarchical slash-separated stream names alongside existing dash-separated category streams#301
albe merged 9 commits intomainfrom
copilot/add-folder-partitioning-scheme

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 30, 2026

Summary

Adds a second stream-naming convention (category/id) that maps to a subdirectory layout on disk, while keeping all existing dash-separated flat-file streams (category-id) completely unchanged. getEventStreamForCategory("x") now unions both layouts.

Layout convention

Stream name Data file Index file
x-foo $data/eventstore.x-foo (unchanged) $streams/eventstore.stream-x-foo.index (unchanged)
x/foo $data/eventstore.x/foo (new) $streams/eventstore.stream-x/foo.index (new)
x/foo/bar $data/eventstore.x/foo/bar (new) $streams/eventstore.stream-x/foo/bar.index (new)

No migration required — existing flat files are untouched.

Behavior

  • getEventStreamForCategory("x") returns events from all streams whose name starts with x- or x/ (consistent with the existing dash semantics which also returns all descendants, not just direct children).
  • getEventStreamForCategory("x/foo") narrows to streams starting with x/foo/, enabling hierarchical category queries.

Changes

src/util.js

  • Added path import.
  • Made scanForFiles recursive: it now descends into subdirectories and matches the relative path from the root against the regex pattern, so eventstore.stream-x/foo.index yields stream-x/foo. Error handling short-circuits immediately on first error.
  • Added scanForFilesSync: a synchronous counterpart to scanForFiles using readdirSync, for use in synchronous initialization contexts.

src/EventStore.js

  • Imported ensureDirectory.
  • createEventStream: calls ensureDirectory on the index subdirectory for slash-named streams before calling ensureIndex.
  • getEventStreamForCategory: filter extended to include streams starting with categoryName + '/' in addition to categoryName + '-'.
  • Updated docstring.

src/Storage/WritableStorage.js

  • getPartition: calls ensureDirectory on the partition parent directory when the partition name contains a /.

src/Storage/ReadableStorage.js

  • scanPartitions: replaced inline recursive directory-scan closure with a call to the generic scanForFilesSync utility, using a regex pattern to match files by the storageFile prefix and filter out .index, .branch, and .lock files.

test/EventStore.spec.js

  • Updated fs.readdir mock to handle both 2-arg and 3-arg call forms.
  • Added tests for hierarchical streams:
    • persistence across store re-open
    • category union of dash and slash streams
    • sub-category narrowing via getEventStreamForCategory("x/foo")
    • deep nesting (a/b/c, a/b/d) including async persistence
    • all-descendants semantics for category queries

docs/streams.md

  • Expanded the Stream Categories section to document both naming conventions side by side.
  • Explains why the flat category-id layout can degrade at scale (large number of files in a single directory).
  • Introduces the hierarchical category/id layout with an on-disk directory tree example.
  • Documents that getEventStreamForCategory transparently unions both layouts.
  • Added a hash-based sharding pattern (type/XX/YY/id) for very large entity populations (e.g. users on large platforms), showing a two-level hex-prefix scheme that distributes millions of streams across up to 65 536 balanced leaf directories.
  • Includes examples for querying a single shard (getEventStreamForCategory("user/a3")), all shards (getEventStreamForCategory("user")), reading a specific entity stream, and a UUID v4 variant using leading characters as the shard prefix.
  • Added a depth-selection table mapping expected entity counts to recommended hash levels.

Copilot AI and others added 2 commits April 30, 2026 18:36
Copy link
Copy Markdown
Owner

@albe albe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes from comments

Comment thread test/EventStore.spec.js Outdated
Comment thread test/EventStore.spec.js Outdated
Comment thread src/util.js Outdated
Copilot finished work on behalf of albe May 1, 2026 06:02
Copilot AI requested a review from albe May 1, 2026 06:02
Comment thread src/Storage/ReadableStorage.js Outdated
Copilot finished work on behalf of albe May 1, 2026 06:38
Copilot AI requested a review from albe May 1, 2026 06:38
Comment thread src/util.js
* @param {RegExp} regexPattern The pattern to match relative file paths against.
* @param {function(string)} onEach Called with the first capturing group (or full match) for each matching path.
*/
function scanForFilesSync(directory, regexPattern, onEach) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I see you implemented scanFilesSync to not change storage initialization, which so far has been a sync operation. I would like to unify things more than duplicate code.

There's two ways I see, making Storage initialization async with a 'ready' event like EventStore does, or make Storage.open() async and let it do the partition scanning lazily. However I do not fully see how either of that would affect the rest of the code base. From a gut feeling, having Storage behave more like EventStore is favorable though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated both approaches. Here's the concrete impact:

Option A – Async open() (move partition scan into open() with a callback)

  • ReadableStorage.open(), WritableStorage.open(), and ReadOnlyStorage.open() all become callback-based.
  • EventStore.initialize() needs to wait for the storage 'ready' event (or an open callback) before accessing partitions in checkUnfinishedCommits().
  • Storage.spec.js has ~144 .open() calls — most of those tests would need to become async (callback or Promise) to remain correct.
  • All direct Storage callers (Consumer.spec.js, stress tests) would also need updating.

Option B – Emit 'ready' from the constructor (like EventStore)

  • Similar scope: every caller that currently treats Storage as "immediately usable after construction + open()" would need to listen for 'ready' first.
  • Adds a required event-listener pattern to every Storage user, which is a larger API break.

Both options are meaningful architectural improvements but require changes across ~5 files and ~150 call sites. My recommendation is to keep the current scanForFilesSync approach in this PR (it's an isolated, self-contained utility), and pursue the async Storage initialization as a separate, focused PR where the full test surface can be properly updated. Happy to implement either option in a follow-up if you'd prefer.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot create a follow up PR based on this branch and implement option B there. Given Storage is 2nd order API a break might be okay in a minor version still, but that's t.b.d. Need to see the implementation and impact first

Copilot finished work on behalf of albe May 1, 2026 14:00
Copilot AI requested a review from albe May 1, 2026 14:00
@albe
Copy link
Copy Markdown
Owner

albe commented May 1, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 1, 2026

@copilot resolve the merge conflicts in this pull request

Done in 33152df. Merged the three changes from main (PR #300) into this branch:

  • WritableStorage.ensureIndex now accepts a reindex = true parameter and wraps the scan loop under it
  • EventStore.createEventStream forwards the reindex parameter to ensureIndex
  • EventStore.commit passes reindex = false when implicitly creating a new write stream
  • Added the corresponding new test

All 380 tests pass.

Copilot finished work on behalf of albe May 1, 2026 14:10
@albe
Copy link
Copy Markdown
Owner

albe commented May 1, 2026

@copilot resolve the merge conflicts in this pull request

…from main (PR #302)

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 1, 2026

@copilot resolve the merge conflicts in this pull request

Done in 22b4a6c. Merged the large refactor from main (PR #302 — IndexMatcher/PartitionPool):

  • src/util.js: exports both scanForFilesSync (ours) and getPropertyAtPath (main)
  • src/EventStore.js: kept the slash-stream ensureDirectory block
  • src/Storage/ReadableStorage.js: scanPartitions now uses partitions.add() (PartitionPool API) instead of plain-object assignment
  • src/Storage/WritableStorage.js: kept the ensureDirectory for slash-named partition subdirectories while also using partitions.add()

All 392 tests pass.

Copilot finished work on behalf of albe May 1, 2026 15:05
@albe
Copy link
Copy Markdown
Owner

albe commented May 1, 2026

@copilot update the documentation to explain the new category system and provide examples in where this is useful. Typically for managing entity streams where an arbitrarily large amount of entity instances can exist in the code, e.g. users on large platforms. In that case the stream category should contain the type prefix and a hash based subfolder schema with two or more levels

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 1, 2026

@copilot update the documentation to explain the new category system and provide examples in where this is useful. Typically for managing entity streams where an arbitrarily large amount of entity ins...

Done in 1c72b19. Updated docs/streams.md — the Stream Categories section now covers:

  • Flat category-id layout — the existing convention with its on-disk file structure, and an explanation of why flat directories can degrade performance at large scale.
  • Hierarchical category/id layout — how slash-separated names map to a directory tree on disk, with the same category query API working across both layouts.
  • Hash-based sharding — a two-level hex-prefix pattern (type/XX/YY/id) that distributes millions of entity streams across 65 536 balanced leaf directories, including code for numeric IDs and UUID v4 keys.
  • Sub-category narrowing — how to use getEventStreamForCategory("user/a3") to scope queries to a single shard, or getEventStreamForCategory("user") for all shards.
  • Depth selection table — guidance on choosing 1, 2, or 3 levels based on the expected entity population.

Copilot finished work on behalf of albe May 1, 2026 15:31
@coveralls
Copy link
Copy Markdown

coveralls commented May 1, 2026

Coverage Status

coverage: 98.106% (-0.007%) from 98.113% — copilot/add-folder-partitioning-scheme into main

Comment thread docs/streams.md Outdated
Comment thread docs/streams.md Outdated
Copilot finished work on behalf of albe May 1, 2026 17:42
Copilot AI requested a review from albe May 1, 2026 17:42
Copilot finished work on behalf of albe May 1, 2026 17:44
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 1, 2026

@albe I've opened a new pull request, #303, to work on those changes. Once the pull request is ready, I'll request review from you.

@albe albe marked this pull request as ready for review May 1, 2026 18:25
@albe albe merged commit 0087705 into main May 1, 2026
13 checks passed
@albe albe deleted the copilot/add-folder-partitioning-scheme branch May 1, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants