First round of the V4 Async Daemon #1712

jeremydmiller · 2021-01-22T23:34:41Z

It's not all the way back to 100% of what the current async daemon is, but the foundation is in place and I'd like this pulled in soon so we can knock down some issues.

What is working:

The new EventProjection can run in the new async daemon
The new ViewProjection and new AggregateProjection both work in the async daemon. Little different set of optimizations compared to more, "inline" projection types
ProjectionAgent is the new equivalent to the old ProjectionTrack. What was Fetcher is really combined into ProjectionAgent, and it's more "pull-based" to keep enough events queued up to keep the real projection humming along
Isn't tested yet, but the new async daemon can handle multi-tenanted events and documents
The new daemon uses the now standard .Net ILogger abstraction
Closes 3-4 issues along the way

…on. Closes GH-1699

…duplication in AggregateStream, deleted EventQueryHandler

…g pattern

…erface

…change logic

…an aggregate with NodeAgent

… type

… projection

…daemon. Closes GH-1690

Hawxy · 2021-01-23T15:01:10Z

src/Marten/Events/Daemon/ProjectionAgent.cs

+
+            _subscription = _tracker.Subscribe(this);
+
+            _logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}");


I'd be much appreciated if you used the structured logging overloads for the ILogger integration. It's useful for those of us using structured sinks and log aggregation platforms like Sumologic.

Suggested change

_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}");

_logger.LogInformation("Projection agent for '{ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {HighWaterMark}", _projectionShard.ProjectionOrShardName, lastCommitted, tracker.HighWaterMark);

@Hawxy I think that's a great idea, and had thought vaguely about that. Any chance you'd be interested in taking that on?

oskardudycz · 2021-01-24T09:10:17Z

src/Marten/Events/Daemon/HighWater/HighWaterDetector.cs

+        {
+            _tenant = tenant;
+
+            _findSafeSequence = new NpgsqlCommand($@"select min(seq_id) from {graph.DatabaseSchemaName}.mt_events where mt_events.timestamp >= :timestamp");


We should consider inedxing strategy for those columns (at least as opt-in). For huge number of events those queries may take some time. On the other hand indexes will slow down appending...
We could consider (or at least validate) if dedicated materialised views could help here.

Also I'm not sure (yet) about all the assumptions (as I didn't go through all of he changes), but we might think about introducing some numeric offset commit instead of using the timestamp. Timestamps are dangerous for multinode environments might be hit with time skew issues between nodes (that may lead eg. with skipping some events in the worst edge case). From my experience, using numbers is more predictable and easier to troubleshoot. Also there may be useful for potential conflict management.

The timestamp is assigned by the database, and the only possible usage of it is in this one case to detect stale, missing values in the sequence. We don't key off that otherwise.

And yes to the indexing. Another option is to shard the table by sequence range. I don't see how a materialized view would help with the frequency that this would be getting hit.

src/Marten/Events/Daemon/HighWater/HighWaterDetector.cs

oskardudycz · 2021-01-24T13:58:21Z

src/Marten/Events/Daemon/HighWater/HighWaterAgent.cs

+                        break;
+
+                    case HighWaterStatus.Stale:
+                        var safeHarborTime = _current.Timestamp.Add(_settings.LeadingEdgeBuffer);


Same consideration as above about the timestamp vs numeric offset and time skew as above #1712 (comment).

Same comment. This is to slide around "stale missing sequence" values. We're not using this in any way where we need precision. The "safe harbor" time would be something like "just assume anything older than 5 seconds is good". Easier to talk about this in Zoom.

When the HighWaterDetector "finds" the statistics, it starts looking from the last previously known sequence. Keep that in mind here.

So the case here is:

fetch the statistics once, the last good number is 1000, but the sequence is at 1300.

fetch the statistics again on the regular polling interval, the last good number is 1000, but the sequence has advanced to 1500. We can tell that there's stale data where transactions are either being *really slow finishing up, but have already reserved the sequence numbers. The current async daemon sucks because you always have to be very careful about whether or not missing sequence numbers are real and just in flight, or the result of a failure. The new tombstone event thing should make that problem mostly go away, but only mostly. So at some point we have to assume that it's safe to pick at any events older than the "safe harbor time".

wait a calculated moment, and look for the highest sequence number that w/o gaps higher than the "safe harbor" timestamp

At no time are we using that timestamp as the determination for what evens have been published or not, except for the stale sequence gap issue. Which shouldn't happen unless you get funky database connectivity issues. And even there, you're not needing a lot of precision in the timestamp values, so I'm not concerned with the node drift issue.

oskardudycz · 2021-01-24T14:09:51Z

src/Marten/Events/Daemon/HighWater/HighWaterDetector.cs

+            return statistics;
+        }
+
+        private async Task calculateHighWaterMark(CancellationToken token, HighWaterStatistics statistics,


SUGGESTION: I think that it would be good to keep those calculations in the HighWaterStatistics class nad make setters for those properties private. It would be easier to track what's changed where in the future.

I completely understand why you're saying that, but I think that's just some extra complexity that would never pay off.

src/Marten/Events/Daemon/HighWater/HighWaterStatistics.cs

jeremydmiller added 30 commits January 21, 2021 09:40

Spiked in the IncrementalUpdateBatch for the daemon updates

6a2390d

Spiked in more work on AggregatedPage to deal with multi-tenancy

1ee609b

Daemon spiking, and introduced the new AggregationRuntime class

54089d0

All the necessary code to update and query event projection progressi…

9d66795

…on. Closes GH-1699

Minor reorganization of async projection code

75daefd

Using EventRange for the projection progress updates

096ab78

Spiked a little more on ProjectionAgent, roughed in EventStatement

9706e5d

Introduced EventStatement, placed behind AggregateStream, eliminated …

9b1a2a6

…duplication in AggregateStream, deleted EventQueryHandler

Refactoring the shape of the event slicer to reflect the data fetchin…

eb3bd53

…g pattern

Built out the new ProjectionController

ea09adb

More logic on ProjectionController

3d7db2b

Lots of spiking on ProjectionAgent

031e865

Roughed in event test data publishing for the daemon

4e66a8f

Event store statistics fetching. Closes GH-1704. Closes GH-1705

8b08ce0

Working up test harness code for the async daemon

39e8a78

Daemon test harness can build and fetch expected aggregates

9ef5d23

Simple tests on EventFetcher

73cd7df

New EventTypeFilter for the async daemon

28e0480

ShardStateTracker tests

f8d1eaa

Integrating ShardStateTracker with ProjectionAgent

09dffa2

IInlineProjection became IProjection, new IAsyncCapableProjection int…

ef327ba

…erface

First successful test of the new ProjectionAgent!

1b7a7bb

adding logging to the async daemon code

ed178ce

Async aggregations run one tenant at a time, in parallel

fbc539c

Passing a CancellationToken through the ProjectionAgent

12b23a8

Initial cut at the new HighWaterDetector

b312528

Pulling the high water detector types into their own files

dcc077d

Little bit of logic to determine how to handle the high water status …

96dc3f8

…change logic

Pushed through the HighWaterAgent, end to end daemon functioning for …

429031a

…an aggregate with NodeAgent

Using a long for Event version after rebasing on the new long version…

97be44c

… type

jeremydmiller added 9 commits January 21, 2021 12:07

Refactored DocumentSession to depend on the new IWorkTracker concept

5dd4c4c

Roughed in the new ProjectionDocumentSession

25edd2e

New base classes for custom projections, and using them for the event…

1c3d942

… projection

Yanked out the new common AsyncProjectionShardBase

657ab59

First end to end test of an EventProjection running asynchronously

181eea3

Removed some duplication in the Async Daemon tests

ef70624

First end to end test of the new ViewProjection running in the async …

f9415ba

…daemon. Closes GH-1690

fixes for daemon related tests when running in the whole build

d7a2907

making a test smaller to make it run better on CI

c2e19de

jeremydmiller requested review from mysticmind and oskardudycz January 22, 2021 23:34

Hawxy reviewed Jan 23, 2021

View reviewed changes

oskardudycz reviewed Jan 24, 2021

View reviewed changes

src/Marten/Events/Daemon/HighWater/HighWaterDetector.cs Show resolved Hide resolved

oskardudycz reviewed Jan 24, 2021

View reviewed changes

src/Marten/Events/Daemon/HighWater/HighWaterStatistics.cs Show resolved Hide resolved

Moved the async daemon tests into their own test lib

bf8a679

jeremydmiller merged commit a482722 into master Jan 25, 2021

jeremydmiller deleted the daemon branch January 25, 2021 14:46

Leh2 added a commit to Leh2/marten that referenced this pull request Apr 27, 2022

Add description from JasperFx#1712 (comment)

30bb580

jeremydmiller pushed a commit that referenced this pull request May 2, 2022

Add description from #1712 (comment)

f4fb5b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First round of the V4 Async Daemon #1712

First round of the V4 Async Daemon #1712

jeremydmiller commented Jan 22, 2021

Hawxy Jan 23, 2021

jeremydmiller Jan 23, 2021

Hawxy Jan 24, 2021

oskardudycz Jan 24, 2021 •

edited

oskardudycz Jan 24, 2021 •

edited

jeremydmiller Jan 24, 2021

oskardudycz Jan 24, 2021

jeremydmiller Jan 24, 2021

oskardudycz Jan 24, 2021 •

edited

jeremydmiller Jan 24, 2021


		_subscription = _tracker.Subscribe(this);

		_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}");

	_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}");
	_logger.LogInformation("Projection agent for '{ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {HighWaterMark}", _projectionShard.ProjectionOrShardName, lastCommitted, tracker.HighWaterMark);

First round of the V4 Async Daemon #1712

First round of the V4 Async Daemon #1712

Conversation

jeremydmiller commented Jan 22, 2021

Hawxy Jan 23, 2021

Choose a reason for hiding this comment

jeremydmiller Jan 23, 2021

Choose a reason for hiding this comment

Hawxy Jan 24, 2021

Choose a reason for hiding this comment

oskardudycz Jan 24, 2021 • edited

Choose a reason for hiding this comment

oskardudycz Jan 24, 2021 • edited

Choose a reason for hiding this comment

jeremydmiller Jan 24, 2021

Choose a reason for hiding this comment

oskardudycz Jan 24, 2021

Choose a reason for hiding this comment

jeremydmiller Jan 24, 2021

Choose a reason for hiding this comment

oskardudycz Jan 24, 2021 • edited

Choose a reason for hiding this comment

jeremydmiller Jan 24, 2021

Choose a reason for hiding this comment

oskardudycz Jan 24, 2021 •

edited

oskardudycz Jan 24, 2021 •

edited

oskardudycz Jan 24, 2021 •

edited