New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First round of the V4 Async Daemon #1712
Conversation
…duplication in AggregateStream, deleted EventQueryHandler
…an aggregate with NodeAgent
|
||
_subscription = _tracker.Subscribe(this); | ||
|
||
_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be much appreciated if you used the structured logging overloads for the ILogger
integration. It's useful for those of us using structured sinks and log aggregation platforms like Sumologic.
_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}"); | |
_logger.LogInformation("Projection agent for '{ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {HighWaterMark}", _projectionShard.ProjectionOrShardName, lastCommitted, tracker.HighWaterMark); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hawxy I think that's a great idea, and had thought vaguely about that. Any chance you'd be interested in taking that on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍
{ | ||
_tenant = tenant; | ||
|
||
_findSafeSequence = new NpgsqlCommand($@"select min(seq_id) from {graph.DatabaseSchemaName}.mt_events where mt_events.timestamp >= :timestamp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider inedxing strategy for those columns (at least as opt-in). For huge number of events those queries may take some time. On the other hand indexes will slow down appending...
We could consider (or at least validate) if dedicated materialised views could help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'm not sure (yet) about all the assumptions (as I didn't go through all of he changes), but we might think about introducing some numeric offset commit instead of using the timestamp. Timestamps are dangerous for multinode environments might be hit with time skew issues between nodes (that may lead eg. with skipping some events in the worst edge case). From my experience, using numbers is more predictable and easier to troubleshoot. Also there may be useful for potential conflict management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp is assigned by the database, and the only possible usage of it is in this one case to detect stale, missing values in the sequence. We don't key off that otherwise.
And yes to the indexing. Another option is to shard the table by sequence range. I don't see how a materialized view would help with the frequency that this would be getting hit.
break; | ||
|
||
case HighWaterStatus.Stale: | ||
var safeHarborTime = _current.Timestamp.Add(_settings.LeadingEdgeBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same consideration as above about the timestamp vs numeric offset and time skew as above #1712 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment. This is to slide around "stale missing sequence" values. We're not using this in any way where we need precision. The "safe harbor" time would be something like "just assume anything older than 5 seconds is good". Easier to talk about this in Zoom.
When the HighWaterDetector "finds" the statistics, it starts looking from the last previously known sequence. Keep that in mind here.
So the case here is:
- fetch the statistics once, the last good number is 1000, but the sequence is at 1300.
- fetch the statistics again on the regular polling interval, the last good number is 1000, but the sequence has advanced to 1500. We can tell that there's stale data where transactions are either being *really slow finishing up, but have already reserved the sequence numbers. The current async daemon sucks because you always have to be very careful about whether or not missing sequence numbers are real and just in flight, or the result of a failure. The new tombstone event thing should make that problem mostly go away, but only mostly. So at some point we have to assume that it's safe to pick at any events older than the "safe harbor time".
- wait a calculated moment, and look for the highest sequence number that w/o gaps higher than the "safe harbor" timestamp
At no time are we using that timestamp as the determination for what evens have been published or not, except for the stale sequence gap issue. Which shouldn't happen unless you get funky database connectivity issues. And even there, you're not needing a lot of precision in the timestamp values, so I'm not concerned with the node drift issue.
return statistics; | ||
} | ||
|
||
private async Task calculateHighWaterMark(CancellationToken token, HighWaterStatistics statistics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SUGGESTION: I think that it would be good to keep those calculations in the HighWaterStatistics class nad make setters for those properties private. It would be easier to track what's changed where in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely understand why you're saying that, but I think that's just some extra complexity that would never pay off.
It's not all the way back to 100% of what the current async daemon is, but the foundation is in place and I'd like this pulled in soon so we can knock down some issues.
What is working:
EventProjection
can run in the new async daemonViewProjection
and newAggregateProjection
both work in the async daemon. Little different set of optimizations compared to more, "inline" projection typesProjectionAgent
is the new equivalent to the oldProjectionTrack
. What wasFetcher
is really combined intoProjectionAgent
, and it's more "pull-based" to keep enough events queued up to keep the real projection humming alongILogger
abstraction