Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First round of the V4 Async Daemon #1712

Merged
merged 40 commits into from Jan 25, 2021
Merged

First round of the V4 Async Daemon #1712

merged 40 commits into from Jan 25, 2021

Conversation

jeremydmiller
Copy link
Member

It's not all the way back to 100% of what the current async daemon is, but the foundation is in place and I'd like this pulled in soon so we can knock down some issues.

What is working:

  • The new EventProjection can run in the new async daemon
  • The new ViewProjection and new AggregateProjection both work in the async daemon. Little different set of optimizations compared to more, "inline" projection types
  • ProjectionAgent is the new equivalent to the old ProjectionTrack. What was Fetcher is really combined into ProjectionAgent, and it's more "pull-based" to keep enough events queued up to keep the real projection humming along
  • Isn't tested yet, but the new async daemon can handle multi-tenanted events and documents
  • The new daemon uses the now standard .Net ILogger abstraction
  • Closes 3-4 issues along the way

…duplication in AggregateStream, deleted EventQueryHandler

_subscription = _tracker.Subscribe(this);

_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be much appreciated if you used the structured logging overloads for the ILogger integration. It's useful for those of us using structured sinks and log aggregation platforms like Sumologic.

Suggested change
_logger.LogInformation($"Projection agent for '{_projectionShard.ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {tracker.HighWaterMark}");
_logger.LogInformation("Projection agent for '{ProjectionOrShardName}' has started from sequence {lastCommitted} and a high water mark of {HighWaterMark}", _projectionShard.ProjectionOrShardName, lastCommitted, tracker.HighWaterMark);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hawxy I think that's a great idea, and had thought vaguely about that. Any chance you'd be interested in taking that on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍

{
_tenant = tenant;

_findSafeSequence = new NpgsqlCommand($@"select min(seq_id) from {graph.DatabaseSchemaName}.mt_events where mt_events.timestamp >= :timestamp");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider inedxing strategy for those columns (at least as opt-in). For huge number of events those queries may take some time. On the other hand indexes will slow down appending...
We could consider (or at least validate) if dedicated materialised views could help here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm not sure (yet) about all the assumptions (as I didn't go through all of he changes), but we might think about introducing some numeric offset commit instead of using the timestamp. Timestamps are dangerous for multinode environments might be hit with time skew issues between nodes (that may lead eg. with skipping some events in the worst edge case). From my experience, using numbers is more predictable and easier to troubleshoot. Also there may be useful for potential conflict management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp is assigned by the database, and the only possible usage of it is in this one case to detect stale, missing values in the sequence. We don't key off that otherwise.

And yes to the indexing. Another option is to shard the table by sequence range. I don't see how a materialized view would help with the frequency that this would be getting hit.

break;

case HighWaterStatus.Stale:
var safeHarborTime = _current.Timestamp.Add(_settings.LeadingEdgeBuffer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same consideration as above about the timestamp vs numeric offset and time skew as above #1712 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment. This is to slide around "stale missing sequence" values. We're not using this in any way where we need precision. The "safe harbor" time would be something like "just assume anything older than 5 seconds is good". Easier to talk about this in Zoom.

When the HighWaterDetector "finds" the statistics, it starts looking from the last previously known sequence. Keep that in mind here.

So the case here is:

  1. fetch the statistics once, the last good number is 1000, but the sequence is at 1300.
  2. fetch the statistics again on the regular polling interval, the last good number is 1000, but the sequence has advanced to 1500. We can tell that there's stale data where transactions are either being *really slow finishing up, but have already reserved the sequence numbers. The current async daemon sucks because you always have to be very careful about whether or not missing sequence numbers are real and just in flight, or the result of a failure. The new tombstone event thing should make that problem mostly go away, but only mostly. So at some point we have to assume that it's safe to pick at any events older than the "safe harbor time".
  3. wait a calculated moment, and look for the highest sequence number that w/o gaps higher than the "safe harbor" timestamp

At no time are we using that timestamp as the determination for what evens have been published or not, except for the stale sequence gap issue. Which shouldn't happen unless you get funky database connectivity issues. And even there, you're not needing a lot of precision in the timestamp values, so I'm not concerned with the node drift issue.

return statistics;
}

private async Task calculateHighWaterMark(CancellationToken token, HighWaterStatistics statistics,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: I think that it would be good to keep those calculations in the HighWaterStatistics class nad make setters for those properties private. It would be easier to track what's changed where in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand why you're saying that, but I think that's just some extra complexity that would never pay off.

@jeremydmiller jeremydmiller merged commit a482722 into master Jan 25, 2021
@jeremydmiller jeremydmiller deleted the daemon branch January 25, 2021 14:46
Leh2 added a commit to Leh2/marten that referenced this pull request Apr 27, 2022
jeremydmiller pushed a commit that referenced this pull request May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants