Overhaul Command Execution Pipeline #2887

jeremydmiller · 2024-01-03T20:09:53Z

I'm subsuming a couple older issues here about retry policies (#2265), Hot Chocolate integration (#2738), and more advanced Npgsql usage #2787.

jeremydmiller · 2024-01-03T22:14:17Z

Goals

Make the retry mechanics be more efficient. Depending on Lambda closures generates a lot of object allocations we could eliminate
Utilize ADO.Net command batching (NpgsqlBatch) as that's more efficient than our old CommandBuilder mechanism from Marten 1.0
Improve throughput by moving to positional parameters in the SQL construction. Only hurdle is the tenant id parameter mechanics
The current retry mechanics don't actually work because they don't reset the connection
Reevaluate the database connection lifecycle, especially the sticky connection to session mechanics from the 1.0 days
Improve Marten's usability within Hot Chocolate, and that might mean ending the stick connection thing
Reevaluate whether and how we allow for users to execute arbitrary SQL using a session. Easy to queue up commands now w/ the QueueSqlCommand(). See what folks actually do now.
Reevaluate whether or not we even care about Dapper integration anymore
Maybe in the docs encourage folks to use Task<int> ExecuteAsync(NpgsqlCommand command, CancellationToken token = new())
Use prepared statements on everything?

Changing Session Execution Mechanics

The execution of commands today is done with a combination of IConnectionLifetime that does very low level ADO.Net manipulations w/ a couple different flavors for Marten -owned transactions and ambient transactions among other options.

Proposal:

Replace the existing IConnectionLifetime abstraction with this new one:

// Think of a better name maybe, but I'm proposing this for the
// next version
public interface ISessionExecutor
{
    // These methods are on IQuerySession today. Not sure, but might be best to have these
    // here.

    // Wrap *ALL* transactional boundary stuff here. So it manages all transactional
    // boundaries
    Task ExecutePagesAsync(IReadOnlyList<OperationPage> pages, IMartenSessionLogger logger, CancellationToken token);

    // And also overloads of these 4 for NpgsqlDbBatch

    int Execute(NpgsqlCommand cmd);
    Task<int> ExecuteAsync(NpgsqlCommand command, CancellationToken token = new());
    DbDataReader ExecuteReader(NpgsqlCommand command);
    Task<DbDataReader> ExecuteReaderAsync(NpgsqlCommand command, CancellationToken token = default);
}

The big difference is that the implementations of this new ISessionExecutor will completely manage the connection or DbDataSource mechanics and implement the transactional boundary mechanics. Any new retry policy implementation will work outside of this abstraction. So no granular retry on just the commit or rollback calls. This approach may easily lead to some duplication between implementations as we pull a lot of current workflow from QuerySession.ExecuteDataReaderAsync() type methods. I say that's worth it for the simplification

As part of that change, the current methods in QuerySession for executing commands get a lot simpler, and mostly just delegate to whatever the retry mechanics end up being and the inner ISessionExecutor

As for ISessionExecutor, I think we have these implementations:

DefaultSessionExecutor -- replaces our old MartenControlledConnectionTransaction, but this time uses DbDataSource for all read queries and does a quick open connection, execute batch, commit or rollback, then close connection for applying updates. So we bypass the whole sticky connection thing we have today that causes us so much havoc with Hot Chocolate integration. It also makes Marten less problematic for folks who just can't get their crap together to call Dispose() on sessions.
SerializedSessionExecutor for folks that start a session with a serializable transaction
ExternalSessionExecutor to replace the existing ExternalTransaction that wraps around user supplied connections and transactions. May or may not close
AmbientTransactionExecutor to replace AmbientTransactionLifetime

Error Handling

This interface has to go, no graceful [Obsolete] period, just gone:

public interface IRetryPolicy
{
    void Execute(Action operation);
    TResult Execute<TResult>(Func<TResult> operation);
    Task ExecuteAsync(Func<Task> operation, CancellationToken cancellationToken);
    Task<TResult> ExecuteAsync<TResult>(Func<Task<TResult>> operation, CancellationToken cancellationToken);
}

We'll have to benchmark it to know for sure, but I'd guess that this is a relatively high drag on our performance because of
all the object allocations caused by the captured closures. I have two possible suggestions:

Use Polly internally and make sure we've got a way to allow users to configure the Polly policies for the Marten. Build in a decent default. Make damn sure that we're using static lambdas whenever we do use Polly
Rip off the error handling IContinuation model from Wolverine, specifically how it handles errors in inline executions

I was leery of using Polly before because it's a common diamond dependency problem because of its ubiquity. Honestly, I'd rather just bake it in and make sure it has a default exponential backoff policy for common transient issues. Makes our documentation effort much simpler.

Necessary Spikes

Play with NpgsqlDbBatch and see what its limitations are with really big number of queries. Do we need to page it like we do today?
Use @gfoidl 's BatchBuilder and see if that could work in Weasel around DbDataBatch
Plan on adding some kind of tags to CommandBuilder or the new BatchBuilder as a way of tracking whether or not
we've used the tenant_id parameter or need to. That might reduce some overhead over what we do now
Try to use positional parameters in CommandBuilder or the new BatchBuilder. Figure out how to eliminate the current named parameter usage for tenant id

gfoidl · 2024-01-04T11:51:25Z

Improve throughput by moving to positional parameters in the SQL construction. Only hurdle is the tenant id parameter mechanics

Naive idea: always use tenant id, but if conjoined tenancy is disabled use the *Default* tenancy.
Is that feasable?

QueueSqlCommand(). See what folks actually do now.

I use the QueueSqlCommand and I hope that is kept, as it's within the same transaction. Actually it's the "same transaction" thing that counts.

If outside of a session, then I just use the NpgsqlDataSource (from which I get the connection in Marten 6.x).

Use prepared statements on everything?

I'd use that only for event related operations, not for everything by default. IIRC too much preparation can hurt too, but here I might be wrong.

jeremydmiller · 2024-01-04T12:36:50Z

@gfoidl The tenant_id column doesn't exist unless you opt into conjoined tenancy, so no, that's not feasible. And even after that, there's the possibility of AnyTenant() queries, so that wouldn't work even if the column was always there. It's always something:-)

I agree with you about QueueSqlCommand(), but the main reason we still hang the Connection property off of the session is either for folks using Dapper with Marten or for folks who depended on it before QueueSqlCommand() existed.

oskardudycz · 2024-01-05T13:54:24Z

@jeremydmiller, dumb idea, what if we add tenant id to all tables? Is the issue with tenant id, that we need to have always conditional code to add it or not?

Regarding the scope of changes, I'm definitely fine with replacing the homebrewed IRetryPolicy with Polly, especially after recent improvements and using it internally by MS.

Regarding the other changes, I'm unsure as I'd need to understand the proposal in more detail. What I can tell is that creating a new connection is costly. Not sharing it can lead to performance degradation. Other thing to add is that we should be really careful in default retry policy about that. I saw a few times when wrong settings caused ripple effect when the database was down (even not allowing it to get up because of flooding it with the open connections). Definitely we should be more explicit to our users about Pooling and test how it behaves both for Npgsql pool and PgBouncer.

I'd be happy to get rid of the CommandBuilder static methods in Weasel. Using static PostgreSQLProvider will bite us hard in the long term, so if this could help in that, I'd be more than happy.

So I'm not against the proposal, I'd need to see more details. We'd need to spend time benchmarking it. I think that it'd be good to take some inspiration from EF on how they're dealing with it. I expect that they have that optimised.

Also the other thing to consider (potentially out of scope) is to distinguish querying from update operations. I mean by that, that if we know that operations are read-only then it'd be great to be able to use read replicas or optimisation on the connection level. That can have significant performance improvements. Maybe we should also include that in the API?

I'm also up to using prepared statements, they can increase performance. We'd need to have a good roundtrip of tests, as if they're wrongly cached it can create random issues.

jeremydmiller · 2024-01-05T14:48:15Z

@oskardudycz, thanks for the review and the comments!

I'll have to think on the tenant_id column by default. It jacks up the storage unnecessarily when it's not necessary, and I don't think it really helps us on the querying side. Would simplify some existing conditional logic and generation today, but that's going to be a bit of overhead to change that
I think it would be worthwhile to resurrect the old BenchmarkDotNet suite we had a million years ago. Start with some kind of baseline on the existing 6.* branch. I'd thought about that, but dread dealing with BenchmarkDotNet
"I'd be happy to get rid of the CommandBuilder static methods in Weasel" -- which static methods? The extension methods, or something else?
Regardless, we'll want a new BatchBuilder replacement for the existing CommandBuilder. I think this time we don't try
to share code between Sql Server & Postgresql, especially since the Postgresql version will use positional parameters that I
can't tell if Sql Server even supports

"Also the other thing to consider (potentially out of scope) is to distinguish querying from update operations. I mean by that, that if we know that operations are read-only then it'd be great to be able to use read replicas or optimisation on the connection level. That can have significant performance improvements. Maybe we should also include that in the API?"

-- I'd just have to learn a bit more about what that would mean. Sounds good on the face of things

jeremydmiller · 2024-01-09T00:29:25Z

jeremydmiller · 2024-01-19T20:35:54Z

Punchlist

Execute benchmarks against 6.4 and the latest code. Jeremy has a local project that's already started this work
Utilize Polly in all the IConnectionLifetime implementations. Be careful to use static lambdas in all cases
Fine tune the Polly predicate. Utilize exponential backoff in the default
Documentation in migration guide for connection lifecycle changes
Documentation for customizing Polly resiliency
New configuration for StoreOptions.CommandTimeout as an alternative to the connection string approach

jeremydmiller · 2024-01-19T20:38:35Z

See this draft PR: #2918

Hasn't been rebased on master yet. Punchlist up above yet to go.

jeremydmiller · 2024-01-25T00:22:55Z

There are separate issues now for the docs and last couple tasks. Closing for now.

jeremydmiller added breaking change connection mangement 7.x labels Jan 3, 2024

jeremydmiller added this to the 7.0.0 milestone Jan 3, 2024

This was referenced Jan 3, 2024

Improve transient error retry policies #2265

Closed

Usage of new Npgsql features #2787

Closed

Optimized Marten w/ Hot Chocolate Recipe #2738

Closed

jeremydmiller mentioned this issue Jan 4, 2024

Marten Session closes connection on sql error making it unusable after an error #2752

Closed

jeremydmiller closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul Command Execution Pipeline #2887

Overhaul Command Execution Pipeline #2887

jeremydmiller commented Jan 3, 2024 •

edited

Loading

jeremydmiller commented Jan 3, 2024 •

edited

Loading

gfoidl commented Jan 4, 2024

jeremydmiller commented Jan 4, 2024

oskardudycz commented Jan 5, 2024 •

edited

Loading

jeremydmiller commented Jan 5, 2024

jeremydmiller commented Jan 9, 2024 •

edited

Loading

jeremydmiller commented Jan 19, 2024 •

edited

Loading

jeremydmiller commented Jan 19, 2024

jeremydmiller commented Jan 25, 2024

Overhaul Command Execution Pipeline #2887

Overhaul Command Execution Pipeline #2887

Comments

jeremydmiller commented Jan 3, 2024 • edited Loading

jeremydmiller commented Jan 3, 2024 • edited Loading

Goals

Changing Session Execution Mechanics

Proposal:

Error Handling

Necessary Spikes

gfoidl commented Jan 4, 2024

jeremydmiller commented Jan 4, 2024

oskardudycz commented Jan 5, 2024 • edited Loading

jeremydmiller commented Jan 5, 2024

jeremydmiller commented Jan 9, 2024 • edited Loading

Tasks on Just Batching / Positional Parameters (Jan. 8th)

jeremydmiller commented Jan 19, 2024 • edited Loading

Punchlist

jeremydmiller commented Jan 19, 2024

jeremydmiller commented Jan 25, 2024

jeremydmiller commented Jan 3, 2024 •

edited

Loading

jeremydmiller commented Jan 3, 2024 •

edited

Loading

oskardudycz commented Jan 5, 2024 •

edited

Loading

jeremydmiller commented Jan 9, 2024 •

edited

Loading

jeremydmiller commented Jan 19, 2024 •

edited

Loading