Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul Command Execution Pipeline #2887

Closed
jeremydmiller opened this issue Jan 3, 2024 · 9 comments
Closed

Overhaul Command Execution Pipeline #2887

jeremydmiller opened this issue Jan 3, 2024 · 9 comments

Comments

@jeremydmiller
Copy link
Member

jeremydmiller commented Jan 3, 2024

I'm subsuming a couple older issues here about retry policies (#2265), Hot Chocolate integration (#2738), and more advanced Npgsql usage #2787.

@jeremydmiller
Copy link
Member Author

jeremydmiller commented Jan 3, 2024

Goals

  • Make the retry mechanics be more efficient. Depending on Lambda closures generates a lot of object allocations we could eliminate
  • Utilize ADO.Net command batching (NpgsqlBatch) as that's more efficient than our old CommandBuilder mechanism from Marten 1.0
  • Improve throughput by moving to positional parameters in the SQL construction. Only hurdle is the tenant id parameter mechanics
  • The current retry mechanics don't actually work because they don't reset the connection
  • Reevaluate the database connection lifecycle, especially the sticky connection to session mechanics from the 1.0 days
  • Improve Marten's usability within Hot Chocolate, and that might mean ending the stick connection thing
  • Reevaluate whether and how we allow for users to execute arbitrary SQL using a session. Easy to queue up commands now w/ the QueueSqlCommand(). See what folks actually do now.
  • Reevaluate whether or not we even care about Dapper integration anymore
  • Maybe in the docs encourage folks to use Task<int> ExecuteAsync(NpgsqlCommand command, CancellationToken token = new())
  • Use prepared statements on everything?

Changing Session Execution Mechanics

The execution of commands today is done with a combination of IConnectionLifetime that does very low level ADO.Net manipulations w/ a couple different flavors for Marten -owned transactions and ambient transactions among other options.

Proposal:

Replace the existing IConnectionLifetime abstraction with this new one:

// Think of a better name maybe, but I'm proposing this for the
// next version
public interface ISessionExecutor
{
    // These methods are on IQuerySession today. Not sure, but might be best to have these
    // here.

    // Wrap *ALL* transactional boundary stuff here. So it manages all transactional
    // boundaries
    Task ExecutePagesAsync(IReadOnlyList<OperationPage> pages, IMartenSessionLogger logger, CancellationToken token);

    // And also overloads of these 4 for NpgsqlDbBatch

    int Execute(NpgsqlCommand cmd);
    Task<int> ExecuteAsync(NpgsqlCommand command, CancellationToken token = new());
    DbDataReader ExecuteReader(NpgsqlCommand command);
    Task<DbDataReader> ExecuteReaderAsync(NpgsqlCommand command, CancellationToken token = default);
}

The big difference is that the implementations of this new ISessionExecutor will completely manage the connection or DbDataSource mechanics and implement the transactional boundary mechanics. Any new retry policy implementation will work outside of this abstraction. So no granular retry on just the commit or rollback calls. This approach may easily lead to some duplication between implementations as we pull a lot of current workflow from QuerySession.ExecuteDataReaderAsync() type methods. I say that's worth it for the simplification

As part of that change, the current methods in QuerySession for executing commands get a lot simpler, and mostly just delegate to whatever the retry mechanics end up being and the inner ISessionExecutor

As for ISessionExecutor, I think we have these implementations:

  • DefaultSessionExecutor -- replaces our old MartenControlledConnectionTransaction, but this time uses DbDataSource for all read queries and does a quick open connection, execute batch, commit or rollback, then close connection for applying updates. So we bypass the whole sticky connection thing we have today that causes us so much havoc with Hot Chocolate integration. It also makes Marten less problematic for folks who just can't get their crap together to call Dispose() on sessions.
  • SerializedSessionExecutor for folks that start a session with a serializable transaction
  • ExternalSessionExecutor to replace the existing ExternalTransaction that wraps around user supplied connections and transactions. May or may not close
  • AmbientTransactionExecutor to replace AmbientTransactionLifetime

Error Handling

This interface has to go, no graceful [Obsolete] period, just gone:

public interface IRetryPolicy
{
    void Execute(Action operation);
    TResult Execute<TResult>(Func<TResult> operation);
    Task ExecuteAsync(Func<Task> operation, CancellationToken cancellationToken);
    Task<TResult> ExecuteAsync<TResult>(Func<Task<TResult>> operation, CancellationToken cancellationToken);
}

We'll have to benchmark it to know for sure, but I'd guess that this is a relatively high drag on our performance because of
all the object allocations caused by the captured closures. I have two possible suggestions:

  1. Use Polly internally and make sure we've got a way to allow users to configure the Polly policies for the Marten. Build in a decent default. Make damn sure that we're using static lambdas whenever we do use Polly
  2. Rip off the error handling IContinuation model from Wolverine, specifically how it handles errors in inline executions

I was leery of using Polly before because it's a common diamond dependency problem because of its ubiquity. Honestly, I'd rather just bake it in and make sure it has a default exponential backoff policy for common transient issues. Makes our documentation effort much simpler.

Necessary Spikes

  • Play with NpgsqlDbBatch and see what its limitations are with really big number of queries. Do we need to page it like we do today?
  • Use @gfoidl 's BatchBuilder and see if that could work in Weasel around DbDataBatch
  • Plan on adding some kind of tags to CommandBuilder or the new BatchBuilder as a way of tracking whether or not
    we've used the tenant_id parameter or need to. That might reduce some overhead over what we do now
  • Try to use positional parameters in CommandBuilder or the new BatchBuilder. Figure out how to eliminate the current named parameter usage for tenant id

@gfoidl
Copy link
Contributor

gfoidl commented Jan 4, 2024

Improve throughput by moving to positional parameters in the SQL construction. Only hurdle is the tenant id parameter mechanics

Naive idea: always use tenant id, but if conjoined tenancy is disabled use the *Default* tenancy.
Is that feasable?

QueueSqlCommand(). See what folks actually do now.

I use the QueueSqlCommand and I hope that is kept, as it's within the same transaction. Actually it's the "same transaction" thing that counts.

If outside of a session, then I just use the NpgsqlDataSource (from which I get the connection in Marten 6.x).

Use prepared statements on everything?

I'd use that only for event related operations, not for everything by default. IIRC too much preparation can hurt too, but here I might be wrong.

@jeremydmiller
Copy link
Member Author

@gfoidl The tenant_id column doesn't exist unless you opt into conjoined tenancy, so no, that's not feasible. And even after that, there's the possibility of AnyTenant() queries, so that wouldn't work even if the column was always there. It's always something:-)

I agree with you about QueueSqlCommand(), but the main reason we still hang the Connection property off of the session is either for folks using Dapper with Marten or for folks who depended on it before QueueSqlCommand() existed.

@oskardudycz
Copy link
Collaborator

oskardudycz commented Jan 5, 2024

@jeremydmiller, dumb idea, what if we add tenant id to all tables? Is the issue with tenant id, that we need to have always conditional code to add it or not?

Regarding the scope of changes, I'm definitely fine with replacing the homebrewed IRetryPolicy with Polly, especially after recent improvements and using it internally by MS.

Regarding the other changes, I'm unsure as I'd need to understand the proposal in more detail. What I can tell is that creating a new connection is costly. Not sharing it can lead to performance degradation. Other thing to add is that we should be really careful in default retry policy about that. I saw a few times when wrong settings caused ripple effect when the database was down (even not allowing it to get up because of flooding it with the open connections). Definitely we should be more explicit to our users about Pooling and test how it behaves both for Npgsql pool and PgBouncer.

I'd be happy to get rid of the CommandBuilder static methods in Weasel. Using static PostgreSQLProvider will bite us hard in the long term, so if this could help in that, I'd be more than happy.

So I'm not against the proposal, I'd need to see more details. We'd need to spend time benchmarking it. I think that it'd be good to take some inspiration from EF on how they're dealing with it. I expect that they have that optimised.

Also the other thing to consider (potentially out of scope) is to distinguish querying from update operations. I mean by that, that if we know that operations are read-only then it'd be great to be able to use read replicas or optimisation on the connection level. That can have significant performance improvements. Maybe we should also include that in the API?

I'm also up to using prepared statements, they can increase performance. We'd need to have a good roundtrip of tests, as if they're wrongly cached it can create random issues.

@jeremydmiller
Copy link
Member Author

@oskardudycz, thanks for the review and the comments!

  • I'll have to think on the tenant_id column by default. It jacks up the storage unnecessarily when it's not necessary, and I don't think it really helps us on the querying side. Would simplify some existing conditional logic and generation today, but that's going to be a bit of overhead to change that
  • I think it would be worthwhile to resurrect the old BenchmarkDotNet suite we had a million years ago. Start with some kind of baseline on the existing 6.* branch. I'd thought about that, but dread dealing with BenchmarkDotNet
  • "I'd be happy to get rid of the CommandBuilder static methods in Weasel" -- which static methods? The extension methods, or something else?
  • Regardless, we'll want a new BatchBuilder replacement for the existing CommandBuilder. I think this time we don't try
    to share code between Sql Server & Postgresql, especially since the Postgresql version will use positional parameters that I
    can't tell if Sql Server even supports

"Also the other thing to consider (potentially out of scope) is to distinguish querying from update operations. I mean by that, that if we know that operations are read-only then it'd be great to be able to use read replicas or optimisation on the connection level. That can have significant performance improvements. Maybe we should also include that in the API?"

-- I'd just have to learn a bit more about what that would mean. Sounds good on the face of things

@jeremydmiller
Copy link
Member Author

jeremydmiller commented Jan 9, 2024

Tasks on Just Batching / Positional Parameters (Jan. 8th)

  • Lift StartNewCommand() from BatchQuery to ICommandBuilder
  • Statement should call StartNewCommand() on apply if its parent is not a CTE
  • Switch all methods in CommandExtensions to build up an NpgsqlBatch with BatchBuilder
  • Create a new method in QuerySession to execute a reader from NpgsqlBatch
  • Reapply logging to QuerySession batch handling methods
  • Add Polly error handling to connection lifetime classes
  • CompiledQueryPlan will have to be batch aware. Ugh.
  • Adjust the code generation of each ICompiledQueryAwareFilter to use the parameters variable name
  • Change compiled query types to use BatchedQuery
  • Might have to change JasperFx.CodeGeneration to allow for many more constants

@jeremydmiller
Copy link
Member Author

jeremydmiller commented Jan 19, 2024

Punchlist

  • Execute benchmarks against 6.4 and the latest code. Jeremy has a local project that's already started this work
  • Utilize Polly in all the IConnectionLifetime implementations. Be careful to use static lambdas in all cases
  • Fine tune the Polly predicate. Utilize exponential backoff in the default
  • Documentation in migration guide for connection lifecycle changes
  • Documentation for customizing Polly resiliency
  • New configuration for StoreOptions.CommandTimeout as an alternative to the connection string approach

@jeremydmiller
Copy link
Member Author

See this draft PR: #2918

Hasn't been rebased on master yet. Punchlist up above yet to go.

@jeremydmiller
Copy link
Member Author

There are separate issues now for the docs and last couple tasks. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants