Description
This proposal has been approved in principle, subject to confirmation by several database providers that the API shape is good.
Issues tracking provider implementations:
- SqlClient: SqlClient: Implement optimized version of the new ADO.NET batching API SqlClient#19
- Npgsql: Implement new ADO.NET batching API npgsql/npgsql#2317
- MySqlConnector: Implement new ADO.NET batching API mysql-net/MySqlConnector#650
This issue is based on previous discussions in #15375 and #17445.
Background
Batching multiple SQL statements can be critical for database performance, as they can be executed in a single roundtrip rather than waiting for each statement to complete before sending the next one. System.Data.Common doesn't currently provide a first-class API for this, but some providers (e.g. SqlClient) allow batching statements by concatenating them together into DbCommand.CommandText, separated by semicolons:
var cmd = connection.CreateCommand();
cmd.CommandText = "INSERT INTO table (1); INSERT INTO table (2)";
The problem with this approach, is that most databases require separate protocol messages for each statement (e.g. PostgreSQL, MySQL), forcing the database's ADO.NET provider to parse the SQL client-side and to split on semicolons. This is both unreliable (parsing SQL is difficult) and bad for performance - ideally an ADO.NET provider should simply forward user-provided SQL to the database, without parsing or rewriting it (see #25022 for a related issue forcing providers to parse SQL).
As a specific case, SQL Server does natively support the multi-statement commands, but even there this has drawbacks for performance, detailed by @GSPP in dotnet/SqlClient#19. Also, as @bgribaudo pointed out, the current concatenation-based approach doesn't allow invoking multiple stored procedures in batching fashion. SqlClient does include SqlCommandSet, which performs efficient, non-concatenation-based batching, but that class is internal and currently usable only via DbDataSet, and not for general usage (NHibernate apparently accesses this via reflection). This proposal would allow exposing SqlCommandSet via a public API.
Goals
- Provide a structured way to execute multiple SQL statements in a single roundtrip, without any need for client-side parsing of SQL.
- Keep the API consistent with other ADO.NET APIs, and specifically close to DbCommand (both are "executable"), to reduce the conceptual complexity for adoption.
- Allow mixing in different types of statements in the same batch (insert, update, select). The current concatenation approach supports this, and so does our reader API (multiple resultsets).
- Provide non-aggregated access to the number of rows affected, for each individual command in the batch.
Proposed API
public abstract class DbBatch : IDisposable, IAsyncDisposable
{
public DbBatchCommandCollection BatchCommands => DbBatchCommands;
protected abstract DbBatchCommandCollection DbBatchCommands { get; }
public DbBatchCommand CreateBatchCommand() => CreateDbBatchCommand();
protected abstract DbBatchCommand CreateDbBatchCommand();
#region Execution (mirrors DbCommand)
// Delegates to ExecuteDbDataReader
public DbDataReader ExecuteReader(CommandBehavior behavior = CommandBehavior.Default);
protected abstract DbDataReader ExecuteDbDataReader(CommandBehavior behavior);
// Delegate to ExecuteDbDataReaderAsync
public Task<DbDataReader> ExecuteReaderAsync(CancellationToken cancellationToken = default);
public Task<DbDataReader> ExecuteReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken = default);
protected abstract Task<DbDataReader> ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken);
public abstract int ExecuteNonQuery();
public abstract Task<int> ExecuteNonQueryAsync(CancellationToken cancellationToken = default);
public abstract object ExecuteScalar();
public abstract Task<object> ExecuteScalarAsync(CancellationToken cancellationToken = default);
#endregion
#region Execution properties (mirrors DbCommand)
public abstract int Timeout { get; set; }
// Delegates to DbConnection
public DbConnection Connection { get; set; }
protected abstract DbConnection DbConnection { get; set; }
// Delegates to DbTransaction
public DbTransaction Transaction { get; set; }
protected abstract DbTransaction DbTransaction { get; set; }
#endregion
#region Other methods mirroring DbCommand
public abstract void Prepare();
public abstract Task PrepareAsync(CancellationToken cancellationToken = default);
public abstract void Cancel();
#endregion
#region Standard dispose pattern
public void Dispose() { ... }
protected virtual void Dispose(bool disposing) {}
#endregion
}
public class DbBatchCommandCollection : Collection<DbBatchCommand>
{
}
public abstract class DbBatchCommand
{
public abstract string CommandText { get; set; }
public abstract CommandType CommandType { get; set; }
public abstract int RecordsAffected { get; }
public DbParameterCollection Parameters => DbParameterCollection;
protected abstract DbParameterCollection DbParameterCollection { get; }
}
public class DbConnection
{
public DbBatch CreateBatch() => CreateDbBatch();
protected virtual DbBatch CreateDbBatch() => throw new NotSupportedException();
public DbBatchCommand CreateBatchCommand() => CreateDbBatchCommand();
protected virtual DbBatchCommand CreateDbBatchCommand() => throw new NotSupportedException();
// Covers both CreateBatch and CreateBatchCommand
public virtual bool CanCreateBatch => false;
}
public class DbProviderFactory
{
public virtual DbBatch CreateBatch() => throw new NotSupportedException();
public virtual DbBatchCommand CreateBatchCommand() => throw new NotSupportedException();
// Covers both CreateBatch and CreateBatchCommand
public virtual bool CanCreateBatch => false;
}
public class DbException
{
public DbBatchCommand BatchCommand => DbBatchCommand;
protected virtual DbBatchCommand DbBatchCommand => null;
}
General usage and examples
Usage is fairly trivial and aligned with the existing DbCommand APIs. Users first create a new DbBatch, either by calling DbCommandFactory.CreateBatch()
, or by instantiating one directly. Commands are added into the batch, execution properties (e.g. Timeout
) are set on it, and finally on of the Execute*()
methods are called on it. Connection
, Transaction
and Timeout
are specified on the DbBatch, like they are set on DbCommand for un-batched operations.
Here is a code sample using DbProviderFactory for database portability:
using var batch = dbProviderFactory.CreateBatch();
var cmd1 = dbProviderFactory.CreateBatchCommand();
cmd1.CommandText = "UPDATE table SET f1=@p1 WHERE f2=@p2";
var p1 = dbProviderFactory.CreateParameter();
var p2 = dbProviderFactory.CreateParameter();
p1.Value = 8;
p2.Value = 9;
cmd1.Parameters.Add(p1);
cmd1.Parameters.Add(p2);
batch.Add(cmd1);
var cmd2 = dbProviderFactory.CreateBatchCommand();
cmd2.CommandText = "SELECT * FROM table WHERE f2=@p3";
var p3 = dbProviderFactory.CreateParameter();
p3.Value = 8;
cmd2.Parameters.Add(p3);
batch.Add(cmd2);
batch.Connection = conn;
batch.Transaction = transaction;
using var reader = batch.ExecuteReader();
// read contains one resultset, from SELECT
The verboseness of this API corresponds to how ADO.NET currently looks. General usability improvements are planned for later.
Here is a suggested code sample working against a specific provider and using initializers for better readability and terseness:
using var batch = new XyzBatch
{
Connection = conn,
Transaction = transaction,
BatchCommands =
{
new XyzBatchCommand("UPDATE table SET f1=@p1 WHERE f2=@p2")
{
Parameters =
{
new XyzParameter("p1", 8),
new XyzParameter("p2", 9),
}
},
new XyzBatchCommand("SELECT * FROM table WHERE f2=@p1")
{
Parameters =
{
new XyzParameter("p1", 8),
}
}
}
};
using var reader = batch.ExecuteReader();
// read contains one resultset, from SELECT
Design and naming motivations
The original proposal had the batch holding a list of DbCommand instead of the new DbBatchCommand type. The change, originally proposed by @bgribaudo, was motivated by the following reasons:
- DbCommand would have had some dead properties when executed as part of a batch (Connection, Transaction, CommandTimeout).
- DbCommand is disposable, raising various questions around its lifespan (does the command get disposed with the batch or not, is it OK for DbException to reference a disposable object...).
The name DbBatchCommand will hopefully convey the similarity between that type and DbCommand (both hold SQL, parameters and a CommandType), while at the same time keeping them distinct (DbBatchCommand isn't executable outside of a DbBatch). The name DbStatement was also considered, although it seems this would introduce more confusion:
- It's not directly identifiable as a batch component ("why can't I execute it?")
- On some providers (e.g. SqlClient) a single DbBatchCommand can still contain concatenation batching, so a DbStatement would actually contain multiple statements.
Affected rows
DbCommand has APIs for finding out how many rows were affected by the command:
- The return value of
ExecuteNonQuery[Async]()
returns an int - The
DbDataReader.RecordsAffected
property
However, both of these provide an aggregate number; if the command contained more than one statement, it's currently impossible to find out how many rows were affected by each statement separately.
Providing non-aggregate affected rows could be done via an overload of ExecuteNonQuery[Async]()
which returns an int[]
, or which accepts a user-provided int[]
and populates it. The approaches have their complexities (perf, array management...), and as we're introducing a new DbBatchCommand dedicated to batching, we propose to simply add a RecordsAffected
property on it instead. The provider would populate this property for each DbBatchCommand to indicate how many rows were affected by that command.
As with DbDataReader.RecordsAffected, this property would contain -1 for SELECT statements, and 0 if no rows were affected or the statement failed.
Command behavior
DbCommand.ExecuteReader()
accepts a CommandBehavior
enum which allows execution behavior to be tweaked. Even if it seems to be a rare case, users may need to specify different behaviors for different batch commands. As a result, DbBatch.ExecuteReader()
does not accept a behavior parameter, and DbBatchCommand has a CommandBehavior property instead.
This requires the user to set any non-default behavior on each and every batch command. A batch-wide default could be accepted by DbBatch.ExecuteReader()
, but that would require a way to distinguish between the batch default and the enum's default value (CommandBehavior.Default
), e.g. by making it nullable on the DbCommandBatch. The introduced complexity doesn't seem to be worth it.
Note that not all providers will support all command behaviors on batched commands - it is expected that in some cases the entire batch will have to share the same behavior. Also, CommandBehavior.CloseConnection makes no sense on batched commands except the last, and the provider should probably throw on this.
DbException.Command
Since we now execute batches, we should provide a way of letting users know which command in the batch triggered an exception. We propose to do this by introducing a new virtual property on DbException, called BatchCommand
, pointing to the relevant DbBatchCommand instance. In non-batched command execution this property would be left unassigned.
Notes on batch size management
In some cases, there may be some value in having the provider's batching implementation implicitly manage batch sizes. Examples:
- There may be some database hard limit (on the number of statements, on the number of parameters). The implementation could break down the user-provided batch into several sub-batches, to save the user the trouble of doing that. This would mean wrapping multiple internal readers with a single reader, which isn't trivial (although we could provide an implementation).
- Note: it seems that the SQL Server 2100-parameter limit will not apply here, as it applies to each individual statement rather than to the batch. So we're not sure we have a good real-world example.
- It may be inefficient to batch statements below a certain number (because of overheads associated with batches). In this case the implementation would simply not batch.
- For example, with current concatenation-based batching on SQL Server, batching 2 statements performs worse than not batching.
The pro here is to free the user from knowing low-level, database-specific details, transparently increasing perf and reducing friction for everyone (we're assuming the user has no special knowledge which would assist in the decision-making). The con here is more complexity and a general uncertainty that this is needed - any examples from the community would be helpful.
Regardless, this can be viewed as an implementation detail of a provider; although we can provide guidelines, at the end of the day each provider will implement the behavior they desire.
Backwards-compatibility
As a new API, no breakage is being introduced, but it has been proposed to provide a backwards compatibility shim for the new API that either performs concatenation-based batching, or simply executes without batching at all.
As of now, the decision is to not provide any sort of shim: trying to use the new batching API with providers that haven't implemented support for it will throw a NotSupportedException. The reasons for this are:
- Silently falling to non-batching execution would give the false expectation/illusion of batching without delivering on it.
- Concatenation-based batching doesn't work if the different batched commands have named parameters with the same name (since parameter lists would need to be unified and names must be unique). This is expected to be a common scenario as multiple identical statements are executed with different parameter values (but identical names).
- We want to avoid the complexity and effort of implementing the shim. For example, a non-batching shim would need to somehow combine the readers from the individual commands into a single reader interface for user consumption. We don't think this is worth the effort.
- Most applications work against a single database type, and so can either use the new API or not, based on knowledge of what the provider supports. Multi-database applications (and layers) are much more rare, and it's expected they can take the added burden of detecting whether batching is supported or not, and implementing two code paths.
Feature detection
As the API will throw for providers which don't implement it, a way must be provided for consumers to know whether the API is supported. The proposal is to add a CanCreateBatch
bool property to both DbProviderFactory and DbConnection, alongside the CreateCommandSet()
methods.
Additional Notes
- No guarantees are made with regards to batch transactionality (do the batch commands execute in an implicit transaction) or to error handling (if a command fails, are later commands automatically skipped). These details are provider-dependent. However, wherever possible, it is suggested that implementations provide both transactionality and skip-on-failure behavior.
- DbBatch is disposable, like DbCommand. This is to allow for cases where it maps to a native resource that needs to be freed, and may also help with pooling/recycling of instances (to be elaborated in a separate proposal).
- We hope to add a
DbBatch.Add(string commandText)
, which would create the CommandText internally and be a lighter API, but some design issues need to resolved around parameter management. We're deferring this for now as it's non-critical sugar. - An alternative proposal was made by @bgribaudo in API Proposal: DbCommandSet -- allows multiple DbCommands to be transmitted to server in single message #28794, in which a separate DbDataReader is returned for each DbBatchCommand, instead of one covering the entire batch. The goal was to help user identify command boundaries in resultsets when commands return more than one resultset (or even a variable number of resultsets). Due to the assumed rareness of the problem and the existence of other workarounds we think it's better to leave on DbDataReader for the entire batch.
Open questions
- DbCommand and DbBatch share a lot of surface APIs. In theory there could be an interface capturing those similarities, allowing users to abstract away what's being executed (but suffers from the same versioning flaws of interfaces, modulu default interface methods).
- Does DbBatch need to implement System.ComponentModel.Component, like all other ADO objects (for Visual Studio designer)? There seems to be little change that VS would be updated for this, but possibly for consistency.
Edit history
Date | Modification |
---|---|
2019-02-14 | DbCommandSet now implements IEnumerable<DbCommand> and includes the two GetEnumerator() methods, as per @bgrainger's suggestion |
2019-02-18 | Updated proposal with different concrete options for backwards compatibility, batch size management, and added a usage example. |
2019-02-22 | Updated with the current decision on backwards compatibility (no shim), added feature detection discussion, added non-aggregated rows affected to goals. |
2019-02-23 | Added missing DbConnection.CreateCommandSet() and renamed DbProviderFactory.CreateCommandSet() (it previously had a Db which doesn't belong). Added CanCreateCommandSet property to DbProviderFactory, DbConnection. |
2019-02-25 | Added public non-virtual DbConnection.CreateCommandSet() along protected virtual CreateDbCommandSet() |
2019-03-04 | Added DbConnection, DbTransaction to DbCommandSet. Corrected ExecuteReaderAsync overloads, corrected some comments. |
2019-03-06 | DbCommandSet now includes a Commands list instead of implementing IEnumerable<DbCommand> itself. |
2019-03-12 | Renamed DbCommandSet to DbBatch, and changed it to hold a new DbBatchCommand type instead of DbCommand. Added readable/terse code sample, added explanatory notes and rearranged sections slightly. |
2019-04-18 | Added CommandBehavior to DbBatchCommand and removed the commandBehavior parameter to DbBatch.ExecuteReader() ; added a section to explain. Added RecordsAffected to DbBatchCommand and updated the affected rows section. |
2019-04-20 | Final typo pass (thanks @bgrainger) and added sentence on CommandBehavior support. |
2019-05-14 | Note on values of RecordsAffected (for SELECT, for failed statements), add standard Dispose pattern methods to DbBatch, fixed issues raised by @bgrainger |
2019-05-23 | Apply standard ADO.NET pattern to DbBatch with virtual ExecuteDbDataReader and non-virtual ExecuteDataReader |
2019-06-17 | Removed DbBatch.CancelAsync() following removal of DbCommand.CancelAsync() in #28596, thanks @Wraith2. Replaced IList<DbBatchCommand> with DbBatchCommandCollection . |
2021-06-21 | Make DbException.DbBatchCommand protected |
2021-06-29 | Move CommandBehavior from DbBatchCommand to DbBatch.ExecuteReader |
2021-07-01 | Remove the setter for DbBatchCommand.RecordsAffected |
2021-07-01 | Added DbBatch.CreateBatchCommand and CreateDbBatchCommand |