Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch split queries #10878

Open
roji opened this issue Feb 5, 2018 · 8 comments
Open

Batch split queries #10878

roji opened this issue Feb 5, 2018 · 8 comments

Comments

@roji
Copy link
Member

roji commented Feb 5, 2018

In some scenarios, EF Core internally generates more than one read in order to execute a single query. This is notably the case when executing (some?) joins when using a non-MARS provider. Performance could be (greatly!) improved by batching those multiple reads into a single batch internally. Unless I'm mistaken, in the case you generate multiple queries, the results for the earlier queries have to be buffered anyway, so there's no additional penalty in the buffering implied in the batching.

Note: this issue isn't about exposing a batching read API to the user (that's what #10879 is about).

NOTE: batching becomes impossible if we decide to go in the direction of #12776 (fetching dependents by foreign keys instead of reevaluating the principal query).

@ajcvickers
Copy link
Member

Note: see also #10465, which is very similar but comes from a slightly different angle.

@divega
Copy link
Contributor

divega commented Feb 8, 2018

See also #5952

@roji
Copy link
Member Author

roji commented Sep 25, 2019

Unless I'm mistaken, this is no longer relevant as we generate one SQL query per LINQ query since 3.0. Closing.

@roji
Copy link
Member Author

roji commented Apr 21, 2020

Reopening and assigning to @smitpatel at his request, could be done as part of the workaround for #18022.

@roji
Copy link
Member Author

roji commented May 1, 2020

I've checked what command batching does to isolation/consistency in SQL Server and PostgreSQL, and the results are negative: merely batching statements into the same DbCommand doesn't have any impact on isolation. In other words, different database state can be observed by different statements in the same DbCommand, if not within an explicit transaction.

It's still possible to have consistency by wrapping the batch in a serializable transaction (or snapshot in SQL Server) - see test code. We probably shouldn't do this implicitly, since it implies error or deadlocking behavior that doesn't occur otherwise.

Test code
[NonParallelizable]
public class BatchTransactionIsolationTest
{
    const string PostgresConnectionString = ...;
    const string SqlServerConnectionString = ...;

    [Test]
    public async Task SqlServer([Values(true, false)] bool wrapInSerializableTx)
    {
        using var conn1 = new SqlConnection(SqlServerConnectionString);
        using var conn2 = new SqlConnection(SqlServerConnectionString);
        conn1.Open();
        conn2.Open();

        var createSql = @"
IF OBJECT_ID('dbo.data', 'U') IS NOT NULL 
DROP TABLE data; 
CREATE TABLE data (id INT);
INSERT INTO data (id) VALUES (1)";
        
        using (var createCmd = new SqlCommand(createSql, conn1))
            createCmd.ExecuteNonQuery();

        var tx = wrapInSerializableTx
            ? conn1.BeginTransaction(IsolationLevel.Snapshot)
            : null;

        using var batchCommand = new SqlCommand(@"
SELECT * FROM data;
WAITFOR DELAY '00:00:05';
SELECT * FROM data", conn1, tx);
        
        using var modifyCommand = new SqlCommand(@"
UPDATE data SET id=2 WHERE id=1; 
INSERT INTO data (id) VALUES (10)", conn2);

        var t = batchCommand.ExecuteReaderAsync();
        Thread.Sleep(1000);
        modifyCommand.ExecuteNonQuery();

        var reader = await t;
        Console.WriteLine("Before wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));

        Assert.True(reader.NextResult());
        
        Console.WriteLine("After wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));
    }
    
    [Test]
    public async Task Postgres([Values(true, false)]bool wrapInSerializableTx)
    {
        using var conn1 = new NpgsqlConnection(PostgresConnectionString);
        using var conn2 = new NpgsqlConnection(PostgresConnectionString);
        conn1.Open();
        conn2.Open();

        var createSql = @"
DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT);
INSERT INTO data (id) VALUES (1)";
        
        using (var createCmd = new NpgsqlCommand(createSql, conn1))
            createCmd.ExecuteNonQuery();

        if (wrapInSerializableTx)
            conn1.BeginTransaction(IsolationLevel.Serializable);
        
        using var batchCommand = new NpgsqlCommand(@"
SELECT * FROM data;
SELECT pg_sleep(5);
SELECT * FROM data", conn1);
        
        using var modifyCommand = new NpgsqlCommand(@"
UPDATE data SET id=2 WHERE id=1; 
INSERT INTO data (id) VALUES (10)", conn2);

        var t = batchCommand.ExecuteReaderAsync();
        Thread.Sleep(1000);
        modifyCommand.ExecuteNonQuery();

        var reader = await t;
        Console.WriteLine("Before wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));

        Assert.True(reader.NextResult());
        Assert.True(reader.NextResult());  // pg_sleep returns a void resultset
        
        Console.WriteLine("After wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));
    }
}

@ajcvickers
Copy link
Member

@roji Thanks for checking this. It's unfortunate, but I think it re-enforces that we need to retain both collection-include ehaviors for now.

@roji
Copy link
Member Author

roji commented May 2, 2020

I completely agree. I also think it points towards retaining single-query as the default behavior, but I think we all agree we need some way to opt into split query.

@smitpatel smitpatel removed their assignment Aug 27, 2020
@roji roji changed the title Batch multiple internally-generated reads Batch split queries Oct 20, 2021
@roji
Copy link
Member Author

roji commented Mar 4, 2024

Note: batching becomes impossible if we decide to go in the direction of #12776 (fetching dependents by foreign keys instead of reevaluating the principal query).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants