New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dapper doesn't use prepared statements #474

Open
buybackoff opened this Issue Feb 29, 2016 · 12 comments

Comments

Projects
None yet
4 participants
@buybackoff

buybackoff commented Feb 29, 2016

In most RDBMS that could be replaced by stored procedures, but SQLite doesn't have them. If I need to insert many rows, but cannot do so in a single transaction, I cannot reuse prepared statements. Please see this issue for additional details. In this test using Dapper is now c.25% slower than the manual call.

One idea is to cache queries by sql string, make them prepared, and just populate parameters on every subsequent call. That could be done with some additional overload with an optional parameter prepared = false. Are you interested in supporting this?

Dapper is awesome for automatic mapping, it will be even more awesome and faster with support of prepared statements!

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Feb 29, 2016

Contributor

I could kinda get behind that, but the big problem is concurrency. It
would need some non-trivial lease mechanism (with fallback strategy) around
the command. All doable, but not quite as easy as suggested.
On 29 Feb 2016 5:07 a.m., "Victor Baybekov" notifications@github.com
wrote:

In most RDBMS that could be replaced by stored procedures, but SQLite
doesn't have them. If I need to insert many rows, but cannot do so in a
single transaction, I cannot reuse prepared statements. Please see this
issue aspnet/Microsoft.Data.Sqlite#235 for
additional details. In this test
https://github.com/Spreads/Spreads/blob/master/tests/Spreads.Extensions.Tests/Storage/SQLite/SqlitePerformanceTest.cs#L40
using Dapper is now c.25% slower than the manual call.

One idea is to cache queries by sql string, make them prepared, and just
populate parameters on every subsequent call. That could be done with some
additional overload with an optional parameter prepared = false. Are you
interested in supporting this?

Dapper is awesome for automatic mapping, it will be even more awesome and
faster with support of prepared statements!


Reply to this email directly or view it on GitHub
#474.

Contributor

mgravell commented Feb 29, 2016

I could kinda get behind that, but the big problem is concurrency. It
would need some non-trivial lease mechanism (with fallback strategy) around
the command. All doable, but not quite as easy as suggested.
On 29 Feb 2016 5:07 a.m., "Victor Baybekov" notifications@github.com
wrote:

In most RDBMS that could be replaced by stored procedures, but SQLite
doesn't have them. If I need to insert many rows, but cannot do so in a
single transaction, I cannot reuse prepared statements. Please see this
issue aspnet/Microsoft.Data.Sqlite#235 for
additional details. In this test
https://github.com/Spreads/Spreads/blob/master/tests/Spreads.Extensions.Tests/Storage/SQLite/SqlitePerformanceTest.cs#L40
using Dapper is now c.25% slower than the manual call.

One idea is to cache queries by sql string, make them prepared, and just
populate parameters on every subsequent call. That could be done with some
additional overload with an optional parameter prepared = false. Are you
interested in supporting this?

Dapper is awesome for automatic mapping, it will be even more awesome and
faster with support of prepared statements!


Reply to this email directly or view it on GitHub
#474.

@buybackoff

This comment has been minimized.

Show comment
Hide comment
@buybackoff

buybackoff Feb 29, 2016

I guess some simple contended lock or MRE will still be cheaper than recreating a statement every time? E.g. ConcurrentDictionarysql:string,PreparedCommand:IDbCommand with MRE inside... I just cannot sleep well when there are 20% of free performance not gained yet :)

buybackoff commented Feb 29, 2016

I guess some simple contended lock or MRE will still be cheaper than recreating a statement every time? E.g. ConcurrentDictionarysql:string,PreparedCommand:IDbCommand with MRE inside... I just cannot sleep well when there are 20% of free performance not gained yet :)

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Feb 29, 2016

Contributor

It isn't quite as simple as an MRE, though; we don't usually want to queue
people. Lease with restore (with fallback of create new) is a possibility,
but it gets into fun questions like:

  • how many (max) do we pool and reuse? at most one? at most n?
  • in the case of fallbacks, do we prepare them or not?
  • what is the impact on GC of long-held prepared commands? is this a
    genuine concern vs the long-held dynamic methods?
  • for what providers is it advantageous/disadvantageous/neutral to prepare
    them? - and how do we tell (especially if wrapped in a decorator like
    mini-profiler)?
  • opt in? opt out?
  • are there scenarios where this would actively break things?
  • how does this play with string lengths? would we need the max lengths to
    be declared here? or ...?
  • how does this play with custom type providers?
  • what is the measured performance impact in different scenarios?
    (different RDBMS, different platform - i.e. .net vs dnx)
  • how could this hurt people? i.e. when it "works", people could get
    suboptimal query plans (from the atypical first hit problem) that they
    aren't currently seeing for plan-on-demand; note this won't impact all
    RDBMS - SqlServer works mostly the same with or without preparation

Lots to consider, is my point.

I fully take on board that we should look into this. Simply; there are some
changes I can hack in with 20 minutes work - this is not one of those. This
needs very careful consideration, implementation and testing.

On 29 February 2016 at 08:04, Victor Baybekov notifications@github.com
wrote:

I guess some simple contended lock or MRE will still be cheaper than
recreating a statement every time? E.g.
ConcurrentDictionarysql:string,PreparedCommand:IDbCommand with MRE
inside... I just cannot sleep well when there are 20% of free performance
not gained yet :)


Reply to this email directly or view it on GitHub
#474 (comment)
.

Regards,

Marc

Contributor

mgravell commented Feb 29, 2016

It isn't quite as simple as an MRE, though; we don't usually want to queue
people. Lease with restore (with fallback of create new) is a possibility,
but it gets into fun questions like:

  • how many (max) do we pool and reuse? at most one? at most n?
  • in the case of fallbacks, do we prepare them or not?
  • what is the impact on GC of long-held prepared commands? is this a
    genuine concern vs the long-held dynamic methods?
  • for what providers is it advantageous/disadvantageous/neutral to prepare
    them? - and how do we tell (especially if wrapped in a decorator like
    mini-profiler)?
  • opt in? opt out?
  • are there scenarios where this would actively break things?
  • how does this play with string lengths? would we need the max lengths to
    be declared here? or ...?
  • how does this play with custom type providers?
  • what is the measured performance impact in different scenarios?
    (different RDBMS, different platform - i.e. .net vs dnx)
  • how could this hurt people? i.e. when it "works", people could get
    suboptimal query plans (from the atypical first hit problem) that they
    aren't currently seeing for plan-on-demand; note this won't impact all
    RDBMS - SqlServer works mostly the same with or without preparation

Lots to consider, is my point.

I fully take on board that we should look into this. Simply; there are some
changes I can hack in with 20 minutes work - this is not one of those. This
needs very careful consideration, implementation and testing.

On 29 February 2016 at 08:04, Victor Baybekov notifications@github.com
wrote:

I guess some simple contended lock or MRE will still be cheaper than
recreating a statement every time? E.g.
ConcurrentDictionarysql:string,PreparedCommand:IDbCommand with MRE
inside... I just cannot sleep well when there are 20% of free performance
not gained yet :)


Reply to this email directly or view it on GitHub
#474 (comment)
.

Regards,

Marc

@buybackoff

This comment has been minimized.

Show comment
Hide comment
@buybackoff

buybackoff Feb 29, 2016

I was definitely thinking about a very explicit opt-in, so that any
existing features are not touched at all. I did not want to propose some
fundamental change of internals which would use prepared statements behind
the scenes. Just a way to use all the magic of mapping that Dapper provides
only in the places when I explicitly know that a prepared statement should
be reused.

On Mon, Feb 29, 2016 at 12:46 PM, Marc Gravell notifications@github.com
wrote:

It isn't quite as simple as an MRE, though; we don't usually want to queue
people. Lease with restore (with fallback of create new) is a possibility,
but it gets into fun questions like:

  • how many (max) do we pool and reuse? at most one? at most n?
  • in the case of fallbacks, do we prepare them or not?
  • what is the impact on GC of long-held prepared commands? is this a
    genuine concern vs the long-held dynamic methods?
  • for what providers is it advantageous/disadvantageous/neutral to prepare
    them? - and how do we tell (especially if wrapped in a decorator like
    mini-profiler)?
  • opt in? opt out?
  • are there scenarios where this would actively break things?
  • how does this play with string lengths? would we need the max lengths to
    be declared here? or ...?
  • how does this play with custom type providers?
  • what is the measured performance impact in different scenarios?
    (different RDBMS, different platform - i.e. .net vs dnx)
  • how could this hurt people? i.e. when it "works", people could get
    suboptimal query plans (from the atypical first hit problem) that they
    aren't currently seeing for plan-on-demand; note this won't impact all
    RDBMS - SqlServer works mostly the same with or without preparation

Lots to consider, is my point.

I fully take on board that we should look into this. Simply; there are some
changes I can hack in with 20 minutes work - this is not one of those. This
needs very careful consideration, implementation and testing.

On 29 February 2016 at 08:04, Victor Baybekov notifications@github.com
wrote:

I guess some simple contended lock or MRE will still be cheaper than
recreating a statement every time? E.g.
ConcurrentDictionarysql:string,PreparedCommand:IDbCommand with MRE
inside... I just cannot sleep well when there are 20% of free performance
not gained yet :)


Reply to this email directly or view it on GitHub
<
#474 (comment)

.

Regards,

Marc


Reply to this email directly or view it on GitHub
#474 (comment)
.

buybackoff commented Feb 29, 2016

I was definitely thinking about a very explicit opt-in, so that any
existing features are not touched at all. I did not want to propose some
fundamental change of internals which would use prepared statements behind
the scenes. Just a way to use all the magic of mapping that Dapper provides
only in the places when I explicitly know that a prepared statement should
be reused.

On Mon, Feb 29, 2016 at 12:46 PM, Marc Gravell notifications@github.com
wrote:

It isn't quite as simple as an MRE, though; we don't usually want to queue
people. Lease with restore (with fallback of create new) is a possibility,
but it gets into fun questions like:

  • how many (max) do we pool and reuse? at most one? at most n?
  • in the case of fallbacks, do we prepare them or not?
  • what is the impact on GC of long-held prepared commands? is this a
    genuine concern vs the long-held dynamic methods?
  • for what providers is it advantageous/disadvantageous/neutral to prepare
    them? - and how do we tell (especially if wrapped in a decorator like
    mini-profiler)?
  • opt in? opt out?
  • are there scenarios where this would actively break things?
  • how does this play with string lengths? would we need the max lengths to
    be declared here? or ...?
  • how does this play with custom type providers?
  • what is the measured performance impact in different scenarios?
    (different RDBMS, different platform - i.e. .net vs dnx)
  • how could this hurt people? i.e. when it "works", people could get
    suboptimal query plans (from the atypical first hit problem) that they
    aren't currently seeing for plan-on-demand; note this won't impact all
    RDBMS - SqlServer works mostly the same with or without preparation

Lots to consider, is my point.

I fully take on board that we should look into this. Simply; there are some
changes I can hack in with 20 minutes work - this is not one of those. This
needs very careful consideration, implementation and testing.

On 29 February 2016 at 08:04, Victor Baybekov notifications@github.com
wrote:

I guess some simple contended lock or MRE will still be cheaper than
recreating a statement every time? E.g.
ConcurrentDictionarysql:string,PreparedCommand:IDbCommand with MRE
inside... I just cannot sleep well when there are 20% of free performance
not gained yet :)


Reply to this email directly or view it on GitHub
<
#474 (comment)

.

Regards,

Marc


Reply to this email directly or view it on GitHub
#474 (comment)
.

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Feb 29, 2016

Contributor

just to note some implementation details - any lease-based mechanism here would most-likely be implemented using interlocked mechanisms, since we don't want to queue

impact on existing codebase is non-trivial; would require quite some rework (rewrite?) to how the parameter preparation code currently works, Fortunately, IDataParameterCollection does at least have an indexer.

Contributor

mgravell commented Feb 29, 2016

just to note some implementation details - any lease-based mechanism here would most-likely be implemented using interlocked mechanisms, since we don't want to queue

impact on existing codebase is non-trivial; would require quite some rework (rewrite?) to how the parameter preparation code currently works, Fortunately, IDataParameterCollection does at least have an indexer.

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Feb 29, 2016

Contributor

I was definitely thinking about a very explicit opt-in

global? or per command? and if per-command: how specified?

(that's an open question; you don't need to have the answer, but any thoughts are welcome)

Contributor

mgravell commented Feb 29, 2016

I was definitely thinking about a very explicit opt-in

global? or per command? and if per-command: how specified?

(that's an open question; you don't need to have the answer, but any thoughts are welcome)

@buybackoff

This comment has been minimized.

Show comment
Hide comment
@buybackoff

buybackoff Feb 29, 2016

In the linked test, I would very much like just an additional parameter:

for (int i = 0; i < 100000; i++) {
                    connection.Execute("INSERT INTO Numbers VALUES (@Key, @Value);",
                        new[] { new { Key = (long)i, Value = (double)i } }, usePrepared = true);
                }

It should be documented that such a command lives for the lifetime of connection, because it makes sense to reuse only those statements that are called repeatedly. Of there could be some explicit logic for disposing it.

Another option is just to add the .Execute(...) extension method on IDbCommand. Then we could reuse Dapper mapping functionality to set the parameters like in this example, but the statement preparation/disposal logic will be on a user, Dapper just does mapping work. So in this example, Dapper could give this functionality:

command.CommandText =
            "INSERT INTO Numbers VALUES (@Key, @Value);";
command.Prepare();
command.Execute(new[] { new { Key = (long)i, Value = (double)i } });

The second option looks simpler and better to me.

buybackoff commented Feb 29, 2016

In the linked test, I would very much like just an additional parameter:

for (int i = 0; i < 100000; i++) {
                    connection.Execute("INSERT INTO Numbers VALUES (@Key, @Value);",
                        new[] { new { Key = (long)i, Value = (double)i } }, usePrepared = true);
                }

It should be documented that such a command lives for the lifetime of connection, because it makes sense to reuse only those statements that are called repeatedly. Of there could be some explicit logic for disposing it.

Another option is just to add the .Execute(...) extension method on IDbCommand. Then we could reuse Dapper mapping functionality to set the parameters like in this example, but the statement preparation/disposal logic will be on a user, Dapper just does mapping work. So in this example, Dapper could give this functionality:

command.CommandText =
            "INSERT INTO Numbers VALUES (@Key, @Value);";
command.Prepare();
command.Execute(new[] { new { Key = (long)i, Value = (double)i } });

The second option looks simpler and better to me.

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Feb 29, 2016

Contributor

adding yet another optional parameter is slightly problematic, due to backwards compatibility and overload requirements; adding an additional parameter to an existing method is not compatible, so we'd need to duplicate all the existing overloads. Frankly, what we perhaps should do at some point is a "major bump" (2.0 release) that kills a lot of overloads that have been added for binary compatibility reasons. I'd be unlikely to support adding a parameter without a major bump, but it could perhaps be done via the CommandDefinition APIs - adding an additional constructor overload for CommandDefinition is a lot less invasive than adding overloads for every method. Just thinking aloud.

Re "Dapper just does mapping work" - note that this can already be done via GetRowParser<T> on IDataReader. Only one ExecuteReader() method away from what you mention.

Contributor

mgravell commented Feb 29, 2016

adding yet another optional parameter is slightly problematic, due to backwards compatibility and overload requirements; adding an additional parameter to an existing method is not compatible, so we'd need to duplicate all the existing overloads. Frankly, what we perhaps should do at some point is a "major bump" (2.0 release) that kills a lot of overloads that have been added for binary compatibility reasons. I'd be unlikely to support adding a parameter without a major bump, but it could perhaps be done via the CommandDefinition APIs - adding an additional constructor overload for CommandDefinition is a lot less invasive than adding overloads for every method. Just thinking aloud.

Re "Dapper just does mapping work" - note that this can already be done via GetRowParser<T> on IDataReader. Only one ExecuteReader() method away from what you mention.

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Feb 29, 2016

Contributor

Actually, this could be specified in the existing CommandFlags enum with no
API change. A good 2.0 thing to do would be to nuke the compat overloads
and add in an overload that takes CommandFlags so that it can be specified
without using CommandDefinition - this would replace the "buffered"
parameter.

On 29 Feb 2016 10:13, "Victor Baybekov" notifications@github.com wrote:

In the linked test, I would very much like just an additional parameter:

for (int i = 0; i < 100000; i++) {
connection.Execute("INSERT INTO Numbers VALUES (@key, @value);",
new[] { new { Key = (long)i, Value = (double)i } }, usePrepared = true);
}

It should be documented that such a command lives for the lifetime of
connection, because it makes sense to reuse only those statements that are
called repeatedly. Of there could some explicit logic for disposing it.

Another option is just to add the .Execute(...) extension method on
IDbCommand. Then we could reuse Dapper mapping functionality to set the
parameters like in this example
https://msdn.microsoft.com/en-us/library/system.data.idbcommand.prepare%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396,
but the statement preparation/disposal logic will be on a user, Dapper just
does mapping work. So in this example, Dapper could give this functionality:

command.CommandText =
"INSERT INTO Numbers VALUES (@key, @value);";
command.Prepare();
command.Execute(new[] { new { Key = (long)i, Value = (double)i } });

The second option looks simpler and better to me.


Reply to this email directly or view it on GitHub
#474 (comment)
.

Contributor

mgravell commented Feb 29, 2016

Actually, this could be specified in the existing CommandFlags enum with no
API change. A good 2.0 thing to do would be to nuke the compat overloads
and add in an overload that takes CommandFlags so that it can be specified
without using CommandDefinition - this would replace the "buffered"
parameter.

On 29 Feb 2016 10:13, "Victor Baybekov" notifications@github.com wrote:

In the linked test, I would very much like just an additional parameter:

for (int i = 0; i < 100000; i++) {
connection.Execute("INSERT INTO Numbers VALUES (@key, @value);",
new[] { new { Key = (long)i, Value = (double)i } }, usePrepared = true);
}

It should be documented that such a command lives for the lifetime of
connection, because it makes sense to reuse only those statements that are
called repeatedly. Of there could some explicit logic for disposing it.

Another option is just to add the .Execute(...) extension method on
IDbCommand. Then we could reuse Dapper mapping functionality to set the
parameters like in this example
https://msdn.microsoft.com/en-us/library/system.data.idbcommand.prepare%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396,
but the statement preparation/disposal logic will be on a user, Dapper just
does mapping work. So in this example, Dapper could give this functionality:

command.CommandText =
"INSERT INTO Numbers VALUES (@key, @value);";
command.Prepare();
command.Execute(new[] { new { Key = (long)i, Value = (double)i } });

The second option looks simpler and better to me.


Reply to this email directly or view it on GitHub
#474 (comment)
.

@roji

This comment has been minimized.

Show comment
Hide comment
@roji

roji Dec 21, 2016

Note that PostgreSQL is one example where prepared statements can have a pretty dramatic effect on performance (don't have figures available but can benchmark if needed). Also, the upcoming version 3.2 of the Npgsql provider will have automatic preparation of statements inside the driver, based on an LRU cache. The advantage of implementing this in the driver unlocks the performance benefits for all layers above it which don't use prepare (Dapper, Entity Framework..). This would also be an opt-in feature.

roji commented Dec 21, 2016

Note that PostgreSQL is one example where prepared statements can have a pretty dramatic effect on performance (don't have figures available but can benchmark if needed). Also, the upcoming version 3.2 of the Npgsql provider will have automatic preparation of statements inside the driver, based on an LRU cache. The advantage of implementing this in the driver unlocks the performance benefits for all layers above it which don't use prepare (Dapper, Entity Framework..). This would also be an opt-in feature.

@NickCraver

This comment has been minimized.

Show comment
Hide comment
@NickCraver

NickCraver Jan 28, 2017

Member

Planning the CommandFlags adjustments for v2, link will appear below shortly.

Member

NickCraver commented Jan 28, 2017

Planning the CommandFlags adjustments for v2, link will appear below shortly.

@NickCraver NickCraver added the v2.0 label Jan 28, 2017

@NickCraver NickCraver referenced this issue Jan 28, 2017

Open

Dapper v2 planning and discussion #688

4 of 28 tasks complete

@NickCraver NickCraver modified the milestone: v2.0 Jan 30, 2017

@mgravell

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment