New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async methods: Added QueryMultipleAsync and ExecuteAsync. #105
Conversation
…ble, as we are not interacting with sync context at this point.
Is there something specific holding this pull request up? I'd love to be able to use this in my projects. |
The original project seems to be somehow inactive - there have been no commits for almost 5 months. |
Simply: time, availability, etc; it is very much on my agenda to clear the backlog, and some change requests from internally came in this week, which is very much motivation to dust it off. |
Let me know if there's any way I can help. |
While testing ExecuteAsync with an IEnumerable param I was getting exceptions "The command execution cannot proceed due to a pending asynchronous operation already in progress". I had to wait each command before issuing the next. As in : total += await cmd.ExecuteNonQueryAsync().ConfigureAwait(false); Instead of: cmdTasks.Add(cmd.ExecuteNonQueryAsync()); foreach (var cmdTask in cmdTasks) |
You probably need to enable multiple active queries in the driver
connectionString="Server=MyServer;Database=MyDb;UID=john;pwd=doe;MultipleActiveResultSets=True;" Karl On 2/17/2014 1:08 PM, Roberto Martinez-Brunet wrote:
|
Hello Karl, Nope, tried that (and re-checked now) with two different servers. The code I'm runnung is:
|
I can duplicate this, though I have been using this code for a while now On 2/17/2014 3:54 PM, Roberto Martinez-Brunet wrote:
Karl Waclawek |
…ne async query in progress.
Hi Roberto, I committed a fix to my fork. Seems to work for me. Karl On 2/17/2014 3:54 PM, Roberto Martinez-Brunet wrote:
Karl Waclawek |
Hello Karl, I ran my test codes through your new implementation of ExecuteAsync and it works for me too. I have some comments though. 1.The difference between what you are doing and what I am (was) doing is that in my case the commands are processed sequentially while in yours they are going in parallel. Then, in the first case a MARS enable connection is not necessary while in the second it is. Considering that a MARS enabled connection is costly, I think there might be cases when the sequential approach is faster/better. Nevertheless, the second approach feels "more" async.
Consiodering these comments I created an ExcuteAsync implementation that seems to work but need more testing: Best Regards, Roberto
|
Optimization through cloning DbCommand instead of complete re-initialization.
Thanks for the feedback, I modified my fix. Further comments below. Best Regards, Karl On 2/18/2014 9:38 AM, Roberto Martinez-Brunet wrote:
I would say that the implementation should not force sequential
It worries me too - my mistake. I think it only works because it has no In my modification I have added a task continuation where the command is
Maybe yes, but the existing approach seems just as simple.
True.
Karl Waclawek |
Good, lets hope that Marc gets motivated to review all this stuff :-) |
I would not expect the commands to run in parallel for one pretty big reason: I never want one of the commands to be killed as the deadlock victim of another. Recovering from such a failure (outside of a transaction, maybe) with this model would be very difficult, when normally a deadlock victim command could usually just be retried. I would expect it to behave identically to the normal multi-exec, except that it would notify me when it is finished asynchronously. |
Yes, agree, that makes sense. |
…execution. The parametere must be an IEnumerable - but not it is checked if each element is a proper parameter.
Although I don't think this is an issue as it is the very nature of a It is called ExecuteAsyncSerialized. It assumes that the parameter is Karl On 3/3/2014 11:44 PM, John Gietzen wrote:
Karl Waclawek |
Karl - would you consider reversing your decision above so that running them sequentially is the norm and that running them in parallel is done via ExecuteConcurrentlyAsync. I agree with your comment that statements are often run concurrently in SQL but AFAIK it's quite rare against the same connection. My particular example is inserting rows that form a self-referrencing hierarchy and therefore have to be inserted in the order I have dictated otherwise foreign-key errors will be introduced. cheers, Gary |
Gary, I don't have a religious prejudice one way or another, so I can switch things around if this is the general consensus. At the end this is not my call anyway, since Marc will determine what ends up in his repository. Best thing would be if Marc lets us know... Btw, In my experience, with the async approach you will likely have multiple active statements on your connection unless you pay careful attention to your code. This is especially true when you send of multiple unrelated queries and want to shorten the overall latency, as in that case you don't want to send the next query off only after you received the results from the previous one. Karl |
I confess I don't know the full implications of calling exec-async concurrently like this; do they get stacked and pipelined by the provider? or does it end up running MARS-style? The latter sounds undesirable; the former probably isn't a problem. |
I think they end up running MARS style, but MARS does not imply parallel execution, especially not for updates and inserts. For my use case I would like multiple statements to be sent over the wire right after each other, without waiting for the results before sending the next one. If there are many statements, the aggregate latency might become noticeable. The proposed ExecuteAsync implementation does not really start the commands concurrently (no Parallel.For), they are submitted/started on the same thread and connection in sequence. So I assume the provider will send each statement over the wire right away in the order it is submitted. |
Marc: MSDN says they run MARS-style. Karl: I would vote for in-series to be the norm for these reasons:
In addition, I would say that the "Dapper" way of doing this would be a boolean flag, rather than a method overload. (e.g. |
I would try to achieve only asynchronicity not parallelism. I compiled the dll in both cases and tested with a bunch of inserts. When there were ~500 inserts all seemed to work fine in both cases. With 12K inserts it also worked fine.. most of the time; but sometime it was way slower in parallel. I cannot conclude that the problem was in the parallelism but a) such a large qty of tasks and b) a bunch of concurrent inserts fighting for the same table, makes me nervous. Then I reverted to only use the async-sequential approach without problems. |
but... if this is just pipelining non-query ops, I can't see a problem with running them async by default - it would also reduce latency costs. I'm a big fan of pipelining. It kinda makes sense for execute-non-query. However: I would not advocate the same for anything that returned grid data (execute-query / execute-scalar), because that would impose MARS. That would probably need to be syncronous and consumed. |
As far as I know there is no parallel execution here, so you are fine.
|
@rmbrunet indeed, getting async pipelining right is really very tricky - I've got around 25kloc in SE.Redis that can testify to that! |
@mgravell: I think it is running queries concurrently:
The first of the 3 queries returns fine, but this results in 2 exceptions:
Also, it returns after ~10 seconds, rather than after ~30 seconds. |
If you google MARS you will see it never executes queries concurrently, it What you are observing is that there is a short time when a subsequent
|
Maybe I'm just not understanding the debate. My point is that the queries are running concurrently. This should not be the default behavior, since it will lead to deadlocks, out-of-order operations, and extra memory usage. Add on top of that the user will have to change their connection string... |
Oh, wait, you're right. It actually isn't running them concurrently, and so it really is just a connection string change. I guess it isn't so bad. |
In light of this discussion, do we still need "ExecuteAsyncSerialized()" ? However, it may help when you don't want to add "MultipleActiveResultSets = true" to your connection string. |
@kwaclaw: If we do keep it, I would vote to make it a boolean option, just like |
If we want to parallelize to reduce latency when param is IEnumerable On Sun, Mar 23, 2014 at 2:03 PM, John Gietzen notifications@github.comwrote:
|
No, basically. Pipelining good, concurrency bad. They are very different things. If you used Parallel you should expect it to explode in a shower of sparks: it is not intended to be thread-safe. Note that TDS is naturally pipelineable, so if Async works by pipelining, it is fine: you cannot do any better on the connection than that. |
I think this would make the order of execution unpredictable, which On 3/23/2014 2:41 PM, Roberto Martinez-Brunet wrote:
Karl Waclawek |
At this point I the question becomes, "is there ever going to be a time when a developer should prefer to avoid pipelining the requests?" I believe that the answer to that question is yes, so we should keep both implementations. Also, have we considered how this behaves on Oracle, MySql, etc? We shouldn't rely on the implementation details of a particular This all leads me to believe that the default should be to serialize the queries. (e.g. |
@john: You make good points, but turning off pipelining by default only solves the issue for callers of ExecuteAsync, but not for those who use asynchronous calls in any other way. They will always have to consider the issues around pipelining, MultipleActiveResultSets and the particular database/driver they are using. What about this: public static async Task<int> MultiExecuteAsync(this IDbConnection cnn, string sql, IEnumerable param, IDbTransaction transaction = null, int? commandTimeout = null, CommandType? commandType = null)
{
...
}
public static async Task<int> MultiExecuteAsyncSerialized(this IDbConnection cnn, string sql, IEnumerable param, IDbTransaction transaction = null, int? commandTimeout = null, CommandType? commandType = null)
{
...
}
public static Task<int> ExecuteAsync(this IDbConnection cnn, string sql, dynamic param = null, IDbTransaction transaction = null, int? commandTimeout = null, CommandType? commandType = null, bool serialized = false)
{
IEnumerable multiExec = (object)param as IEnumerable;
if (multiExec != null && !(multiExec is string))
{
if (serialized)
return MultiExecuteAsyncSerialized(cnn, sql, multiExec, transaction, commandTimeout, commandType);
else
return MultiExecuteAsync(cnn, sql, multiExec, transaction, commandTimeout, commandType);
}
// nice and simple
CacheInfo info = null;
if ((object)param != null)
{
var identity = new Identity(sql, commandType, cnn, null, (object)param == null ? null : ((object)param).GetType(), null);
info = GetCacheInfo(identity);
}
return ExecuteCommandAsync(cnn, transaction, sql, (object)param == null ? null : info.ParamReader, (object)param, commandTimeout, commandType);
} |
👍 |
…o separate methods: MultiExecuteAsync and MultiExecuteAsyncSerialized, the latter awaits each sql command before calling the next, serializing their execution at the client level, while the former allows pipelining the commands at the server, requiring the driver to have specific support (MARS). - Modified ExecuteAsync with additional "serialized" argument, to pick which of the above to call.
@mgravell Mark, dude come on merge it :) |
Please? |
This is good stuff. Once we get this commited, we can move on to porting everything in Contrib to support async as well. |
Applied manually, along with inter-related changed; thanks |
Marc, the difference in performance between total += await cmd.ExecuteNonQueryAsync(); and cmdTasks.Add(cmd.ExecuteNonQueryAsync()); seems to be significant in some cases. There was a proposal of include a flag in the call to leave to the caller the responsibility to know if the connection supports pipelining. If a) the performance gains are substantial (as my informal tests seems to indicate) b) the use cases for these functionality are not exceptional and c) the usage of Dapper is mostly SQL Server (where the functionality is supported) then the use of a flag is a good alternative. The other alternative, kinda of "official answer" to the issue, would be that the use case is exceptional and should be coded in straight ado.net if needed. |
The idea of a caller-controller flag is viable. Because of some issues relating to overload resolution, optional parameters, and ambiguous method calls, I had to introduce a new state representation for some of the async stuff; I could move any bool flags (buffered, pipelined, etc) to a [Flags] enum, and then cope with it internally. Interesting. Will investigate. |
I would appreciate adding the flag back. AFAIK, MARS functionality has been supported by Oracle and DB2 for along Karl On Thu, May 8, 2014 at 10:02 AM, Marc Gravell notifications@github.comwrote:
|
The problem I have, though, is that DbCommand explicitly says " Do not
|
To me that just means that you have to use a new command object for each Which is why my fork is creating new command objects, and that is also how On Thu, May 8, 2014 at 5:53 PM, Marc Gravell notifications@github.comwrote:
|
This is now available; I tweaked the implementation a bit because the scenario we're using here involves millions of rows, so the list/array is not an option. Instead, it uses a queue - so it can buffer a finite amount, and then when full it will be enqueue/dequeue/await in tandem - then dequeue/await anything still outstanding at the end. It works pretty well ;p |
Looks awesome! On Wed, May 14, 2014 at 10:42 AM, Marc Gravell notifications@github.comwrote:
|
It seems to me that the methods QueryMultipleAsync and ExecuteAsync need to be included when one wants to use Dapper asynchronously.