Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparatively slow execution of parameterless Oracle SQL SELECT query #1945

Closed
kriewall opened this issue Aug 11, 2023 · 25 comments · Fixed by #1946
Closed

Comparatively slow execution of parameterless Oracle SQL SELECT query #1945

kriewall opened this issue Aug 11, 2023 · 25 comments · Fixed by #1946

Comments

@kriewall
Copy link

As-is behavior:

First-time execution of Dapper's QueryAsync<T> extension method consistently takes over 1 minute to return when supplied with a parameterless Oracle SQL query that returns 21497 rows with 31 values each. The same query when executed in SQL Developer on the same machine with the same connection string takes 12s (In SQL Developer, spooling to file was used to avoid overly optimistic timing results resulting from pagination.)

Expected behavior:

Dapper's QueryAsync<T> should return in roughly the same amount of time as the native query; no longer than 2X as a ballpark.

Environment:

  • Processor: Intel Xeon W-2102 2.9GHz
  • Memory: 64GB
  • OS: Windows 10 Enterprise 22H2
  • .NET: v7.0
  • Oracle.ManagedDataAccess.Core: v3.21.90
  • Dapper: v2.0.143

Code:

Argument values to QueryAsync:

  • T: See sanitized POCO below.
  • sql: see sanitized query below.
  • param: null
  • transaction: null
  • commandTimeout: 600
  • commandType: Text

Out of a total 1:06 execution time, the following block of code in SqlMapper.Async takes 1:02:

while (await reader.ReadAsync(cancel).ConfigureAwait(false))
{
    object val = func(reader);
    buffer.Add(GetValue<T>(reader, effectiveType, val));
}

Stepping through this block via symbol file indicates no observable performance lag. No branch other than the first is ever hit in the GetValue<T> method:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static T GetValue<T>(DbDataReader reader, Type effectiveType, object val)
{
    if (val is T tVal)
    {
        return tVal;
    }
    else if (val == null && (!effectiveType.IsValueType || Nullable.GetUnderlyingType(effectiveType) != null))
    {
        return default;
    }
    else if (val is Array array && typeof(T).IsArray)
    {
        var elementType = typeof(T).GetElementType();
        var result = Array.CreateInstance(elementType, array.Length);
        for (int i = 0; i < array.Length; i++)
            result.SetValue(Convert.ChangeType(array.GetValue(i), elementType, CultureInfo.InvariantCulture), i);
        return (T)(object)result;
    }
    else
    {
        try
        {
            var convertToType = Nullable.GetUnderlyingType(effectiveType) ?? effectiveType;
            return (T)Convert.ChangeType(val, convertToType, CultureInfo.InvariantCulture);
        }
        catch (Exception ex)
        {
#pragma warning disable CS0618 // Type or member is obsolete
            ThrowDataException(ex, 0, reader, val);
#pragma warning restore CS0618 // Type or member is obsolete
            return default; // For the compiler - we've already thrown
        }
    }
}

Below is a mapping of the properties of the sanitized POCO class used as the generic type for QueryAsync/Query to the corresponding value types from the Oracle query:

    public class DataRow
    {
        public string Value1 { get; set; } // VARCHAR2(32) NULLABLE
        public DateTime Value2 { get; set; } // DATE
        public string Value3 { get; set; } // VARCHAR2(50) NULLABLE
        public string Value4 { get; set; } // VARCHAR2(150) 
        public string Value5 { get; set; } // VARCHAR2(32) NULLABLE
        public string Value6 { get; set; } // VARCHAR2(150)
        public string Value7 { get; set; } // VARCHAR2(200)
        public string Value8 { get; set; } // VARCHAR2(50) NULLABLE
        public string Value9 { get; set; } // VARCHAR2(500) NULLABLE
        public string Value10 { get; set; } // VARCHAR2(50)
        public string Value11 { get; set; } // VARCHAR2(50) NULLABLE
        public string Value12 { get; set; } // VARCHAR2(50) NULLABLE
        public long Value13 { get; set; } // NUMBER
        public long Value14 { get; set; } // NUMBER
        public string Value15 { get; set; } // VARCHAR2(601) [Calculated value from query]
        public double Value16 { get; set; } // NUMBER [Calculated value from query]
        public double Value17 { get; set; } // NUMBER [Calculated value from query]
        public string Value18 { get; set; } // VARCHAR2(50)
        public double? Value19 { get; set; } // NUMBER NULLABLE [Calculated value from query]
        public bool Value20 { get; set; } // VARCHAR2(5) NULLABLE
        public long? Value21 { get; set; } // NUMBER(38,0) NULLABLE [from query LEFT JOIN]
        public string Value22 { get; set; } // VARCHAR(200) NULLABLE
        public double? Value23 { get; set; } // NUMBER(38,0) NULLABLE
        public double? Value24 { get; set; } // NUMBER(38,0) NULLABLE
        public double? Value25 { get; set; } // NUMBER(38,0) NULLABLE
        public double? Value26 { get; set; } // NUMBER(38,0) NULLABLE
        public double? Value27 { get; set; } // FLOAT NULLABLE
        public double? Value28 { get; set; } // FLOAT NULLABLE
        public double? Value29 { get; set; } // FLOAT NULLABLE
        public double? Value30 { get; set; } // FLOAT NULLABLE
        public double? Value31 { get; set; } // NUMBER [Calculated value from query]
    }

Below is the sanitized SQL query

SELECT 
    wc.wc_row_id AS value13
    ,SUBSTR(wc.wc1, 2) AS value1
    ,wc.wc2 AS value2
    ,wc.wc3 AS value3
    ,wc.wc4 AS value12
    ,wc.wc5 AS value4
    ,wc.wc6 AS value6
    ,wc.wc7 AS value5
    ,wc.wc8 AS value7
    ,wc.wc9 AS value8
    ,wc.wc10 AS value9
    ,wc.wc11 AS value11
    ,ic.ic1 AS value14
    ,ic.ic2 || '\' || ic.ic3 as value15
    ,ic.ic4 AS value10
    ,TRUNC(iss.ic5,2) AS value16
    ,TRUNC(100*iss.ic6 / iss.ic7,2) AS value17
    ,wz.wz1 AS value18
    ,GREATEST(ABS(wz.wz2), ABS(wz.wz3), ABS(wz.wz4), ABS(wz.wz5)) as value19
    ,wz.wz6 as value20
    ,adc.adc1 As value21
    ,dc.dc1 as value22
    ,adc.adc2 as value23
    ,adc.adc3 as value24
    ,adc.adc4 as value25
    ,adc.adc5 as value26
    ,adc.adc6 as value27
    ,adc.adc7 as value28
    ,adc.adc8 as value29
    ,adc.adc9 as value30
    ,TRUNC(COALESCE(COUNT(ad.adc1) OVER (PARTITION BY adc.adc1) / iss.ic7, 0),6) AS value31
FROM
    wc
    JOIN ic ON ic.wc_row_id = wc.wc_row_id
    LEFT JOIN iss ON iss.wc_row_id = wc.wc_row_id AND iss.ic_row_id = ic.ic_row_id
    LEFT JOIN wz ON wz.ic_row_id = ic.ic_row_id AND wz.wc_row_id = wc.wc_row_id 
                AND wz.wz8 = 'KURT'
                AND wz.wz9 = 'Original'
    LEFT JOIN adc ON adc.ic_row_id = ic.ic_row_id
    LEFT JOIN dc ON dc.dc_id = adc.dc_id
    LEFT JOIN ad ON ad.wc_row_id = adc.wc_row_id AND ad.ic_row_id = adc.ic_row_id AND ad.adc_row_id = adc.adc_row_id
WHERE 1=1 
    AND wc.wc1 IN 'ThisValue'
    AND wc.TIME>to_date('08/09/2023 10:00:00','mm/dd/yyyy hh24:mi:ss') 
    AND wc.TIME<=to_date('08/09/2023 10:10:00','mm/dd/yyyy hh24:mi:ss')  

Other observations:

  • Same duration using generic-less QueryAsync (not surprising since it is the same code under the hood; however this suggests it doesn't have to do with my POCO).
  • Same duration using Query<T> (also leverages same while loop as above)
  • Response time improves as columns are eliminated, but no obvious indications that any given column(s) are problematic in particular. Delta is only obvious when a significant fraction of the columns are removed.
  • No spikes in CPU or memory consumption during execution of QueryAsync<T>. CPU ranges between 20-30% and memory was flat at 43%.
  • Subsequent execution is faster thanks to buffering; but first-time execution speed is critical since the query is parameterized during normal usage and repeat parameterization is rare.
  • The buffer parameter is not available for manipulation with QueryAsync<T>; however when I tried setting it to false with Query<T>, the performance is about the same.
  • The queries used in actual production would need to pull 10-20X as much data in an efficient manner. It seems that the time to get data back from QueryAsync scales exponentially with the data volume, although this is just a rough estimate.

Specific inquiries:

  • Is it reasonable to expect faster first-time response from QueryAsync or is a 6-7X degradation compared to the native SQL query expected?
  • If better performance is a reasonable expectation, what's your best guess as to the source of the problem: my query, the POCO, the Dapper code, some configuration?
  • What can I do to improve matters or get a better handle on benchmarking? (Would a stored procedure be likely to be more performant?)
  • Let me know what other information I can provide that would help make this actionable for you.

Really hoping someone can assist. Dapper is deeply embedded in our code but the performance especially for large queries is becoming critical.

@kriewall kriewall changed the title Comparatively slow execution of parameterless SQL SELECT query Comparatively slow execution of parameterless Oracle SQL SELECT query Aug 11, 2023
@mgravell
Copy link
Member

Firstly, this is going to be virtually impossible for me to do much with without runnable code with illustrative data that actually reproduces the scenario, as this looks massively contextual. However, if we generalize, there are 6 main causes of possible bottlenecks here:

  1. raw database performance
  2. raw bandwidth issues
  3. internal buffering / resize overheads (in the buffer.Add)
  4. overheads in the ADO.NET provider etc
  5. overheads inside Dapper
  6. framework overheads

Let's discuss each separately

raw database performance

It is great that you've looked at timings in SQL Developer - that helps us establish some baseline numbers, however: we need to be a little careful. I don't know about Oracle (I'm simply not an Oracle person), but if I compare to SQL Server: there are actually some connection-mode SET flags that are set differently by default in SSMS (SQL Server's equivalent to SQL Developer) than in ADO.NET, and those flags can give different performance. I suspect you're probably right enough here, but: you might want to double check there.

raw bandwidth issues

I don't know what your data is, but if it contains a lot of large CLOB/BLOB data, then yes fetching that can cause perceived performance problems; and tools like SSMS (and by extension, possibly SQL Developer) often don't fetch the entire payload by default - they select the first "some bytes", usually enough to fill a cell with a "bkdjhfghdfb..." with the "..." telling you "this is bigger". Fetching the full values obviously requires more network bandwidth and internal database IO. You might want to verify whether this is giving misleading timings in SQL Developer. Also, is the test machine the same here between the SQL Developer and .NET runs? (as obviously different machines may have different routes and network capacities)

The fact that not selecting certain columns impacts the performance: makes me very suspicious that this might be a factor.

I suspect that this is the most likely cause of the difference, honestly.

internal buffering / resize overheads

I don't think this should be an issue, as it should reach a big size pretty efficiently; however, I wonder if we can try some things here. You mention that Query<T> with buffered: false takes about the same, which makes me doubt that this is a factor. For async, this API does now exist; see QueryUnbufferedAsync<T>. But it sounds like we can most likely eliminate this. As a side note, in the entirely experimental DapperAOT bits, there is meant to be an option to provide a size hint on queries (i.e. "assume 10k rows") which would help if that was a factor, but it looks like that got omitted in the recent big rework - I'll add that back, just mentioning for curiosity.

overheads in the ADO.NET provider etc

I can't do anything about that; that would be an internal OracleCommand etc thing

overheads inside Dapper

There aren't a lot here; we've worked pretty hard to make things pretty efficient; one thing that might be interesting is to write the same code manually via raw ADO.NET and see if it behaves similarly, vs whether it is now magically faster. I suspect that if you tried this, you'd find that it behaves very similarly, but I can't do this without a runnable repro. The DapperAOT bits have some slightly more efficient data access API usage, which might be worth investigating but emphasis: this package is not ready for real usage. If I had a runnable sample, that's one thing I'd look at, purely as a quick way of finding things.

framework overheads

Again, outside of our control, but a possible cause of problems - in particular on the async path; so one thing I'd be keen to try is sync vs async; but I suspect there's no big problems here.


Again, without runnable code it is hard to say much, but my suspicion (and it can be no more than that, without a runnable scenario) is that this is simply CLOB/BLOB bandwidth caused by the necessity of fetching the entire payloads.

@mgravell
Copy link
Member

For completeness, one other possible factor: "sync context". The easiest way of ruling this out is to try sync vs async. If sync is just as slow: it isn't that!

@kriewall
Copy link
Author

Many thanks for the response @mgravell.

Acknowledged your point about the executable code. I'm hoping I can maybe run with your general guidance since getting a standalone package up and running is more of an undertaking than I can take on right now.

To your points:

  • raw database performance: Duly noted. I was not aware of the flag-setting behavior, I'll dig in and see if I can get more details.
  • raw bandwidth issues: This is actually one I feel close to ruling out. The comparison was indeed done on the same machine. Our schema has no BLOBs/CLOBS to speak of; the largest column that I access is a VARCHAR2(400) and the largest value returned is a union of two columns, equivalent to a VARCHAR2(600). Regarding SQL Developer, I suspected it was cheating at first so I wrapped the query in a spool command to force the output to csv when running the query as a script:
SELECT sysdate from dual;
set term off;
set feed off;
spool "c:\temp\viewquery.csv";
set sqlformat csv;

-- My query here --

spool off;
SELECT sysdate from dual;

I confirmed that the CSV file has fully populated data. Curiously the feed off settings didn't suppress the default 5000 lines of output, so I think the sysdate times that registered are actually inclusive of the print time, and hence conservative. Based on my observation of the execution it looked like the steps were getting executed sequentially, but I could be wrong. Let me know if you think this was a reasonable approach to characterizing the reference query time.

I should note that the dB connection I'm using is especially bad. I'm testing against it for precisely that reason, because it's the one that's going to annoy people the most. Under better network conditions Dapper would undoubtedly perform better, but I think this helps amplify any deltas between native vs Dapper execution.

  • internal buffering / resize overheads: I'll check out QueryUnbufferedAsync, worthwhile for completeness' sake.

  • overheads in ADO.NET or Dapper: Rolling my own implementation around the Oracle provider is probably my next step. If the latency is comparable, then I think we can safely attribute it to the Oracle implementation; if ADO.NET is better that might suggest something on the Dapper side.

  • framework overheads / sync context: No difference between sync and async methods from what I can see, so I think we can rule this one out.

Maybe I missed it on your DapperAOT announcement issue, but what's the rough timeline for a stable v1?

@kriewall
Copy link
Author

I did a doublecheck on the SQL Developer export wherein I opened the export CSV file in VSCode, wiped its contents, and watched the live refresh as the query executed. It does indeed complete with a full data set in about 12s. This is for a slightly altered timerange to ensure the query isn't cached.

@mgravell
Copy link
Member

Re v1: it builds on new compiler features (not language features) in net8; so: not before net8 which should be in November. It still targets all platforms - you don't need to use net8 - you just need an updated compiler.

@mgravell
Copy link
Member

I agree the main thing to try here is a raw ADO.NET version, to see if that is faster.

@kriewall
Copy link
Author

Okay, it does seem to be something with the Oracle ADO.NET provider. When I swap things out and just loop through DbDataReader.ReadAsync() without even any action or mapping on the row results, the performance is every bit as slow when wrapped by Dapper. Same if I use the sync methods; and changing the fetch size doesn't make any difference either.

I'm really mystified as to why there could be such a performance delta but I'll close this issue and won't trouble you further. Thanks for giving me some pointers to look into.

@kriewall
Copy link
Author

I spoke too soon: fetch size does matter. OracleCommand has a custom property FetchSize which, if increased (I thought at first this was defined in terms of rows but it is actually bytes) improves the response time dramatically. Only problem is that this is not defined on DbCommand, it is custom to OracleCommand. I realize this may fall into the 5% of categories that Dapper is not intended to handle, but is there any way for me to specify FetchSize (or set other custom OracleCommand parameters) when executing QueryAsync<T> against an OracleConnection?

@kriewall kriewall reopened this Aug 14, 2023
@mgravell
Copy link
Member

We set InitialLONGFetchSize to -1, which is guidance we've been given previously;

// .InitialLONGFetchSize = -1
- does this have the same effect? Is this completely / subtly different?

@mgravell
Copy link
Member

Reading docs, it sounds like FetchSize is slightly different; the tricky bit, though, is where to configure such. Would this be needed to be controlled fairly granularly, i.e. per query, or sis a global setting sufficient?

@mgravell
Copy link
Member

not in Dapper, but in DapperAot, I have an experiment that works:

    [CommandProperty<OracleCommand>(nameof(OracleCommand.FetchSize), 1024)]
    static void SomeCode(DbConnection connection, string bar)
    {
        _ = connection.Query<Customer>("def");
        _ = connection.Query<Customer>("def", new { Foo = 12, bar });
    }
    public class Customer
    {
        public int X { get; set; }
        public string Y;
        public double? Z { get; set; }
    }

This works for any command-level instance values, and ends up emitting tweaks as you would expect:

        public override global::System.Data.Common.DbCommand GetCommand(global::System.Data.Common.DbConnection connection,
            string sql, global::System.Data.CommandType commandType, object? args)
        {
            var cmd = base.GetCommand(connection, sql, commandType, args);
            if (cmd is global::Oracle.ManagedDataAccess.Client.OracleCommand cmd0)
            {
                cmd0.FetchSize = 1024;

            }
            return cmd;
        }

Would this approach work? I can't guarantee that I'm going to backport this to Dapper vanilla (see #1909), but: it could possibly be done.

@kriewall
Copy link
Author

Yeah, FetchSize seems to be general-purpose, whereas InitialLONGFetchSize appears to apply to only certain column types. I did a quick check on the effects of increasing InitialLONGFetchSize and it had no effect, but the column data returned by my query is either VARCHAR2 or NUMBER.

For starters, I think a global setting for FetchSize would suffice to break the logjam, although I think ideally it should be configurable at a query level. I have a range of queries, some which are pretty light and others which return sizeable datasets, so I could see benefit to being able to tune on a per-query basis.

Regarding the DapperAOT example, I like the prospect of being able to decorate an object with the fetch size to be applied to the associated query; that said, it would be a significant refactor to associate the queries statically with the POCO. FWIW, I currently encapsulate my queries inside an object that derives from an interface like this:

    /// <summary>
    /// Represents a parameterized SQL query
    /// </summary>
    public interface ISqlContent
    {
        /// <summary>
        /// Gets the parameterized query
        /// </summary>
        /// <returns>Parameterized query</returns>
        string GetContent();

        /// <summary>
        /// Gets the parameter substitution dictionary
        /// </summary>
        /// <returns></returns>
        IDictionary<string, object> GetParameters();
    }

The instance takes in all the arguments to be injected into the query, and I use reflection to create the parameter dictionary; the content and parameters are ultimately used with QueryAsync. In sum, I think I'm closer to the vanilla usage of Dapper (var dog = connection.Query<Dog>("select Age = @Age, Id = @Id", new { Age = (int?)null, Id = guid });) than I am to that above - but let me know if I'm missing something obvious.

@mgravell
Copy link
Member

mgravell commented Aug 14, 2023

Interesting. Configuration (and how/where to do it) is always fun. For Dapper vanilla, anything beyond "single global configuration" is probably going to be too hard to squeeze in. Curiously though, we already have a few options in DapperAOT even for your more nuanced scenario - for example, it is possible to write a custom "command factory" which can inspect the args and command, and do something based on, say, a known dictionary key or an interface your code knows about - so it can make any decisions it wants.

Does global help at all, or is that not enough control?

@kriewall
Copy link
Author

I'd be delighted with global control. :)

mgravell added a commit to DapperLib/DapperAOT that referenced this issue Aug 15, 2023
@mgravell
Copy link
Member

mgravell commented Aug 15, 2023

Added DapperAOT example ^^^; in particular see https://github.com/DapperLib/DapperAOT/blob/main/test/Dapper.AOT.Test/Interceptors/GlobalFetchSize.input.cs - the reality is that DapperAOT has much better plugin points for this kind of ad-hoc request; I'll see if we can squeeze something into Dapper core, but: more complex to do; that is on top of https://github.com/DapperLib/DapperAOT/blob/main/test/Dapper.AOT.Test/Interceptors/CommandProperties.input.cs, which provides another mechanism (again in AOT)

@kriewall
Copy link
Author

Understood; sorry to ask you to revisit the original Dapper when you're trying to move onto something newer and better. If you come to the conclusion that it's more effort than it's worth, please let me know ASAP as I'll need to figure an alternative to modulate the fetch size. Since Dapper AOT is on the horizon I think I can clone your code locally and put some hacks in it to get me by until November.

@mgravell
Copy link
Member

I'm tempted to try and hack it in, regardless; I think I see a way to do it; my main reservation is actually the thought "this value should be contextual", but the way I figure: that applies to the existing default too, and that certainly isn't contextual - so maybe an override default is fine;

let me have a stab at it later - I know where to look and what to watch out for

@mgravell
Copy link
Member

this very much shows some of the problems the AOT push is trying to resolve, though; right now, hooking in any change like this: is hella awkward and requires lots of knowledge of ref-emit; if we can get the AOT work "done", then we get much better options, where ref-emit becomes a fond but distant memory, and where tweaks are simple and provably work

mgravell added a commit that referenced this issue Aug 16, 2023
@mgravell
Copy link
Member

@kriewall so... how do you feel about testing this? Reality is I'm simply not an Oracle person; in theory setting SqlMapper.Settings.FetchSize should do the right thing; note that you should only set this once - it isn't intended for constant changing.

Are you in a position to build locally, or should I setup a myget deploy?

@kriewall
Copy link
Author

@mgravell Totally on board with testing. Looks like the changes are on the fetchsize branch. I'll try to find time today and do the validation.

@kriewall
Copy link
Author

@mgravell Looks like it works! The only thing I noticed is that the default value for SqlMapper.Settings.FetchSize seems to be -1 which from an OracleCommand standpoint is an invalid entry:

image

That said, when I check the OracleCommand.FetchSize value at the time of usage it is equal to the default value of 131072. If I set a custom SqlMapper.Settings.FetchSize value >0 at startup I see that value reflected in OracleCommand.FetchSize when the command is constructed within Dapper. Large FetchSize values dramatically improve the performance of large queries.

Let me know if there's anything else you'd like me to do from a validation perspective. Will this change be published to Nuget soon, or is there a way I can get my hands on a prerelease version?

@mgravell
Copy link
Member

mgravell commented Aug 17, 2023 via email

@mgravell
Copy link
Member

Re NuGet: yes, we can get this merged quickly - probably Friday

(I replied on the -1 thing by email; it should appear here soon?

@kriewall
Copy link
Author

Clear on the -1 and its meaning. Seems reasonable and is consistent with the fact that the FetchSize is the Oracle default unless explicitly assigned.

Appreciate your support on this! I'll stay tuned for the package push.

mgravell added a commit that referenced this issue Aug 17, 2023
@mgravell
Copy link
Member

https://github.com/DapperLib/Dapper/releases/tag/2.0.151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants