Skip to content

Performance

Paul Stovell edited this page Apr 30, 2020 · 24 revisions

This page has tips for getting better performance out of Nevermore.

Nevermore 12.0+ is generally faster for most queries, and with larger documents, uses much less memory. If you're on an older version, consider upgrading. See what's new in Nevermore 12.0+ for benchmarks.

Nevermore is optimized particularly around reading, with a focus on being fast (similar to hand-coded SqlCommand speed) and with very few large memory allocations. This section explains a bit about how it all works.

Reading

When querying or loading documents, Nevermore uses a different "strategy" depending on the type being queried (primitive, value tuple, arbitrary class, or a mapped document). Each strategy behaves a little differently, but the core idea is the same.

Reader behavior

When Nevermore calls ExecuteReader, it specifies the SequentialAccess command behavior for the data reader, which allows Nevermore to read and process data as it streams in from SQL Server without buffering the entire row. This is especially important for large documents. Each Nevermore strategy ensures data is only read in the correct order (i.e., field 2 is read before field 3...)

Compilation

Most strategies keep a cache of queries they've seen before, and a compiled func that knows how to execute the reader for that particular query.

  • For primitives, it creates a func for each type you query (int, string)
  • For ValueTuples, it creates a func for each tuple combination you query (ValueTuple<int, int>, ValueTuple<string, int>)
  • For arbitrary classes (without a document map) + documents, it creates one per type + returned columns

The compiled func will look as much as possible as if it was written by hand. For example, when reading a document with two mapped columns, it will generate and compile something like this:

(DbDataReader reader, DocumentReaderContext context) =>
{
    var temp0 = reader.GetString(0);                                     // Id
    var temp1 = reader.GetString(1);                                     // FirstName
    var temp2 = reader.IsDBNull(2) ? (int?)null : reader.GetInt32(2);    // Age

    var jsonResult = context.DeserializeText<Contact>(reader, 7);

    result.Id = temp0;
    result.FirstName = temp1;
    result.Age = temp2;
    return result;
}

The above is overly simplified - there are extra null checks (e.g., if GetString() returns DBNull.Value), and some field assignments to keep track of what column is being read, in case there's an error, and so on. But you get the picture. Note that each field is read and converted into temporary variables, as we can't deserialize until we encounter the JSON column, and the JSON column is usually not at the start of the query.

If you're curious, look at DocumentReaderExpressionBuilder in the source code. Also, if your query causes an exception, the error message will include the source code for the generated reader in case it isn't what you expect.

The func that is compiled out of this will be fast, but building it is slow, so the result is kept for future queries.

Deserialization

Nevermore can store JSON either as text, or in a compressed format.

For JSON as text, there are two strategies: one when we know there will always be small documents, and one if we think there could be large documents.

  • Small documents: Nevermore simply calls DbDataReader.GetString() and then passes it to JSON.NET for deserialization
  • Large documents: Nevermore uses a custom TextReader that calls through to DbDataReader.GetChars() which allows it to read and process the text directly into a buffer as needed, without loading the entire document into memory.

The benefit of the large document approach is great: Nevermore can completely avoid LOH allocations and keep overall memory usage down. However, for small documents (1K or smaller) it's slightly slower, which is why there's two strategies.

How does Nevermore know which strategy to use? It looks at an ExpectLargeDocuments property on the document map, which defaults to false. If it's true, Nevermore always uses the large document strategy.

By default, ExpectLargeDocuments is false - but don't worry. If Nevermore reads a string larger than 1K when deserializing, it will set ExpectLargeDocuments to true and remember it for next time (until the application restarts). So in this way, Nevermore is self-tuning.

This means:

  • If you know for sure that the table will have at least a few large documents, set ExpectLargeDocuments to true.
  • If you are very confident it won't, or at least for some customers it won't, leave it as false.

In Octopus Deploy's case, roughly 80% of documents fit the small category (1K or under). Looking at large customer databases, only certain tables tend to have clusters of large documents.

For compressed JSON, Nevermore calls GetStream on the data reader, and sends it through a GZipStream to decompress through to the JSON.NET deserializer.

Nevermore also makes use of array pools when possible to reduce allocations, though all buffers used are quite small anyway.

Streaming results

Once the expression is compiled, the actual reading is simply done one row at a time by calling the compiled func.

If you call transaction.Stream, you'll get each row at a time. This is good if you need to process the rows but don't need them all in memory at once.

If you use transaction.Query(), you have the choice of Stream(), ToList() and so-on. Use Stream() if possible.

Writing

In general, most optimization work in Nevermore is focussed on reading data. For writing, the focus is mostly on memory use, especially with large documents.

When inserting or updating documents, Nevermore simply generates a SQL statement with the fields you'd expect, and generates command arguments, then executes it. The properties are read from the object into the SqlCommand using compiled funcs (one for each property, cached on the document map).

When serializing JSON (as text or compressed) Nevermore first has to serialize the document to a memory stream. For this, we use RecyclableMemoryStream, which in testing works much better especially when documents can be quite large. The memory stream is then passed to the SqlCommand as a stream - when the command is executed, the stream is read. Again, this means there should be no LOH allocations when dealing with even very large documents.

When JSON is compressed, Nevermore uses GZipStream with the compression level set to Optimal by default. This is because we assume the SQL Server database will be on another server, and the goal is to reduce network I/O on large documents. Documents are written infrequently but read often, so this seems like a good tradeoff. You can, however, change the compression level if you need to.

Other

Beyond reading and writing documents, Nevermore has a few other operations - ExecuteQuery, ExecuteReader and so on. These all call the corresponding method on SqlCommand so there isn't much to optimize.

Queries & query plan cache

Whenever a query is run for the first time in SQL Server, it is compiled and a query plan is generated for the query. Every query requires a query plan before it is actually executed. This query plan is stored in SQL Server query plan cache. This way when that query is run again, SQL Server doesn’t need to create another query plan; rather it uses the cached query plan which improved database performance.

From https://www.sqlshack.com/understanding-sql-server-query-plan-cache/

What's not obvious is that a lot of things can determine whether SQL Server can re-use the query plan. These include:

  • The names of parameters (@firstname1 vs. @firstname2)
  • The types and sizes of parameters (@firstname1 nvarchar(32) vs. @firstname1 nvarchar(27))
  • Casing and other things

These can make a significant difference to query performance. To make best use of the query plan cache, Nevermore does a few things:

Sizes of string variables

When you add a string parameter to a SqlCommand by default, it's actually defined with the length of the string value. For example, if you do two queries with different length strings as parameters, you'll get two query plans.

To solve this, when adding string parameters, Nevermore will set a string length size and round it up into buckets. The buckets are:

nvarchar(100)
nvarchar(200)
nvarchar(600)
nvarchar(1000)

So if you do these queries:

transaction.Query<Customer>().Where(c => c.Email == "test1@octopus.com").FirstOrDefault();
transaction.Query<Customer>().Where(c => c.Email == "test17@octopus.com").FirstOrDefault();

The @email parameter will be passed as nvarchar(100) for both, and SQL can reuse the query plan.

IN queries

When you perform a query like this:

var surnames = new[] { "Robinson", "Smith" }; 
transaction.Query<Customer>().Where(c => c.LastName.In(surnames)).FirstOrDefault();

Nevermore will generate an IN query:

select * from dbo.Customer where LastName in (@lastname_0, @lastname_1)

Therefore, the number of parameters in the array will determine the length and number of parameters to the query. This is a problem because it will generate a lot of query plans that cannot be reused.

Instead, Nevermore will attempt to "pad" the array into buckets of a set size, so that the query plan is more likely to be reused. It does this by repeating the last value in the array until it reaches the size.

The bucket sizes are:

1, 2, 3, 4, 5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 500, 750, 1000

It should be noted that if you're doing queries with a hundred or more values, you should use table-valued parameters.

Predictable parameter names

Nevermore tries to use predictable names for SQL parameters - if you run the same query within a transaction (or between transactions), you should get the same names for command parameter values.

To detect cases where this doesn't happen, Nevermore has a feature you can enable in the configuration:

#if DEBUG
configuration.DetectQueryPlanThrashing = true;
#endif

We suggest doing this in debug builds of your application only, as it adds overhead to process each query, and it's better to go ahead with the query than break the application!

If it detects an issue, it will throw an exception with a message like this:

Nevermore.Diagnostics.DuplicateQueryException: 

Detected a SQL query that is otherwise a perfect duplicate of another 
query, except with different parameter names. This is likely to create 
thrashing of the query plan cache.

The statement being executed this time was: 

select * from dbo.Customer where Id = @id_1

The statement executed last time was:

select * from dbo.Customer where Id = @id

Rewrite your query to use more predictable parameter names, as this 
will allow the database to re-use the query plan for both queries.

Performance benchmarks

TODO: Include

Pitfalls and tips

This section includes a few tips around performance.

LoadMany and lots of parameters

A common pattern is to query a list of documents, then to discover you need an additional document for each one you just queried. This would be a typical "SELECT N+1" problem.

Nevermore transactions have a Load method that takes a list of document IDs. In old versions of Nevermore, this generated a where Id in (@id1, @id2, @id3...) query, and was to be avoided. In Nevermore 12+, it uses table value parameters to pass the list of IDs, then joins on it. This is much faster - Nevermore can load 50,000+ documents this way in a few milliseconds. So use this method often!

In fact, when doing any other kind of query where you need hundreds of parameters, you can call CommandParameterValues.AddTable which will define a temporary table in your query full of the list you give, with a single column called ParameterValue. You can then join on it:

List<string> emails = GetHugeListOfEmailAddresses();

var args = new CommandParameterValues();
args.AddTable("emailList ", emails);

transaction.Stream<Person>(@"
    select p.* from Person p 
       inner join @emailList t 
       on t.[ParameterValue] = p.Email",
   args);

Learn more at about Table-Valued Parameters in Nevermore.

Consider enabling compression

If the time spent reading and writing JSON from the database and transferring it over the network between the application and SQL Server is a concern, consider enabling compression. This makes a very big difference in bandwidth-constrained environments, especially for larger documents.

Change the serializer?

Nevermore uses JSON.NET by default. You could instead use the System.Text.Json serializer, which seems to use less memory. You can do this by setting your own IDocumentSerializer on the store configuration. However, System.Text.Json makes the assumption that everything you do is in UTF-8, which isn't the case in SQL Server, so you might waste time converting anyway.

Avoid stream with projection

There is an overload of Stream that takes a callback, which is used when doing a projection. It's used when you want to do a query with a join, then map the result to a pair (or three or four) documents. It works well when the result is small (or when you query an arbitrary class) but not for large documents. This is because the data reader cannot use SequentialAccess (as your projection may be out of order), so the entire row must be read into memory.

Large strings/arrays

Nevermore works hard to avoid any LOH allocations (objects 85K or more) by streaming and buffering everything it can. However, if your document includes one huge allocation anyway - e.g., because it has an 85K HTML string embedded in it, or a giant array of ints - then those allocations will still be needed.

For this kind of design, you might want to consider a separate table. One table could store the JSON (which is fully deserialized) but when the full data is needed, you could fetch from the second table instead (of course, maybe this data makes more sense to store on disk anyway!).

If you call IRelationalTransaction.ExecuteReader you'll get raw access to a data reader - you could use this, for example, to read the text using GetChars in chunks and write it to the HTML response directly, without ever allocating a large string.

Do not use varchar

SQL has two datatypes for representing strings - nvarchar and varchar. Nevermore must pass all strings as nvarchar, since the .NET string type could contain unicode. This results in implicit conversions when querying against columns, and slows queries down.

For example, take this query:

var args = new CommandParameterValues 
{{
  "ids", new List<string>() {"MyId-18", "MyId-48"}
}};
var count = reader.ExecuteScalar<int>(
    "select count(*) from dbo.SomeTable where Id in @ids", args);

The generated SQL looks like this:

exec sp_executesql N'select count(*) from dbo.SomeTable where Id in (@ids_1, @ids_2)',N'@ids_1 nvarchar(100),@ids_2 nvarchar(100)',@ids_1=N'MyId-18',@ids_2=N'MyId-48'

Notice how all string command values are passed to SQL with the N prefix, indicating nvarchar.

If the ID column is declared as nvarchar, we get a nice, efficient index seek.

However, if the ID column is declared as varchar, it results in an index scan, and a warning when you look at the execution plan.

Index scan because of implicit conversion