Incremental improvements to Npgsql #3

divega · 2017-11-06T20:59:50Z

@damageboy, @roji and @anpete have been discussing and looking possible changes to Npgsql as well as modifying the benchmark code to try to get better performance(see npgsql/npgsql@c153d27, [insert a reference to "whole command caching"], and https://gist.github.com/damageboy/556ece30cf2368212773381ec01a8c06).

The results so far seem very promising.

Creating this issue here so that we can hopefully centralize the conversation about them here (some of it has been going over email, but it would be better for them to be in the open).

It would also be great if we can incorporate any modified benchmark code in one repository (e.g. this one) so anyone can play with it and try new changes. I believe the last code from @damageboy is at https://github.com/damageboy/benchmarkdb.

The text was updated successfully, but these errors were encountered:

divega · 2017-11-06T21:00:34Z

@damageboy @roji are you cool with moving the conversation here? If yes, could you please summarize the current state of things?

roji · 2017-11-07T04:58:42Z

Hey @divega, great to see things moving along.

Optimization efforts are currently taking place in Npgsql's dev branch, which will be released as 3.3.

Remove some needless allocations in Better resetting of sync context npgsql/npgsql#1689 (based on @anpete's suggestion in Removes allocations from several DataReader hot paths. npgsql/npgsql#1688. Already merged into dev.
Command-level caching (Command-level caching npgsql/npgsql#1701). This is a more invasive performance feature that helps a lot.

Note that there are several ADO.NET-level API issues that also hurt Npgsql performance somewhat (e.g. https://github.com/dotnet/corefx/issues/25093), I'll post these here as well.

Npgsql builds on appveyor and travis, and CI nuget packages for dev are published to myget. The build is currently a bit unstable, I'll be working to stabilize things more in the coming weeks, but you should already be able to grab packages and test (it's best to check what commit they correspond to though).

roji · 2017-11-07T04:59:20Z

@damageboy, maybe it's a good idea to post the latest performance run figures you got.

damageboy · 2017-11-07T08:29:50Z

Hi all,

Here's where we are performance on the latest iteration of the benchmarks:

Physical setup

I ran the benchmarks on two physical machines:

I ran this ugly thing to get an overall glimpse of these machines:

uname -a; cat /proc/cpuinfo | grep "model name" | sort -u; lspci -d 1924: ; ethtool sfn0 | grep Speed; sysctl net.core.rmem_max net.core.wmem_max; echo dotnet --version: $(dotnet --version)

Client machine info/stats:

Linux client 4.4.0-96-generic #119~14.04.1-Ubuntu SMP Wed Sep 13 08:40:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
model name      : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
03:00.0 Ethernet controller: Solarflare Communications Device 0a03 (rev 02)
03:00.1 Ethernet controller: Solarflare Communications Device 0a03 (rev 02)
Cannot get wake-on-lan settings: Operation not permitted
        Speed: 10000Mb/s
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
dotnet --version: 2.0.0

Server machine info/stats:

Linux server 4.4.0-96-generic #119~14.04.1-Ubuntu SMP Wed Sep 13 08:40:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
model name      : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
03:00.0 Ethernet controller: Solarflare Communications Device 0a03 (rev 02)
03:00.1 Ethernet controller: Solarflare Communications Device 0a03 (rev 02)
Cannot get wake-on-lan settings: Operation not permitted
        Speed: 10000Mb/s
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
dotnet --version: 2.0.0

Postgres version:

dpkg -l | grep postgres:

ii  pgdg-keyring                                                     2017.1                                                    all          keyring for apt.postgresql.org
ii  postgresql-9.6                                                   9.6.5-1.pgdg14.04+1                                       amd64        object-relational SQL database, version 9.6 server
ii  postgresql-client                                                9.6+185.pgdg14.04+1                                       all          front-end programs for PostgreSQL (supported version)
ii  postgresql-client-9.6                                            9.6.5-1.pgdg14.04+1                                       amd64        front-end programs for PostgreSQL 9.6
ii  postgresql-client-common                                         185.pgdg14.04+1                                           all          manager for multiple PostgreSQL client versions
ii  postgresql-common                                                185.pgdg14.04+1                                           all          PostgreSQL database-cluster manager
ii  postgresql-contrib-9.6                                           9.6.5-1.pgdg14.04+1                                       amd64        additional facilities for PostgreSQL

Benchmark Code

I took @sebastienros benchmark code and made some small changes to make it more reporting friendly
and added a bunch of benchmarks.

The code is here

Variants

Basically there are 4 sync + 4 async benchmark variants in my fork:

Variant	Sync bench name	Async bench name
Plain Vanilla	sync	async
Keep Connection Obj	sync+conn	async+conn
Keep Connection + Command Objs	sync+conn+cmd	async+conn+cmd
Keep Connection + Command Objs + Pre-allocate results	sync+conn+cmd+prealloc	async+conn+cmd+prealloc

What the each benchamrk does per "Transaction" is:

Benchmark	Description
Plain vanilla	Creates a new ADO.NET Connection object, sets the `ConnectionString`, calls `Open{,Async}()` the connection, Creates a Command, Sets the `CommandText`, Calls `Prepare()`, Calls `ExecuteReader{,Async}()`, iterates over the results with `Read{,Async}()`, calling `ReadInt32(0)` and `ReadString(1)` on each row
Keep Connection Obj	Same as above, but the `DbConnection` obj is allocated outside of the benchmark loop, with its `ConnectionString` set and `Open{,Async}()` is called once for the entire benchmark duration
Keep Connection + Command Objs	Same as above, but in addition the `DbCommand` obj is allocated outside of the benchmark loop with it's `CommandText` set and `Prepare()` called once for the entire benchmark duration
Keep Connection + Command Objs + Pre-allocate results	Same as a above, but the results list + Poco objects are pre-allocated outside of the main benchmark loop.

The connection string for the benchmark is pretty straight forward:
https://github.com/damageboy/benchmarkdb/blob/b2ec07c7a48781df5772fa60d75562a094d5516d/BenchmarkDb/runme.sh#L11

But worth noting that:

the maximum pool size needs to be increased to the higher thread count iterations
The NoResetOnClose should be set to true for Npgsql to skip some costly connection resetting, which is a Npgsql specific optimization that is probably appropriate for most users

Running the benchmakrs

Basically while there are not docker images for these benchmarks for now to run them you roughly need:

On the server machine

Install postgres
Open up through pg_hba.conf the connections
Increase max_connections to 128 at least (I use 200)
Setup the benchmark user + password:
Create the test data:

wget https://raw.githubusercontent.com/aspnet/benchmarks/dev/docker/postgres-techempower/create-postgres.sql
cat create-postgres.sql | psql -U benchmarkdbuser -H localhost bench

On the client machine:

Clone the repo
cd benchmarkdb/BenchmarkDB/
Use ./runme.sh <number-of-seconds>

This will clone Npgsql, compile it, compile the benchmark, hotpatch Npgsql binaries, run the full suite of benchmarks on 1,4,8,16,32,64,128 threads and produce a results.md + results.csv file which I used to produce the charts below...

Results

So if we run roughly everything (Npgsql 3.2.5, 3.3 dev) on 1,4,8,16,32,64,128 threads with all benchmark variants we roughly get the following picture:

Things that stand out:

Sync is:
- Faster than async for lower thread counts
- Slower for higher thread counts
- Not clear why, also doesn't happen on Windows for some reason some reason (e.g. on Windows sync is always faster than async)

If we focus more on the latest Npgsql (dev branch) and async:

Things that stand out:

Moving stuff (Both allocations and Npgsql/ADO.NET operations help a lot
We can hit 180K TPS with the hyper optimizated case
Performance improvements flatten out with higher thread counts without seeing more load on the DB.

Things not in the benchmarks but that I have personally noticed:

At NO point when running c#/dotnet do I see the DB / Network (10-Gbe) server overloaded (I have asciinema recording of this for brave souls)
Removing the call to GetString(1) adds another 10%+ to the TPS numbers
The result list pre-allocation doesn't really do anything (That's why I didn't include it in the results)

PGBench

Running a similar benchmark with PGBench (for reference):

do it:

cd benchmarkdb/BenchmarkDb
PGPASSWORD=bench for t in 1 4 8 16 32; do pgbench -h 1.1.1.2 -U benchmarkdbuser -M prepared -j $t -c $t -T 60 -n -f pgbench.sql bench; done

PGBench results

Threads	TPS
1	18425.11
4	74685.26
8	156070.63
16	243891.20
32	268168.44

sebastienros · 2017-11-20T22:32:05Z

Here are my results on a controlled environment (Azure D3_v2 instances matching TechEmpower cloud environment). The baseline is ran using pgbench. Measures have been done on Windows Server 2016 DC. Load is generated by a separate machine on the same network.

Here I compared the current Nuget package with the Myget version (as of 11/17/2017)

Then after doubling the performance of the database server to see if it is a bottleneck.

The peak result is 42% of the baseline
Sync is faster than async
The dev branch is providing a 10% of raw db access improvement under normal usage (no caching)

Implement reader cancellation. Fixes aspnet#3

divega assigned sebastienros Nov 6, 2017

roji mentioned this issue Nov 20, 2017

Support for nullable arrays npgsql/npgsql#443

Closed

sebastienros mentioned this issue Nov 20, 2017

Idea: better low level networking code #5

Open

divega mentioned this issue Dec 12, 2017

Investigate performance of MySqlConnector #21

Open

bgrainger added a commit to bgrainger/DataAccessPerformance that referenced this issue Mar 19, 2018

Merge pull request #229 from bgrainger/cancel-reader

b86ec58

Implement reader cancellation. Fixes aspnet#3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental improvements to Npgsql #3

Incremental improvements to Npgsql #3

divega commented Nov 6, 2017

divega commented Nov 6, 2017

roji commented Nov 7, 2017 •

edited

Loading

roji commented Nov 7, 2017

damageboy commented Nov 7, 2017 •

edited

Loading

sebastienros commented Nov 20, 2017

Incremental improvements to Npgsql #3

Incremental improvements to Npgsql #3

Comments

divega commented Nov 6, 2017

divega commented Nov 6, 2017

roji commented Nov 7, 2017 • edited Loading

roji commented Nov 7, 2017

damageboy commented Nov 7, 2017 • edited Loading

Physical setup

Client machine info/stats:

Server machine info/stats:

Postgres version:

Benchmark Code

Variants

Running the benchmakrs

On the server machine

On the client machine:

Results

PGBench

Running a similar benchmark with PGBench (for reference):

PGBench results

sebastienros commented Nov 20, 2017

roji commented Nov 7, 2017 •

edited

Loading

damageboy commented Nov 7, 2017 •

edited

Loading