Down the Rabbit Hole

This is where we give away the recipe to the secret sauce. If we were smart, we'd leave to the those industrious enough to read the code, lest others copy what we have done too easily. But honestly, when you come in with benchmarks like these there is a certain amount of skepticism that must be addressed.

We're in your bytecodez

In order to make HikariCP as fast as it is, we went down to bytecode-level engineering. We pulled out every trick we know to help the JIT help you. We tried to limit key routines to less then the JIT inline-threshold, we chased down and eliminated as many invokeinterface or invokespecial bytecode operations as possible, flattened inheritance hierarchies, shadowed member variables, eliminated casts. Trusting no one, we dissected the JVM classes and replaced where necessary. We studied operating system thread schedulers and JIT compiler output. We think it shows in our results. But like rust, we never sleep; there are still a few unexplored paths we intend to go down in the future.

There is not only beauty in simplicity, but if done right, speed.

Javassist-generated Delegates

Pretty much every connection pool, dare we say every pool available, has to "wrap" your real Connection, Statement, PreparedStatement, etc. instances and intercept methods like close() so that the Connection isn't actually closed but instead is returned to the pool. Statement and it's subclasses must be wrapped, and SQLException caught and inspected to see if the exception reflects a disconnection that warrants ejecting the Connection from the pool.

What this means is "delegation". The Connection wrapper cares about intercepting close() or execute(sql) for example, but for almost all of the other methods of Connection it simply delegates. Something like:

public Clob createClob() {
   return delegate.createClob();
}

The first iteration of HikariCP also did this, and it still provides a "fallback" mode. An interface like PreparedStatement contains some 50+ methods, only 4 of which we are interested in intercepting. Rather than creating a wrapper class that has 50+ "delegate" methods like the above, we use Javassist to generate all of the delegate methods. While this provides no inherent performance increase, it means that our "proxy" (wrapper) class only need contain the overridden methods. The Statement proxy class in HikariCP is only ~130 lines of code including comments, compared to 1100+ lines of code in other pools. This approach is in keeping with our minimalist ethos.

Our delegates perform quite admirably:

Pool	Med (ms)	Avg (ms)	Max (ms)
BoneCP	5049	3249	6929
HikariCP	13	11	58

And yet, looking at the bytecode for all of the delegate methods, with their getfield, checkcast, and invokeinterface op codes, it really touched our nerve. Is it possible to go faster?

Can we actually eliminate delegation itself?

Full-on Insanity

"I've always been mad, I know I've been mad,
like the most of us,
very hard to explain why you're mad,
even if you're not mad..." 
                       - Pink Floyd

But how? How could we eliminate delegation and still intercept the methods we need? Even more, we need to wrap every "delegate" method with a try..catch to interrogate SQLExceptions, which is actually interception now isn't it?

In order to eliminate delegation the user needs to run against the "bare metal" of their driver classes, yet we still need to intercept methods and wrap them with exception handlers. We were already using Javassist to generate our classes for delegation. Why not use Javassist to inject our code directly into the driver's classes?

However, the classes must be altered before they are loaded ... because convincing the JVM to reload classes is no trivial task. The answer lay in java.lang.instrument. We built an instrumentation "agent" that "instruments" the driver classes on the fly as they are loaded, injecting our code into them. The instrumentation agent is dynamically loaded and unloaded so that it doesn't spend time inspecting classes that have nothing to do with JDBC and no need for instrumentation.

Pure Gold

As slim as our "delegate" proxies are, there is still a fair amount of code, especially in the Connection proxy. The prospect of "inlining" the bytecode, or worse, source code into the instrumentation code had a bad smell about it. We've already written the intercept code once in our proxies, can't we just use that somehow? But the code is in our classes, not in the target driver's classes.

This is where we think code can sometimes become art. We created an annotation @HikariInject, and with it we annotate all of the fields and methods in the existing proxy classes. The instrumentation agent inspects our proxy classes, and injects fields or methods tagged with @HikariInject into the target driver class -- with some special logic for handling collisions. The pure gold is, the exact same class code that is used in "delegation" mode is the same exact class code that is injected in "instrumentation" mode. There is only one canonical source for both.

The instrumenter is extremely robust, but if there is any kind of failure injecting the code, HikariCP drops back to delegation mode (and logs a message to that effect). The JVM is smart enough to know that if an instrumentation agent throws an exception, the class is loaded cleanly without it -- nothing can be corrupted. Injection takes place at pool startup time, and typically takes only about 200ms.

The result of this is:

Pool	Med (ms)	Avg (ms)	Max (ms)
BoneCP	5049	3249	6929
HikariCP	8	7	13

While going from 13ms (delegates) to 8ms (instrumentation) may not seem like much, it represents a 40% improvement.

Yeah, but still

Still, even without instrumentation, how do we get anywhere near 13ms for 60+ million JDBC API invocations? Well, we're obviously running against a stub implementation, so the JIT is doing a lot of inlining. However, the same inlining at the stub-level is occurring for BoneCP in the benchmark. So, no inherent advantage to us.

But inlining is part of the equation, and I will say that BoneCP has at least 10 methods that are flagged as "hot" by the JVM that the JIT considers too large to inline. And at least two of these are critical path. HikariCP has none. Which brings us to another topic...

Scheduler quanta

Some light reading. TL;DR Obviously, when you're running 400 threads "at once", you aren't really running them "at once" unless you have 400 cores. The operating system, using N cores, switches between your threads giving each a small "slice" of time to run called a quanta or quantum.

But with 400 threads, when your time runs out (as a thread) it may be a "long time" before the scheduler gives you a chance to run again. With this many threads, if a thread cannot complete what it needs to get done during its time-slice, well, there is a performance penalty to be paid. And not a small one.

We have combed through HikariCP, crushing and optimizing the critical code paths to ensure they can fully execute any operation within a "quanta". With of course the exception of a truly blocked condition, such as no available connections. Actually, not just any operation -- 60+ million of them.

The fact is, with JIT inlining and execution path optimizations, a thread invoking against HikariCP can get through all 60+ million JDBC operations in the MixedBench benchmark within a single scheduler quanta.

Put that in your pipe and smoke it!

Can we go faster?

We don't know. Our original goal when moving from delegates to instrumentation was to reach sub-millisecond times for 60+ million JDBC API invocations. We continue to poke at the problem, and we're not sure where the theoretical maximum actually is, but we feel we're getting close. Maybe we're at "good enough" and it's time to take on another task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Down the Rabbit Hole

We're in your bytecodez

Javassist-generated Delegates

Full-on Insanity

Pure Gold

Yeah, but still

Scheduler quanta

Can we go faster?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally