Skip to content

Case 3: Systematic approach to benchmarking your code

Marius Jokubauskas edited this page Mar 18, 2021 · 6 revisions

How can a single switch statement ruin your performance Systematic approach to benchmarking your code

As we are starting to focus on performance, we have decide to do some check on our tracing lib TNT4J. You may have not found it yet but every run you run JMH you're thrown with a warning:

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Ok, first thing first - perform a baseline. Oh well, first task, and it's not that easy. You can take a base as your previous release, thus it doesn't mean anything but you're heading right (or wrong) way along your commits. Happily we have encountered once OpenTelemetry - NewRelic and others seems to join the forces to make some tracing lib as they call it "vendor agnostic", I can have other opinion on this statement, but you can leave it here for now.

So let's start explore OT stuff, after some research, I did find out that it might work. These to are comparable, both are creating some data model, there is the uniquely identified trace, with child entries and some mapped entries for various data. We call the trace as activity, as OT calls it span. Activity can have child entries - events, OT call it events too, and last but not the least activities and events can have key-value pair data we call properties, and OT calls it attributes. So here it is - nomenclature differs, but idea behind that is pretty straightforward, and the same.

This data model then is pushed to some collector, we do usually use Nastel Xray as data store/visualizer as OT uses Jaeger/Prometheus. The output we call the "sink", as OT uses "exporter". And the fun part started to end here. If I'd like to benchmark my code, I don't want to create huge ecosystem of back end databases, servlet's, UI's and so on. Thats not that time consuming in this case as both back ends shipped as containers anyway, but i'd like to share my benchmark with other team members and I wouldn't like to have a three page instructions full of steps how to benchmark simple lib. Another drawback I wouldn't have an idea what I'm benchmarking - if I run both back-end/front-end on my computer I might have end up benchmarking how well my system handles sharing resources between applications. On the other hand if I do run Jaeger/Prometheus/Xray on server I might end up benchmarking the network..

After all that thinking and research I have to find something in common in these libs. And there is - both do support JUL logging as the output. Eureka! I can compare apples to apples and see what is what.

Few days of research and finally I have something to code - a benchmark at last. I’am using the whole cybench.io toolkit. Stakes are high - I am about to prove we’re the best in the world. I’d like that all my benchmark's is recorded as a proof. Cybench do have a lib comparison feature, you need to have some proper annotation’s ant that’s it - cybench.io will graph you yours comparison. Let’s start code now..

One benchmark for OT:

@Benchmark
@BenchmarkMetaData(key = "libSymbolicName", value = "OpenTelemetry")
@BenchmarkMetaData(key = "libVersion", value = "1.0.0")
@BenchmarkMetaData(key = "libDescription", value = "")
@BenchmarkMetaData(key = "libVendor", value = "OpenTelemetry")
@BenchmarkMetaData(key = "libUrl", value = "https://opentelemetry.io/")
@BenchmarkMetaData(key = "api", value = "OpenTelemetry")
@BenchmarkMetaData(key = "title", value = "OpenTelemetry create and log activity")
@BenchmarkTag(tag = "d81aa931-9c4d-4429-bd71-b343a3a7c09b")
public static void otLog() {
    Span span = tracer.spanBuilder("New").startSpan();
    span.addEvent("asda");
    span.end();
}

Just clean and simple. And another for TNT4J

@Benchmark
@BenchmarkMetaData(key = "libSymbolicName", value = "OpenTelemetry")
@BenchmarkMetaData(key = "libVersion", value = "2.13.2")
@BenchmarkMetaData(key = "libDescription", value = "")
@BenchmarkMetaData(key = "libVendor", value = "Nastel")
@BenchmarkMetaData(key = "libUrl", value = "https://github.com/Nastel/TNT4J")
@BenchmarkMetaData(key = "api", value = "TNT4J")
@BenchmarkMetaData(key = "title", value = "TNT4J create and log activity")
@BenchmarkTag(tag = "9223353d-8cb9-41be-9f42-a154e527f7f8")
public static void tnt4jLog() {
    TrackingActivity trackingActivity = logger.newActivity();
    trackingActivity.add(logger.newEvent("asda", "ASDA"));
    logger.tnt(trackingActivity);
}

That's it. Just run it. For developing I am using Cybench plugin for Intellij. Run the benchmark and get the result stored in cybech.io is few clicks away. And I did the clicking.

Doh!

Seems all the glory moved to the box named trash. Our tnt4j lib has a score twice as low as the concurrent.

Keep calm and make it faster. Concentrate, concentrate... For the TNT4J it's actually a few lines involved, should be too hard. I have just wrote a test, to figure our what's involved:

    @Test
    void tnt4jLog() throws InterruptedException {
       TraceBenchmarks.logger=  TraceBenchmarks.setupTNT4Jtracer();
       TraceBenchmarks.tnt4jLog();
    }

Afterwards I ran the code in debug and set the debug breakpoint to the JUL output's line in my JULEventSink:

		logger.log(level, msg, t);

And after a run I know the stack, in other words every class involved in this performance issue:

_reportItem:209, TrackerImpl (com.jkoolcloud.tnt4j.tracker)
tnt:469, TrackerImpl (com.jkoolcloud.tnt4j.tracker)
tnt:854, TrackingLogger (com.jkoolcloud.tnt4j)
tnt4jLog:64, TraceBenchmarks (com.gocypher.cybench.client)
tnt4jLog:14, TraceBenchmarksTest (com.gocypher.cybench.client)
<.. jUnit lines>

Not so much after all.

Now I have commented the OT benchmark and set the baseline for TNT4J score. And the baseline score is - 10.

My approach was to comment out the lines I these methods and see what score I get in a run. I have set up for a quick run where:

Property Value
Measurement time 1
Measurement iterations 1
Warmup time 1
Warmup itterations 1
Include hw properties false
Send report to Cybench false

Happily my Cybench Intellij plug-in takes care of saving and keeping track of my performance.

Let me comment this picture a bit - each line represents benchmark run, I have ran benchmark every time I changed something in code, usually that's a line I have commented out, there is some code features as statistics collection, state check's etc. that I could easily comment out, and see how my code performs. Commenting these lines out I have find out that it didn't impact my benchmark a lot. My score is ~10 all the time. And there is some benchmark runs where I get score >100, these I've got where I actually commented out the logging output itself, but it still helped me to narrow down the cause.

And a few lines from below where I have scored 14 - Eureka! I have found a line causing and issue with big impact. Lets look at a code

	@Override
	protected void writeLine(OpLevel sev, LogEntry entry, Throwable t) {
		_checkState();

		//Level level = getLevel(sev);
		if (!logger.isLoggable(Level.INFO)) {
			return;
		}

		String msg = entry.getString();
		incrementBytesSent(msg.length());
		logger.log(level, msg, t);
	}

So I have commented out the level check and got a high score of 14. Let you show a code for getLevel() method:

	private Level getLevel(OpLevel sev) {
		switch (sev) {
		case INFO:
			return Level.INFO;
		case DEBUG:
			return Level.FINE;
		case TRACE:
			return Level.FINER;
		case NOTICE:
		case WARNING:
			return Level.WARNING;
		case ERROR:
		case CRITICAL:
		case FAILURE:
		case FATAL:
		case HALT:
			return Level.SEVERE;
		case NONE:
			return Level.OFF;
		default:
			return Level.INFO;
		}
	}

So it doesn't do much, just mapping our own severity (Oplevel) to JUL's Level. But it's clearly causing an issue. Just Google it up "Switch statement java performance"

To be continued...