Presentation: Jackson Performance

Tatu Saloranta edited this page Jun 5, 2018 · 23 revisions

Turbo-charging Jackson

Although Jackson JSON Processor is fast out-of-the-box, with default settings and common usage patterns, there are ways to make it process things even faster.

This presentation looks at couple of things you can use that can make a big difference in performance, especially for cases where every last drop of CPU power matters. But it can also help reduce things like garbage generation, to improve efficiency even if performance is not directly a significant existence challenge.

Basics: Things You Should Do Anyway

There are some basic ground rules to follow to ensure that Jackson processes things efficiently (close to optimal level). These are things that you should "do anyway", even if you do not have actual performance problems: think of them as an interpretation of the "Boy Scout Rule" ("Always leave the campground cleaner than you found it"). Note that guidelines are shown in loosely decreasing order of importance.

  1. Reuse heavy-weight objects: ObjectMapper (data-binding) and JsonFactory (streaming API)
    • To a lesser degree, you may also want to reuse ObjectReader and ObjectWriter instances -- this is just some icing on the cake, but they are fully thread-safe and reusable
  2. Close things that need to be closed: JsonParser, JsonGenerator
    • This helps reuse underlying things such as symbol tables, reusable input/output buffers
    • Nothing to close for ObjectMapper
  3. Use "unrefined" (least processed) forms of input: i.e., do not try decorating input sources and output targets:
    • Input: byte[] is best if you have it; InputStream second best; followed by Reader -- and in every case, do NOT try reading input into a String!
    • Output: OutputStream is best; Writer second best; and calling writeValueAsString() is the least efficient (why construct an intermediate String?)
    • Rationale: Jackson is very good at finding the most efficient (sometimes zero-copy) way to consume/produce JSON encoded data -- allow it do its magic
  4. If you need to re-process, then replay and don't re-parse
    • Sometimes you need to process things in multiple phases; for example, you may need to parse part of JSON encoded data to plan out further processing or data-binding rules, and/or modify intermediate presentation for further processing
    • Instead of writing out intermediate forms back as JSON (which will incur both JSON writing and reading overhead), it is better to use a more efficient intermediate form
    • The most efficient intermediate form is TokenBuffer (flat sequence of JSON Tokens); followed by JSON Tree model (JsonNode)
    • May also want to use ObjectMapper.convertValue(), to convert between Object types
  5. Use ObjectReader method readValues() for reading sequences of the same POJO type
    • Functionally equivalent to calling readValue() multiple times, but both more convenient AND (slightly) more efficient
  6. Use ObjectWriter method writeValues() for writing sequences of the same POJO type
    • Functionally equivalent to calling writeValue() multiple times on same JsonGenerator but both more convenient and (slightly) more efficient
  7. Prefer 'ObjectReader'/'ObjectWriter' over 'ObjectMapper'
    • ObjectReader and ObjectWriter are safer to use -- they are fully immutable and freely shareable between threads -- but they can also be bit more efficient, since they can avoid some of the lookups that ObjectMapper has to do

Specific options for further improving performance

Once you have reviewed the basics discussed above, you may want to consider other tasks specifically aimed at further improving performance.

Ease vs Compatibility

There are two main criteria that differentiate the approaches listed below:

  1. Ease -- how much work is involved in making the change
  2. Compatibility -- is the resulting system interoperable with "Plain Old JSON" usage?

Compatible, not so easy: Use the Streaming API

The big benefit of the Jackson Databind API is the ease of use: with just a line or two of code you can convert between POJOs and JSON. But this convenience is not completely free: there is overhead involved in some of the automated processing, such as that of handling POJO property values using Java Reflection (compared to explicit calls to getters and setters).

So one straight-forward (if laborious) possibility is to rewrite data conversion to use the Jackson Streaming API. With the Streaming API one has to construct JsonParsers and JsonGenerators, and use low-level calls to read and write JSON as tokens.

If you explicitly rewrite all the conversions to use the Streaming API instead of data binding, you may be able to increase throughput by 30-40%; and this without any changes to actual JSON produced. But writing and maintaining the low-level code takes time and effort, so whether you want to do this depends on how much you want to invest in achieving a modest speed improvement.

One possible trade-off is that of only rewriting parts of the process; specifically, optimizing most commonly used conversions. These are usually leaf-level classes (classes that have only primitive- or String- valued properties). You can achieve this by writing JsonSerializers and JsonDeserializers for a small number of types; Jackson can happily use both its own default POJO serializers, deserializers, and custom overrides for specific types.

Non-compatible, easy: Smile binary "JSON"

Another kind of trade-off is to consider the Smile binary format.

Smile is a binary format that is 100% compatible with logical JSON Data model; similar to how "binary XML" (like Fast Infoset) is related to standard textual XML. This means that conversion between JSON and Smile can be done efficiently and without loss of information. It also means that the API for working with Smile-encoded data is nearly identical to the regular Jackson API: the only difference being that the underlying factory is of type SmileFactory, instead of JsonFactory. SmileFactory is provided by the Jackson Smile Module (jackson-dataformat-smile)

Converting a service (or client) to use Smile is very easy: just create an ObjectMapper that uses SmileFactory. But the potential challenge is that such a change is visible to clients; this may or may not be a problem (depending on whether the content format can be auto-negotiated, as is done using JAX-RS). But it is a visible change either way.

Use of the binary format may be problematic more generally, as well; dealing with binary formats is very difficult from Javascript (and this is true for ALL binary formats, including protobuf and thrift) -- and for Javascript, specifically, it is SLOWER than handling of JSON -- but may also be problematic from languages that do not yet have Smile codec available. Currently Smile support is provided by libsmile library written in C (and obviously the standard Java implementation). Finally, debugging of binary formats is more difficult than that of textual data formats, as some kind of reader is needed.

Performance improvements from using Smile are similar to using the Streaming API (30 - 50% improvement), but an additional bonus is that size of the data will decrease as well; typically by a similar amount (30-50%). Note that performance improvements are more significant with redundant data like streams of similar Objects ("big data", such as Map/Reduce data streams); this is because Smile can use back-references to all but eliminate repeating property names and short String values (like Enumerated values).

Finally, note that as with JSON, you can also choose between the Streaming API and databinding when using Smile as the underlying format. Doing so combines multiple performance-improving tactics.

Or CBOR!

Alternatively, you may consider Smile-like alternative binary JSON format, CBOR. Backend implement, jackson-dataformat-cbor, is also included in Binary dataformats repo.

Non-compatible, easy: POJOs as JSON Arrays (Jackson 2.1)

An upcoming new option to be included in the soon-to-be-released Jackson 2.1 makes it possible to change the actual JSON Structure used for serializing Java Objects. For example, consider the case of a hypothetical Point class:

public class Point {
  public int x, y;
}

which would typically be serialized as:

{"x":27, "y":15}

However, if one declares it as:

@JsonFormat(shape=JsonFormat.Shape.ARRAY)
@JsonPropertyOrder(alphabetic=true)
public class Point {
  public int x, y;
}

we would instead get:

[27,15]

which basically just eliminates property names by using positional values for indicating which property value is stored where. This can lead to a significant reduction in size of the serialized JSON content; and this translates quite directly to performance. It is also worth noting that this works equally well for "simple" non-repeating data (like request/response messages), as property names are simply eliminated.

As with Smile, this change is directly visible to clients, and either requires that clients use Jackson, or that they implement similar functionality. Nonetheless, this format is slightly easier to read (or at least debug) and process with scripting languages.

Since this features is brand new, it has not been extensively performance-tested, but the initial results suggest that it can achieve improvements similar to use of Smile or hand-written Streaming API based converters. And this feature can be combined with use of the Smile format as well.

Compatible, easy: Afterburner

After going through couple of compromises (easy OR compatible), there is one approach that is both easy AND compatible (yay!): Jackson Afterburner Module.

The Afterburner module optimizes the underlying serializers and deserializers.

  • Uses byte code generation to replace Java Reflection calls (used for field and method access and constructor calls) with actual byte code -- similar to how one would write explicit field access and method calls from Java code
  • Inlines handling of a small set of basic types (String, int, long -- possibly more in future), so that if the default serializer/deserializer is used, calls are replaced by equivalent standard handling (which eliminates a couple of method calls, and possible argument/return value boxing)
  • Speculative "match/parsing" of ordered property names, using special matching calls in JsonParser -- this can eliminate symbol table lookups if field names are serialized in the same order that Jackson serializes them (which may be indicated by use of @JsonPropertyOrder)

Since these optimizations more or less mimic more efficient patterns used by "hand-written" converters (i.e. our first option, use of the Streaming API), performance improvements could theoretically reach the level of such converters. In practice, we have observed improvements in 60-70% range of this maximum (that is, Afterburner can eliminate 2/3 of overhead that standard databinding has over hand-written alternatives).

Maturity of approaches

The approaches discussed so far have different levels of maturity, and this may affect your decision process.

  • Streaming API - based converters ("hand-written"): The Streaming API has been available since the first Jackson release
  • Smile format: First introduced in Jackson 1.6, very stable, both format and parser/generator implementations
    • Significant amount of real, heavy production use by projects like Elastic Search
  • Afterburner: Has been available since Jackson 1.8 so stable at this point.
  • POJO-as-array: included in Jackson 2.1

Or Do (almost) All of Above!

But do you need to choose just one approach? Absolutely not!

In fact, combining multiple approaches can have big payoffs and you can combine most of the approaches. Specifically:

  • The choice of Smile over JSON is compatible with all the other choices and can vary independently.
  • Choices of "POJO-as-array" and Afterburner are compatible with choices other than the Streaming API

So, you could consider combinations such as:

  • Use the Smile format, but write your code using the Streaming API. This is what some frameworks such as (Elastic Search) do
  • Use Afterburner with "POJO-as-Array"; either with regular JSON or Smile

Such combinations help reach optimum performance.

How fast is fast?

With "extreme" combinations such as those listed above, use of plain old JSON can meet or exceed performance of fast binary formats such as protobuf, thrift or avro. And with Smile, both processing speed and data sizes can exceed alternatives (as small, even faster!).

Although not all combinations discussed above are included, the JVM Serializers benchmark can give some idea for improvements, as it includes results for JSON/Smile, Streaming-API/databind/Afterburner combinations.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.