Skip to content

1.3. Memory FAQ

Giuseppe Barbieri edited this page Oct 1, 2020 · 5 revisions

This page is a must read for all LWJGL users.

Why does LWJGL use buffers so much?

LWJGL requires the use of off-heap memory when passing data to native libraries. Likewise, any buffers returned from native libraries are always backed by off-heap memory. This is not an LWJGL limitation. There are two issues with Java objects and arrays that live on the JVM heap:

  • It is not possible to control the layout of Java objects. Different JVMs and different JVM settings produce very different field layouts. Native libraries on the other hand expect data with very precisely defined layouts.
  • Any Java object or array may be moved by the GC at any time, concurrently with the execution of a native method call. All JNI methods are executed at a safepoint so, by definition, must not access heap data.

The standard approach is:

  1. Using JNI functions to access Java objects, which is painfully slow.
  2. Using JNI functions to "pin" Java arrays (Get/ReleasePrimitiveArrayCritical or Hotspot Critical Natives) which is also inefficient for several reasons.

LWJGL on the other hand is designed to be used with direct (off-heap) java.nio buffer classes for passing data to and from native code. ByteBuffer and the other classes is not the best possible abstraction for off-heap data and their API is not ideal, but it is the only officially supported way to access off-heap data in Java.

The easiest way to think of ByteBuffer is as a wrapper over a native C pointer, plus the array length (the buffer.capacity()). LWJGL maps C primitive types to the corresponding class in java.nio. Arrays of pointers are mapped to the org.lwjgl.PointerBuffer class. Pointers to structs are mapped to the corresponding struct class. Pointers to struct arrays are mapped to the corresponding <StructClass>.Buffer class. PointerBuffer and the struct Buffer classes have an API very similar to java.nio buffers.

What java.nio.ByteOrder should be used?

The buffer byte order must be set to ByteOrder.nativeOrder(). It is basically required for correct cross-platform behavior. It also results in the best performance.

All buffer instances created by LWJGL are always set to the native byte order.

How do I allocate and deallocate buffers?

After getting familiar with the above mappings, the next step is learning how to handle allocation of such buffers. This is a critical issue and LWJGL offers several options. The options are listed below ordered from more-to-less efficient. Every time you make a decision on how to handle an allocation, you should consider the first option. If that's not applicable, consider the second option, and so on.

1. Stack allocation

Java does not support explicit stack allocation of Java objects and obviously does not support off-heap stack allocation either. In C it's very simple: you declare a variable inside a function and it's stack allocated. When the function returns, the variable's memory is reclaimed automatically (and without overhead). There's no such equivalent in Java.

Similarly, it is not possible in Java to call a native function that expects or returns a struct by value. Such functions in LWJGL bindings are wrapped and exposed with a pointer-to-struct parameter or return value.

This is a problem because it's very common to need small, short-lived allocations when calling native functions. For example, creating a vertex buffer object in OpenGL in C:

GLuint vbo;
glGenBuffers(1, &vbo); // very simple

and with LWJGL:

IntBuffer ip = ...; // need a 4-byte buffer here
glGenBuffers(ip);
int vbo = ip.get(0);

A real IntBuffer allocation in the above example, regardless of the implementation, would be vastly more inefficient than the stack pointer in the equivalent C code.

The usual answer to this problem, in LWJGL 2 and other Java libraries, is to allocate the buffer once, cache it and reuse it in many method calls. This is an incredibly unsatisfying solution:

  • It leads to ugly code and wastes memory.
  • To avoid wasting memory, static buffers are usually used.
  • Using static buffers leads to either concurrency bugs or less than ideal performance (due to synchronization).

The LWJGL 3 answer is the org.lwjgl.system.MemoryStack API. It's been designed to be used with static imports and try-with-resources blocks. The above example becomes:

int vbo;
try (MemoryStack stack = stackPush()) {
    IntBuffer ip = stack.callocInt(1);
    glGenBuffers(ip);
    vbo = ip.get(0);
} // stack automatically popped, ip memory automatically reclaimed

It is obviously more verbose, but has the following advantages:

  • More than one allocation is usually required, but the try-with-resources boilerplate remains the same.
  • The semantics of the above code perfectly match the requirements. The stack memory is thread-local, just like a real C thread stack.
  • Performance is ideal. The stack push and pop are simple bumps of a pointer and the IntBuffer instance allocation is either eliminated with escape analysis or handled by the next minor/eden GC cycle (super efficiently).

Note 1: The default stack size is 64kb. It can be changed with -Dorg.lwjgl.system.stackSize or Configuration.STACK_SIZE.

Note 2: Structs and struct buffers can also be allocated on the MemoryStack.

Note 3: The static, thread-local MemoryStack API is just a convenience. There's additional API that lets you create and/or use MemoryStack instances as you see fit.

2. MemoryUtil (malloc/free)

Sometimes stack allocation cannot be used. The memory that must be allocated is too big or the allocation is long lived. In such cases, the next best option is explicit memory management. Either via the org.lwjgl.system.MemoryUtil API or a specific memory allocator (currently available in LWJGL: stdlib, jemalloc). Example:

ByteBuffer buffer = memAlloc(2 * 1024 * 1024); // 2MB
// use buffer...
memFree(buffer); // free when no longer needed

Note 1: Just like in C, the user is responsible for deallocating memory allocated with malloc using free.

Note 2: API for the standard functions calloc, realloc and aligned_alloc is also available.

Note 3: The Java objects allocated with the explicit memory management functions are also subject to escape analysis.

3. BufferUtils (ByteBuffer.allocateDirect)

Sometimes the explicit memory management API cannot be used either. Maybe a particular allocation is hard to track without complicating the code, or it might not be possible to know exactly when it is no longer required. Such cases are legitimate candidates for using org.lwjgl.BufferUtils. This class existed in older LWJGL versions with the same API. It uses ByteBuffer.allocateDirect to do the allocations which has one major advantage: the user does not need to deallocate the off-heap memory explicitly, it is done automatically by the GC.

On the other hand, it has the following disadvantages:

  • It is slow, much slower than the raw malloc call. A lot of overhead on top of a function that is already slow.
  • It scales badly with concurrent allocations.
  • It arbitrarily limits the amount of allocated memory (-XX:MaxDirectMemorySize).
  • Like Java arrays, the allocated memory is always zeroed-out. This is not necessarily bad, but having the option would be better.
  • There's no way to deallocate the allocated memory on demand (without JDK-specific reflection hacks). Instead, a reference queue is used that usually requires two GC cycles to free the native memory. This may lead to OOM errors under pressure.

An example of LWJGL using BufferUtils internally is for allocating the memory that backs the thread-local MemoryStack instances. It is a long lived allocation that must be deallocated when the thread dies, so we let the GC take care of it.

tl;dr

  1. Use org.lwjgl.system.MemoryStack if allocation is small and short-lived otherwise...
  2. Use org.lwjgl.system.MemoryUtil if allocation can be tracked and its live is known otherwise...
  3. Use org.lwjgl.BufferUtil

I want to know more, got something for me?

Yes, read the Memory Management in LWJGL 3 blog post.