Description
How we are using LightGBM:
We are using the LightGBM c_api in our model hosting service, written in Golang. We've written a CGO wrapper around the c_api. We are using the “lib_lightgbm.so” library file provided with on Github.
Version Used :
3.1.1
Environment info:
Operating System: Observed on both Linux and MacOS
Architecture: x86_64
CPU model: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
C++ compiler version: gcc version 7.2.0 (Debian 7.2.0-1)
CMake version: 3.9.0
GLIBC version: ldd (Debian GLIBC 2.28-10) 2.28
Context:
We load a few LightGBM models in our model hosting service and refresh the models as soon as the new ones are available. The models are loaded via the method LGBM_BoosterLoadModelFromString
provided by the api. We are releasing the older models with the method LGBM_BoosterFree
.
We are hosting this service on GKE pods which have a fixed amount of memory.
Issue:
We're seeing a gradual uptick in the RSS (Resident Memory Set) of the service as soon as the model is refreshed. We are measuring the RSS metric via prometheus, which exposes process_resident_memory_bytes. This indicates
that the entire memory is not freed up when LGBM_BoosterFree
is called. This is causing our service pods to go down of OOM and the lifetime of the pods and in turn service health has taken a hit.
To remove the suspicions of the "Golang part of code" contributing to the RSS, we looked at the heap and the heap comes back to the pre-load value for a model.
To further solidify that Go is not the issue, we ran an experiment where we continuously free up and reload the same model again and again. We do not load anything else in the service except the model file (no metadatas or any such sort of thing).
We observed a staircase for the RSS metric:

For this experiment, the heap looked like this:

This ascertains the fact that something is going on with the C API while freeing up the space it takes for the model. I started digging into the code but the memory seems to be managed appropriately using unique pointers and destructors.
Then, I stripped the situation to it's bare minimum form where I load a model string (which is on disk), record the rss, release it, record the rss. I do this multiple times, while forcing the GC to run after every action and waiting 20 seconds for rss to settle so i get the exact values. And the results are a bit weird. RSS seems to be coming down to the older values in some iterations while it does not do so in others. The result is a gradual increase.
Initial: 1524
C_API Model Loaded: 4576
Model Released: 1988
Initial: 1988
C_API Model Loaded: 5659
Model Released: 4012
Initial: 4012
C_API Model Loaded: 7263
Model Released: 5509
Initial: 5509
C_API Model Loaded: 9209
Model Released: 7561
Initial: 7561
C_API Model Loaded: 9281
Model Released: 7603
Initial: 7603
C_API Model Loaded: 9336
Model Released: 7658
Initial: 7658
C_API Model Loaded: 11329
Model Released: 9681
Initial: 9681
C_API Model Loaded: 11372
Model Released: 9724
Initial: 9724
C_API Model Loaded: 13395
Model Released: 11748
Initial: 11748
C_API Model Loaded: 13476
Model Released: 11828
I am at a loss on how to figure this. Is it something about Go that I am missing?
Minimal Reproducible Example:
Attaching a minimal reproducible example. Since this example cannot be a simple snippet of code, have linked it to my git repo.