Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost - hist + learning_rate decay memory usage #3579

Closed
dev7mt opened this issue Aug 10, 2018 · 12 comments · Fixed by #5153
Closed

XGBoost - hist + learning_rate decay memory usage #3579

dev7mt opened this issue Aug 10, 2018 · 12 comments · Fixed by #5153

Comments

@dev7mt
Copy link

dev7mt commented Aug 10, 2018

Hey,

I have been trying to implement in my project eta_decay that's quite specific to my needs, but I kept running into OutOfMemory errors. After a bit of digging, I've found out, that setting learning rate, while using the "hist" tree_method seems to cause the same issue. It led me to believe that the callback itself is not the problem here.

I have tested this issue on multiple environments (two different setups of Ubuntu - on premise and cloud and macOS), and it always produced similar errors.

The code below should reproduce the issue:

import numpy as np
import xgboost as xgb
from psutil import virtual_memory as vm
import matplotlib.pyplot as plt

def get_used_memory():
    MEM = vm()
    return MEM.used / (1024 ** 3)

def generate_data():
    y = np.random.gamma(2, 4, OBS)
    X = np.random.normal(5, 2, [OBS, FEATURES])
    return X, y

def check_memory_callback(MEMORY_HISTORY):
    def callback(env):
        state = f"[{env.iteration}]/[{env.end_iteration}]"
        memory = f"Used: {get_used_memory()}"
        MEMORY_HISTORY.append(get_used_memory())

    return callback

MAX_ITER = 10
ETA_BASE = 0.3
ETA_MIN = 0.1
ETA_DECAY = np.linspace(ETA_BASE, ETA_MIN, MAX_ITER).tolist()
OBS = 10 ** 6
FEATURES = 20
PARAMS = {
    'eta': ETA_BASE,
    "tree_method": "hist",
    "booster": "gbtree",
    "silient": 0,
}
NO_DECAY_HISTORY = []
DECAY_HISTORY = []
DECAY_APPROX_HISTORY = []

X_train, y_train = generate_data()
X_test, y_test = generate_data()
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
evals_result = {}

model1 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(NO_DECAY_HISTORY)]
)

model2 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(DECAY_HISTORY)],
    learning_rates=ETA_DECAY
)

model3 = xgb.train(
    maximize=True,
    params={'eta': ETA_BASE, "tree_method": "approx", "booster": "gbtree", "silient": 0},
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(DECAY_APPROX_HISTORY)],
    learning_rates=ETA_DECAY
)

plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), NO_DECAY_HISTORY, label="no decay", color="green")
plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), DECAY_HISTORY, label="with decay", color="red")
plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), DECAY_APPROX_HISTORY, label="with approx and decay", color="blue")
plt.title("XGBoost - Memory usage over iterations")
plt.legend()
plt.ylabel("System memory GB used")
plt.xlabel("Iteration")
plt.show()

Attached plot from my run of the code above.
image

I did no digging into the underlying code (cpp), but a memory leakage seems plausible.
As I understand this is not the desired behaviour, but maybe this method requires such amounts of memory.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 10, 2018

Is this problem confined to tree_method=hist? Did you try exact or approx?

@dev7mt
Copy link
Author

dev7mt commented Aug 10, 2018

I tried using the approx method, it works fine then. Although the results are worse and training takes more time. As mentioned in the code above (+ blue line on the plot):

model3 = xgb.train(
    params={'eta': ETA_BASE, "tree_method": "approx", "booster": "gbtree", "silient": 0},
    [...]
    learning_rates=ETA_DECAY
)

I did not try the exact method.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 10, 2018

Memory leakage may be probable. Let me look at it after 0.80 release.

@Denisevi4
Copy link

I've had this issue before. I don't know exactly what is happening, but I found a workaround.

While studying it I found that the learning_rates parameter in xgb.train actually calls a reset_learning_rate Callback. Then I tried using other custom Callbacks and I saw this memory leak as well. It looks as if once you call any Callback other than the print Callback, it causes the tree to re-initialize at every iteration.

My workaround was to add a "learning_rate_schedule" dmc parameter and then set the new learning rate in the at the beginning of each iteration. It involved quite a bit of modification of the c++ code. Also, I saw this problem in gpu_hist as well. So, I edited the cuda code too. In the end my solution resets the learning rate without Callbacks. And it works.

@kretes
Copy link

kretes commented Sep 6, 2018

@hcho3 0.80 is released, did you have a chance to look at this leakage?

@Denisevi4 can you share the code for that?

@trivialfis
Copy link
Member

trivialfis commented Oct 15, 2018

@Denisevi4 For the CUDA gpu_hist, did you find usual unusual memory usage in GPU memory, or just CPU memory? I'm currently spending time on gpu-hist, see if I can dig something out.

@hcho3
Copy link
Collaborator

hcho3 commented Oct 17, 2018

@dev7mt @Denisevi4 @kretes @trivialfis I think I found the cause of the memory leak. When the learning rate decay is enabled, FastHistMaker::Init() is called every iteration, where it should have been called only in the first iteration. The initialization function FastHistMaker::Init() allocates new objects, hence the rising memory usage over time.

I'll try to come up with a fix so that FastHistMaker::Init() is called only once.

@hcho3
Copy link
Collaborator

hcho3 commented Oct 17, 2018

Here is a snippet of diagnostic logs I injected.

Learning rate decay enabled:

xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[0]     test-rmse:7.7284
xgboost/src/c_api/c_api.cc:869: XGBoosterSetParam(): name = learning_rate, value = 0.2777777777777778
Tree method is selected to be 'hist', which uses a single updater grow_fast_histmaker.
xgboost/src/tree/updater_fast_hist.cc:50: FastHistMaker::Init()
xgboost/src/tree/updater_prune.cc:24: TreePruner()
xgboost/src/tree/updater_fast_hist.cc:72: FastHistMaker::Update(): is_gmat_initialized_ = false
xgboost/src/common/hist_util.cc:127: GHistIndexMatrix::Init()
xgboost/src/tree/../common/column_matrix.h:72: ColumnMatrix::Init()
xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[1]     test-rmse:6.82093
xgboost/src/c_api/c_api.cc:869: XGBoosterSetParam(): name = learning_rate, value = 0.25555555555555554
Tree method is selected to be 'hist', which uses a single updater grow_fast_histmaker.
xgboost/src/tree/updater_fast_hist.cc:50: FastHistMaker::Init()
xgboost/src/tree/updater_fast_hist.cc:72: FastHistMaker::Update(): is_gmat_initialized_ = false
xgboost/src/common/hist_util.cc:127: GHistIndexMatrix::Init()
xgboost/src/tree/../common/column_matrix.h:72: ColumnMatrix::Init()
xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6

Learning rate decay disabled

xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[0]     test-rmse:7.72278
xgboost/src/tree/updater_fast_hist.cc:72: FastHistMaker::Update(): is_gmat_initialized_ = true
xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[1]     test-rmse:6.75087

@hcho3
Copy link
Collaborator

hcho3 commented Oct 17, 2018

Diagnosis The learning rate decay callback function calls XGBoosterSetParam() to update the learning rate. The XGBoosterSetParam() function in turn calls Learner::Configure(), which re-initializes each tree updater (calling FastHistMaker::Init()). The FastHistMaker updater maintains extra objects that are meant to be recycled across iterations, and re-initialization wastes memory by duplicating those internal objects.

hcho3 added a commit to hcho3/xgboost that referenced this issue Oct 17, 2018
**Diagnosis** The learning rate callback function calls `XGBoosterSetParam()`
to update the learning rate. The `XGBoosterSetParam()` function in turn calls
`Learner::Configure()`, which resets and re-initializes each tree updater,
calling `FastHistMaker::Init()`. The `FastHistMaker::Init()` function in turn
re-allocates internal objects that were meant to be recycled across iterations.
Thus memory usage increases over time.

**Fix** The learning rate callback should call a new function
`XGBoosterUpdateParamInPlace()`. The new function is designed so that no object
is re-allocated.
@hcho3
Copy link
Collaborator

hcho3 commented Oct 17, 2018

@dev7mt @Denisevi4 @kretes @trivialfis Fix is available at #3803.

hcho3 added a commit to hcho3/xgboost that referenced this issue Oct 18, 2018
**Diagnosis** The learning rate callback function calls `XGBoosterSetParam()`
to update the learning rate. The `XGBoosterSetParam()` function in turn calls
`Learner::Configure()`, which resets and re-initializes each tree updater,
calling `FastHistMaker::Init()`. The `FastHistMaker::Init()` function in turn
re-allocates internal objects that were meant to be recycled across iterations.
Thus memory usage increases over time.

**Fix** The learning rate callback should call a new function
`XGBoosterUpdateParamInPlace()`. The new function is designed so that no object
is re-allocated.
@hcho3
Copy link
Collaborator

hcho3 commented Oct 22, 2018

@dev7mt @Denisevi4 @kretes The next upcoming release (version 0.81) will not include a fix for the memory leak issue. The reason is that the fix is only temporary, adds a lot of maintenance burden, and it will be supplanted by a future code re-factor. For now, you should use approx and exact when using learning rate decay. Alternatively, checkout the eta_decay_memleak branch from my fork.

@trivialfis
Copy link
Member

@hcho3

The FastHistMaker updater maintains extra objects that are meant to be recycled across iterations, and re-initialization wastes memory by duplicating those internal objects.

Could you be more specific about which object? I'm trying to do parameter update, may just fix this on the way...

@trivialfis trivialfis self-assigned this Dec 19, 2019
trivialfis added a commit to trivialfis/xgboost that referenced this issue Dec 23, 2019
trivialfis added a commit that referenced this issue Dec 31, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants