Latest version training crashes #10137

ganlei333 · 2024-03-21T11:20:39Z

My environment is Python 3.9 established through Conda. The default installation for XGBoost is 1.7.0, so there is no problem training with this version. I can train normally, but when I upgraded XGBoost to 2.0.3 and used the same code and data for training, I encountered a crash. And 100% reproduce the crash. My hardware is 2697AV4, with 1TB of DDR4-2400T memory and SSD solid-state drive. CPU mode training. The ntthread parameter is set to -1, with a dataset size of 89M and 195 features for binary classification.

My code function is as follows:

def optimize_hyperparameters(X_train, y_train):
    param_dist = {
        'n_estimators': randint(100, 600),
        'learning_rate': uniform(0.01, 0.2),
        'subsample': uniform(0.5, 0.5),
        'max_depth': randint(20, 30),
        'colsample_bytree': uniform(0.5, 0.5),
        'min_child_weight': randint(1, 10)
    }

    scale_pos_weight = len(y_train[y_train == 0]) / len(y_train[y_train == 1])

    xgb_clf = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss',scale_pos_weight=scale_pos_weight,nthread=-1)
    rscv = RandomizedSearchCV(xgb_clf, param_distributions=param_dist, n_iter=500, scoring='roc_auc', cv=3, verbose=2, random_state=42, n_jobs=1)
    rscv.fit(X_train, y_train)

    print(f"Best parameters found: {rscv.best_params_}")
    return rscv.best_estimator_

The text was updated successfully, but these errors were encountered:

ganlei333 · 2024-03-21T11:21:54Z

sklearn ：1.3.0

trivialfis · 2024-03-21T14:41:16Z

Hi, thank you for raising the issue. Could you please provide a reproducible example that I can run on my machine?

ganlei333 · 2024-03-25T01:59:18Z

I have provided the training function, but the data is too large to upload successfully.

trivialfis · 2024-03-25T10:30:38Z

Could you please provide the code in a more complete form so that I can run it with synthesized data?

trivialfis · 2024-03-25T12:59:22Z

In addition, could you please share what you mean by crash? Is it running out of memory or is it running into segfault?

ganlei333 · 2024-03-26T06:34:20Z

Uploading xgb.zip…
The crash was caused by the server automatically restarting

trivialfis · 2024-03-26T19:32:17Z

Feel free to close the issue if this is not related to XGBoost itself.

ganlei333 · 2024-03-27T08:52:50Z

这个
I have tested that there is no problem with training XGBoost 2.0.3 in the py3.10 environment of the Windows 2019 operating system, while in the py3.11 environment of the debian12 operating system of Linux, as long as the training system is trained, it will immediately restart (the data cleaning part is normal, and once the training logic is entered, the system restart will be triggered). And the operating system environment is very clean. It's not a CPU temperature issue either.

trivialfis · 2024-03-27T10:54:29Z

By restart, do you mean the login session restart, or the whole OS got restarted?

ganlei333 · 2024-03-28T02:22:38Z

Restart the entire operating system

trivialfis · 2024-03-28T08:34:47Z

Then that's beyond me. The OS has bugs, likely in the kernel, you may upgrade your OS, I don't think this is related to XGBoost.

trivialfis · 2024-03-30T09:22:13Z

Feel free to reopen it if you need further information.

trivialfis closed this as completed Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest version training crashes #10137

Latest version training crashes #10137

ganlei333 commented Mar 21, 2024 •

edited by trivialfis

ganlei333 commented Mar 21, 2024

trivialfis commented Mar 21, 2024

ganlei333 commented Mar 25, 2024

trivialfis commented Mar 25, 2024

trivialfis commented Mar 25, 2024

ganlei333 commented Mar 26, 2024

trivialfis commented Mar 26, 2024

ganlei333 commented Mar 27, 2024

trivialfis commented Mar 27, 2024

ganlei333 commented Mar 28, 2024

trivialfis commented Mar 28, 2024

trivialfis commented Mar 30, 2024

Latest version training crashes #10137

Latest version training crashes #10137

Comments

ganlei333 commented Mar 21, 2024 • edited by trivialfis

ganlei333 commented Mar 21, 2024

trivialfis commented Mar 21, 2024

ganlei333 commented Mar 25, 2024

trivialfis commented Mar 25, 2024

trivialfis commented Mar 25, 2024

ganlei333 commented Mar 26, 2024

trivialfis commented Mar 26, 2024

ganlei333 commented Mar 27, 2024

trivialfis commented Mar 27, 2024

ganlei333 commented Mar 28, 2024

trivialfis commented Mar 28, 2024

trivialfis commented Mar 30, 2024

ganlei333 commented Mar 21, 2024 •

edited by trivialfis