joblib "Broken Pipe" using scikit-learn grid-search crossfold validation after importing Comet ML #543

DFuller134 · 2024-04-19T18:14:59Z

Describe the Bug

After importing comet_ml a scikit-learn-based training script fails during sklearn grid search cross-validation: "broken pipe" exception in joblib. Works fine without import of comet_ml.

Expected behavior

Training script should execute to completion with import of comet_ml

Where is the issue?

Third Party Integrations (scikit-learn). Stack trace indicates calls into comet_ml monkey-patching.

To Reproduce

Steps to reproduce the behavior:

import comet_ml
instantiate a Comet ML experiment
some exp.log... statements
instantiate scikit-learn GridSearchCV and fit to initiate

Stack Trace

Fitting 5 folds for each of 36 candidates, totalling 180 fits
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
    r = call_item()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 598, in __call__
    return [func(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 598, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 129, in __call__
    return self.function(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 949, in _fit_and_score
    print(end_msg)
BrokenPipeError: [Errno 32] Broken pipe
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 994, in <module>
    main()
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 972, in main
    run_experiment(
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 771, in run_experiment
    pred_df, y_test, best_grid_rgr, X_train, X, y = run_xgb(df, pre_pipe, post_pipe, params)
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 740, in run_xgb
    grid = grid.fit(X_train, y_train)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/comet_ml/monkey_patching.py", line 316, in wrapper
    raise exception_raised
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/comet_ml/monkey_patching.py", line 287, in wrapper
    return_value = original(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 970, in fit
    self._run_search(evaluate_candidates)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 1527, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 916, in evaluate_candidates
    out = parallel(
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 67, in __call__
    return super().__call__(iterable_with_config)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 2007, in __call__
    return output if self.return_generator else list(output)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1650, in _get_outputs
    yield from self._retrieve()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1754, in _retrieve
    self._raise_error_fast()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 745, in get_result
    return self._return_or_raise()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 763, in _return_or_raise
    raise self._result
BrokenPipeError: [Errno 32] Broken pipe
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     name                  : outside_cheese_2516

Comet Debug Log

comet.log

Screenshots or GIFs

N/A

Additional context (code fragment - fails on grid.fit)

    # Instantiate & Fit Grid Search Object
    grid = GridSearchCV(rgr, params, cv=5, n_jobs=-1, scoring=scoring, verbose=5)
    grid = grid.fit(X_train, y_train)

The text was updated successfully, but these errors were encountered:

dsblank · 2024-04-22T16:20:38Z

Looking through your log (search for "Traceback") I see this issue:

comet_ml.vendor.nvidia_ml.pynvml.NVMLError_NotSupported: Not Supported

but that shouldn't cause any issues. I also see:

[[13.3],
       [33.9],
       [54.5],
       [75.1],
       [95.7]]
ValueError: can only convert an array of size 1 to a Python scalar

which could be a Comet bug.

Also:

ModuleNotFoundError: No module named 'graphviz'

Pip install graphviz (or another dot package) to see if that helps.

DFuller134 · 2024-04-22T20:59:54Z

I addressed each of these issues except the NVML error (related to GPU drivers likely needed to log GPU metrics). The ValueError cleared up when I set COMET_DISABLE_AUTO_LOGGING=1. I also installed graphviz to clear up that issue.

I would agree that the ValueError seems like a CometML bug.

dsblank · 2024-04-24T11:13:25Z

@DFuller134 thanks for your update! I'll pass on the details of the NVMLError_NotSupported error to our engineering team.

dsblank · 2024-04-24T12:07:19Z

Do you know what [[33.9], [54.5], [75.1], [95.7]] is? If you are trying to log a parameter (or step or epoch) value, it can't be a list of values. I believe that these are the only places that this error could come from.

dsblank added the question label Apr 22, 2024

dsblank added the bug label Apr 24, 2024

dsblank closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

joblib "Broken Pipe" using scikit-learn grid-search crossfold validation after importing Comet ML #543

joblib "Broken Pipe" using scikit-learn grid-search crossfold validation after importing Comet ML #543

DFuller134 commented Apr 19, 2024

dsblank commented Apr 22, 2024

DFuller134 commented Apr 22, 2024

dsblank commented Apr 24, 2024

dsblank commented Apr 24, 2024

joblib "Broken Pipe" using scikit-learn grid-search crossfold validation after importing Comet ML #543

joblib "Broken Pipe" using scikit-learn grid-search crossfold validation after importing Comet ML #543

Comments

DFuller134 commented Apr 19, 2024

Describe the Bug

Expected behavior

Where is the issue?

To Reproduce

Stack Trace

Comet Debug Log

Screenshots or GIFs

Additional context (code fragment - fails on grid.fit)

dsblank commented Apr 22, 2024

DFuller134 commented Apr 22, 2024

dsblank commented Apr 24, 2024

dsblank commented Apr 24, 2024