Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joblib "Broken Pipe" using scikit-learn grid-search crossfold validation after importing Comet ML #543

Closed
DFuller134 opened this issue Apr 19, 2024 · 4 comments

Comments

@DFuller134
Copy link

Describe the Bug

After importing comet_ml a scikit-learn-based training script fails during sklearn grid search cross-validation: "broken pipe" exception in joblib. Works fine without import of comet_ml.

Expected behavior

Training script should execute to completion with import of comet_ml

Where is the issue?

Third Party Integrations (scikit-learn). Stack trace indicates calls into comet_ml monkey-patching.

To Reproduce

Steps to reproduce the behavior:

  1. import comet_ml
  2. instantiate a Comet ML experiment
  3. some exp.log... statements
  4. instantiate scikit-learn GridSearchCV and fit to initiate

Stack Trace

Fitting 5 folds for each of 36 candidates, totalling 180 fits
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
    r = call_item()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 598, in __call__
    return [func(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 598, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 129, in __call__
    return self.function(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 949, in _fit_and_score
    print(end_msg)
BrokenPipeError: [Errno 32] Broken pipe
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 994, in <module>
    main()
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 972, in main
    run_experiment(
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 771, in run_experiment
    pred_df, y_test, best_grid_rgr, X_train, X, y = run_xgb(df, pre_pipe, post_pipe, params)
  File "/home/redacted/projects/redacted/redacted/redacted/redacted-final_model.py", line 740, in run_xgb
    grid = grid.fit(X_train, y_train)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/comet_ml/monkey_patching.py", line 316, in wrapper
    raise exception_raised
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/comet_ml/monkey_patching.py", line 287, in wrapper
    return_value = original(*args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 970, in fit
    self._run_search(evaluate_candidates)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 1527, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 916, in evaluate_candidates
    out = parallel(
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 67, in __call__
    return super().__call__(iterable_with_config)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 2007, in __call__
    return output if self.return_generator else list(output)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1650, in _get_outputs
    yield from self._retrieve()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1754, in _retrieve
    self._raise_error_fast()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 745, in get_result
    return self._return_or_raise()
  File "/home/redacted/projects/redacted/redacted/redacted/venv/lib/python3.10/site-packages/joblib/parallel.py", line 763, in _return_or_raise
    raise self._result
BrokenPipeError: [Errno 32] Broken pipe
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     name                  : outside_cheese_2516

Comet Debug Log

comet.log

Screenshots or GIFs

N/A

Additional context (code fragment - fails on grid.fit)

    # Instantiate & Fit Grid Search Object
    grid = GridSearchCV(rgr, params, cv=5, n_jobs=-1, scoring=scoring, verbose=5)
    grid = grid.fit(X_train, y_train)
@dsblank
Copy link
Collaborator

dsblank commented Apr 22, 2024

Looking through your log (search for "Traceback") I see this issue:

comet_ml.vendor.nvidia_ml.pynvml.NVMLError_NotSupported: Not Supported

but that shouldn't cause any issues. I also see:

[[13.3],
       [33.9],
       [54.5],
       [75.1],
       [95.7]]
ValueError: can only convert an array of size 1 to a Python scalar

which could be a Comet bug.

Also:

ModuleNotFoundError: No module named 'graphviz'

Pip install graphviz (or another dot package) to see if that helps.

@DFuller134
Copy link
Author

I addressed each of these issues except the NVML error (related to GPU drivers likely needed to log GPU metrics). The ValueError cleared up when I set COMET_DISABLE_AUTO_LOGGING=1. I also installed graphviz to clear up that issue.

I would agree that the ValueError seems like a CometML bug.

@dsblank
Copy link
Collaborator

dsblank commented Apr 24, 2024

@DFuller134 thanks for your update! I'll pass on the details of the NVMLError_NotSupported error to our engineering team.

@dsblank dsblank added the bug label Apr 24, 2024
@dsblank
Copy link
Collaborator

dsblank commented Apr 24, 2024

Do you know what [[33.9], [54.5], [75.1], [95.7]] is? If you are trying to log a parameter (or step or epoch) value, it can't be a list of values. I believe that these are the only places that this error could come from.

@dsblank dsblank closed this as completed Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants