Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow #1835

Open
sungreong opened this issue May 13, 2024 · 0 comments

Comments

@sungreong
Copy link

Environment:

  • Metaflow Version: 2.8.1
  • PyCaret Version: 3.1.0
  • Operating System: Ubuntu 20.04
  • Python Version: 3.8.10

Issue Description:
I am experiencing significant performance degradation when executing a PyCaret model training script with n_jobs=5 inside a Metaflow step, compared to running the script directly in a standalone Python environment. The training process either slows down drastically or seems to hang.

Steps to Reproduce:

Set up a PyCaret environment and configure a model training with n_jobs=5.
Run the training script directly in Python — it executes quickly and efficiently.
Integrate the same script into a Metaflow step and execute. The process slows down significantly.

from pycaret.classification import setup, compare_models

@step
def train(self):
    print('Executing train step')
    from pycaret.datasets import get_data
    dataset = get_data('diabetes')
    
    # Setting up environment in PyCaret
    clf = setup(data=dataset, target='Class variable', silent=True, n_jobs=5)
    
    # Comparing all models
    best_model = compare_models()
    
    self.next(self.evaluate)

@step
def evaluate(self):
    print('Evaluating model')
    self.next(self.end)

@step
def end(self):
    print('Training completed')

Expected Behavior:
The model training inside Metaflow should perform comparably to when it's run in a standalone Python script.

Actual Behavior:
When executed inside Metaflow, the training process is much slower, or hangs indefinitely, particularly when using multiple jobs (n_jobs=5).

Additional Context:

Running the script directly uses all allocated cores efficiently.
When running inside Metaflow, system monitoring tools indicate less efficient use of available CPU resources.
Could this issue be related to how Metaflow handles multiprocessing within a step, or possibly resource allocation conflicts between PyCaret and Metaflow? Any insights or suggestions would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant