Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow #1835

sungreong · 2024-05-13T23:58:49Z

Environment:

Metaflow Version: 2.8.1
PyCaret Version: 3.1.0
Operating System: Ubuntu 20.04
Python Version: 3.8.10

Issue Description:
I am experiencing significant performance degradation when executing a PyCaret model training script with n_jobs=5 inside a Metaflow step, compared to running the script directly in a standalone Python environment. The training process either slows down drastically or seems to hang.

Steps to Reproduce:

Set up a PyCaret environment and configure a model training with n_jobs=5.
Run the training script directly in Python — it executes quickly and efficiently.
Integrate the same script into a Metaflow step and execute. The process slows down significantly.

from pycaret.classification import setup, compare_models

@step
def train(self):
    print('Executing train step')
    from pycaret.datasets import get_data
    dataset = get_data('diabetes')
    
    # Setting up environment in PyCaret
    clf = setup(data=dataset, target='Class variable', silent=True, n_jobs=5)
    
    # Comparing all models
    best_model = compare_models()
    
    self.next(self.evaluate)

@step
def evaluate(self):
    print('Evaluating model')
    self.next(self.end)

@step
def end(self):
    print('Training completed')

Expected Behavior:
The model training inside Metaflow should perform comparably to when it's run in a standalone Python script.

Actual Behavior:
When executed inside Metaflow, the training process is much slower, or hangs indefinitely, particularly when using multiple jobs (n_jobs=5).

Additional Context:

Running the script directly uses all allocated cores efficiently.
When running inside Metaflow, system monitoring tools indicate less efficient use of available CPU resources.
Could this issue be related to how Metaflow handles multiprocessing within a step, or possibly resource allocation conflicts between PyCaret and Metaflow? Any insights or suggestions would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow #1835

Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow #1835

sungreong commented May 13, 2024

Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow #1835

Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow #1835

Comments

sungreong commented May 13, 2024