Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tabular] Parallel Model Training #4215

Open
Innixma opened this issue May 21, 2024 · 1 comment
Open

[tabular] Parallel Model Training #4215

Innixma opened this issue May 21, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request feature: distributed Related to Distributed AutoGluon module: tabular
Milestone

Comments

@Innixma
Copy link
Contributor

Innixma commented May 21, 2024

Related: #4213

We should add support for parallel model training via ray that goes beyond the currently implemented parallel fold model fitting and parallel HPO.

Given a portfolio of 100 models, we should be able to train multiple of them at the same time if we have sufficient resources.

This should also be extended to work with a distributed cluster.

Pseudocode Example

Current logic:

for model in portfolio:
    models.append(fit_model(model))

Code that does this in mainline:

for i, model in enumerate(models):
if isinstance(model, str):
model = self.load_model(model)
elif self.low_memory:
model = copy.deepcopy(model)
if hyperparameter_tune_kwargs is not None and isinstance(hyperparameter_tune_kwargs, dict):
hyperparameter_tune_kwargs_model = hyperparameter_tune_kwargs.get(model.name, None)
else:
hyperparameter_tune_kwargs_model = None
# TODO: Only update scores when finished, only update model as part of final models if finished!
if time_split:
time_left = time_limit_model_split
else:
if time_limit is None:
time_left = None
else:
time_start_model = time.time()
time_left = time_limit - (time_start_model - time_start)
model_name_trained_lst = self._train_single_full(
X, y, model, time_limit=time_left, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs_model, **kwargs
)
if self.low_memory:
del model
models_valid += model_name_trained_lst
return models_valid

Proposed logic:

# Need smart logic to determine how to schedule and how much resources to give each model
models = ray_parallel(portfolio)
@jmakov
Copy link

jmakov commented Jun 12, 2024

Ray Tune offers distributed HPO, also Optuna. There's also https://github.com/fugue-project/fugue which offers abstract interface to support different distributed platforms (Dask, Ray, Spark etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature: distributed Related to Distributed AutoGluon module: tabular
Projects
None yet
Development

No branches or pull requests

3 participants