`MLFlowLogger`'s status is "RUNNING" even after training failed #12291

ritsuki1227 · 2022-03-10T14:32:37Z

🐛 Bug

If a trainer with MLFlowLogger raises an error, the user should be able to see the MLflow's screen to check the training has been failed.
MLflow's status remains "RUNNING" even after trainer.fit raises an error in the current implementation, so the user cannot know whether the training is still in progress or failed.

Current behavior when training finished with an error:

Expected behavior:

To Reproduce

class CustomModel(BoringModel):
    def training_step(self, batch, batch_idx):
        super().training_step(batch, batch_idx)
        raise BaseException
trainer = Trainer(logger=MLFlowLogger("test"))
try:
    trainer.fit(CustomModel())
finally:
    print(trainer.logger.experiment.get_run(trainer.logger.run_id).info.status) # This should be 'FAILED'

cc @Borda

The text was updated successfully, but these errors were encountered:

ritsuki1227 mentioned this issue Mar 10, 2022

Set MLFlowLogger status to FAILED when training raises an error #12292

Merged

12 tasks

akihironitta added bug Something isn't working logger: mlflow labels Mar 10, 2022

carmocca added this to the 1.7 milestone Apr 6, 2022

carmocca added feature Is an improvement or enhancement and removed bug Something isn't working labels Apr 6, 2022

carmocca modified the milestones: pl:1.7, pl:future Jul 19, 2022

awaelchli closed this as completed in #12292 Sep 20, 2022

awaelchli modified the milestones: pl:future, pl:1.8 Sep 20, 2022

awaelchli assigned awaelchli and ritsuki1227 Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MLFlowLogger`'s status is "RUNNING" even after training failed #12291

`MLFlowLogger`'s status is "RUNNING" even after training failed #12291

ritsuki1227 commented Mar 10, 2022 •

edited by github-actions bot

MLFlowLogger's status is "RUNNING" even after training failed #12291

MLFlowLogger's status is "RUNNING" even after training failed #12291

Comments

ritsuki1227 commented Mar 10, 2022 • edited by github-actions bot

🐛 Bug

Current behavior when training finished with an error:

Expected behavior:

To Reproduce

`MLFlowLogger`'s status is "RUNNING" even after training failed #12291

`MLFlowLogger`'s status is "RUNNING" even after training failed #12291

ritsuki1227 commented Mar 10, 2022 •

edited by github-actions bot