Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get experiment_id from MLFlow only once instead of each training loop. #3394

Merged
merged 5 commits into from
Sep 9, 2020

Conversation

patrickorlando
Copy link
Contributor

@patrickorlando patrickorlando commented Sep 8, 2020

What does this PR do?

When using the MLFLow logger, the MLFLow client retrieves the experiment id from the MLFlowClient each time logger.experiment is called/accessed. This causes overhead during training and validation loops which is dramatic if the server is remote.

This PR checks whether the experiment_id is already defined (meaning it has already been retrieved) and does not make the call to MLFlow if so.

Fixes #3393

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team September 8, 2020 11:05
@awaelchli awaelchli added bug Something isn't working logger Related to the Loggers v1.0 allowed labels Sep 8, 2020
@codecov
Copy link

codecov bot commented Sep 8, 2020

Codecov Report

Merging #3394 into master will decrease coverage by 2%.
The diff coverage is 100%.

@@           Coverage Diff            @@
##           master   #3394     +/-   ##
========================================
- Coverage      85%     83%     -2%     
========================================
  Files          98     102      +4     
  Lines        8072    9159   +1087     
========================================
+ Hits         6897    7611    +714     
- Misses       1175    1548    +373     

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

pytorch_lightning/loggers/mlflow.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team September 8, 2020 17:30
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify mergify bot requested a review from a team September 8, 2020 22:16
@Borda Borda added the ready PRs ready to be merged label Sep 8, 2020
@awaelchli
Copy link
Member

awaelchli commented Sep 9, 2020

@patrickorlando This test fails on master, could you add it to the file tests/loggers/test_mlflow.py ?

def test_mlflow_experiment_created_once(tmpdir):
    logger = MLFlowLogger('test', save_dir=tmpdir)
    get_experiment_name = logger.experiment.get_experiment_by_name
    with mock.patch.object(MlflowClient, 'get_experiment_by_name', wraps=get_experiment_name) as mocked:
        _ = logger.experiment
        _ = logger.experiment
        _ = logger.experiment
        assert mocked.call_count == 1

Thanks

Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requesting a test to make the bugfix complete, see my comment above

@mergify mergify bot requested a review from a team September 9, 2020 07:20
@patrickorlando
Copy link
Contributor Author

patrickorlando commented Sep 9, 2020

@awaelchli I've added the test case, but I had to change

get_experiment_name = logger.experiment.get_experiment_by_name

to

get_experiment_name = logger._mlflow_client.get_experiment_by_name

It had already been called in the first case before it was mocked and the test was failing.

I've checked that this change still fails on master (commit: d438ad8a8db3e76d3ed4e3c6bc9b91d6b3266b8e)

    def test_mlflow_experiment_id_retrieved_once(tmpdir):
        logger = MLFlowLogger('test', save_dir=tmpdir)
        get_experiment_name = logger._mlflow_client.get_experiment_by_name
        with mock.patch.object(MlflowClient, 'get_experiment_by_name', wraps=get_experiment_name) as mocked:
            _ = logger.experiment
            _ = logger.experiment
            _ = logger.experiment
>           assert mocked.call_count == 1
E           AssertionError: assert 3 == 1
E            +  where 3 = <MagicMock name='get_experiment_by_name' id='140215984767184'>.call_count

tests/loggers/test_mlflow.py:53: AssertionError

@pep8speaks
Copy link

pep8speaks commented Sep 9, 2020

Hello @patrickorlando! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-09 09:13:44 UTC

@awaelchli
Copy link
Member

oh I see, because I had it added to the top of the file. But this way is better 👍

@mergify
Copy link
Contributor

mergify bot commented Sep 9, 2020

This pull request is now in conflict... :(

@Borda Borda merged commit 656c1af into Lightning-AI:master Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logger Related to the Loggers ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MLFlow Logger slows training steps dramatically, despite only setting metrics to be logged on epoch
5 participants