-
Notifications
You must be signed in to change notification settings - Fork 3.6k
PR: Fix Duplicate Metric Logging in MLFlowLogger to Prevent MLflow Database Errors #20871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
PR: Fix Duplicate Metric Logging in MLFlowLogger to Prevent MLflow Database Errors #20871
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Add `mlflow` in test requirements
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions. |
will maybe also fix #20902 |
Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com>
for more information, see https://pre-commit.ci
@KAVYANSHTYAGI could you please check failing tests? |
What does this PR do?
This PR fixes a long standing issue in PyTorch Lightning’s MLFlowLogger where logging the same metric (with the same name and step) more than once in a run causes a unique constraint violation on certain MLflow backends (e.g., PostgreSQL).
Now, MLFlowLogger tracks (metric, step) pairs and skips any duplicate metric logs within a run, preventing database errors and improving robustness.
This change also updates the class docstring to document this new behavior and adds a unit test to verify that duplicate metric logs are ignored as expected.
Fixes #20865
Motivation and Context
Dependencies
Does your PR introduce any breaking changes?
Other Checklist Items
Documentation updated- yes(see class docstring in MLFlowLogger)
New test added for deduplication- yes
Fun fact:
This change will help Lightning users avoid subtle training failures, especially with remote or production MLflow tracking servers!
📚 Documentation preview 📚: https://pytorch-lightning--20871.org.readthedocs.build/en/20871/