Skip to content

Mlflow logging LR duplicate key issue with PostgreSQL DB #190 #20865

Open
@anaprietonem

Description

@anaprietonem

Bug description

What happened?
In the anemoi codebase - we use the pytorch lightning MlflowLogger and we hit this error when logging this to a remote server with a postgresql database:

mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "metric_pk" DETAIL: Key (key, "timestamp", step, run_uuid, value, is_nan)=(lr-AdamW, 1741862947817, 60899, 843c1331fefa436bab56485cfc5bc16e, 0.000322870392593084, f) already exists.

https://github.com/ecmwf/anemoi-core/blob/main/training/src/anemoi/training/diagnostics/mlflow/logger.py

versions:

mlflow                             2.22.0
pytorch-lightning           2.5.1

What version are you seeing the problem on?

v2.5

Reproduced in studio

No response

How to reproduce the bug

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageWaiting to be triaged by maintainersver: 2.5.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions