[BUG] UI Crash - Unterminated string in JSON at position 5000 for mlflow.log-model.history #12032

Piotr45 · 2024-05-17T13:32:58Z

Issues Policy acknowledgement

I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

Client: 2.10.0
Tracking server: 2.12.2

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
Python version: 3.8.10

Describe the problem

After running trainings using the new version of MLflow we are encountering crashes in MLflow UI.

The issue we encountered results in displaying "Something went wrong", on the experiment screen e.g. https://mlflow.server/#/experiments/7. In developer tools console we can observe, that MLflow encountered:

Unterminated string in JSON at position 5000

We were able to workaround this issue by deleting mlflow.log-model.history tag from the db.

DELETE FROM tags WHERE `key` = 'mlflow.log-model.history';

When the tags are deleted the issue disappears, but appears again after another training.

As far as we can tell, in our case models are usually logged several times whenever a metric is improved. And every time a model is logged, so is its history under the mlflow.log-model.history tag. When a model is logged multiple times in a single run, MLflow logs its current history concatenated with its previous history by loading the mlflow.log-model.history tag.

Example tag:

"tags": [
          {
            "key": "mlflow.log-model.history",
            "value": "[{\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:50:29.067285\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"c6db59cf6d1d460b9b1d079b0fe6a485\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:52:52.823115\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"aecad7c539564511951d34ca30077088\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:55:16.083233\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"79062e837db845619a153397e69c095c\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:57:39.337207\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"9454b0054c8d4c0389db6602d1f3d837\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:00:02.548966\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"d736e22088af4dfc9e808365eca26910\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:02:25.823720\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"205d35b4f0424325a015a2b9a61a2797\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:04:49.111657\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"96e6702de31d4dd08d0bc0aac282fa09\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:09:35.769272\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"9417e00fca8444239cbd8bd4b847c699\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:11:59.223981\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"7a73c28cbab14560bbe03e0cdddf3984\", \"mlflow_version\": \"2.4.1\"}]"
          },
          ...
]

Tracking information

Code to reproduce issue

Stack trace

SyntaxError: Unterminated string in JSON at position 5000 (line 1 column 5001)
    at JSON.parse (<anonymous>)
    at E.getLoggedModelsFromTags (Utils.tsx:945:27)
    at experimentPage.row-utils.ts:339:27
    at Array.map (<anonymous>)
    at m (experimentPage.row-utils.ts:277:39)
    at experimentPage.row-utils.ts:437:7
    at Object.Xa [as useMemo] (react-dom.production.min.js:179:119)
    at t.useMemo (react.production.min.js:25:208)
    at g (experimentPage.row-utils.ts:435:10)
    at ExperimentViewRuns.tsx:233:31

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

serena-ruan · 2024-05-20T09:37:09Z

@daniellok-db Are you aware of this issue? Should the latest UI sync fix this problem?
I checked in codebase, the logic to update "mlflow.log-model.history" tag is super old and we didn't update it.

daniellok-db · 2024-05-20T09:56:07Z

ah... i think what's happening here is that we're trying to store JSONified history in this tag value, but tag values have a max length of 5000 characters:

mlflow/mlflow/store/tracking/dbmodels/models.py

Line 259 in 2e7bff1

value = Column(String(5000), nullable=True)

Therefore the JSON gets truncated and becomes invalid. I think as a quick fix we can catch the error and treat it as though there is no tag. It's not ideal but probably better than crashing the UI. Unfortunately I don't have a ton of context about this feature, so I'm not sure what a better solution would look like here.

serena-ruan · 2024-05-20T10:03:49Z

Thanks for the investigation! Let's discuss internally tomorrow about whether we need to increase the tag value size or hide such long tags.

github-actions · 2024-05-25T00:12:51Z

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Piotr45 added the bug Something isn't working label May 17, 2024

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server labels May 17, 2024

daniellok-db mentioned this issue May 21, 2024

Catch JSON parse errors when deserializing log-model tags #12075

Merged

39 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] UI Crash - Unterminated string in JSON at position 5000 for mlflow.log-model.history #12032

[BUG] UI Crash - Unterminated string in JSON at position 5000 for mlflow.log-model.history #12032

Piotr45 commented May 17, 2024 •

edited

serena-ruan commented May 20, 2024

daniellok-db commented May 20, 2024

serena-ruan commented May 20, 2024

github-actions bot commented May 25, 2024

[BUG] UI Crash - Unterminated string in JSON at position 5000 for mlflow.log-model.history #12032

[BUG] UI Crash - Unterminated string in JSON at position 5000 for mlflow.log-model.history #12032

Comments

Piotr45 commented May 17, 2024 • edited

Issues Policy acknowledgement

Where did you encounter this bug?

Willingness to contribute

MLflow version

System information

Describe the problem

Tracking information

Code to reproduce issue

Stack trace

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

serena-ruan commented May 20, 2024

daniellok-db commented May 20, 2024

serena-ruan commented May 20, 2024

github-actions bot commented May 25, 2024

Piotr45 commented May 17, 2024 •

edited