Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] UI Crash - Unterminated string in JSON at position 5000 for mlflow.log-model.history #12032

Open
3 of 23 tasks
Piotr45 opened this issue May 17, 2024 · 4 comments
Open
3 of 23 tasks
Labels
area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server bug Something isn't working

Comments

@Piotr45
Copy link

Piotr45 commented May 17, 2024

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

  • Client: 2.10.0
  • Tracking server: 2.12.2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
  • Python version: 3.8.10

Describe the problem

After running trainings using the new version of MLflow we are encountering crashes in MLflow UI.

The issue we encountered results in displaying "Something went wrong", on the experiment screen e.g. https://mlflow.server/#/experiments/7. In developer tools console we can observe, that MLflow encountered:

Unterminated string in JSON at position 5000

We were able to workaround this issue by deleting mlflow.log-model.history tag from the db.

DELETE FROM tags WHERE `key` = 'mlflow.log-model.history';

When the tags are deleted the issue disappears, but appears again after another training.

As far as we can tell, in our case models are usually logged several times whenever a metric is improved. And every time a model is logged, so is its history under the mlflow.log-model.history tag. When a model is logged multiple times in a single run, MLflow logs its current history concatenated with its previous history by loading the mlflow.log-model.history tag.

Example tag:

"tags": [
          {
            "key": "mlflow.log-model.history",
            "value": "[{\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:50:29.067285\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"c6db59cf6d1d460b9b1d079b0fe6a485\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:52:52.823115\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"aecad7c539564511951d34ca30077088\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:55:16.083233\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"79062e837db845619a153397e69c095c\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 10:57:39.337207\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"9454b0054c8d4c0389db6602d1f3d837\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:00:02.548966\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"d736e22088af4dfc9e808365eca26910\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:02:25.823720\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"205d35b4f0424325a015a2b9a61a2797\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:04:49.111657\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"96e6702de31d4dd08d0bc0aac282fa09\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:09:35.769272\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"9417e00fca8444239cbd8bd4b847c699\", \"mlflow_version\": \"2.4.1\"}, {\"run_id\": \"99c05a5a02514602ab518b369eb5893a\", \"artifact_path\": \"model_abcd\", \"utc_time_created\": \"2024-04-11 11:11:59.223981\", \"flavors\": {\"pytorch\": {\"model_data\": \"data\", \"pytorch_version\": \"2.0.1+cu117\", \"code\": null}, \"python_function\": {\"pickle_module_name\": \"mlflow.pytorch.pickle_module\", \"loader_module\": \"mlflow.pytorch\", \"python_version\": \"3.8.10\", \"data\": \"data\", \"env\": {\"conda\": \"conda.yaml\", \"virtualenv\": \"python_env.yaml\"}}}, \"model_uuid\": \"7a73c28cbab14560bbe03e0cdddf3984\", \"mlflow_version\": \"2.4.1\"}]"
          },
          ...
]

Tracking information

Code to reproduce issue

Stack trace

SyntaxError: Unterminated string in JSON at position 5000 (line 1 column 5001)
    at JSON.parse (<anonymous>)
    at E.getLoggedModelsFromTags (Utils.tsx:945:27)
    at experimentPage.row-utils.ts:339:27
    at Array.map (<anonymous>)
    at m (experimentPage.row-utils.ts:277:39)
    at experimentPage.row-utils.ts:437:7
    at Object.Xa [as useMemo] (react-dom.production.min.js:179:119)
    at t.useMemo (react.production.min.js:25:208)
    at g (experimentPage.row-utils.ts:435:10)
    at ExperimentViewRuns.tsx:233:31

Other info / logs

image

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@Piotr45 Piotr45 added the bug Something isn't working label May 17, 2024
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server labels May 17, 2024
@serena-ruan
Copy link
Collaborator

@daniellok-db Are you aware of this issue? Should the latest UI sync fix this problem?
I checked in codebase, the logic to update "mlflow.log-model.history" tag is super old and we didn't update it.

@daniellok-db
Copy link
Collaborator

ah... i think what's happening here is that we're trying to store JSONified history in this tag value, but tag values have a max length of 5000 characters:

value = Column(String(5000), nullable=True)

Therefore the JSON gets truncated and becomes invalid. I think as a quick fix we can catch the error and treat it as though there is no tag. It's not ideal but probably better than crashing the UI. Unfortunately I don't have a ton of context about this feature, so I'm not sure what a better solution would look like here.

@serena-ruan
Copy link
Collaborator

Thanks for the investigation! Let's discuss internally tomorrow about whether we need to increase the tag value size or hide such long tags.

Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants