Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Improve UI stability to corrupt metric files #12030

Closed
1 of 22 tasks
joncarter1 opened this issue May 17, 2024 · 2 comments
Closed
1 of 22 tasks

[FR] Improve UI stability to corrupt metric files #12030

joncarter1 opened this issue May 17, 2024 · 2 comments
Labels
area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server enhancement New feature or request

Comments

@joncarter1
Copy link
Contributor

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

When using a file-system backend store, OS-level issues can sometimes cause corrupt metric files.
This has previously been reported in issues such as e.g.:
#3052

This issue can take down the UI for an entire experiment section, with an error message in the UI such as:
image

It would be very helpful if these sort of issues could be handled gracefully in the UI i.e. not crashing the whole page.

Motivation

What is the use case for this feature?

Avoid crashing the whole UI when one or more file-based metrics are corrupted.

It can be difficult to track down the exact run(s)/file(s) that lead to the issue, and I've found myself building custom scripts to scan for, and fix, corrupt files whenever this occurs.

Why is this use case valuable to support for MLflow users in general?

Has previously been reported as an issue by many members of the community.
#3052
#7932
Current alternative is to manually triage the file-based store to scan and fix corrupt files.

Details

This could also help to identify the problematic runs/metrics e.g. by only failing to plot corrupt metrics for specific runs in the UI, and flashing a message to indicate which ones failed.

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@joncarter1 joncarter1 added the enhancement New feature or request label May 17, 2024
@github-actions github-actions bot added the area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server label May 17, 2024
@serena-ruan
Copy link
Collaborator

Filed PR to finish the left work on last PR, pls feel free to take a look @joncarter1

@serena-ruan
Copy link
Collaborator

Closing the issue as the PR is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants