Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new feature] capture information from training dynamics #974

Open
jwmueller opened this issue Jan 31, 2024 · 0 comments
Open

[new feature] capture information from training dynamics #974

jwmueller opened this issue Jan 31, 2024 · 0 comments
Labels
enhancement New feature or request help-wanted We need your help to add this, but it may be more challenging than a "good first issue"

Comments

@jwmueller
Copy link
Member

jwmueller commented Jan 31, 2024

Goal: create a new module (for now say: cleanlab.experimental.training_dynamics) that allows users to provide model outputs/info at every iteration (aka checkpoint) of an iteratively trained model (eg. neural network).

Useful things to record at each checkpoint include:

  • prediction loss for each training datapoint
  • predicted probabilities for each training datapoint

The code to capture these should be a simple logger that users can easily integrate with arbitrary iterative ML models (huggingface, pytorch, jax, keras, xgboost, ...).

Once these values are captured, they can be used in various functions, such as ones implementing the methods from these papers:

TRIAGE: Characterizing and auditing training data for improved regression

Learning from Training Dynamics: Identifying Mislabeled Data beyond Manually Designed Features

Identifying Mislabeled Data using the Area Under the Margin Ranking

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

@jwmueller jwmueller added enhancement New feature or request help-wanted We need your help to add this, but it may be more challenging than a "good first issue" labels Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help-wanted We need your help to add this, but it may be more challenging than a "good first issue"
Projects
None yet
Development

No branches or pull requests

1 participant