[new feature] capture information from training dynamics #974

jwmueller · 2024-01-31T05:13:52Z

Goal: create a new module (for now say: cleanlab.experimental.training_dynamics) that allows users to provide model outputs/info at every iteration (aka checkpoint) of an iteratively trained model (eg. neural network).

Useful things to record at each checkpoint include:

prediction loss for each training datapoint
predicted probabilities for each training datapoint

The code to capture these should be a simple logger that users can easily integrate with arbitrary iterative ML models (huggingface, pytorch, jax, keras, xgboost, ...).

Once these values are captured, they can be used in various functions, such as ones implementing the methods from these papers:

TRIAGE: Characterizing and auditing training data for improved regression

Learning from Training Dynamics: Identifying Mislabeled Data beyond Manually Designed Features

Identifying Mislabeled Data using the Area Under the Margin Ranking

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

The text was updated successfully, but these errors were encountered:

jwmueller added enhancement New feature or request help-wanted We need your help to add this, but it may be more challenging than a "good first issue" labels Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new feature] capture information from training dynamics #974

[new feature] capture information from training dynamics #974

jwmueller commented Jan 31, 2024 •

edited

[new feature] capture information from training dynamics #974

[new feature] capture information from training dynamics #974

Comments

jwmueller commented Jan 31, 2024 • edited

jwmueller commented Jan 31, 2024 •

edited