[new feature] capture information from training dynamics #974
Labels
enhancement
New feature or request
help-wanted
We need your help to add this, but it may be more challenging than a "good first issue"
Goal: create a new module (for now say:
cleanlab.experimental.training_dynamics
) that allows users to provide model outputs/info at every iteration (aka checkpoint) of an iteratively trained model (eg. neural network).Useful things to record at each checkpoint include:
The code to capture these should be a simple logger that users can easily integrate with arbitrary iterative ML models (huggingface, pytorch, jax, keras, xgboost, ...).
Once these values are captured, they can be used in various functions, such as ones implementing the methods from these papers:
TRIAGE: Characterizing and auditing training data for improved regression
Learning from Training Dynamics: Identifying Mislabeled Data beyond Manually Designed Features
Identifying Mislabeled Data using the Area Under the Margin Ranking
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
The text was updated successfully, but these errors were encountered: