Skip to content

Commit

Permalink
test data changes confusion
Browse files Browse the repository at this point in the history
  • Loading branch information
jwmueller committed Jun 20, 2024
1 parent cf8408c commit 8b146f7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/tutorials/improving_ml_performance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"\n",
"1. [Preprocess](https://towardsdatascience.com/introduction-to-data-preprocessing-in-machine-learning-a9fa83a5dc9d) your training and test data to be suitable for ML. Use cleanlab to check for issues in the merged dataset like train/test leakage or drift.\n",
"2. Fit your ML model to your noisy training data and get its predictions/embeddings for your test data. Use these model outputs with cleanlab to detect issues in your **test** data.\n",
"3. Manually review/correct cleanlab-detected issues in your test data. **We caution against blindly automated correction of test data**. Test data changes should be carefully verified to ensure they will lead to more accurate model evaluation. We also caution against comparing the performance of different ML models across different versions of your test data; performance comparions between models should be based on the same test data.\n",
"3. Manually review/correct cleanlab-detected issues in your test data. **We caution against blindly automated correction of test data**. Changes to your test set should be carefully verified to ensure they will lead to more accurate model evaluation. We also caution against comparing the performance of different ML models across different versions of your test data; performance comparions between models should be based on the same test data.\n",
"4. Cross-validate a new copy of your ML model on your training data, and then use it with cleanlab to detect issues in the **training** dataset. Do not include test data in any part of this step to avoid leaking test set information into the training data curation.\n",
"5. You can try **automated techniques** to curate your training data based on cleanlab results, train models on the curated training data, and evaluate them on the cleaned test data.\n",
"\n",
Expand Down

0 comments on commit 8b146f7

Please sign in to comment.