Skip to content

Commit

Permalink
link to define preprocess
Browse files Browse the repository at this point in the history
  • Loading branch information
jwmueller committed Jun 20, 2024
1 parent 05ed54c commit 4a32bd9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/tutorials/improving_ml_performance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"\n",
"Here's how we recommend handling noisy training and test data (this tutorial walks through these steps):\n",
"\n",
"1. Preprocess your training and test data. Use cleanlab to check for issues in the merged dataset like train/test leakage or drift.\n",
"1. [Preprocess](https://towardsdatascience.com/introduction-to-data-preprocessing-in-machine-learning-a9fa83a5dc9d) your training and test data to be suitable for ML. Use cleanlab to check for issues in the merged dataset like train/test leakage or drift.\n",
"2. Fit your ML model to your noisy training data and get its predictions/embeddings for your test data. Use these model outputs with cleanlab to detect issues in your **test** data.\n",
"3. Manually review/correct cleanlab-detected issues in your test data. To avoid bias, **we caution against automated correction of test data**. Test data changes should be individually verified to ensure they will lead to more accurate model evaluation. We also caution against comparing the performance of different ML models across different versions of your test data; performance comparions between models should be based on the same test data.\n",
"4. Cross-validate a new copy of your ML model on your training data, and then use it with cleanlab to detect issues in the **training** dataset. Do not include test data in any part of this step to avoid leaking test set information into the training data curation.\n",
Expand Down

0 comments on commit 4a32bd9

Please sign in to comment.