link to define preprocess

cleanlab · Jun 20, 2024 · 4a32bd9 · 4a32bd9
1 parent 05ed54c
commit 4a32bd9
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/source/tutorials/improving_ml_performance.ipynb b/docs/source/tutorials/improving_ml_performance.ipynb
@@ -13,7 +13,7 @@
     "\n",
     "Here's how we recommend handling noisy training and test data (this tutorial walks through these steps):\n",
     "\n",
-    "1. Preprocess your training and test data. Use cleanlab to check for issues in the merged dataset like train/test leakage or drift.\n",
+    "1. [Preprocess](https://towardsdatascience.com/introduction-to-data-preprocessing-in-machine-learning-a9fa83a5dc9d) your training and test data to be suitable for ML. Use cleanlab to check for issues in the merged dataset like train/test leakage or drift.\n",
     "2. Fit your ML model to your noisy training data and get its predictions/embeddings for your test data. Use these model outputs with cleanlab to detect issues in your **test** data.\n",
     "3. Manually review/correct cleanlab-detected issues in your test data. To avoid bias, **we caution against automated correction of test data**. Test data changes should be individually verified to ensure they will lead to more accurate model evaluation. We also caution against comparing the performance of different ML models across different versions of your test data; performance comparions between models should be based on the same test data.\n",
     "4. Cross-validate a new copy of your ML model on your training data, and then use it with cleanlab to detect issues in the **training** dataset. Do not include test data in any part of this step to avoid leaking test set information into the training data curation.\n",