dask · npk7 · Jun 8, 2024 · Jun 8, 2024 · Jun 9, 2024 · Jun 9, 2024
diff --git a/docs/source/cross_validation.rst b/docs/source/cross_validation.rst
@@ -24,6 +24,47 @@ The interface for splitting Dask arrays is the same as scikit-learn's version.
 
    X_train.compute()[:3]
 
+Here is another illustration of performing k-fold cross validation purely in Dask. Here a link to gather more information on k-fold cross validation :func:`https://ml.dask.org/modules/generated/dask_ml.model_selection.KFold.html`:
+
+.. ipython:: python
+
+   import dask.array as da
+   from dask_ml.model_selection import KFold
+   from dask_ml.datasets import make_regression
+   from dask_ml.linear_model import LinearRegression
+   from statistics import mean 
+
+   X, y = make_regression(n_samples=200, # choosing number of observations
+				 n_features=5, # number of features
+				 random_state=0, # random seed
+				 chunks=20) # partitions to be made 
+
+   train_scores: list[int] = []
+   test_scores: list[int] = []
+
+   model = LinearRegression()
+
+The Dask kFold method splits the data into k consecutive subsets of data. Here we specify k to be 5, hence, 5-fold cross validation
+
+.. ipython:: python
+   kf = KFold(n_splits=5)
+
+   for i, j in kf.split(X):
+      X_train, X_test = X[i], X[j]
+      y_train, y_test = y[i], y[j]
+
+      model.fit(X_train, y_train)
+
+      train_score = model.score(X_train, y_train)
+      test_score = model.score(X_test, y_test)
+
+      train_scores.append(train_score)
+      test_scores.append(test_score)
+
+   print("mean training score:", mean(train_scores))
+   print("mean testing score:", mean(train_scores))
+
+
 
 While it's possible to pass dask arrays to :func:`sklearn.model_selection.train_test_split`, we recommend
 using the Dask version for performance reasons: the Dask version is faster