#### About

> Cross Validation

Cross-validation is a technique used in machine learning to evaluate the performance of a model on unseen data. This involves dividing the available data into subsets (called "folds"), training a model on a subset of the data, and evaluating its performance on the remaining folds. This process is repeated several times, once for each fold as a test set, and the remaining folds are used for training.



In [7]:
import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import precision_recall_fscore_support

In [2]:
iris = load_iris()
X = iris.data
y = iris.target


In [3]:
model = LogisticRegression()


In [4]:
k = 5
kf = KFold(n_splits=k, shuffle=True, random_state=42)

In [5]:
accuracy_scores = []
precision_scores = []
recall_scores = []
f1_scores = []

In [8]:
# Perform k-fold cross-validation
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    # Train the model on the training data
    model.fit(X_train, y_train)
    
    # Evaluate the model on the test data
    accuracy = model.score(X_test, y_test)
    precision, recall, f1, _ = precision_recall_fscore_support(y_test, model.predict(X_test), average='weighted')
    
    # Append the evaluation scores to the lists
    accuracy_scores.append(accuracy)
    precision_scores.append(precision)
    recall_scores.append(recall)
    f1_scores.append(f1)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [9]:
# Compute the average evaluation scores
avg_accuracy = np.mean(accuracy_scores)
avg_precision = np.mean(precision_scores)
avg_recall = np.mean(recall_scores)
avg_f1 = np.mean(f1_scores)

In [10]:
# Print the average evaluation scores
print("Average Accuracy: ", avg_accuracy)
print("Average Precision: ", avg_precision)
print("Average Recall: ", avg_recall)
print("Average F1-score: ", avg_f1)

Average Accuracy:  0.9733333333333334
Average Precision:  0.9770683760683762
Average Recall:  0.9733333333333334
Average F1-score:  0.9732912280701754
