# Cross-validation

Cross-validation is a widely used technique in machine learning and statistics for assessing the performance and generalization of a predictive model. 

It helps to evaluate how well a model will perform on unseen data and provides a more robust estimate of a model's performance than a single train-test split. 

The basic idea is to split the dataset into multiple subsets, train and test the model on different combinations of these subsets, and then aggregate the results to get a more comprehensive performance evaluation. 

Common types of cross-validation include k-fold cross-validation and leave-one-out cross-validation.

In [14]:
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Load your dataset and target variable here
X, y = make_regression(n_samples=5, n_features=2, noise=1, random_state=42)

# Create a machine learning model (replace with your model)
model = LinearRegression()

# Specify the number of folds (e.g., 5-fold cross-validation)
num_folds = 5

# Create a KFold object to control the cross-validation process
kf = KFold(n_splits=num_folds, shuffle=True, random_state=42)

# Perform cross-validation and specify the scoring metric (e.g., mean squared error)
scores = cross_val_score(model, X, y, cv=kf, scoring='neg_mean_squared_error')

# The scores are typically negative, so we take their absolute values and calculate the mean
mse_scores = -scores
mean_mse = mse_scores.mean()

for i, mse in enumerate(mse_scores):
    print(f"Fold {i} | Mean Squared Error (MSE): {mse:.4f}")
    
print(f"\nMean Squared Error (MSE) across {num_folds} folds: {mean_mse:.4f}")

Fold 0 | Mean Squared Error (MSE): 0.8463
Fold 1 | Mean Squared Error (MSE): 0.0929
Fold 2 | Mean Squared Error (MSE): 3.3469
Fold 3 | Mean Squared Error (MSE): 1.5318
Fold 4 | Mean Squared Error (MSE): 1.7142

Mean Squared Error (MSE) across 5 folds: 1.5064


In [4]:
mse_scores

array([0.84628596, 0.09293514, 3.34686517, 1.53176163, 1.71417011])