# Cross Validation using Folds

- The basic idea behind cross-validation is to divide the dataset into multiple subsets or "folds," train the model on some of these folds, and evaluate it on the remaining fold(s). This process is repeated multiple times, with different subsets used for training and testing in each iteration. The results from these iterations are then averaged to obtain a more robust evaluation of the model's performance.


In [1]:
import pandas as pd
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score, KFold

sales_df = pd.read_csv("datasets/advertising_and_sales_clean.csv")
print(sales_df.head())

X = sales_df.drop(["sales", "influencer"], axis=1).values
y = sales_df["sales"].values

        tv     radio  social_media influencer      sales
0  16000.0   6566.23       2907.98       Mega   54732.76
1  13000.0   9237.76       2409.57       Mega   46677.90
2  41000.0  15886.45       2913.41       Mega  150177.83
3  83000.0  30020.03       6922.30       Mega  298246.34
4  15000.0   8437.41       1406.00      Micro   56594.18


In [3]:
# Create a KFold object
kf = KFold(n_splits=6, shuffle=True, random_state=5)

reg = LinearRegression()

# Compute 6-fold cross-validation scores
cv_scores = cross_val_score(reg, X, y, cv=kf)

# Print scores
print(cv_scores)

# Print the mean
print(np.mean(cv_scores))

[0.99894062 0.99909245 0.9990103  0.99896344 0.99889153 0.99903953]
0.9989896443678249
