#  Cross-Validation in Machine Learning

Cross-validation is a technique to evaluate the model’s performance by splitting the dataset into multiple parts and training/testing on different subsets.

**🔹 Why Use Cross-Validation?**

- Prevents overfitting (model performing well on training data but poorly on unseen data).
- Ensures the model is tested on different data splits, giving a more reliable accuracy.
- Helps compare different models before selecting the best one.

### How Does Cross-Validation Work?

- Split the data into ‘k’ equal parts (folds)
- Train the model on k-1 folds and test on the remaining fold
- Repeat the process k times, each time using a different fold as test data
- Average the results to get the final performance score
- **The most common type is K-Fold Cross-Validation**

In [1]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

In [2]:
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

In [3]:
# Define model
model = RandomForestClassifier(n_estimators=100, random_state=42)

In [4]:
# Perform 5-Fold Cross-Validation
cv_scores = cross_val_score(model, X, y, cv=5)

In [5]:
# Print results
print(f'Cross-Validation Scores: {cv_scores}')
print(f'Average Accuracy: {cv_scores.mean():.2f}')

Cross-Validation Scores: [0.92105263 0.93859649 0.98245614 0.96491228 0.97345133]
Average Accuracy: 0.96


**What does this do?**

- Splits the dataset into 5 folds.
- Trains the model on 4 folds and tests it on the remaining fold.
- Repeats this process 5 times and averages the accuracy.


### 🔹 Types of Cross-Validation

**1️⃣ K-Fold Cross-Validation (Most Common)**

- Splits data into K equal parts (e.g., 5-fold or 10-fold).
- Uses K-1 folds for training and 1 fold for testing.
- Repeats this K times.



**2️⃣ Stratified K-Fold Cross-Validation**

- Ensures each fold has a balanced distribution of classes (useful for imbalanced datasets).


**3️⃣ Leave-One-Out Cross-Validation (LOOCV)**

- Uses one sample as test data, trains on the rest.
- Very accurate but slow for large datasets.

### 🔹 When to Use Cross-Validation?

- ✔️ When dataset is small, and you need better generalization.
- ✔️ When tuning hyperparameters to get the best model.
- ✔️ When comparing different ML algorithms.