# Cross-Validation

Splitting into just **train and test** may sometimes give an unreliable estimate. What if the test set happens to be too easy or too hard?

To solve this, we use **cross-validation**.

---
## 1. What is Cross-Validation?
- Cross-validation repeatedly splits the dataset into training and validation parts.
- The model is trained and tested multiple times.
- The results are averaged for a more reliable estimate.

---
## 2. K-Fold Cross-Validation
- The dataset is divided into **k parts (folds)**.
- Train on k-1 folds, validate on the remaining fold.
- Repeat for all folds.
- Final performance = average across folds.

Example: For 5-fold CV → 80% training, 20% validation (rotating).

---
## 3. Stratified K-Fold
- Ensures class proportions are maintained across folds.
- Useful for **classification problems** with imbalanced data.

---
## 4. Benefits of Cross-Validation
- More reliable performance estimation.
- Better use of limited data.
- Helps in **model selection and hyperparameter tuning**.

---
Next: We will explore **evaluation metrics** to measure model performance.

In [None]:
# Example: K-Fold Cross-Validation
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LogisticRegression

# Load dataset
X, y = load_iris(return_X_y=True)

# Define model
model = LogisticRegression(max_iter=200)

# Apply 5-Fold Cross-Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')

print("Cross-validation scores:", scores)
print("Mean accuracy:", scores.mean())