**general-purpose cross-validation code** using **`KFold` from scikit-learn**, which you can apply to **any model** (e.g., Logistic Regression, Decision Tree, Random Forest, XGBoost, etc.).

---

### **General K-Fold Cross-Validation Code**

```python
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score
import numpy as np

def cross_validate_model(model, X, y, k=5):
    """
    Performs K-Fold cross-validation on any sklearn-compatible model.
    
    Parameters:
        model: the machine learning model (e.g., RandomForestClassifier()).
        X: feature matrix (numpy array or DataFrame).
        y: target labels.
        k: number of folds (default: 5).
        
    Returns:
        List of accuracy scores for each fold.
    """
    kf = KFold(n_splits=k, shuffle=True, random_state=42)
    scores = []

    for fold, (train_index, val_index) in enumerate(kf.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]

        model.fit(X_train, y_train)
        predictions = model.predict(X_val)

        acc = accuracy_score(y_val, predictions)
        scores.append(acc)
        print(f"Fold {fold} Accuracy: {acc:.4f}")

    print(f"\nAverage Accuracy: {np.mean(scores):.4f}")
    return scores
```

---

### **Usage Example**
```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load sample dataset
data = load_iris()
X = data.data
y = data.target

# Initialize model
model = RandomForestClassifier()

# Run cross-validation
cross_validate_model(model, X, y, k=5)
```

---

### **You Can Use This With:**
- `RandomForestClassifier`
- `LogisticRegression`
- `SVC`, `KNeighborsClassifier`
- `XGBClassifier`, `LGBMClassifier`
- Even regression models with minor tweaks (just change the metric)

---
There **is an inbuilt method** in scikit-learn for cross-validation:

### **1. `cross_val_score()`** – the most commonly used inbuilt function for classification and regression models.

---

### **Example: Classification with `cross_val_score()`**
```python
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Model
model = RandomForestClassifier()

# Perform 5-fold CV
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')

print("Accuracy scores for each fold:", scores)
print("Average Accuracy:", scores.mean())
```

---

### **Key Parameters**
- `model`: any scikit-learn compatible model
- `X, y`: features and target
- `cv`: number of folds (e.g., `cv=5`)
- `scoring`: metric (e.g., `'accuracy'`, `'f1'`, `'roc_auc'`, `'neg_mean_squared_error'`)

---

### **2. `cross_validate()`** – more flexible (returns multiple scores)

```python
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LogisticRegression

scores = cross_validate(LogisticRegression(), X, y, cv=5,
                        scoring=['accuracy', 'f1_macro'],
                        return_train_score=True)

print(scores)
```

---

## **Different types of cross-validation**.

---

### 1. **Cross-Validation (Generic) vs. Specific Techniques**
There are different types of **cross-validation**, such as:

| Type | Description |
|------|-------------|
| **K-Fold Cross-Validation** | Splits data into *K* folds. Trains on K-1, tests on 1, repeated K times. |
| **Stratified K-Fold** | Same as K-Fold but maintains class distribution in each fold (for classification). |
| **Leave-One-Out (LOO)** | Each sample is its own test set. Very exhaustive. |
| **Repeated K-Fold** | K-Fold repeated multiple times with different splits. |
| **TimeSeriesSplit** | Used for time series data. Maintains temporal order. |

---

### 2. **cross_val_score() vs. Manual Cross-Validation**
- `cross_val_score()` is an **inbuilt helper function**.
- Manual K-Fold using `KFold().split()` gives you more **flexibility** (e.g., saving models per fold, advanced metrics, visualizations, etc.).

---

### 3. **Training Set Validation vs. Cross-Validation**
- A **train/validation/test split** uses fixed splits (e.g., 70/20/10).
- **Cross-validation** dynamically rotates validation folds for more **robust performance evaluation**.

---

These are **three core techniques** used in machine learning model evaluation and hyperparameter tuning. Here’s a clear breakdown:

---

## **1. Cross-Validation (CV)**

### **What It Is:**
Cross-validation is a **model evaluation** technique used to assess how well your model generalizes to unseen data.

### **How It Works:**
- Data is split into `k` folds.
- The model trains on `k-1` folds and validates on the remaining 1.
- This process repeats `k` times, and the average score is computed.

### **Use Case:**
- Evaluate the performance of a model.
- Detect overfitting or underfitting.

```python
from sklearn.model_selection import cross_val_score
cross_val_score(model, X, y, cv=5, scoring='accuracy')
```

---

## **2. Grid Search (`GridSearchCV`)**

### **What It Is:**
A **hyperparameter tuning** method that exhaustively tries all combinations of a given parameter grid.

### **How It Works:**
- You define a grid of hyperparameters.
- For each combination, it uses **cross-validation** to evaluate the model.
- Returns the combination with the best score.

### **Use Case:**
- When you want **best hyperparameters** and can afford the time/computation cost.

```python
from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [50, 100], 'max_depth': [5, 10]}
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)
```

---

## **3. Random Search (`RandomizedSearchCV`)**

### **What It Is:**
A **faster alternative** to Grid Search that randomly selects combinations of parameters to try.

### **How It Works:**
- Randomly samples a **fixed number** of parameter combinations from the grid.
- Uses **cross-validation** to evaluate each.
- More efficient when the parameter space is large.

### **Use Case:**
- When the search space is big or Grid Search is too slow.

```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

param_dist = {'n_estimators': randint(50, 200), 'max_depth': randint(3, 15)}
rand_search = RandomizedSearchCV(RandomForestClassifier(), param_distributions=param_dist, n_iter=10, cv=5)
rand_search.fit(X, y)
```

---

## **Summary Table**

| Feature                     | Cross-Validation        | GridSearchCV               | RandomizedSearchCV           |
|----------------------------|--------------------------|-----------------------------|-------------------------------|
| **Purpose**                | Evaluate model           | Tune hyperparameters        | Tune hyperparameters          |
| **Search Type**            | None                     | Exhaustive                  | Random sampling               |
| **Uses Cross-Validation**  | Yes                      | Yes                         | Yes                           |
| **Speed**                  | Fast                     | Slow (expensive)            | Faster than GridSearchCV      |
| **Best For**               | General model evaluation | Small hyperparam spaces     | Large hyperparam spaces       |

---