# Cross-Validation in Machine Learning

## 1. What is Cross-Validation?
Cross-validation is a resampling technique used to assess the performance of a machine learning model on unseen data. It helps in:
- Avoiding overfitting
- Selecting the best model
- Tuning hyperparameters

## 2. Holdout Method
The simplest form of cross-validation, where the dataset is split into:
- **Training Set**: Used to train the model.
- **Test Set**: Used to evaluate model performance.

### Example in Python:
```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np

# Generating random data
X = np.random.rand(100, 5)
y = np.random.rand(100)

# Splitting data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training a model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluating on test set
score = model.score(X_test, y_test)
print("Test R² Score:", score)
```

## 3. K-Fold Cross-Validation
Divides the dataset into *k* equal folds. The model is trained on *k-1* folds and tested on the remaining fold. The process is repeated *k* times, and results are averaged.

### Formula:
$$
Error = \frac{1}{k} \sum_{i=1}^{k} Error_i
$$

### Example in Python:
```python
from sklearn.model_selection import KFold, cross_val_score

kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kf, scoring='r2')
print("K-Fold R² Scores:", scores)
print("Average R²:", np.mean(scores))
```

## 4. Leave-One-Out Cross-Validation (LOOCV)
A special case of K-Fold where *k* equals the number of data points. Each iteration trains on *n-1* samples and tests on 1 sample.

### Example in Python:
```python
from sklearn.model_selection import LeaveOneOut

loo = LeaveOneOut()
scores = cross_val_score(model, X, y, cv=loo, scoring='r2')
print("LOOCV Average R²:", np.mean(scores))
```

## 5. Time Series Cross-Validation
For time-dependent data, regular shuffling is not possible. Instead, the dataset is split sequentially, ensuring training always precedes testing.

### Example in Python:
```python
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
scores = cross_val_score(model, X, y, cv=tscv, scoring='r2')
print("Time Series Cross-Validation Scores:", scores)
```

### Summary:
| Method | Best for |
|--------|---------|
| Holdout | Large datasets, fast evaluation |
| K-Fold | Balanced performance estimation |
| LOOCV | Small datasets, high variance |
| Time Series | Temporal data |

Cross-validation ensures robust model evaluation and selection, improving generalization to new data.
