# Cross Validation in Machine Learning

**Cross-validation is a technique used to check how well a machine learning model performs on unseen data while preventing overfitting. It works by:**

- Splitting the dataset into several parts.
- Training the model on some parts and testing it on the remaining part.
- Repeating this resampling process multiple times by choosing different parts of the dataset.
- Averaging the results from each validation step to get the final performance.

### 1. Holdout Validation
In Holdout Validation method typically 50% data is used for training and 50% for testing. Making it simple and quick to apply. The major drawback of this method is that only 50% data is used for training, the model may miss important patterns in the other half which leads to high bias.

### 2. LOOCV (Leave One Out Cross Validation)
In this method the model is trained on the entire dataset except for one data point which is used for testing. This process is repeated for each data point in the dataset.

- All data points are used for training, resulting in low bias.
- Testing on a single data point can cause high variance, especially if the point is an outlier.
- It can be very time-consuming for large datasets as it requires one iteration per data point.

### 3. Stratified Cross-Validation
It is a technique that ensures each fold of the cross-validation process has the same class distribution as the full dataset. This is useful for imbalanced datasets where some classes are underrepresented.

- The dataset is divided into k folds, keeping class proportions consistent in each fold.
- In each iteration, one fold is used for testing and the remaining folds for training.
- This process is repeated k times so that each fold is used once as the test set.
- It helps classification models generalize better by maintaining balanced class representation.

### 4. K-Fold Cross Validation
K-Fold Cross Validation splits the dataset into k equal-sized folds. The model is trained on k-1 folds and tested on the remaining fold. This process is repeated k times each time using a different fold for testing.

## Python implementation for k fold cross-validation
### Step 1: Importing necessary libraries

In [2]:
from sklearn.model_selection import cross_val_score, KFold
from sklearn.svm import SVC
from sklearn.datasets import load_iris

### Step 2: Loading the dataset

In [3]:
iris = load_iris()
X, y = iris.data, iris.target

### Step 3: Creating SVM classifier

In [4]:
svm_classifier = SVC(kernel='linear')

### Step 4: Defining the number of folds for cross-validation

In [5]:
num_folds = 5
kf = KFold(n_splits=num_folds, shuffle=True, random_state=42)

### Step 5: Performing k-fold cross-validation

In [6]:
cross_val_results = cross_val_score(svm_classifier, X, y, cv=kf)

### Step 6: Evaluation metrics

In [7]:
print("Cross-Validation Results (Accuracy):")
for i, result in enumerate(cross_val_results, 1):
    print(f"  Fold {i}: {result * 100:.2f}%")
    
print(f'Mean Accuracy: {cross_val_results.mean()* 100:.2f}%')

Cross-Validation Results (Accuracy):
  Fold 1: 100.00%
  Fold 2: 100.00%
  Fold 3: 96.67%
  Fold 4: 93.33%
  Fold 5: 96.67%
Mean Accuracy: 97.33%


## The End !!