### Cross Validation in Machine Learning:
- Cross-validation is a technique used to check how well a machine learning model performs on unseen data while preventing overfitting.
- It works by -
    1. Splitting the dataset into several parts.
    2. Training the model on some parts and testing it on the remaining part.
    3. Repeating this resampling process multiple times by choosing different parts of the dataset.
    4. Averaging the results from each validation step to get the final performance.

### Types of Cross-Validation:
#### 1. Holdout Validation:
- This method typically 50% data is used for training and remaining 50% is used for testing.
- Making it simple and easy to apply.
- The major drawback of this method is that only 50% data is used for training, the model may miss important patterns in the other half which leads to high bias.
#### 2. LOOCV(Leave One Out Cross Validation):
- In this method the model is trained on the entire dataset except for one data point which is used for testing.
- This process is repeated for each data point in the dataset.
  1. All datapoints are used for training, resulting in low bias.
  2. Testing on a single data point can cause high variance, especially if the data is outlier.
  3. It can be very time-consuming process for large datasets as it requires one iteration for one data point.
#### 3. Stratified Cross-Validation:
- It is a technique that ensures each fold of the cross-validation has same class distribution as the full dataset.
- This is usefull for imbalanced datasets where some classes are underrepresented.
  1. The dataset is divided into k-folds, keeping class proportions consistent in each fold.
  2. In each iteration, one fold is used for testing and remaining others for training.
  3. This process is repeated for k times so that each fold is used once as the test set.
  4. It helps classification models generalize better by maintaining balanced class representation.
#### 4. K-Fold Cross Validation:
- It splis data into k equal-sized folds.
- The model is trained on k-1 folds and tested on the remaining fold.
- This process is repeated for k times each time using a different fold for testing.

### Python implementation for k fold cross-validation
#### Step 1: Importing necessary libraries

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score, KFold
from sklearn.svm import SVC


#### Step 2: Loading the dataset

In [2]:
iris = load_iris()
X = iris.data
y = iris.target

#### Step 3: Creating SVM classifier
- SVC() - From scikit-learn is used to build the Support Vector Machine model.
- Here, we are using a linear kernel, suitable for linearly seperable data.

In [3]:
svm_classifier = SVC(kernel='linear')

#### Step 4: Defining the number of folds for cross-validation
- We define 5 folds, meaning the dataset will be split into 5 parts.The model will train on 4 parts and test on 1, repeating this process 5 times for balanced evaluation.

In [5]:
num_folds = 5
kf = KFold(n_splits=num_folds, shuffle=True, random_state=42)

#### Step 5: Performing k-fold cross-validation
- cross_val_score() - We use to automatically split data,train and evaluate the model across all folds. It returns the accuracy for each fold.

In [6]:
cross_val_results = cross_val_score(svm_classifier, X, y, cv=kf)

#### Step 6: Evaluation metrics

In [7]:
print("Cross-Validation Results (Accuracy):")
for i, result in enumerate(cross_val_results,1):
    print(f"Fold{i}:{result * 100 :.2f}%")
print(f"Mean Accuracy :{cross_val_results.mean()*100:.2f}%")

Cross-Validation Results (Accuracy):
Fold1:100.00%
Fold2:100.00%
Fold3:96.67%
Fold4:93.33%
Fold5:96.67%
Mean Accuracy :97.33%
