## 📊 What is Cross-Validation Score?

**Cross-validation score** refers to the performance metric(s) you get when evaluating a model using cross-validation rather than a single train/test split. In `scikit-learn`, the helper function `cross_val_score()` automates this process.

---

###  K-Fold Splitting
- Your dataset is split into **K** equally (or nearly equally) sized **folds**.
- For each of the **K** iterations:
  - One fold is held out as the **validation set**.
  - The model is trained on the remaining **K – 1** folds.

---

### Model Evaluation
- After training, the model is scored on the validation fold using your chosen metric (e.g., R², mean squared error, accuracy).
- This process repeats for each fold, producing **K separate scores**.

---

###  Aggregating Results
- `cross_val_score()` returns an array of the **K scores**.
- You typically compute:
  - The **mean score**: to estimate overall model performance.
  - The **standard deviation**: to assess score variability across folds.

---

This method provides a more reliable measure of model performance and helps detect overfitting or instability due to random splits.


In [1]:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score

##Load the iris dataset from Sklearn Datasets

In [2]:
iris = datasets.load_iris()

##Explore the Structure of the Digits Dataset

In [3]:
dir(iris)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

## 🌳 Cross-Validation with Random Forest Classifier

In this step, we use `cross_val_score()` to evaluate a `RandomForestClassifier` with 35 decision trees (`n_estimators=35`) on the Iris dataset.
Performed `5 fold cross Validation` and used the `R^2` scoring method.
This method splits the data into multiple folds and returns accuracy scores for each fold.


In [4]:
rf = cross_val_score(RandomForestClassifier(n_estimators=35),iris.data, iris.target,cv=5, scoring='r2')


In [5]:
print("Fold-wise scores:", rf)
print("Average accuracy:", rf.mean())

Fold-wise scores: [0.95 0.95 0.9  0.85 1.  ]
Average accuracy: 0.93


## 🧪 Cross-Validation with Support Vector Classifier (SVC)

Here, we evaluate an `SVC` (Support Vector Classifier) on the Iris dataset using `cross_val_score()`.

This approach splits the data into multiple folds (by default 5) and trains/tests the model across each fold to assess its generalization performance.

Below, we print the accuracy scores for each fold.


In [6]:
svm = cross_val_score(SVC(),iris.data, iris.target, scoring='r2')


In [7]:
print("Fold-wise scores:", svm)
print("Average accuracy:", svm.mean())

Fold-wise scores: [0.95 0.95 0.95 0.9  1.  ]
Average accuracy: 0.95


## 📦 Cross-Validation with Logistic Regression

In this section, we evaluate a `LogisticRegression` model on the Iris dataset using `cross_val_score()`.

Since logistic regression may require multiple iterations to converge on some datasets, we specify `max_iter=150` to ensure convergence.

The result is a list of accuracy scores from each fold, providing a more robust measure of model performance.


In [8]:
lr = cross_val_score(LogisticRegression(max_iter=150),iris.data, iris.target, cv=5, scoring='r2')


In [10]:
print("Fold-wise scores:", lr)
print("Average accuracy:", lr.mean())

Fold-wise scores: [0.95 1.   0.9  0.95 1.  ]
Average accuracy: 0.96


So it seems like Logistic Regression got the highest mean accuracy of above all models.