In this notebook we answer the following questions:

- the existence of a significant statistical association between the iris type and the input features (petal and sepal width and length)?
- the ability of each kind of estimator to assess or not such a statistical association between features and target variable?

Using the `sklearn.datasets.load_iris` dataset and for the classifiers `sklearn.dummy.DummyClassifier` and `sklearn.ensemble.HistGradientBoostingClassifier`.

In [1]:
# imports for the notebook
from sklearn.datasets import load_iris
from sklearn.base import BaseEstimator
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.model_selection import permutation_test_score
import numpy as np

<h1>Load the data and print some informations</h1>

In [2]:
# Load and inspect the data
X, y = load_iris(return_X_y=True)

# Number of samples, number of features and number of classes
n, d = X.shape
n_classes = len(np.unique(y))
print(f"n_samples: {n}, n_features: {d}, n_classes: {n_classes}", end="\n\n")

# Name of the features and standard deviation of each feature
print(f"Feature names: {load_iris()['feature_names']}")
print(f"Standard deviation of each feature: {X.std(axis=0)}", end="\n\n")

# Name of the target and class distribution
n_classes = len(np.unique(y))
print(f"Classes names: {load_iris()['target_names']}")
print(f"Class distribution: {np.bincount(y)}")

n_samples: 150, n_features: 4, n_classes: 3

Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Standard deviation of each feature: [0.82530129 0.43441097 1.75940407 0.75969263]

Classes names: ['setosa' 'versicolor' 'virginica']
Class distribution: [50 50 50]


<h1>Remarks about the data</h1>

- The data represents measures on different flowers, nothing seems to indicate a temporal or spatial correlation between samples, it seems safe to assume iid samples
- The features are homogeneous (length in cm) and in the same order of magnitude: it seems that scaling the data before processing it is not a necessity here
- The classes are balanced: this information is of interest as we plan to use a histogram-based classification tree classifier, if classes were unbalanced we should consider setting a `class_weight` argument different that `None`.

<h1> Computation of p-values for a cross-validated score with permutation significiance test </h1>

For two classifiers:

- `sklearn.dummy.DummyClassifier`
- `sklearn.ensemble.HistGradientBoostingClassifier`

In [3]:
# Define a function to compute the p-value of a classifier using permutation test
def p_value(clf: BaseEstimator) -> float:
    """Compute the p-value of the classifier using permutation test."""
    score, permutation_scores, p_value = permutation_test_score(
        clf, X, y, scoring="accuracy", n_permutations=100, n_jobs=-1
    )
    return p_value


# Compute the p-value of the classifiers
clf_dummy = DummyClassifier()
print(f"p-value for DummyClassifier: {p_value(clf_dummy):.2f}")

clf_boosting = HistGradientBoostingClassifier()
print(f"p-value for HistGradientBoostingClassifier: {p_value(clf_boosting):.2f}")

p-value for DummyClassifier: 1.00
p-value for HistGradientBoostingClassifier: 0.01



<h1>Conclusion</h1>


<h3> Test </h3>

- null hypothesis: the classifier acts the same on random data and on actual data: the labels are obtained by chance (random classifier)

- alternative: the classifier acts significantly better on actual data than random: a pattern between inputs and labels is shown

<h3>Interpretation of the p-value</h3>

- A large p-value (close to 1) indicates that null hypothesis cannot be discarded and the classifier do not indicate a statistical evidence between input features and labels.

- A small p-value (for this test, the minimal value is `1 / n_permutations`, here `0.01`) indicates that there is evidence against null hypothesis and consequently, a statistical evidence between labels and input features.


<h3>Conclusion for our study</h3>

| Classifier                        | p-value  | Conclusion                                                                   |
|-----------------------------------|----------|------------------------------------------------------------------------------|
| `DummyClassifier`                 | 1.0      | labels obtained by chance, no statistical evidence exposed                   |
| `HistGradientBoostingClassifier`  | 0.01     | pattern learned, clear statistical evidence between input featurs and labels |
