Use VC theory to get a confidence interval on the true error rate
of the LDA classifier for the iris data (from the book web site).

The data may be found at
https://archive.ics.uci.edu/dataset/53/iris

In [67]:
import numpy as np
import pandas as pd

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import confusion_matrix, zero_one_loss
from tabulate import tabulate

## Download the iris dataset

In [68]:
# Read the data into a pandas data frame
df = pd.read_csv('../data/iris.dat')

Y = df['class'].to_numpy()
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].to_numpy()

## Perform the linear discriminant analysis

In [69]:
model = LinearDiscriminantAnalysis().fit(X, Y)

## Report the results

In [70]:
def report_confusion_matrix(response, covariate, model):
    print("Confusion matrix:")
    print(tabulate(
        np.concatenate([
            [['Y = 0'], ['Y = 1'], ['Y = 2']],
            confusion_matrix(response, model.predict(covariate))
        ], axis=1),
        headers=['h = 0', 'h = 1', 'h = 2']
    ))

def report_missclassification_rate(response, covariate, model):
    print(f"Misclassification rate: {zero_one_loss(response, model.predict(covariate)):.3}")

In [73]:
report_missclassification_rate(Y, X, model)
report_confusion_matrix(Y, X, model)

Misclassification rate: 0.02
Confusion matrix:
         h = 0    h = 1    h = 2
-----  -------  -------  -------
Y = 0       50        0        0
Y = 1        0       48        2
Y = 2        0        1       49


## Estimate the true error rate
LDA produces linear classifiers, and the set of *all* linear classifiers has VC dimension $n^{d+1}$.
Therefore VC theory tells us that
$$
    \hat L_n (h) \pm \varepsilon ,\,
    \varepsilon = \frac{32}{n} \log \left( \frac{8 \left( n^{d+1} + 1 \right)}{\alpha} \right)
$$
is a $1-\alpha$ confidence interval for the true error rate.
Here $d = 4$ (since $X$ has *four* features) and $n = 150$,
so for $\alpha = 0.05$ this yields
$$
    \varepsilon = \frac{32}{150} \log \left( \frac{8 \left( {150}^5 + 1 \right)}{0.05} \right)
    \approx 6.43
$$

In [79]:
# Computing the half-width of the confidence interval
n = len(Y)
d = X.shape[1]

alpha = 0.05

epsilon = (32/n)*np.log(8*(n**(d + 1) + 1)/alpha)
print(f"Half-width = {epsilon:.3}")

Half-width = 6.43


## Conclusion
We note that the half-width is larger than 1, which means that the confidence interval
we obtain from it is **useless**, since it will cover all of the interval $[0, 1]$.
In other words: all we are able to conclude is that the true error rate is between $0$ and $1$,
which we already knew -- this particular confidence interval has given us no new information.

Note that, as shown below, in the same setting where $d = 4$ and we are using linear classifiers,
the confidence interval as constructed above would have a half-width of $0.5$ when $n \approx 3,000$
(i.e. twenty times more data points than we currently have).

In [84]:
n = 3000
d = 4

alpha = 0.05

epsilon = (32/n)*np.log(8*(n**(d + 1) + 1)/alpha)
print(f"Half-width = {epsilon:.3}")

Half-width = 0.481
