## 0. Imports

In [3]:
import pandas as pd
import numpy as np

from interpret_extension import show
from interpret_extension.glassbox import LinearDiscriminantAnalysisClassifier

from sklearn.model_selection import train_test_split

## 1. Loading IRIS Dataset

Let's load the well-known IRIS Dataset:

In [4]:
iris = pd.read_csv('data/iris.csv')
iris.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

However, let's convert it into a binary problem and separate between **X** and **y**:

In [5]:
iris['species'] = np.where(iris['species'] == 'Iris-setosa', 1, 0)

X = iris.drop('species', axis=1)
y = iris['species']

So, now, Iris-versicolor and Iris-virginica are the same class (**negative class**) and Iris-setosa is the **positive class**.

Finally, let's split it:

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 2. Linear Discriminant Analysis Model

Let's now use LDA model. As we know, a linear model that can classify data generating a linear decision boundary.

In [7]:
LDA_model = LinearDiscriminantAnalysisClassifier()

Let's fit the model:

In [8]:
LDA_model.fit(X_train, y_train)

<interpret_extension.glassbox._lineardiscriminantanalysis.LinearDiscriminantAnalysisClassifier at 0x2049624e020>

How will be the predictions?

In [9]:
pred = LDA_model.predict(X_test)
pred

array([0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0,
       1, 0, 0, 0, 0, 0, 1, 1], dtype=int64)

In [10]:
print(LDA_model.score(X_test, y_test))

1.0


Some interesting LDA params used to make the predictions:

In [11]:
LDA_model.model.coef_

array([[  2.26813213,  13.13883125, -10.60256395,  -3.39451531]])

In [12]:
LDA_model.model.covariance_

array([[0.33748229, 0.12122188, 0.30169583, 0.10831875],
       [0.12122188, 0.12445729, 0.10759583, 0.06044792],
       [0.30169583, 0.10759583, 0.45351458, 0.19006458],
       [0.10831875, 0.06044792, 0.19006458, 0.12114375]])

Now let's see the InterpretML visualizations:

In [13]:
LDA_global_explanation = LDA_model.explain_global()
show(LDA_global_explanation)

We can see linear functions indicating a clear tendency: when the value of **sepal_length**, **petal_length** or **petal_width** is higher, the probability of belonging to the positive class is also positive. Alternatively, smaller values of **sepal_width** are associated with a lower likelihood of belonging to the positive class.

In [14]:
LDA_local_explanation = LDA_model.explain_local(X_test, y_test)
show(LDA_local_explanation)