# Classgraphic Iris (multiclass) demo

For demos, classgraphic has a convenience module, `essential`. This way you can start right away with an `import *`:

In [None]:
from classgraphic.essential import *

## Settings

In [None]:
random_state = 42
max_iter = 200

## Data wrangling

We will load the data for the Iris data set from plotly express, already available as px (thanks to `classgraphic.essential`). We'll describe the dataframe and visually check if there are any missing values.

In [None]:
# loading the data
df = px.data.iris()

# let's see what kind of data we have
describe(df, transpose=True).show()

# any missing?
missing(df)

In [None]:
# making one value NaN on purpose
df.iloc[2,2] = np.nan

# any missing now?
missing(df)

Given the warning above, and seeing there is only one row affected, we will drop it

In [None]:
df.dropna(inplace=True)

## Building a model

We are now ready to assign our X (features) and y (our target)

In [None]:
# features
X = df.drop(columns=["species", "species_id"])

#target
y = df["species"]

# Let's check our classes we will be training on and predicting
class_imbalance_table(y, condition="all")

The classes are well balanced, even as we removed one observation. Our train and test split will have a lot more impact on the class imbalance, especially as we will not specify `stratify=y`...

In [None]:
# train / test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=random_state
)

# we want to see total count for each, default for bars is to be stacked, so that works
# we could also pass to class_imbalance barmode="overlay" if we prefer
class_imbalance(y_train, y_test, condition="train,test")

We are now ready to train our model, a straightforward LogisticRegression classifier.

In [None]:
# model
model = LogisticRegression(max_iter=max_iter, random_state=random_state)
model.fit(X_train, y_train)

# predictions
y_score = model.predict_proba(X_test)
y_pred = model.predict(X_test)

## Evaluating the model performance

We will start with a confusion matrix. This also will give us MCC

In [None]:
confusion_matrix_table(model, y_test, y_pred)

We can also look at several metrics, per classes and at macro and weighted averages:

In [None]:
classification_table(model, y_test, y_pred)

Let's have a look at the actual results:

In [None]:
prediction_table(model, y_test, y_pred)

We can also look at the coefficients by classes or by features

In [None]:
feature_importance(model, y).show()
feature_importance(model, y, transpose=True).show()

Let's have a look at what the features looked like for the first 10 predictions (adjustable):

In [None]:
fig, data = view(X_test, y_test, y_pred, extended=True)
fig.show()

In [None]:
data

And we can visualize our class errors:

In [None]:
class_error(model, y_test, y_pred)

Other useful diagnostic plots include Precision/Recall, threshold (FPR, TPR) and ROC:

In [None]:
precision_recall(model, y_test, y_score).show()
threshold(model, y_test, y_score).show()
roc(model, y_test, y_score)

## Looking at probabilities

With linear classification models, we usually have a good spread, unlike some other models where we get something either 0 or near zero, or 1 or near one. In our case here, we can visualize the probabilities by predicted or real classes:

In [None]:
prediction_histogram(model, y_test, y_score)