# Model Understanding

Simply examining a model's performance metrics is not enough to select a model and promote it for use in a production setting. While developing an ML algorithm, it is important to understand how the model behaves on the data, to examine the key factors influencing its predictions and to consider where it may be deficient. Determination of what "success" may mean for an ML project depends first and foremost on the user's domain expertise.

EvalML includes a variety of tools for understanding models.

First, let's train a pipeline on some data.

In [None]:
import evalml

class RFBinaryClassificationPipeline(evalml.pipelines.BinaryClassificationPipeline):
    component_graph = ['Simple Imputer', 'Random Forest Classifier']

X, y = evalml.demos.load_breast_cancer()

pipeline = RFBinaryClassificationPipeline({})
pipeline.fit(X, y)
print(pipeline.score(X, y, objectives=['log_loss_binary']))

## Feature Importance

We can get the importance associated with each feature of the resulting pipeline

In [None]:
pipeline.feature_importance

We can also create a bar plot of the feature importances

In [None]:
pipeline.graph_feature_importance()

## Permutation Importance

We can also compute and plot [the permutation importance](https://scikit-learn.org/stable/modules/permutation_importance.html) of the pipeline.

In [None]:
evalml.pipelines.calculate_permutation_importance(pipeline, X, y, 'log_loss_binary')

In [None]:
evalml.pipelines.graph_permutation_importance(pipeline, X, y, 'log_loss_binary')

## Confusion Matrix

For binary or multiclass classification, we can view a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) of the classifier's predictions

In [None]:
y_pred = pipeline.predict(X)
evalml.pipelines.graph_utils.graph_confusion_matrix(y, y_pred)

## Precision-Recall Curve

For binary classification, we can view the precision-recall curve of the pipeline.

In [None]:
# get the predicted probabilities associated with the "true" label
y = y.map({'malignant': 0, 'benign': 1})
y_pred_proba = pipeline.predict_proba(X)["benign"]
evalml.pipelines.graph_utils.graph_precision_recall_curve(y, y_pred_proba)

## ROC Curve

For binary and multiclass classification, we can view the [Receiver Operating Characteristic (ROC) curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) of the pipeline.

In [None]:
# get the predicted probabilities associated with the "benign" label
y_pred_proba = pipeline.predict_proba(X)["benign"]
evalml.pipelines.graph_utils.graph_roc_curve(y, y_pred_proba)