# 5 - Explainability 

**Features importances**

In [None]:
# Pour Random Forest
importances = best_rf.feature_importances_
feat_names = X.columns
feat_imp_df = pd.DataFrame({'feature': feat_names, 'importance': importances}).sort_values(by='importance', ascending=False)

# Barplot
plt.figure(figsize=(10,6))
sns.barplot(x='importance', y='feature', data=feat_imp_df, palette='viridis')
plt.title("Feature Importances - Random Forest")
plt.xlabel("Importance")
plt.ylabel("Feature")
plt.tight_layout()
plt.show()


In [None]:
import shap

# Explainer pour Random Forest
explainer = shap.TreeExplainer(best_rf)
shap_values = explainer.shap_values(X_test)

# Summary plot (global view)
shap.summary_plot(shap_values[1], X_test)  # [1] = classe positive (stroke)


In [None]:
# Example for the first test set
shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[1][0,:], X_test.iloc[0,:])


The feature importances and SHAP plots reveal that age, **average glucose level, and BMI** are the most influential factors in predicting stroke. The SHAP summary plot confirms this globally, while the force plot illustrates how these features combine to influence the prediction for individual patients.

**Importance of interpretability in medical applications:**
In healthcare, model predictions **are not enough on their own** — clinicians need to understand why a prediction was made to trust it and take action. Interpretability techniques, like feature importances and SHAP, provide insights into which patient characteristics drive the prediction. This ensures transparency, allows for clinical validation, and helps prevent decisions based solely on “black-box” models. Especially in critical scenarios such as stroke detection, this level of **explainability is essential**for patient safety, ethical responsibility, and regulatory compliance.

Overall, combining high-performing predictive models with clear interpretability makes this approach suitable for clinical decision support systems, while maintaining trust and accountability

# 6 - What to conclude ?

This notebook shows us how to work on a classification problem in the medical field. We were able to see that it was fairly easy to find a model that performed very well in terms of accuracy (with a tendency for the model to classify all samples in class 0, i.e. Non-Stroke). But if there's one thing to take away from this Notebook, it's that it's important to ask ourselves what we want our model to achieve.

In the context of a predictive model in the medical field, accuracy may not be our most interesting metric: our main aim would be to ensure that everyone at risk of developing a stroke is detected, even if this means generating more false positives.

Are our models perfect? Obviously not, and there are certainly other things that could be done to improve this model still further. But the main point here was to highlight the importance of looking at all, or at least more than one metric, and of clearly understanding what capability we are looking for in our classification model.