# XAI CODE DEMO

## Explainable AI Specialization on Coursera

If you experience high latency while running this notebook, you can open it in Google Colab:

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/explainable-machine-learning/interpretable-ml/blob/main/rulefit_interpretability.ipynb)

# RuleFit

* RuleFit learns a sparse linear model with the original features AND also a number of new features that are decision rules
* New features that are decision rules capture interactions between the original features
* These features are generated from decision trees  trained to predict the outcome of interest

Steps:
1. Generate Rules
2. Create Sparse Linear Model

In this code demo, we will implement two versions of RuleFit. The first is an implementation via the imodels python library and the second is only using the scikit-learn package. Both implementations use a tree ensemble with gradient boosting and a Lasso linear model.

In [None]:
!pip install --upgrade imodels scikit-learn --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.1/243.1 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.9/12.9 MB[0m [31m93.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sklearn-compat 0.1.3 requires scikit-learn<1.7,>=1.2, but you have scikit-learn 1.7.0 which is incompatible.[0m[31m
[0m

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OneHotEncoder

from imodels import RuleFitRegressor

#### Dataset

We will be using the Diabetes datastet for this demonstration: [LINK](https://www.geeksforgeeks.org/sklearn-diabetes-dataset/)

This dataset is from [Efron, et.al.](https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf)

The features in this dataset are age, sex, bmi, blood pressure (bp), and six serum measurements (s1-s6).

In [None]:
# Load the diabetes dataset
diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = diabetes.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## RuleFit Implementation with *imodels*
[imodels RuleFit algorithm](https://csinva.io/imodels/rule_set/rule_fit.html#imodels.rule_set.rule_fit.RuleFit)

From documentation:
Linear model of tree-based decision rules based on the rulefit algorithm from Friedman and Popescu.

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

### Initialize and train RuleFitRegressor

In [None]:
# Initialize the RuleFitRegressor
model = RuleFitRegressor()

# Train the model
model.fit(X_train, y_train)

0,1,2
,n_estimators,100
,tree_size,4
,sample_fract,'default'
,max_rules,30
,memory_par,0.01
,tree_generator,
,lin_trim_quantile,0.025
,lin_standardise,True
,exp_rand_tree_size,True
,include_linear,True


### Evaluate Model

In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')


Mean Squared Error: 2743.94


### Extract Rules

In [None]:
rule_df = model.visualize()
rule_df

Unnamed: 0,rule,coef
3,bp,22.54
18,bmi <= 0.0687 and s5 <= 0.022,-11.71
20,bmi <= 0.01319 and s4 <= 0.05313,-7.09
22,s3 > -0.01579,-0.44
17,bmi <= 0.00942 and s3 > -0.0342,-15.9
19,bp <= 0.02359 and s2 <= 0.01732,-1.35
14,bmi <= 0.00942 and s4 <= 0.03062,-9.6
12,age <= 0.06895 and bmi <= 0.00565 and s3 > -0.07469 and s5 <= 0.0199,-17.9
13,bmi <= -0.00136 and bp <= 0.02704,-2.85
25,bp > -0.04814 and s5 > -0.00017,11.74




---



## Implementing RuleFit without *imodels*

### Train a tree ensemble with gradient boosting

In [None]:
# Train the gradient boosting model
gb = GradientBoostingRegressor(n_estimators=100, max_depth=3, random_state=42)
gb.fit(X_train, y_train)

0,1,2
,loss,'squared_error'
,learning_rate,0.1
,n_estimators,100
,subsample,1.0
,criterion,'friedman_mse'
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_depth,3
,min_impurity_decrease,0.0


### Extract rules from the tree ensemble

Deduplicate rules

In [None]:
def extract_rules(tree, feature_names):
    rules = []
    tree_ = tree.tree_
    feature = tree_.feature
    threshold = tree_.threshold

    def traverse(node, rule):
        if tree_.feature[node] != -2:
            name = feature_names[feature[node]]
            threshold_value = threshold[node]
            left_rule = rule + [f"{name} <= {threshold_value}"]
            right_rule = rule + [f"{name} > {threshold_value}"]
            traverse(tree_.children_left[node], left_rule)
            traverse(tree_.children_right[node], right_rule)
        else:
            rules.append(rule)

    traverse(0, [])
    return rules

# Extract rules from all trees
rules = []
for estimator in gb.estimators_:
    for tree in estimator:
        rules.extend(extract_rules(tree, X.columns))

# Deduplicate rules
rules = list(map(list, {tuple(rule) for rule in rules}))


🧠 What Does the traverse() Function Do?

The traverse function recursively walks through a decision tree, building the decision rules (paths) that lead to a prediction.

Each time it sees a non-leaf node, it:

    Reads the feature being split

    Gets the threshold value used for splitting

    Recursively explores:

        The left child (where feature ≤ threshold)

        The right child (where feature > threshold)

    If it reaches a leaf node, it saves the full rule path (i.e., the decisions made to get to that leaf).

📊 Mini Dataset Example

Imagine this dataset:
Age	Salary	Bought
25	50000	No
40	90000	Yes
35	60000	Yes
22	40000	No

Let’s say a simple decision tree was trained on this data to predict whether someone will buy a product.
🔍 Trained Tree Might Look Like This:

Root: Is Age <= 30?
├── Yes → Leaf: Predict No
└── No → Is Salary <= 75000?
     ├── Yes → Leaf: Predict Yes
     └── No → Leaf: Predict Yes

🔁 How traverse() Works Here

We call:

traverse(0, [])  # Start at the root

Let’s walk through it:

    Node 0 → Age <= 30

        Go Left: Path = [Age <= 30] → reaches Leaf → Save: ["Age <= 30"]

        Go Right: Path = [Age > 30] → Go to Node 1

    Node 1 → Salary <= 75000

        Go Left: Path = [Age > 30, Salary <= 75000] → Leaf → Save

        Go Right: Path = [Age > 30, Salary > 75000] → Leaf → Save

✅ Extracted Rules

So, traverse() builds these rules:

    "Age <= 30" → Predict No

    "Age > 30", "Salary <= 75000" → Predict Yes

    "Age > 30", "Salary > 75000" → Predict Yes

🧩 Summary

The traverse() function builds all paths from root to leaves, capturing the exact conditions the tree uses to make predictions.

This is how tree models can be explained in human terms like:

    "If age is more than 30 and salary is less than or equal to 75,000 → Predict YES"

### Convert rules to feature matrix

In [None]:
def rule_to_feature_matrix(rules, X):
    feature_matrix = np.zeros((X.shape[0], len(rules)), dtype=int)
    for i, rule in enumerate(rules):
        rule_conditions = " & ".join(rule)
        feature_matrix[:, i] = X.eval(rule_conditions).astype(int)
    return feature_matrix

rule_features_train = rule_to_feature_matrix(rules, X_train)
rule_features_test = rule_to_feature_matrix(rules, X_test)

### Train Lasso model

Combine rule-based features with original features

In [None]:
# Combine original features and rule-based features
X_train_combined = np.hstack([X_train, rule_features_train])
X_test_combined = np.hstack([X_test, rule_features_test])

# Train the Lasso model
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_combined, y_train)

0,1,2
,alpha,0.1
,fit_intercept,True
,precompute,False
,copy_X,True
,max_iter,1000
,tol,0.0001
,warm_start,False
,positive,False
,random_state,
,selection,'cyclic'


### Evaluate model

In [None]:
# Make predictions on the test set
y_pred = lasso.predict(X_test_combined)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')

Mean Squared Error: 3701.72


### Extract Rules

In [None]:
# Get the coefficients and feature names (original features + rule-based features)
coefs = lasso.coef_
original_feature_names = X.columns.tolist()
rule_feature_names = [f"Rule {i+1}" for i in range(len(rules))]
all_feature_names = original_feature_names + rule_feature_names

# Sort coefficients and feature names by absolute coefficient value
coef_feature_pairs = sorted(zip(coefs, all_feature_names), key=lambda x: abs(x[0]), reverse=True)

# Print rules
for coef, feature_name in coef_feature_pairs:
    if coef != 0:
        if "Rule" in feature_name:
            rule_index = int(feature_name.split()[1]) - 1
            rule_description = " AND ".join(rules[rule_index])
            print(f"{feature_name}: Coefficient = {coef:.4f}, Rule = {rule_description}")
        else:
            print(f"{feature_name}: Coefficient = {coef:.4f}")


Rule 223: Coefficient = 80.5429, Rule = s6 > 0.04241442494094372 AND s5 > 0.01368608744814992 AND s5 <= 0.015194719657301903
Rule 277: Coefficient = 44.0965, Rule = bp <= 0.09818317741155624 AND s1 <= -0.024272182025015354 AND s1 > -0.028400040231645107
Rule 193: Coefficient = 43.8745, Rule = s5 <= -0.0010538420465309173 AND s2 > 0.017944753170013428 AND age <= -0.0581863634288311
Rule 70: Coefficient = 40.7815, Rule = s5 > -0.04327731393277645 AND bmi > 0.06870198622345924 AND bmi > 0.14899898320436478
Rule 100: Coefficient = -40.2042, Rule = s6 <= -0.003148751042317599 AND bp > 0.016708109062165022 AND s5 <= -0.03781251050531864
Rule 203: Coefficient = -32.8557, Rule = s4 <= 0.05441997013986111 AND s5 > 0.08501111716032028 AND bp <= 0.03220093855634332
Rule 86: Coefficient = -30.4297, Rule = age <= 0.007199329789727926 AND s2 > 0.016535584814846516 AND age > 0.003566791128832847
Rule 66: Coefficient = 29.8529, Rule = s5 <= 0.016671447083353996 AND bp <= 0.054579468443989754 AND bp > 