# XAI CODE DEMO

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AIPI-590-XAI/Duke-AI-XAI/blob/dev/interpretable-ml-example-notebooks/rulefit_interpretability.ipynb)

# RuleFit

* RuleFit learns a sparse linear model with the original features AND also a number of new features that are decision rules
* New features that are decision rules capture interactions between the original features
* These features are generated from decision trees  trained to predict the outcome of interest

Steps:
1. Generate Rules
2. Create Sparse Linear Model

In this code demo, we will implement two versions of RuleFit. The first is an implementation via the imodels python library and the second is only using the scikit-learn package. Both implementations use a tree ensemble with gradient boosting and a Lasso linear model.

In [1]:
!pip install --upgrade imodels scikit-learn --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.7/242.7 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [29]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OneHotEncoder

from imodels import RuleFitRegressor

  and should_run_async(code)


#### Dataset

We will be using the Diabetes datastet for this demonstration: [LINK](https://www.geeksforgeeks.org/sklearn-diabetes-dataset/)

This dataset is from [Efron, et.al.](https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf)

The features in this dataset are age, sex, bmi, blood pressure (bp), and six serum measurements (s1-s6).

In [7]:
# Load the diabetes dataset
diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = diabetes.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  and should_run_async(code)


## RuleFit Implementation with *imodels*
[imodels RuleFit algorithm](https://csinva.io/imodels/rule_set/rule_fit.html#imodels.rule_set.rule_fit.RuleFit)

From documentation:
Linear model of tree-based decision rules based on the rulefit algorithm from Friedman and Popescu.

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

### Initialize and train RuleFitRegressor

In [8]:
# Initialize the RuleFitRegressor
model = RuleFitRegressor()

# Train the model
model.fit(X_train, y_train)

  and should_run_async(code)


### Evaluate Model

In [9]:
# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')


  and should_run_async(code)


Mean Squared Error: 2783.78


### Extract Rules

In [19]:
rule_df = model.visualize()
rule_df

  and should_run_async(code)


Unnamed: 0,rule,coef
3,bp,72.75
23,s5 > -0.03388,1.17
22,bp <= 0.02359,-3.41
20,bmi <= 0.07894 and s5 <= 0.022,-17.95
21,bp <= 0.01614,-6.44
17,bmi <= 0.01319 and bp <= 0.08901 and s4 <= 0.05313,-2.3
18,s4 <= 0.06106 and s5 <= -0.00017,-4.45
11,bmi <= 0.00511 and s5 <= 0.01704,-9.2
15,bmi <= 0.00942 and s4 <= 0.03136 and s5 <= 0.02895,-16.54
16,bmi <= 0.00888 and s3 > -0.01947,-13.61




---



## Implementing RuleFit without *imodels*

### Train a tree ensemble with gradient boosting

In [30]:
# Train the gradient boosting model
gb = GradientBoostingRegressor(n_estimators=100, max_depth=3, random_state=42)
gb.fit(X_train, y_train)

  and should_run_async(code)


### Extract rules from the tree ensemble

Deduplicate rules

In [31]:
def extract_rules(tree, feature_names):
    rules = []
    tree_ = tree.tree_
    feature = tree_.feature
    threshold = tree_.threshold

    def traverse(node, rule):
        if tree_.feature[node] != -2:
            name = feature_names[feature[node]]
            threshold_value = threshold[node]
            left_rule = rule + [f"{name} <= {threshold_value}"]
            right_rule = rule + [f"{name} > {threshold_value}"]
            traverse(tree_.children_left[node], left_rule)
            traverse(tree_.children_right[node], right_rule)
        else:
            rules.append(rule)

    traverse(0, [])
    return rules

# Extract rules from all trees
rules = []
for estimator in gb.estimators_:
    for tree in estimator:
        rules.extend(extract_rules(tree, X.columns))

# Deduplicate rules
rules = list(map(list, {tuple(rule) for rule in rules}))


  and should_run_async(code)


### Convert rules to feature matrix

In [32]:
def rule_to_feature_matrix(rules, X):
    feature_matrix = np.zeros((X.shape[0], len(rules)), dtype=int)
    for i, rule in enumerate(rules):
        rule_conditions = " & ".join(rule)
        feature_matrix[:, i] = X.eval(rule_conditions).astype(int)
    return feature_matrix

rule_features_train = rule_to_feature_matrix(rules, X_train)
rule_features_test = rule_to_feature_matrix(rules, X_test)

  and should_run_async(code)


### Train Lasso model

Combine rule-based features with original features

In [33]:
# Combine original features and rule-based features
X_train_combined = np.hstack([X_train, rule_features_train])
X_test_combined = np.hstack([X_test, rule_features_test])

# Train the Lasso model
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_combined, y_train)

  and should_run_async(code)


### Evaluate model

In [34]:
# Make predictions on the test set
y_pred = lasso.predict(X_test_combined)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')

Mean Squared Error: 3695.81


  and should_run_async(code)


### Extract Rules

In [37]:
# Get the coefficients and feature names (original features + rule-based features)
coefs = lasso.coef_
original_feature_names = X.columns.tolist()
rule_feature_names = [f"Rule {i+1}" for i in range(len(rules))]
all_feature_names = original_feature_names + rule_feature_names

# Sort coefficients and feature names by absolute coefficient value
coef_feature_pairs = sorted(zip(coefs, all_feature_names), key=lambda x: abs(x[0]), reverse=True)

# Print rules
for coef, feature_name in coef_feature_pairs:
    if coef != 0:
        if "Rule" in feature_name:
            rule_index = int(feature_name.split()[1]) - 1
            rule_description = " AND ".join(rules[rule_index])
            print(f"{feature_name}: Coefficient = {coef:.4f}, Rule = {rule_description}")
        else:
            print(f"{feature_name}: Coefficient = {coef:.4f}")


Rule 86: Coefficient = 80.5387, Rule = s6 > 0.04241442494094372 AND s5 > 0.01368608744814992 AND s5 <= 0.015194719657301903
Rule 236: Coefficient = 54.0583, Rule = bmi > 0.14899898320436478
Rule 467: Coefficient = 44.1279, Rule = bp <= 0.09818317741155624 AND s1 <= -0.024272182025015354 AND s1 > -0.028400040231645107
Rule 446: Coefficient = 43.8329, Rule = s5 <= -0.0010538420465309173 AND s2 > 0.017944753170013428 AND age <= -0.0581863634288311
Rule 387: Coefficient = -40.2284, Rule = s6 <= -0.003148751042317599 AND bp > 0.016708109062165022 AND s5 <= -0.03781251050531864
Rule 540: Coefficient = -32.8381, Rule = s4 <= 0.05441997013986111 AND s5 > 0.08501111716032028 AND bp <= 0.03220093855634332
Rule 318: Coefficient = -30.4226, Rule = age <= 0.007199329789727926 AND s2 > 0.016535584814846516 AND age > 0.003566791128832847
Rule 452: Coefficient = 29.8514, Rule = s5 <= 0.016671447083353996 AND bp <= 0.054579468443989754 AND bp > 0.03736521489918232
Rule 220: Coefficient = 28.4261, Rule 

  and should_run_async(code)
