<a href="https://colab.research.google.com/github/aghakishiyeva/Interpretable-ML-II/blob/main/AIPI590_Assignment_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open in? Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/aghakishiyeva/Interpretable-ML/blob/main/AIPI590_Assignment_3.ipynb)

# Import Libraries, Load and Clean the Dataset

In [None]:
# Cell 1: Import necessary libraries and load the data

# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from imodels import RuleFitClassifier, SkopeRulesClassifier, BoostedRulesClassifier

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Perform a train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Show dataset structure
print("Dataset structure:", X_train.shape)
print("Target classes:", np.unique(y))

Dataset structure: (398, 30)
Target classes: [0 1]


  and should_run_async(code)


# RuleFit

RuleFit combines the power of linear models and decision trees. It extracts decision rules from an ensemble of decision trees (like random forest) and fits a sparse linear model on these rules. The resulting model is interpretable because it selects a small number of decision rules that help explain the target variable. RuleFit is powerful in balancing interpretability and prediction performance.

In [None]:
# Initialize RuleFit model
rulefit_model = RuleFitClassifier()

# Fit and predict with RuleFit
rulefit_model.fit(X_train, y_train)
rulefit_preds = rulefit_model.predict(X_test)

# Calculate accuracy
rulefit_acc = accuracy_score(y_test, rulefit_preds)
print(f"Accuracy of RuleFit: {rulefit_acc}")

  and should_run_async(code)


Accuracy of RuleFit: 0.9532163742690059


In [None]:
rule_df = rulefit_model.visualize()
rule_df

  and should_run_async(code)


Unnamed: 0,rule,coef
21,X21,-0.08
24,X24,-0.02
45,X10 <= 0.55918 and X27 <= 0.44685,0.48
46,X10 <= 0.65777 and X22 <= 0.23111 and X24 <= 1.9952 and X27 <= 0.61454,0.66
47,X20 <= 0.10591 and X27 <= 0.66078 and X3 <= 0.10178,0.13
52,X10 <= 0.76102 and X23 <= -0.08973 and X28 <= 1.1038,0.14
43,X10 <= 0.76102 and X23 <= 0.15157 and X7 <= 0.07541,0.3
50,X13 <= 0.16596 and X20 <= 0.10799 and X27 <= 0.64197 and X7 <= 0.19542,0.42
44,X13 <= -0.09713 and X27 <= 0.44763 and X28 <= 0.83227,0.96
42,X10 <= 0.6233 and X14 > -1.19126 and X23 <= -0.02394 and X27 <= 0.38886,0.39


# SkopeRules

SkopeRules is a rule learning algorithm that extracts decision rules from an ensemble of trees (such as gradient-boosted trees). It filters and deduplicates the rules to keep only the most informative ones. These rules are combined linearly to predict the target variable. It offers better interpretability by focusing on rules with high precision, especially for noisy datasets.

In [None]:
# Initialize SkopeRules model
skoperules_model = SkopeRulesClassifier()

# Fit and predict with SkopeRules
skoperules_model.fit(X_train, y_train)
skoperules_preds = skoperules_model.predict(X_test)

# Calculate accuracy
skoperules_acc = accuracy_score(y_test, skoperules_preds)
print(f"Accuracy of SkopeRules: {skoperules_acc}")

  and should_run_async(code)


Accuracy of SkopeRules: 0.3684210526315789


# Boosted Rule Set

Boosted Rule Set uses an ensemble method, Adaboost, to sequentially fit a set of rules. Each rule is added one after the other, and the weight of each rule is adjusted based on how well it improves the model’s predictions. This approach is useful for generating strong predictions from a collection of weak rules. The advantage of Boosted Rule Set is that it enhances prediction performance by iteratively refining the model using previously learned rules.

In [None]:
# Initialize Boosted Rules model
boostedrules_model = BoostedRulesClassifier()

# Fit and predict with Boosted Rules
boostedrules_model.fit(X_train, y_train)
boostedrules_preds = boostedrules_model.predict(X_test)

# Calculate accuracy
boostedrules_acc = accuracy_score(y_test, boostedrules_preds)
print(f"Accuracy of Boosted Rules: {boostedrules_acc}")

  and should_run_async(code)


Accuracy of Boosted Rules: 0.9707602339181286
