<a href="https://colab.research.google.com/github/AaryaDesai1/Interpretable_ML_II/blob/main/Interpretable_ML_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIPI 590- XAI | Assignment #04
## Aarya Desai

# Imodels Interpretability Assignment

In this notebook, I'll demonstrate three algorithms from the `imodels` Python library: RuleFit, Boosted Rules, and SkopeRules. These algorithms generate interpretable decision rule sets or lists that can explain predictions effectively.

For the same, I'll use the UCI Adult dataset for this demonstration, which is a binary classification task (predicting income level based on various attributes).

## Installation and Setup


In [2]:
!pip install --upgrade imodels
!pip install pandas scikit-learn matplotlib seaborn

Collecting imodels
  Downloading imodels-1.4.6-py3-none-any.whl.metadata (30 kB)
Downloading imodels-1.4.6-py3-none-any.whl (243 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.0/243.0 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: imodels
Successfully installed imodels-1.4.6


## Data Loading and Preprocessing

The UCI Adult dataset can be loaded via `sklearn.datasets`. The loading and preprocessing are in the following cells.


In [3]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Load the UCI Adult dataset from the UCI repository
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
column_names = ['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status',
                'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss',
                'hours-per-week', 'native-country', 'income']

data = pd.read_csv(url, names=column_names, sep=',\s', na_values="?", engine='python')


In [4]:
# Preprocessing
# Drop rows with missing values
data.dropna(inplace=True)

# Encode the target variable ('income')
data['income'] = data['income'].apply(lambda x: 1 if x == '>50K' else 0)

# Encode categorical features using one-hot encoding
categorical_columns = ['workclass', 'education', 'marital-status', 'occupation',
                       'relationship', 'race', 'sex', 'native-country']
data = pd.get_dummies(data, columns=categorical_columns)

# Split dataset into features and target
X = data.drop('income', axis=1)
y = data['income']

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


# Model 1: RuleFit

`RuleFit` is an interpretable model that combines rule-based decision trees with linear models. It generates a list of decision rules from tree models and uses these rules along with linear terms to make predictions. This hybrid approach ensures that predictions are both interpretable and accurate.


In [6]:
# Install necessary libraries
from sklearn.metrics import accuracy_score
from imodels.rule_set.rule_fit import RuleFitClassifier

# Initialize RuleFitClassifier
model_rulefit = RuleFitClassifier(n_estimators=100, tree_size=4, max_rules=30)

# Train the model
model_rulefit.fit(X_train, y_train, feature_names=X.columns)

# Make predictions on the test set
y_pred = model_rulefit.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'RuleFitClassifier Accuracy: {accuracy:.4f}')

# Visualize the learned rules
model_rulefit.visualize()


  and should_run_async(code)


RuleFitClassifier Accuracy: 0.8455


Unnamed: 0,rule,coef
1,fnlwgt,0.0
3,capital-gain,0.0
4,capital-loss,0.0
5,hours-per-week,0.02
33,marital-status_Never-married,-0.14
39,occupation_Exec-managerial,0.45
40,occupation_Farming-fishing,-0.04
43,occupation_Other-service,-0.16
45,occupation_Prof-specialty,0.12
55,relationship_Wife,0.08


## Interpreting the RuleFit Resutls:

Here, the `rule` column tells us the individual features or complex rules that were formed to predict the outcome variable, here, a binary variables that indicates whether or not one can earn above 50,000 dollars. The `coef` column then gives us a value which shows us the importance and direction of this prediction. For example, one of the results is:
`capital-gain <= 7073.5 and capital-loss <= 1794.5` with a coefficient of -1.45. This means that if an individual has capital gain less than or equal to 7073.5 dollars as well as a capital loss less than or equal to 1794.5 dollars, they are **much less likely** to earn more than 50k as per the results of this model.   

### **NOTE**
The finding explained above was found when I ran the cell the first time. However, after having to run it again, without setting a random seed, I found different results. This shows a drawback of this model, as different decision rules will be found each time the model is run, therefore, not providing consitency and robustness that you may need.

# Model 2: Boosted Rules

`BoostedRulesClassifier` is an interpretable ensemble model that builds upon the principles of boosting to create a collection of logical rules derived from decision trees. The model employs weak learners, typically shallow decision trees, which are combined to improve prediction accuracy. This method not only enhances the robustness of predictions but also retains interpretability by providing clear and actionable decision rules.

## Interpreting the BoostedRulesClassifier Results:

In the results generated by the `BoostedRulesClassifier`, each `Rule` represents a decision tree that contributes to the final ensemble prediction. The rules reflect combinations of feature thresholds that help predict the outcome variable, which in this case is a binary indicator of whether an individual earns above or below $50,000.

For example, the output may include several decision trees, such as:
- `DecisionTreeClassifier(max_depth=1, random_state=1608637542)`

These rules indicate the conditions under which predictions are made. Although the exact decision thresholds are not detailed in the example, each rule contributes to the model's final predictions based on the features of the input data.

### **NOTE**
The results displayed were obtained from the initial execution of the model. It is important to recognize that, similar to other ensemble methods, the `BoostedRulesClassifier` may yield different rules upon repeated runs if the random state is not set. This variability can impact the consistency and robustness of the model's interpretations, making it crucial to consider the stability of the rules when applying the model in practice.

In [9]:
from imodels import BoostedRulesClassifier
from sklearn.tree import DecisionTreeClassifier

# Train the Boosted Rules model
model = BoostedRulesClassifier(estimator=DecisionTreeClassifier(max_depth=1), n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Visualizing the rules
rules = model.estimators_  # This will contain the rules learned by the model
for i, rule in enumerate(rules):
    print(f'Rule {i + 1}: {rule}')


  and should_run_async(code)


Accuracy: 0.86
Rule 1: DecisionTreeClassifier(max_depth=1, random_state=1608637542)
Rule 2: DecisionTreeClassifier(max_depth=1, random_state=1273642419)
Rule 3: DecisionTreeClassifier(max_depth=1, random_state=1935803228)
Rule 4: DecisionTreeClassifier(max_depth=1, random_state=787846414)
Rule 5: DecisionTreeClassifier(max_depth=1, random_state=996406378)
Rule 6: DecisionTreeClassifier(max_depth=1, random_state=1201263687)
Rule 7: DecisionTreeClassifier(max_depth=1, random_state=423734972)
Rule 8: DecisionTreeClassifier(max_depth=1, random_state=415968276)
Rule 9: DecisionTreeClassifier(max_depth=1, random_state=670094950)
Rule 10: DecisionTreeClassifier(max_depth=1, random_state=1914837113)
Rule 11: DecisionTreeClassifier(max_depth=1, random_state=669991378)
Rule 12: DecisionTreeClassifier(max_depth=1, random_state=429389014)
Rule 13: DecisionTreeClassifier(max_depth=1, random_state=249467210)
Rule 14: DecisionTreeClassifier(max_depth=1, random_state=1972458954)
Rule 15: DecisionTreeC

### BoostedClassifier Results:
- **Accuracy**: 0.86
- **Rules**:
  - Rule 1: `DecisionTreeClassifier(max_depth=1, random_state=1608637542)`
  - ...
  - Rule 100: `DecisionTreeClassifier(max_depth=1, random_state=134489564)`

The accuracy indicates the percentage of correctly classified instances out of the total test set. Each decision tree rule provides insights into the conditions under which predictions are made, allowing for greater interpretability in understanding the model's decisions.

# Model 3: OneRClassifier

`OneRClassifier` is a simple yet interpretable classification algorithm that builds a rule list based on only one feature. The model works by evaluating each feature independently and selecting the one that provides the best accuracy for classification. The resulting rule list consists of conditions on that single feature, making the model highly interpretable. Since the algorithm only uses one feature, it is limited in complexity but provides clear and easy-to-understand decision rules.

In [10]:
from imodels import OneRClassifier

# Initialize the OneRClassifier
one_r_model = OneRClassifier(max_depth=5)

# Fit the model on the training data
one_r_model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = one_r_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of OneRClassifier: {accuracy:.2f}")

# View the rules
print("Learned Rules:")
for rule in one_r_model.rules_:
    print(rule)

  and should_run_async(code)


Accuracy of OneRClassifier: 0.79
Learned Rules:
{'col': 'capital-gain', 'index_col': 3, 'cutoff': 5119.0, 'val': 0.2100274382639062, 'flip': False, 'val_right': 0.951310861423221, 'num_pts': 21113, 'num_pts_right': 1068}
{'col': 'capital-gain', 'index_col': 3, 'cutoff': 4243.5, 'val': 0.20877793436682102, 'flip': False, 'val_right': 0.3502824858757062, 'num_pts': 20045, 'num_pts_right': 177}
{'col': 'capital-gain', 'index_col': 3, 'cutoff': 3120.0, 'val': 0.0, 'flip': True, 'val_right': 0.21078306824533766, 'num_pts': 19868, 'num_pts_right': 19679}
{'val': 0.0, 'num_pts': 189}


## Interpreting the OneRClassifier Results:
Here, the model identified `capital-gain` as the most important feature for predicting income, and learned rules based on this feature after that to predict whether or not someone would make more than $50,000.

Three rules came about:
1. Cutoff = 5119: If an individual's capital gain is less than or equal to 5119, then they have a low probability (0.21) to earn more than 50k.
2. Cutoff = 4243.5:  If an individual’s capital gain is less than or equal to 4243.5, they are less likely (probability of 0.21) to earn more than 50k.
3. Cutoff = 3120: If an individual’s capital gain is less than 3120.0, they are much less likely (probability close to 0) to earn more than 50k.


### **NOTE**:
The main drawback of this model is the *oversimplification*. Since it selects only one feature (in this case, capital-gain) to base all its rules on, it may overlook important interactions and contributions from other features.