# Week 02 – Glassbox Models (part 2)

In this notebook we'll be applying different glassbox models to the same dataset and compare their performance in terms of accuracy and f1-score and their interpretability.

#### Step 1: Navigate to this week's directory 
```
cd <path_to_week_2_material>
```

#### Step 2: Create and Activate a virtual environment (Python 3.9)
**MacOS**
```
python3.11 -m venv part2_venv
source part2_venv/bin/activate
```

**Windows (cmd)**
```
python3.11 -m venv part2_venv
part2_venv\Scripts\activate
```

#### Step 3: Install required packages
First, install `ipykernel` to integrate your virtual environment with Jupyter.
```
pip install ipykernel
python -m ipykernel install --user --name=part2_venv
```

Next, install all necessary packages.
```
pip install ruleopt
```

**import packages**

In [1]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, roc_auc_score
import os
import sys
# import matplotlib.pyplot as plt

# Load the dataset

We are using the [Titanic dataset](https://www.kaggle.com/c/titanic/overview), which holds data about passangers of the Titanic and whether they survived or not. Passengers are described by 7 features. The response variable is binary (0 – died; 1 - survived).

Make sure to save the dataset in the parent directory or adjust the file path below.

We are directly loading the pre-processed data sets. For some algorithms, we require the data to be in binary form. Hence, we have two versions of X: `X_train` and `X_test`, with continous features and one-hot encoded categorical features, and `X_train_bin` and `X_test_bin`, where all features have been one-hot encoded. For this, continuous features were first transformed into categories. Check the code in `./01_intro/titanic_data_prep.ipynb` for more details on pre-processing.

In [2]:
# X_train 
X_train = pd.read_csv('../datasets/titanic/encoded_titanic_X_train.csv')
X_train_bin = pd.read_csv('../datasets/titanic/bin_titanic_X_train.csv')

# X_test
X_test = pd.read_csv('../datasets/titanic/encoded_titanic_X_test.csv')
X_test_bin = pd.read_csv('../datasets/titanic/bin_titanic_X_test.csv')

# y_train and y_test
y_train = pd.read_csv('../datasets/titanic/titanic_y_train.csv')
y_test = pd.read_csv('../datasets/titanic/titanic_y_test.csv')

# take a look at the data
X_train.head()

Unnamed: 0,Age,SibSp,Parch,Fare,Sex_1,Embarked_1,Embarked_2,Pclass_1,Pclass_2
0,29.699118,0,0,7.6292,0,1,0,0,1
1,29.699118,0,0,8.05,1,0,1,0,1
2,29.699118,0,0,7.75,0,1,0,0,1
3,51.0,1,0,77.9583,0,0,1,0,0
4,21.0,0,0,7.7333,1,1,0,0,1


## About the dataset

- `Age` – age of a person in years (int)
- `SibSp` – the number of siblings or spouse of a person **onboard** (int)
- `Parch` – the number of parents or children of a person **onbard** (int)
- `Fare` – ticket price (float)
- `Sex` – sex of a person (categorical/binary)
- `Embarked` – location where the traveler mounted from. There are three possible values — Southampton, Cherbourg, and Queenstown (categorical)
- `Pclass` – passenger division into class 1, 2, and 3 (categorical)
- `Survived` – whether person survived the sinking of the ship (binary). Less than 40% survived. This is the **outcome** to predict. 


From the original dataset and from preprocessing the data, we know the following about the **categorical features**:

- `Sex` has two values `['female','male']`, which were encoded `[0,1]`, respectively. Then, after applying one-hot encoding, we have `Sex_1` which indicates `male` if 1, `female` otherwise.
- `Embarked` has three values `['C', 'Q', 'S']`, which were encoded `[0,1,2]`, respectively. Hence,
    - `Embarked_1 = 1` indicates `Q` 
    - `Embarked_2 = 1` indicates `S`
    - `Embarked_1 = 0` and `Embarked_2 = 0` indicated `C`
- `Pclass` has three values `[1,2,3]`, which were encoded `[0,1,2]`, respectively. Hence, after encoding, we have:
    - `Pclass_1 = 1` indicates `2`
    - `Pclass_2 = 1` indicates `3`
    - `Pclass_1 = 0` and `Pclass_2 = 0` indicates `1`

In [3]:
d = {
    'target': 'Survived',
    'numerical':['Age', 'SibSp', 'Parch', 'Fare'],
    'categorical':['Sex', 'Embarked', 'Pclass']
}

# save feature names
feature_names = X_train.columns
target_names = list(y_train[d['target']].unique())

# create a dataframe to save y values in 
y_results = pd.DataFrame()
y_results['y_test'] = y_test

---
---

# 5. Decision Rules

## 5.2 Rule Generation (RUG)

We propose this algorithm in our paper which you can access here: https://arxiv.org/abs/2104.10751 

In [4]:
# sys.path.insert(1, os.path.dirname(os.getcwd())+'/RuleDiscovery')
from ruleopt import RUGClassifier
from ruleopt.rule_cost import Length, Gini
from ruleopt.explainer import Explainer
# import ruxg

KeyboardInterrupt: 

In [None]:
# solver = ORToolsSolver()
rule_cost = Length()

# Initialize the RUGClassifier with specific parameters
rug = RUGClassifier(
    random_state=100,
    max_rmp_calls=8,
    rule_cost=rule_cost,
    max_depth=3,
    threshold=0.05)

In [None]:
# rug = RUGClassifier(max_depth=3, rule_length_cost=False,
#                     solver='gurobi', random_state=0, max_RMP_calls = 8, threshold = 0.05)
rug.fit(X_train, y_train[d['target']])
y_results['rug_pred'] = rug.predict(np.array(X_test))

## inspect performance

In [None]:
# Confusion matrix
cm = pd.crosstab(y_results['y_test'], y_results['rug_pred'])
print ("Confusion matrix : \n", cm)

print('\nAccuracy  = %.4f' % accuracy_score(y_results['y_test'], y_results['rug_pred']))
print('F1 score  = %.4f' % f1_score(y_results['y_test'], y_results['rug_pred']))

Confusion matrix : 
 rug_pred    0   1
y_test           
0         102   8
1          29  40

Accuracy  = 0.7933
F1 score  = 0.6838


### Questions

**Q 5.2.1 – Evaluate the model performance.**

...

**Q 5.2.2 – Explain the meaning of `rule_cost = Length()`.**

...

## model interpretation

---
### GLOBAL

In [None]:
exp = Explainer(rug)
summary = exp.summarize_rule_metrics()

Total number of rules: 18
Average rule length: 2.28


In [None]:
rules = exp.retrieve_rule_details(list(feature_names))

RULE 0:
2.50      < Age       <= inf       or null
-inf      < Pclass_2  <= 0.50      or null
-inf      < Sex_1     <= 0.50      and not null
Class: 1
Scaled rule weight: 1.0000

RULE 1:
-inf      < Age       <= 7.00      and not null
-inf      < SibSp     <= 2.50      or null
Class: 1
Scaled rule weight: 0.6667

RULE 2:
-inf      < Sex_1     <= 0.50      and not null
8.04      < Fare      <= 14.85     and not null
Class: 0
Scaled rule weight: 0.6667

RULE 3:
0.96      < Age       <= 2.50      or null
-inf      < Pclass_1  <= 0.50      or null
Class: 0
Scaled rule weight: 0.6667

RULE 4:
-inf      < Age       <= 1.50      and not null
Class: 1
Scaled rule weight: 0.6667

RULE 5:
-inf      < Age       <= 11.50     and not null
-inf      < Embarked_2 <= 0.50      and not null
Class: 0
Scaled rule weight: 0.6667

RULE 6:
13.50     < Age       <= inf       or null
-inf      < Fare      <= 26.14     or null
0.50      < Sex_1     <= inf       or null
Class: 0
Scaled rule weight: 0.3333

RULE

### Questions

**Q 5.2.2 – Evaluate the global interpretability of the model.**

...

---
### LOCAL

In [None]:
rule_coverage_metrics = exp.evaluate_rule_coverage_metrics(X_test, info=True)

Number of instances not covered by any rule: 12
Average number of rules per sample: 1.37
Average length of rules per sample: 2.53


In [None]:
i = 9
print(f'True and predicted values for sample at index {i}:')
print(y_results.loc[i,:], '\n')
print(f'Sample {i} features:')
X_test.loc[[i]]

True and predicted values for sample at index 9:
y_test      1
rug_pred    0
Name: 9, dtype: int64 

Sample 9 features:


Unnamed: 0,Age,SibSp,Parch,Fare,Sex_1,Embarked_1,Embarked_2,Pclass_1,Pclass_2
9,19.0,0,0,8.05,1,0,1,0,1


In [None]:
exp.find_applicable_rules_for_samples(X_test.iloc[[i]], feature_names=list(feature_names), info=True)

Rules for instance 0
RULE 6:
13.50     < Age       <= inf       or null
-inf      < Fare      <= 26.14     or null
0.50      < Sex_1     <= inf       or null
Class: 0
Scaled rule weight: 0.3333



[[6]]

### Questions

**Q 5.2.3 – Take a look at the local explanation for the sample with index 7 of the test set. Explain how the prediction is made.**

...

**Q 5.2.4 – Explain the main difference between RUG and the algorithm here that produces a decision list.**

...