<a href="https://colab.research.google.com/github/VarunBabbar/SPLIT-ICML/blob/updated_readme/SPLIT_Usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Installing system dependencies (this may take ~5-10 minutes)**


In [1]:
!git clone -b updated_readme --single-branch https://github.com/VarunBabbar/SPLIT-ICML.git
!apt-get install -y libtbb-dev cmake ninja-build pkg-config libgmp-dev
!cd SPLIT-ICML && pip install split/ resplit/

Cloning into 'SPLIT-ICML'...
remote: Enumerating objects: 924, done.[K
remote: Counting objects: 100% (91/91), done.[K
remote: Compressing objects: 100% (74/74), done.[K
remote: Total 924 (delta 45), reused 24 (delta 9), pack-reused 833 (from 2)[K
Receiving objects: 100% (924/924), 3.55 MiB | 14.24 MiB/s, done.
Resolving deltas: 100% (353/353), done.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
cmake is already the newest version (3.22.1-1ubuntu1.22.04.2).
The following packages were automatically installed and are no longer required:
  libbz2-dev libpkgconf3 libreadline-dev
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  libgmpxx4ldbl libtbb12 libtbbmalloc2
Suggested packages:
  gmp-doc libgmp10-doc libmpfr-dev libtbb-doc
The following packages will be REMOVED:
  pkgconf r-base-dev
The following NEW packages will be installed:
  libgmp-dev libgmpxx4ldbl libtbb-dev libtbb12 libtbbmalloc2

## **Import packages and load dataset**

In [3]:
import sys
import os
import time
from split import SPLIT, LicketySPLIT
from resplit import RESPLIT
import pandas as pd

In [4]:
dataset_path = "SPLIT-ICML/resplit/example/compas.csv"

In [5]:
df = pd.read_csv(dataset_path) # dataset is already binarized

## **Parameters for SPLIT and LicketySPLIT**

In [6]:
lookahead_depth = 2
depth_buget = 5
regularization = 0.001

## **Train SPLIT**

In [7]:

X,y = df.iloc[:,:-1], df.iloc[:,-1]
# y should correspond to a binary class label.
model_split = SPLIT(lookahead_depth_budget=lookahead_depth, reg=regularization, full_depth_budget=depth_buget, verbose=False, binarize=True,time_limit=100)
# set binarize = True if dataset is not binarized
start_time = time.perf_counter()
model_split.fit(X,y)
end_time = time.perf_counter()
y_pred = model_split.predict(X)
tree = model_split.tree
print(tree)
print("Training_time: {}".format(end_time-start_time))


{ feature: 1 [ left child: { feature: 4 [ left child: { prediction: 1, loss: 0.05814739689230919 }, right child: { feature: 5 [ left child: { prediction: 1, loss: 0.13286003470420837 }, right child: { feature: 7 [ left child: { prediction: 1, loss: 0.005070993676781654 }, right child: { prediction: 0, loss: 0.13218390941619873 }] }] }] }, right child: { feature: 5 [ left child: { feature: 7 [ left child: { feature: 8 [ left child: { prediction: 0, loss: 0.012756689451634884 }, right child: { prediction: 1, loss: 0.04884878918528557 }] }, right child: { prediction: 0, loss: 0.05040448158979416 }] }, right child: { prediction: 0, loss: 0.18139390647411346 }] }] }
Training_time: 0.8152883660000043


## **Train LicketySPLIT**

In [8]:
model_licketysplit = LicketySPLIT(reg = regularization, full_depth_budget=depth_buget, binarize=False)
start_time = time.perf_counter()
model_licketysplit.fit(X,y)
end_time = time.perf_counter()
y_pred = model_split.predict(X)
tree = model_split.tree
loss = (y_pred!= y).mean()
print(tree)
print("Training_time: {}".format(end_time-start_time))
print("Loss", loss)

{ feature: 1 [ left child: { feature: 4 [ left child: { prediction: 1, loss: 0.05814739689230919 }, right child: { feature: 5 [ left child: { prediction: 1, loss: 0.13286003470420837 }, right child: { feature: 7 [ left child: { prediction: 1, loss: 0.005070993676781654 }, right child: { prediction: 0, loss: 0.13218390941619873 }] }] }] }, right child: { feature: 5 [ left child: { feature: 7 [ left child: { feature: 8 [ left child: { prediction: 0, loss: 0.012756689451634884 }, right child: { prediction: 1, loss: 0.04884878918528557 }] }, right child: { prediction: 0, loss: 0.05040448158979416 }] }, right child: { prediction: 0, loss: 0.18139390647411346 }] }] }
Training_time: 0.45865357900015624
Loss 0.3101101749837978


## **Find Rashomon set using RESPLIT**

In [9]:
config = {
    "regularization": 0.005,
    "rashomon_bound_multiplier": 0.01, # Sets the Rashomon set threshold as the set of all models which are within `(1+ε)L*` of the best loss `L*`.
    "depth_budget": 5,
    'cart_lookahead_depth': 3,
    "verbose": False
}
rset = RESPLIT(config, fill_tree = "treefarms")

# Options for fill_tree: "treefarms", "optimal", "greedy".
# "treefarms" will fill each leaf of each prefix with another TreeFARMS Rashomon set.
# "optimal" will complete prefixes using GOSDT.
# "greedy" will do so using greedy completions.
rset.fit(X,y)
i = 0
tree = rset[i] # get the ith tree


Found set of near optimal prefixes. Filling in their leaves now.


100%|██████████| 23/23 [00:18<00:00,  1.21it/s]

23





In [11]:
print(tree)
y_pred = rset.predict(X,i) # predictions for the ith tree

{ feature: 28 [ left child: { feature: 7 [ left child: { prediction: 0, loss: 0.16121192482177576 }, right child: { prediction: 1, loss: 0.008587167854828257 }] }, right child: { feature: 1 [ left child: { feature: 30 [ left child: { prediction: 0, loss: 0.02624756966947505 }, right child: { prediction: 1, loss: 0.03775113415424498 }] }, right child: { prediction: 1, loss: 0.0865197666882696 }] }] }


In [13]:
print(len(rset))

91
