## Import Packages 

In [1]:
import dataset
import tree as miptree
from sklearn import tree
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

## Set Args

In [2]:
timelimit = 600
seed = 42
d = 2

In [3]:
train_ratio = 0.5
val_ratio = 0.25
test_ratio = 0.25

## Load Data 

In [4]:
x, y = dataset.loadData('house-votes-84')

In [5]:
x_enc = dataset.oneHot(x)

In [6]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=1-train_ratio, random_state=seed)
x_val, x_test, y_val, y_test = train_test_split(x_test, y_test, 
                                                test_size=test_ratio/(test_ratio+val_ratio), random_state=seed)
x_train_enc, x_test_enc, y_train, y_test = train_test_split(x_enc, y, test_size=1-train_ratio, random_state=seed)
x_val_enc, x_test_enc, y_val, y_test = train_test_split(x_test_enc, y_test, 
                                                        test_size=test_ratio/(test_ratio+val_ratio), random_state=seed)

## Optimal Classification Tree

In [7]:
octree = miptree.optimalDecisionTreeClassifier(max_depth=d, min_samples_split=0, alpha=0.01, timelimit=timelimit)
octree.fit(x_train, y_train)

Training data include 116 instances, 16 features.
Academic license - for non-commercial use only - expires 2021-06-13
Using license file C:\Users\Apocrypse\gurobi.lic
Parameter outputFlag unchanged
   Value: 1  Min: 0  Max: 1  Default: 1
Parameter LogToConsole unchanged
   Value: 1  Min: 0  Max: 1  Default: 1
Changed value of parameter timelimit to 600.0
   Prev: inf  Min: 0.0  Max: inf  Default: inf
Parameter threads unchanged
   Value: 0  Min: 0  Max: 1024  Default: 0
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1552 rows, 546 columns and 16105 nonzeros
Model fingerprint: 0xfcf2b093
Variable types: 19 continuous, 527 integer (527 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+02]
  Objective range  [1e-02, 2e-02]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+02]

User MIP start produced solution with objective 0.0815385 (0.01s)
Loaded user MIP s

In [8]:
y_train_pred = octree.predict(x_train)
accuracy_score(y_train, y_train_pred)

0.9655172413793104

In [9]:
y_test_pred = octree.predict(x_test)
accuracy_score(y_test, y_test_pred)

0.9655172413793104

## Optimal Classification Tree with Binary Encoding

In [10]:
boct = miptree.binOptimalDecisionTreeClassifier(max_depth=d, min_samples_split=0, timelimit=timelimit)
boct.fit(x_train, y_train)

Training data include 116 instances, 16 features.
Parameter outputFlag unchanged
   Value: 1  Min: 0  Max: 1  Default: 1
Parameter LogToConsole unchanged
   Value: 1  Min: 0  Max: 1  Default: 1
Changed value of parameter timelimit to 600.0
   Prev: inf  Min: 0.0  Max: inf  Default: inf
Parameter threads unchanged
   Value: 0  Min: 0  Max: 1024  Default: 0
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 179 rows, 528 columns and 8472 nonzeros
Model fingerprint: 0x441ee230
Variable types: 472 continuous, 56 integer (56 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+02]
  Objective range  [2e-02, 2e-02]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+02]

User MIP start produced solution with objective 0.0615385 (0.02s)
Loaded user MIP start with objective 0.0615385

Presolve removed 27 rows and 99 columns
Presolve time: 0.03s
Presolved: 152 rows, 429 col

In [11]:
y_train_pred = boct.predict(x_train)
accuracy_score(y_train, y_train_pred)

0.9655172413793104

In [12]:
y_test_pred = boct.predict(x_test)
accuracy_score(y_test, y_test_pred)

0.9655172413793104

## Max Flow Classification Tree

In [13]:
mfoct = miptree.maxFlowOptimalDecisionTreeClassifier(max_depth=d, alpha=0, timelimit=timelimit)
mfoct.fit(x_train_enc, y_train)

Training data include 116 instances, 16 features.
Parameter outputFlag unchanged
   Value: 1  Min: 0  Max: 1  Default: 1
Parameter LogToConsole unchanged
   Value: 1  Min: 0  Max: 1  Default: 1
Changed value of parameter timelimit to 600.0
   Prev: inf  Min: 0.0  Max: inf  Default: inf
Parameter threads unchanged
   Value: 0  Min: 0  Max: 1024  Default: 0
Changed value of parameter lazyConstraints to 1
   Prev: 0  Min: 0  Max: 1  Default: 0
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 14 rows, 185 columns and 86 nonzeros
Model fingerprint: 0xacf718a6
Variable types: 116 continuous, 69 integer (69 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [2e-02, 2e-02]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]

User MIP start did not produce a new incumbent solution

Presolve removed 6 rows and 6 columns
Presolve time: 0.00s
Pre

In [14]:
y_train_pred = mfoct.predict(x_train_enc)
accuracy_score(y_train, y_train_pred)

0.9655172413793104

In [15]:
y_test_pred = mfoct.predict(x_test_enc)
accuracy_score(y_test, y_test_pred)

0.9655172413793104

## SK-Learn Decision Tree 

In [16]:
clf = tree.DecisionTreeClassifier(max_depth=d)
clf.fit(x_train, y_train)

DecisionTreeClassifier(max_depth=2)

In [17]:
y_train_pred = clf.predict(x_train)
accuracy_score(y_train, y_train_pred)

0.9655172413793104

In [18]:
y_test_pred = clf.predict(x_test)
accuracy_score(y_test, y_test_pred)

0.9655172413793104