# Timing Isolation Forest libraries

This notebooks produces timings on fitting Isolation Forest (iForest) and Extended Isolation Forest models to 3 datasets of varying sizes using the libraries [IsoTree](https://github.com/david-cortes/isotree), [SciKit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html), and [EIF](https://github.com/sahandha/eif).

All of the models produce 100 trees of the same fixed sample sizes. Some are run single-threaded and some multi-threaded where supported. The CPU used is an AMD Ryzen 2 2700 running at 3.2GHz and having 16 threads.

Timing sections:
* [Satellite (6435 rows, 36 columns)](#p1)
* [CovType (581,012 rows, 54 columns)](#p2)
* [RCV1 (804,414 rows, 47,236 columns)](#p3)

In [1]:
import numpy as np
from isotree import IsolationForest as IsolationForestIsoTree
from sklearn.ensemble import IsolationForest as IsolationForestSKL
from eif import iForest
from scipy.io import loadmat

<a id="p1"></a>
### Small dataset: Satellite (6435 rows, 36 columns)

Data was taken from the ODDS repository - [link](http://odds.cs.stonybrook.edu/satellite-dataset/).

In [2]:
satellite = loadmat("satellite.mat")
X = np.asfortranarray(satellite["X"]).astype(np.float64)
X.shape

(6435, 36)

Timing isotree, single-variable model:

In [3]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

5.31 ms ± 375 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [4]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

10.7 ms ± 39.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [5]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=6435,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

52.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

875 µs ± 35.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [7]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

1.64 ms ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [8]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=6435,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

6.41 ms ± 17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Timing isotree, extended model:

In [9]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

11 ms ± 50 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [10]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

31.7 ms ± 442 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [11]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=6435,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

186 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

2.24 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [13]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

5.63 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [14]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=6435,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

25.4 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Timing scikit-learn:

In [15]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=256,
                         n_jobs=1)
iso.fit(X)

166 ms ± 9.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [16]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=1024,
                         n_jobs=1)
iso.fit(X)

170 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [17]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=6435,
                         n_jobs=1)
iso.fit(X)

233 ms ± 5.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=256,
                         n_jobs=-1)
iso.fit(X)

305 ms ± 660 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=1024,
                         n_jobs=-1)
iso.fit(X)

305 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [20]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=6435,
                         n_jobs=-1)
iso.fit(X)

277 ms ± 44.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Timing eif, single-variable:

In [21]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=256, ExtensionLevel=0, seed=1)

98.9 ms ± 382 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [22]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=1024, ExtensionLevel=0, seed=1)

325 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [23]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=6435, ExtensionLevel=0, seed=1)

2.18 s ± 8.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Timing eif, extended model:

In [24]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=256, ExtensionLevel=1, seed=1)

94.3 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [25]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=1024, ExtensionLevel=1, seed=1)

333 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [26]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=6435, ExtensionLevel=1, seed=1)

2.21 s ± 16.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<a id="p2"></a>
### Mid-sized dataset: CovType (581,012 rows, 54 columns)

In [27]:
from sklearn.datasets import fetch_covtype

X, y = fetch_covtype(return_X_y=True)
X = np.asfortranarray(X).astype(np.float64)
X.shape

(581012, 54)

Timing isotree, single-variable model:

In [28]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

7.52 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [29]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

28.4 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [30]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

326 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [31]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

1.61 ms ± 7.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [32]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

6.31 ms ± 23.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [33]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

84.8 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Timing isotree, extended model:

In [34]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

13.9 ms ± 23.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [35]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

53.2 ms ± 721 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [36]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

604 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [37]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

3.26 ms ± 17.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [38]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

12.3 ms ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [39]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

168 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Timing scikit-learn:

In [40]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=256,
                         n_jobs=1)
iso.fit(X)

10.1 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [41]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=1024,
                         n_jobs=1)
iso.fit(X)

10.6 s ± 155 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [42]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=10000,
                         n_jobs=1)
iso.fit(X)

11.1 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [43]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=256,
                         n_jobs=-1)
iso.fit(X)

8.3 s ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [44]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=1024,
                         n_jobs=-1)
iso.fit(X)

8.01 s ± 71.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [45]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=10000,
                         n_jobs=-1)
iso.fit(X)

6.89 s ± 188 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Timing eif, single-variable model:

In [46]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=256, ExtensionLevel=0, seed=1)

149 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [47]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=1024, ExtensionLevel=0, seed=1)

398 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [48]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=10000, ExtensionLevel=0, seed=1)

4.99 s ± 73.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Timing eif, extended model:

In [49]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=256, ExtensionLevel=1, seed=1)

160 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [50]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=1024, ExtensionLevel=1, seed=1)

428 ms ± 8.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [51]:
%%timeit
iso = iForest(X, ntrees=100, sample_size=10000, ExtensionLevel=1, seed=1)

5.06 s ± 6.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<a id="p3"></a>
### Large sparse dataset: RCV1 (804,414 rows, 47,236 columns)

**(Note: the EIF library currently does not offer support for sparse input types)**

In [52]:
from sklearn.datasets import fetch_rcv1
from scipy.sparse import csc_matrix

X, y = fetch_rcv1(return_X_y=True)
X = csc_matrix(X).astype(np.float64)

Timing isotree, single-variable model:

In [53]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

67.7 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [54]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

118 ms ± 6.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [55]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=1, nthreads=1,
                             missing_action="fail")
iso.fit(X)

490 ms ± 3.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [56]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

45.6 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [57]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

51.3 ms ± 84.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [58]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=1, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

97.7 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Timing isotree, extended model:

In [59]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

152 ms ± 7.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [60]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

249 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [61]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=2, nthreads=1,
                             missing_action="fail")
iso.fit(X)

844 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [62]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=256,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

58.7 ms ± 69.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [63]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=1024,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

71.1 ms ± 861 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [64]:
%%timeit
iso = IsolationForestIsoTree(ntrees=100, sample_size=10000,
                             ndim=2, nthreads=-1,
                             missing_action="fail")
iso.fit(X)

145 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Timing scikit-learn:

**(Note: scikit-learn will run out-of-memory in this machine if ran with all the available threads)**

In [65]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=256,
                         n_jobs=1)
iso.fit(X)

30.9 s ± 48.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [66]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=1024,
                         n_jobs=1)
iso.fit(X)

31.6 s ± 64.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [67]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=10000,
                         n_jobs=1)
iso.fit(X)

32.8 s ± 126 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [68]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=256,
                         n_jobs=4)
iso.fit(X)

17.8 s ± 160 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [69]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=1024,
                         n_jobs=4)
iso.fit(X)

18.1 s ± 317 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [70]:
%%timeit
iso = IsolationForestSKL(n_estimators=100, max_samples=10000,
                         n_jobs=4)
iso.fit(X)

18.5 s ± 104 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
