Support for isolation forests #322

david-cortes · 2021-10-28T03:24:41Z

Fixes #321

This PR adds support for isolation forest model types by:

Adding a prediction transformation function which follows the formula used by this model type for obtaining a standardized score used for outlier detection.
Adding extra parameters as needed for this formula to work (a standardizing constant).
Adding conversors from the scikit-learn implementation.

A small note about the scikit-learn implementation: this PR adds a different prediction type from what scikit-learn outputs when calling decision_function. What this PR does is based on the reference paper that introduced this algorithm, and follows the idea of "higher score -> more anomalous" (opposite to scikit-learn, but it's what every other implementation does). Scikit-learn uses this same formula internally to calculate their decision_function, but the results will not match with this PR further than a few decimal places because scikit-learn uses a less precise formula than this PR at the moment (explained in scikit-learn/scikit-learn#19087).

I'm not sure if this PR modifies enough files for the documentation to update, and I don't know if the non-python interfaces will work though.

codecov · 2021-10-28T03:46:22Z

Codecov Report

Merging #322 (7df248b) into mainline (e524893) will decrease coverage by 0.87%.
The diff coverage is 89.56%.

@@              Coverage Diff               @@
##             mainline     #322      +/-   ##
==============================================
- Coverage       85.19%   84.31%   -0.88%     
+ Complexity         43       42       -1     
==============================================
  Files             108      108              
  Lines            8240     8329      +89     
  Branches          469      469              
==============================================
+ Hits             7020     7023       +3     
- Misses           1196     1283      +87     
+ Partials           24       23       -1

Impacted Files	Coverage Δ
include/treelite/frontend.h	`90.00% <ø> (ø)`
...main/java/ml/dmlc/treelite4j/java/TreeliteJNI.java	`45.45% <ø> (ø)`
tests/cpp/test_serializer.cc	`100.00% <ø> (ø)`
src/gtil/pred_transform.cc	`90.21% <20.00%> (-4.04%)`	⬇️
src/compiler/native/typeinfo_ctypes.h	`46.77% <41.66%> (-1.23%)`	⬇️
src/compiler/native/pred_transform.h	`81.48% <87.50%> (+0.65%)`	⬆️
include/treelite/predictor.h	`72.09% <100.00%> (+1.36%)`	⬆️
include/treelite/tree.h	`94.39% <100.00%> (ø)`
include/treelite/tree_impl.h	`90.67% <100.00%> (-0.04%)`	⬇️
python/treelite/sklearn/importer.py	`94.05% <100.00%> (+1.85%)`	⬆️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e524893...7df248b. Read the comment docs.

python/treelite/sklearn/__init__.py

hcho3

LGTM. Thank you for your contribution.

src/compiler/pred_transform.cc

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

hcho3 · 2021-11-03T17:53:42Z

Thanks!

tzemicheal · 2021-12-03T17:11:01Z

Thanks, @david-cortes for the contribution. I tested running small inference, couldn't able run the code. Could you provide a small test case running isolation forest in FIL inference?

from sklearn.ensemble import IsolationForest
import treelite
import treelite.sklearn

X = [[-1.1], [0.3], [0.5], [100]]
clf = IsolationForest(random_state=0).fit(X)
clf.predict([[0.1], [0], [90]])
clf.decision_function([[0.1], [0], [90]])


model = treelite.sklearn.import_model(clf)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-be8734c89694> in <module>
----> 1 model = treelite.sklearn.import_model(clf)

/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in import_model(sklearn_model)
    133
    134     if isinstance(sklearn_model, IsolationForest):
--> 135         ratio_c = expected_depth(sklearn_model.max_samples)
    136
    137     node_count = []

/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in expected_depth(n_remainder)
     47 def expected_depth(n_remainder):
     48     """Calculates the expected isolation depth for a remainder of uniform points"""
---> 49     if n_remainder <= 1:
     50         return 0
     51     if n_remainder == 2:

TypeError: '<=' not supported between instances of 'str' and 'int'

hcho3 · 2021-12-03T17:36:09Z

@tzemicheal This PR is not yet part of the stable release of Treelite (2.1.0). Did you install Treelite from the source?

david-cortes · 2021-12-03T17:37:18Z

@tzemicheal Thanks for pointing this out. I've submitted a small PR which should fix it.

tzemicheal · 2021-12-03T18:06:37Z

@tzemicheal This PR is not yet part of the stable release of Treelite (2.1.0). Did you install Treelite from the source?

Yes, I did install it from the source.

The 2.2.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#314 * dmlc/treelite#322, dmlc/treelite#327 * dmlc/treelite#325 * dmlc/treelite#332 * dmlc/treelite#330 * dmlc/treelite#333 * dmlc/treelite#334 * dmlc/treelite#304 * dmlc/treelite#335 In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for #4447. Requires rapidsai/integration#412. EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340). Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4484

The 2.2.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#314 * dmlc/treelite#322, dmlc/treelite#327 * dmlc/treelite#325 * dmlc/treelite#332 * dmlc/treelite#330 * dmlc/treelite#333 * dmlc/treelite#334 * dmlc/treelite#304 * dmlc/treelite#335 In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for rapidsai#4447. Requires rapidsai/integration#412. EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340). Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4484

david-cortes added 3 commits October 28, 2021 00:15

support for isolation forests

ade2ff3

linter

2f46f1d

fix java

fb485fd

david-cortes added 4 commits October 28, 2021 00:50

linter

84dbbd5

linter

a638d30

linter

bfadac1

linter

df80f53

hcho3 reviewed Oct 28, 2021

View reviewed changes

python/treelite/sklearn/__init__.py Outdated Show resolved Hide resolved

david-cortes added 4 commits October 28, 2021 01:29

remove iforest from 'import_model_with_model_builder'

49c7d86

remove unused

aef6aad

linter

46db915

linter

d3c6270

hcho3 requested changes Oct 28, 2021

View reviewed changes

src/compiler/pred_transform.cc Show resolved Hide resolved

Update src/compiler/pred_transform.cc

7df248b

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

hcho3 merged commit 2994be4 into dmlc:mainline Nov 3, 2021

hcho3 mentioned this pull request Nov 24, 2021

[FEA] Isolation Forest implementation with FIL inference capability rapidsai/cuml#3838

Open

david-cortes mentioned this pull request Dec 3, 2021

Fix IsolationForest with max_samples="auto" #327

Merged

This was referenced Jan 13, 2022

Release 2.2.0 #338

Merged

Upgrade Treelite to 2.2.1 rapidsai/cuml#4484

Merged

Fix PyBuffer serializer #340

Merged

hcho3 mentioned this pull request Feb 22, 2022

Test GTIL with IsolationForest #370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for isolation forests #322

Support for isolation forests #322

david-cortes commented Oct 28, 2021

codecov bot commented Oct 28, 2021 •

edited

Loading

hcho3 left a comment

hcho3 commented Nov 3, 2021

tzemicheal commented Dec 3, 2021

hcho3 commented Dec 3, 2021

david-cortes commented Dec 3, 2021

tzemicheal commented Dec 3, 2021

Support for isolation forests #322

Support for isolation forests #322

Conversation

david-cortes commented Oct 28, 2021

codecov bot commented Oct 28, 2021 • edited Loading

Codecov Report

hcho3 left a comment

Choose a reason for hiding this comment

hcho3 commented Nov 3, 2021

tzemicheal commented Dec 3, 2021

hcho3 commented Dec 3, 2021

david-cortes commented Dec 3, 2021

tzemicheal commented Dec 3, 2021

codecov bot commented Oct 28, 2021 •

edited

Loading