Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for isolation forests #322

Merged
merged 12 commits into from
Nov 3, 2021
Merged

Support for isolation forests #322

merged 12 commits into from
Nov 3, 2021

Conversation

david-cortes
Copy link
Contributor

Fixes #321

This PR adds support for isolation forest model types by:

  • Adding a prediction transformation function which follows the formula used by this model type for obtaining a standardized score used for outlier detection.
  • Adding extra parameters as needed for this formula to work (a standardizing constant).
  • Adding conversors from the scikit-learn implementation.

A small note about the scikit-learn implementation: this PR adds a different prediction type from what scikit-learn outputs when calling decision_function. What this PR does is based on the reference paper that introduced this algorithm, and follows the idea of "higher score -> more anomalous" (opposite to scikit-learn, but it's what every other implementation does). Scikit-learn uses this same formula internally to calculate their decision_function, but the results will not match with this PR further than a few decimal places because scikit-learn uses a less precise formula than this PR at the moment (explained in scikit-learn/scikit-learn#19087).

I'm not sure if this PR modifies enough files for the documentation to update, and I don't know if the non-python interfaces will work though.

@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #322 (7df248b) into mainline (e524893) will decrease coverage by 0.87%.
The diff coverage is 89.56%.

Impacted file tree graph

@@              Coverage Diff               @@
##             mainline     #322      +/-   ##
==============================================
- Coverage       85.19%   84.31%   -0.88%     
+ Complexity         43       42       -1     
==============================================
  Files             108      108              
  Lines            8240     8329      +89     
  Branches          469      469              
==============================================
+ Hits             7020     7023       +3     
- Misses           1196     1283      +87     
+ Partials           24       23       -1     
Impacted Files Coverage Δ
include/treelite/frontend.h 90.00% <ø> (ø)
...main/java/ml/dmlc/treelite4j/java/TreeliteJNI.java 45.45% <ø> (ø)
tests/cpp/test_serializer.cc 100.00% <ø> (ø)
src/gtil/pred_transform.cc 90.21% <20.00%> (-4.04%) ⬇️
src/compiler/native/typeinfo_ctypes.h 46.77% <41.66%> (-1.23%) ⬇️
src/compiler/native/pred_transform.h 81.48% <87.50%> (+0.65%) ⬆️
include/treelite/predictor.h 72.09% <100.00%> (+1.36%) ⬆️
include/treelite/tree.h 94.39% <100.00%> (ø)
include/treelite/tree_impl.h 90.67% <100.00%> (-0.04%) ⬇️
python/treelite/sklearn/importer.py 94.05% <100.00%> (+1.85%) ⬆️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e524893...7df248b. Read the comment docs.

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for your contribution.

src/compiler/pred_transform.cc Show resolved Hide resolved
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
@hcho3 hcho3 merged commit 2994be4 into dmlc:mainline Nov 3, 2021
@hcho3
Copy link
Collaborator

hcho3 commented Nov 3, 2021

Thanks!

@tzemicheal
Copy link

Thanks, @david-cortes for the contribution. I tested running small inference, couldn't able run the code. Could you provide a small test case running isolation forest in FIL inference?

from sklearn.ensemble import IsolationForest
import treelite
import treelite.sklearn

X = [[-1.1], [0.3], [0.5], [100]]
clf = IsolationForest(random_state=0).fit(X)
clf.predict([[0.1], [0], [90]])
clf.decision_function([[0.1], [0], [90]])


model = treelite.sklearn.import_model(clf)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-be8734c89694> in <module>
----> 1 model = treelite.sklearn.import_model(clf)

/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in import_model(sklearn_model)
    133
    134     if isinstance(sklearn_model, IsolationForest):
--> 135         ratio_c = expected_depth(sklearn_model.max_samples)
    136
    137     node_count = []

/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in expected_depth(n_remainder)
     47 def expected_depth(n_remainder):
     48     """Calculates the expected isolation depth for a remainder of uniform points"""
---> 49     if n_remainder <= 1:
     50         return 0
     51     if n_remainder == 2:

TypeError: '<=' not supported between instances of 'str' and 'int'

@hcho3
Copy link
Collaborator

hcho3 commented Dec 3, 2021

@tzemicheal This PR is not yet part of the stable release of Treelite (2.1.0). Did you install Treelite from the source?

@david-cortes
Copy link
Contributor Author

@tzemicheal Thanks for pointing this out. I've submitted a small PR which should fix it.

@tzemicheal
Copy link

@tzemicheal This PR is not yet part of the stable release of Treelite (2.1.0). Did you install Treelite from the source?

Yes, I did install it from the source.

rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jan 25, 2022
The 2.2.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#314
* dmlc/treelite#322, dmlc/treelite#327
* dmlc/treelite#325
* dmlc/treelite#332
* dmlc/treelite#330
* dmlc/treelite#333
* dmlc/treelite#334
* dmlc/treelite#304
* dmlc/treelite#335

In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for #4447.

Requires rapidsai/integration#412.

EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340).

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4484
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The 2.2.0 version of Treelite incorporates the following major improvements:

* dmlc/treelite#314
* dmlc/treelite#322, dmlc/treelite#327
* dmlc/treelite#325
* dmlc/treelite#332
* dmlc/treelite#330
* dmlc/treelite#333
* dmlc/treelite#334
* dmlc/treelite#304
* dmlc/treelite#335

In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for rapidsai#4447.

Requires rapidsai/integration#412.

EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340).

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: predicting for isolation forests
3 participants