-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for isolation forests #322
Conversation
Codecov Report
@@ Coverage Diff @@
## mainline #322 +/- ##
==============================================
- Coverage 85.19% 84.31% -0.88%
+ Complexity 43 42 -1
==============================================
Files 108 108
Lines 8240 8329 +89
Branches 469 469
==============================================
+ Hits 7020 7023 +3
- Misses 1196 1283 +87
+ Partials 24 23 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for your contribution.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Thanks! |
Thanks, @david-cortes for the contribution. I tested running small inference, couldn't able run the code. Could you provide a small test case running isolation forest in FIL inference? from sklearn.ensemble import IsolationForest
import treelite
import treelite.sklearn
X = [[-1.1], [0.3], [0.5], [100]]
clf = IsolationForest(random_state=0).fit(X)
clf.predict([[0.1], [0], [90]])
clf.decision_function([[0.1], [0], [90]])
model = treelite.sklearn.import_model(clf)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-be8734c89694> in <module>
----> 1 model = treelite.sklearn.import_model(clf)
/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in import_model(sklearn_model)
133
134 if isinstance(sklearn_model, IsolationForest):
--> 135 ratio_c = expected_depth(sklearn_model.max_samples)
136
137 node_count = []
/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in expected_depth(n_remainder)
47 def expected_depth(n_remainder):
48 """Calculates the expected isolation depth for a remainder of uniform points"""
---> 49 if n_remainder <= 1:
50 return 0
51 if n_remainder == 2:
TypeError: '<=' not supported between instances of 'str' and 'int' |
@tzemicheal This PR is not yet part of the stable release of Treelite (2.1.0). Did you install Treelite from the source? |
@tzemicheal Thanks for pointing this out. I've submitted a small PR which should fix it. |
Yes, I did install it from the source. |
The 2.2.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#314 * dmlc/treelite#322, dmlc/treelite#327 * dmlc/treelite#325 * dmlc/treelite#332 * dmlc/treelite#330 * dmlc/treelite#333 * dmlc/treelite#334 * dmlc/treelite#304 * dmlc/treelite#335 In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for #4447. Requires rapidsai/integration#412. EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340). Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4484
The 2.2.0 version of Treelite incorporates the following major improvements: * dmlc/treelite#314 * dmlc/treelite#322, dmlc/treelite#327 * dmlc/treelite#325 * dmlc/treelite#332 * dmlc/treelite#330 * dmlc/treelite#333 * dmlc/treelite#334 * dmlc/treelite#304 * dmlc/treelite#335 In particular, dmlc/treelite#332, dmlc/treelite#330, dmlc/treelite#333 are required for rapidsai#4447. Requires rapidsai/integration#412. EDIT. Using 2.2.1 patch release, to incorporate a hotfix (dmlc/treelite#340). Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4484
Fixes #321
This PR adds support for isolation forest model types by:
A small note about the scikit-learn implementation: this PR adds a different prediction type from what scikit-learn outputs when calling
decision_function
. What this PR does is based on the reference paper that introduced this algorithm, and follows the idea of "higher score -> more anomalous" (opposite to scikit-learn, but it's what every other implementation does). Scikit-learn uses this same formula internally to calculate theirdecision_function
, but the results will not match with this PR further than a few decimal places because scikit-learn uses a less precise formula than this PR at the moment (explained in scikit-learn/scikit-learn#19087).I'm not sure if this PR modifies enough files for the documentation to update, and I don't know if the non-python interfaces will work though.