New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/local outlier factor #164
Conversation
pablomm
commented
Sep 12, 2019
- Created LocalOutlierFactor (which wraps scikit-learn multivariate version)
- Example in gallery of detection of outliers
- New real dataset employed in the example (fetch_octane)
- Test and Doctests added
Codecov Report
@@ Coverage Diff @@
## develop #164 +/- ##
===========================================
+ Coverage 72.74% 73.06% +0.32%
===========================================
Files 40 41 +1
Lines 3900 3947 +47
===========================================
+ Hits 2837 2884 +47
Misses 1063 1063
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed this method and we are not sure if it works for the population in a functional data context (it will probably work, but there is no theory published). Thus, we will keep this PR "on hold" until we have a better understanding on the theory. If you have more info about the usage of this method in FDA, please tell us.
Local Outlier Factor | ||
-------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief explanation missing.
I didn't do much research, I read the original paper, and then I saw others in which it was applied for time series, and as it is based on proximity I extended it in the same way as the rest of the k-nn estimators, in which it worked for the fda case. Apart from that and the tests I did with different datasets outliers comparing the results with the other methods we have implemented I did not investigate further. |
The thing is that one theoretical motivation of the nearest neighbors methods is the estimation of probability densities, which do not exist in functional data. They told me that the other nearest neighbors methods (and probably this one) can have a more rigorous foundation in FDA based in Radon-Nikodym derivatives, which sometimes do exist in FDA, and can be seen as the equivalent of a quotient of densities. But the fact is that no one has tried to extend the local outlier factor to FDA right now, so I prefer to err on the side of caution. |
Okay, I understand your point of view. Keep me informed of progress in this regard. I understand that this branch will be blocked for quite some time. I want to add some enhancements to the efficiency of the knn estimators to finish the work on this module. To keep everything updated easily I had thought to add the changes but without adding this estimator to the documentation, leaving it alone in the private neighbor module, what do you think? |
Ok, as long as the |
Done. |
Feature/local outlier factor