Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/local outlier factor #164

Merged
merged 12 commits into from Sep 23, 2019
Merged

Feature/local outlier factor #164

merged 12 commits into from Sep 23, 2019

Conversation

pablomm
Copy link
Member

@pablomm pablomm commented Sep 12, 2019

  • Created LocalOutlierFactor (which wraps scikit-learn multivariate version)
  • Example in gallery of detection of outliers
  • New real dataset employed in the example (fetch_octane)
  • Test and Doctests added

@codecov
Copy link

codecov bot commented Sep 12, 2019

Codecov Report

Merging #164 into develop will increase coverage by 0.32%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #164      +/-   ##
===========================================
+ Coverage    72.74%   73.06%   +0.32%     
===========================================
  Files           40       41       +1     
  Lines         3900     3947      +47     
===========================================
+ Hits          2837     2884      +47     
  Misses        1063     1063
Impacted Files Coverage Δ
skfda/_neighbors/base.py 100% <ø> (ø) ⬆️
skfda/_neighbors/regression.py 100% <ø> (ø) ⬆️
skfda/_neighbors/classification.py 100% <ø> (ø) ⬆️
skfda/_neighbors/unsupervised.py 100% <ø> (ø) ⬆️
skfda/_neighbors/outlier.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e81577b...1640735. Read the comment docs.

@vnmabus vnmabus added the pending theory The theoretical properties of the methods implemented are not yet understood in FDA. label Sep 18, 2019
Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed this method and we are not sure if it works for the population in a functional data context (it will probably work, but there is no theory published). Thus, we will keep this PR "on hold" until we have a better understanding on the theory. If you have more info about the usage of this method in FDA, please tell us.

Local Outlier Factor
--------------------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief explanation missing.

@pablomm
Copy link
Member Author

pablomm commented Sep 19, 2019

We have discussed this method and we are not sure if it works for the population in a functional data context (it will probably work, but there is no theory published). Thus, we will keep this PR "on hold" until we have a better understanding on the theory. If you have more info about the usage of this method in FDA, please tell us.

I didn't do much research, I read the original paper, and then I saw others in which it was applied for time series, and as it is based on proximity I extended it in the same way as the rest of the k-nn estimators, in which it worked for the fda case.

Apart from that and the tests I did with different datasets outliers comparing the results with the other methods we have implemented I did not investigate further.

@vnmabus
Copy link
Member

vnmabus commented Sep 19, 2019

The thing is that one theoretical motivation of the nearest neighbors methods is the estimation of probability densities, which do not exist in functional data. They told me that the other nearest neighbors methods (and probably this one) can have a more rigorous foundation in FDA based in Radon-Nikodym derivatives, which sometimes do exist in FDA, and can be seen as the equivalent of a quotient of densities. But the fact is that no one has tried to extend the local outlier factor to FDA right now, so I prefer to err on the side of caution.

@pablomm
Copy link
Member Author

pablomm commented Sep 21, 2019

Okay, I understand your point of view. Keep me informed of progress in this regard.

I understand that this branch will be blocked for quite some time. I want to add some enhancements to the efficiency of the knn estimators to finish the work on this module. To keep everything updated easily I had thought to add the changes but without adding this estimator to the documentation, leaving it alone in the private neighbor module, what do you think?

@vnmabus
Copy link
Member

vnmabus commented Sep 21, 2019

Okay, I understand your point of view. Keep me informed of progress in this regard.

I understand that this branch will be blocked for quite some time. I want to add some enhancements to the efficiency of the knn estimators to finish the work on this module. To keep everything updated easily I had thought to add the changes but without adding this estimator to the documentation, leaving it alone in the private neighbor module, what do you think?

Ok, as long as the LocalOutlierFactor object is private. I would also create an issue so that we will not forget about it.

@pablomm
Copy link
Member Author

pablomm commented Sep 21, 2019

Ok, as long as the LocalOutlierFactor object is private. I would also create an issue so that we will not forget about it.

Done.

@pablomm pablomm merged commit d293e5a into develop Sep 23, 2019
@pablomm pablomm deleted the feature/local-outlier-factor branch September 23, 2019 15:30
DavidGarciaFer pushed a commit that referenced this pull request Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement pending theory The theoretical properties of the methods implemented are not yet understood in FDA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants