-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FixedRectangularBinning
symbolization and Diversity
probabilities
#127
Conversation
Codecov Report
@@ Coverage Diff @@
## main #127 +/- ##
==========================================
+ Coverage 80.17% 81.61% +1.43%
==========================================
Files 35 36 +1
Lines 792 854 +62
==========================================
+ Hits 635 697 +62
Misses 157 157
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
(tag me explicitly when you think this is final and ready to review) |
@Datseris If the tests pass and the documentation looks good after the latest commit, this is ready for reviewing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is RectangularBinMapping
? Where does it come from? Does it replace the RectangularBinEncoder
I have created? If so, why is the git diff like this? It never shows the RectangularBinEncoder
being replcaed by the Mapping
.
p.s.: The fact that this new FixedRectangularBinning
was written in the same file as the normal binning and was written in the lines before the normal binning made the git diff unecessarily difficult to review. If you check, it seems like most of the RectangularBinning
code is overwritten/changed in this PR.
function probabilities(x::AbstractVector{T}, est::Diversity) where T <: Real | ||
ds, binning = similarities_and_binning(x, est) | ||
bin_estimator = ValueHistogram(binning) | ||
|
||
return probabilities(ds, bin_estimator.binning) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here. We get a bin_estimator
that contains the binning
we give to ValueHistogram, but then in the end we actually only use the binning
...? What's the purpose of bin_estiamtor
then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore I thought we completely dissallowed use probabilities
with anything else besides <: ProbaiblitiesEstimator
. So how does this call using a binning
works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore I thought we completely dissallowed use probabilities with anything else besides <: ProbaiblitiesEstimator.
I can't remember anymore where the discussion ended up. There were probably some good arguments for/against in a PR/issue somewhere.
So how does this call using a binning works?
It just does
function probabilities(x::Vector_or_Dataset, binning::Floating_or_Fixed_RectBinning)
fasthist!(x, binning)[1]
end
which is convenient, because fasthist
isn't exported and writing probabilities(x, ValueHistogram(RectangularBinning())
is verbose.
If we don't want to have these convenience methods, it's straight-forward to just remove them and let
function probabilities(x::AbstractVector{T}, est::Diversity) where T <: Real
ds, binning = similarities_and_binning(x, est)
return fasthist!(ds, binning)[1]
end
and the same for probabilities_and_outcomes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueHistogram(RectangularBinning()) is verbose. Yes, but we have allowed ValueHistogram(ε)
to initialize the binning internally. I think we should stick with not allowing probabilities
to take anything other than an estimator.
Co-authored-by: George Datseris <datseris.george@gmail.com>
Co-authored-by: George Datseris <datseris.george@gmail.com>
Co-authored-by: George Datseris <datseris.george@gmail.com>
Co-authored-by: George Datseris <datseris.george@gmail.com>
The
The git diff became weird when I merged |
@@ -1,4 +1,5 @@ | |||
using Entropies.DelayEmbeddings | |||
using DelayEmbeddings: genembed, Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make Entropies
reexport DelayEmbeddings
I guess
But these aren't intuitive names though. OrdinalPatterns is more intuitive than OrdinalMapping. I would argue that once again we violate the "one word per concept principle". Why are we using |
Ok, I think I addressed all your comments now, @Datseris. The CI docs still fail to run, though, and it seems to be related to the use of ChaosTools in the docs (but they look good locally). |
ill take care of the docs in a different PR |
What's new in this PR?
This PR implements the diversity entropy (Wang et al., 2020), and its underlying probability estimator.
Changes/additions:
FixedRectangularBinning
symbolization scheme, which is required for the new method (fixes Rectangular binning with custom (fixed) ranges #118).alphabet_length
implemented for bothRectangularBinning
andFixedRectangularBinning
(fixes Implementalphabet_length
for histograms #103).Diversity
probability estimator, with analytical tests from the paper.FixedRectangularBinning
andDiversity
.RectangularBinning
to highlight how it is different fromFixedRectangularBinning
.References
Wang, X., Si, S., & Li, Y. (2020). Multiscale diversity entropy: A novel dynamical measure for fault diagnosis of rotating machinery. IEEE Transactions on Industrial Informatics, 17(8), 5419-5429.