Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixedRectangularBinning symbolization and Diversity probabilities #127

Merged
merged 26 commits into from
Oct 29, 2022

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Oct 12, 2022

What's new in this PR?

This PR implements the diversity entropy (Wang et al., 2020), and its underlying probability estimator.

Changes/additions:

  • FixedRectangularBinning symbolization scheme, which is required for the new method (fixes Rectangular binning with custom (fixed) ranges #118).
  • alphabet_length implemented for both RectangularBinning and FixedRectangularBinning (fixes Implement alphabet_length for histograms #103).
  • Added Diversity probability estimator, with analytical tests from the paper.
  • Documentation entries for the new FixedRectangularBinning and Diversity.
  • Modified docstring for RectangularBinning to highlight how it is different from FixedRectangularBinning.

References

Wang, X., Si, S., & Li, Y. (2020). Multiscale diversity entropy: A novel dynamical measure for fault diagnosis of rotating machinery. IEEE Transactions on Industrial Informatics, 17(8), 5419-5429.

@kahaaga kahaaga added enhancement New feature or request that is non-breaking new probabilities New probabilities estimator binning Related with binning: histograms, transfer operator hacktoberfest-accepted Accepted for the Hacktoberfest challenge labels Oct 12, 2022
@kahaaga kahaaga added the documentation Improvements or additions to documentation label Oct 12, 2022
@kahaaga kahaaga requested a review from Datseris October 12, 2022 07:08
@codecov
Copy link

codecov bot commented Oct 12, 2022

Codecov Report

Merging #127 (8a625b4) into main (e85b0fb) will increase coverage by 1.43%.
The diff coverage is 84.78%.

@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
+ Coverage   80.17%   81.61%   +1.43%     
==========================================
  Files          35       36       +1     
  Lines         792      854      +62     
==========================================
+ Hits          635      697      +62     
  Misses        157      157              
Impacted Files Coverage Δ
src/complexity_measures/sample_entropy.jl 80.55% <ø> (ø)
src/encoding/outcomes.jl 0.00% <0.00%> (ø)
...imators/permutation_ordinal/SymbolicPermutation.jl 72.97% <ø> (ø)
...imators/permutation_ordinal/spatial_permutation.jl 91.30% <ø> (ø)
...ities_estimators/histograms/rectangular_binning.jl 81.33% <78.94%> (-11.53%) ⬇️
.../complexity_measures/reverse_dispersion_entropy.jl 100.00% <100.00%> (ø)
src/encoding/gaussian_cdf.jl 100.00% <100.00%> (ø)
src/encoding/ordinal_pattern.jl 100.00% <100.00%> (ø)
.../probabilities_estimators/dispersion/dispersion.jl 76.19% <100.00%> (ø)
...rc/probabilities_estimators/diversity/diversity.jl 100.00% <100.00%> (ø)
... and 7 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kahaaga kahaaga mentioned this pull request Oct 20, 2022
3 tasks
@Datseris
Copy link
Member

(tag me explicitly when you think this is final and ready to review)

@kahaaga
Copy link
Member Author

kahaaga commented Oct 24, 2022

(tag me explicitly when you think this is final and ready to review)

@Datseris If the tests pass and the documentation looks good after the latest commit, this is ready for reviewing.

Copy link
Member

@Datseris Datseris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is RectangularBinMapping? Where does it come from? Does it replace the RectangularBinEncoder I have created? If so, why is the git diff like this? It never shows the RectangularBinEncoder being replcaed by the Mapping.

p.s.: The fact that this new FixedRectangularBinning was written in the same file as the normal binning and was written in the lines before the normal binning made the git diff unecessarily difficult to review. If you check, it seems like most of the RectangularBinning code is overwritten/changed in this PR.

src/probabilities_estimators/diversity/diversity.jl Outdated Show resolved Hide resolved
docs/src/examples.md Outdated Show resolved Hide resolved
src/probabilities_estimators/diversity/diversity.jl Outdated Show resolved Hide resolved
Comment on lines 62 to 67
function probabilities(x::AbstractVector{T}, est::Diversity) where T <: Real
ds, binning = similarities_and_binning(x, est)
bin_estimator = ValueHistogram(binning)

return probabilities(ds, bin_estimator.binning)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. We get a bin_estimator that contains the binning we give to ValueHistogram, but then in the end we actually only use the binning...? What's the purpose of bin_estiamtor then...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore I thought we completely dissallowed use probabilities with anything else besides <: ProbaiblitiesEstimator. So how does this call using a binning works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore I thought we completely dissallowed use probabilities with anything else besides <: ProbaiblitiesEstimator.

I can't remember anymore where the discussion ended up. There were probably some good arguments for/against in a PR/issue somewhere.

So how does this call using a binning works?

It just does

function probabilities(x::Vector_or_Dataset, binning::Floating_or_Fixed_RectBinning)
    fasthist!(x, binning)[1]
end

which is convenient, because fasthist isn't exported and writing probabilities(x, ValueHistogram(RectangularBinning()) is verbose.

If we don't want to have these convenience methods, it's straight-forward to just remove them and let

function probabilities(x::AbstractVector{T}, est::Diversity) where T <: Real
    ds, binning = similarities_and_binning(x, est)
    return fasthist!(ds, binning)[1]
end

and the same for probabilities_and_outcomes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueHistogram(RectangularBinning()) is verbose. Yes, but we have allowed ValueHistogram(ε) to initialize the binning internally. I think we should stick with not allowing probabilities to take anything other than an estimator.

src/probabilities_estimators/diversity/diversity.jl Outdated Show resolved Hide resolved
kahaaga and others added 4 commits October 29, 2022 00:46
Co-authored-by: George Datseris <datseris.george@gmail.com>
Co-authored-by: George Datseris <datseris.george@gmail.com>
Co-authored-by: George Datseris <datseris.george@gmail.com>
Co-authored-by: George Datseris <datseris.george@gmail.com>
@kahaaga
Copy link
Member Author

kahaaga commented Oct 28, 2022

What is RectangularBinMapping? Where does it come from? Does it replace the RectangularBinEncoder I have created? If so, why is the git diff like this? It never shows the RectangularBinEncoder being replcaed by the Mapping.

The RectangularBinMapping is exactly RectangularBinEncoder, just renamed, so that all encoding schemes have a common naming convention (like GaussianMapping and OrdinalMapping). Nothing else should have changed.

p.s.: The fact that this new FixedRectangularBinning was written in the same file as the normal binning and was written in the lines before the normal binning made the git diff unecessarily difficult to review. If you check, it seems like most of the RectangularBinning code is overwritten/changed in this PR.

The git diff became weird when I merged main into the diversity_probs branch. I guess it was because of what you said about the new lines added before the existing ones?

@@ -1,4 +1,5 @@
using Entropies.DelayEmbeddings
using DelayEmbeddings: genembed, Dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make Entropies reexport DelayEmbeddings I guess

@Datseris
Copy link
Member

The RectangularBinMapping is exactly RectangularBinEncoder, just renamed, so that all encoding schemes have a common naming convention (like GaussianMapping and OrdinalMapping).

But these aren't intuitive names though. OrdinalPatterns is more intuitive than OrdinalMapping. I would argue that once again we violate the "one word per concept principle". Why are we using Mapping here? Aren't these Encodings? We decided on the Encoding word, we might as well use it. Hence, RectangularBinEncoding, OrdinalPatternEncoding, etc.

@kahaaga
Copy link
Member Author

kahaaga commented Oct 29, 2022

Ok, I think I addressed all your comments now, @Datseris. The CI docs still fail to run, though, and it seems to be related to the use of ChaosTools in the docs (but they look good locally).

@Datseris Datseris merged commit a61c3a7 into main Oct 29, 2022
@Datseris Datseris deleted the diversity_probs branch October 29, 2022 12:39
@Datseris
Copy link
Member

ill take care of the docs in a different PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binning Related with binning: histograms, transfer operator documentation Improvements or additions to documentation enhancement New feature or request that is non-breaking hacktoberfest-accepted Accepted for the Hacktoberfest challenge new probabilities New probabilities estimator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rectangular binning with custom (fixed) ranges Implement alphabet_length for histograms
2 participants