Permutation entropy in higher dimensions (2D) #74

kahaaga · 2022-08-31T22:05:42Z

What is this PR?

A permutation entropy estimator for 2D arrays with arbitrary element type (matrices of numbers, strings , or any other custom type which can be sorted will work).

Interface

I use a flexible stencil-based approach, as discussed in #73. The new estimator has two signatures:

PermutationStencil(stencil::AbstractArray{T, D})
PermutationStencil(m::Int, D::Int)

where the first once takes any (D-dimensional) (hyper)rectangular stencil as input, and the second one is a convenience constructor for square stencils of size m*m in D dimensions. I've only implemented the estimator for D == 2 (for use on images, matrices).

The 3D case requires a bit more though, and I will submit that in a separate PR, once we've decided on the final api design (see below).

Basic usage

The interface is the same as for most other estimators. Below is a reproduction of the example in Riberio et al. (2012)

julia> x = [1 2 1; 8 3 4; 6 7 5]
3×3 Matrix{Int64}:
 1  2  1
 8  3  4
 6  7  5

julia> m, D = 2, 2
(2, 2)

julia> est = PermutationStencil(m, D)
PermutationStencil{2}(Bool[1 1; 1 1])

julia> # Compute the entropy to base 2 and order 1
       h = genentropy(x, est, base = 2, q = 1)
0.32715643797829735

julia> # Only compute probabilites
       ps = probabilities(x, est)
3-element Probabilities{Float64}:
 0.5
 0.25
 0.25

Generators for repeated application

Again, I find that the generator approach discussed in #70 is necessary for efficient repeated computations.
I here use both probabilitygenerator/ProbabilityGenerator and entropygenerator/EntropyGenerator. I'm favoring the approach in this PR, where probability generation and entropy computation is done separately.

# A collection of ten square matrices
xs = [rand(["a", "b", "c"], 5, 5) for i = 1:10]

# Probability generator that uses the first matrix as template. Any array with the same type and size can then be 
# used as input to to generator to obtain probabilities.
m, D = 2, 2
est = PermutationStencil(m, D)
pgen = probabilitygenerator(first(xs), est);

# Compute probabilities for each matrix:
[pgen(xi) for xi in xs]

# Entropy generators work the same way. Internally, they pre-allocate based on a template matrix, such that
# entropy may be computed efficiently for many arrays of the same size and element type.
hgen = entropygenerator(first(xs), est);

[hgen(xi, base = 2, q = 2) for xi in xs]

Tests

I have implemented generic tests, and successfully reproduce the example from Riberio et al. (2012)

What needs to be discussed before merging?

We need to decide on

If we want generators that pre-allocate (I vote yes, because it is much more efficient that repeated initialization)
What the api should look like (I'm happy with what I've implemented here (based on TimeseriesSurrogates.jl)
If we want ProbabilityGenerators and EntropyGenerators, or just have EntropyGenerators.

I definitely favor having both ProbabilityGenerators and EntropyGenerators, because only probabilities (and not entropies) are needed for #75. By separating, we can compute both entropy and other measures that require probabilities without double the computations. The generators are also trivially extendable to other methods, if wanted/needed.

References

Ribeiro, H. V., Zunino, L., Lenzi, E. K., Santoro, P. A., & Mendes, R. S. (2012). Complexity-entropy causality plane as a complexity measure for two-dimensional patterns.

Add random to deps Separate probabilities and entropy estimators

codecov · 2022-08-31T22:10:01Z

Codecov Report

Merging #74 (f4bd4d2) into main (02fbdcc) will decrease coverage by 1.84%.
The diff coverage is 66.00%.

@@            Coverage Diff             @@
##             main      #74      +/-   ##
==========================================
- Coverage   79.32%   77.48%   -1.85%     
==========================================
  Files          21       27       +6     
  Lines         624      724     +100     
==========================================
+ Hits          495      561      +66     
- Misses        129      163      +34

Impacted Files	Coverage Δ
src/disequilibrium/disequilibrium.jl	`0.00% <0.00%> (ø)`
src/disequilibrium/generator.jl	`0.00% <0.00%> (ø)`
src/symbolic/symbolic.jl	`0.00% <ø> (ø)`
.../symbolic/higher_dim/stencil/PermutationStencil.jl	`33.33% <33.33%> (ø)`
src/symbolic/higher_dim/stencil/stencil_2d.jl	`82.35% <82.35%> (ø)`
src/api.jl	`100.00% <100.00%> (ø)`
src/symbolic/higher_dim/utils.jl	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Datseris · 2022-09-03T18:01:50Z

Thanks for your effort @kahaaga . If it is okay with you, I'd like to implement a different solution to #73, that is both more performant, and in my opinion a simpler interface with less source code. I've written code that iterates over "neighborhoods" of multi dimensional arrays several times, and refined it over three different versions, starting from TimeseriesPrediction.jl, into its fastest possible form that exists currently in Agents.jl.

Here is my code outline, and I show below an example of the API as seen by a user.

Stencils are Vector{CartesianIndex}. The user just gives the indices of offsets of each entry of the array that defines the stencil. This is simpler, takes less memory, allows for efficient iterations, allows as arbitrary stencils as the current code, and naturally integrates with view so that there are no unecessary allocations.
An iterator (that already exists in Agents.jl) scans through the original array and iterates using the stencil. It views the array, and efficiently uses vec on the view.
I do not see yet the advantage of introducing new API functions that are the "generator" approaches. With the solution I've outlined above, there is nothing to be pre-allocated because there are no allocations done anyways.
Given that we iterate over vector-like views of an array, this should lead to large code re-use of the existing permutation entropy methods.
The code in Agents.jl also allows for iteration with periodic boundary conditions, or terminating boundary conditions. A user may choose if they want periodic warping or not. If not, points that do not fullfill the stencil extent will be skipped.
The solution I propose is dimension agnostic. The exact same source code will work for any dimensionality of input data.
The ordering of the stencil is clear. It is the same as the order of the offsets given in the array. If a user makes a stencil using a 2x2 matrix, there is ambiguity whther the top right or bottom left entry comes first. No such ambiguity with my proposed solution.

Here's how it looks like. The following defines a 2x2 "square" stencil that goes from the pixel of the array to the bottom and right.

stencil = PermutationStencil([CartesianIndex(0,1), CartesianIndex(0,1), CartesianIndex(1,1)]; periodic = true)
# stencils always include the 0,0 cartesian index.

data = [rand(50,50) for _ in 1:50]

perment = genentropy(data, stencil)

So, given my familiarity with the codebase of Agents.jl, I'll open a second PR where I implement this interface. The amount of new API stuff is definitely much much smaller. Whether it is much much faster as well, we will have to see once I open it. I have not read any other PRs yet, like #70, so I don't know yet about this generator interface you want. It will take some time to go to the other stuff.

kahaaga · 2022-09-03T18:54:42Z

Thanks for your effort @kahaaga . If it is okay with you, I'd like to implement a different solution to #73, that is both more performant, and in my opinion a simpler interface with less source code. I've written code that iterates over "neighborhoods" of multi dimensional arrays several times, and refined it over three different versions, starting from TimeseriesPrediction.jl, into its fastest possible form that exists currently in Agents.jl.

Hey @Datseris! Faster, less-code approaches are always welcome. I feel no particular urge to stick with my implementation if you've got something more efficient. I recently started using Agents.jl myself and was amazed by both the performance and simplicity of the code/api, so more of that, please 😁

I suggest we leave this PR open until you've submitted your version, and then compare performance.

I do not see yet the advantage of introducing new API functions that are the "generator" approaches. With the solution I've outlined above, there is nothing to be pre-allocated because there are no allocations done anyways.

For the permutation entropy-related stuff here, it may not be necessary with a generator approach if your implementation is just as efficient. However, it is definitely a viable approach to increase performance for other methods.

I have not read any other PRs yet, like #70, so I don't know yet about this generator interface you want.

My main take-away from working on the currently open PRs there is a lot to be gained from the pre-allocation approach using a generator. A slow-performing (allocation-a-lot) approach can be the difference between waiting hours to waiting weeks for numerical experiments to finish.

The pre-allocation/generator approach becomes highly relevant in the context null hypothesis testing for the methods in CausalityTools and friends. A user might trigger hundreds of thousands or millions of entropy calculations under the hood for a single experiment. For such uses cases, pre-allocation things like histogram containers will benefit performance greatly.

Again, perhaps this is not relevant for permutation-based methods, but for other methods, the performance gain can be pretty substantial.

However, I do like the spirit of keeping everything JuliaDynamics nice and clean API-wise. The generator functionality I want doesn't necessarily fit best in Entropies.jl. I could also move the performance-critical stuff to CausalityToolsBase, or create a EntropyGenerator package or something like that for it.

Let's revisit this when you've had time to look at the other PRs too, and your solution to #73 is ready.

It will take some time to go to the other stuff.

No worries. I've got plenty of other stuff to do in the meanwhile that doesn't hinge on any of my currently open PRs :D

Datseris · 2022-09-05T06:34:45Z

Shouldn't we close this in favor of #78 ...?

Symbolic permutation 2D

10178cb

Add random to deps Separate probabilities and entropy estimators

kahaaga marked this pull request as draft August 31, 2022 22:05

kahaaga mentioned this pull request Aug 31, 2022

Implement disequilibrium #75

Closed

Stencil approach

d67ca4c

kahaaga changed the title ~~Symbolic permutation 2D~~ Permutation entropy in higher dimensions (only 2D for now) Sep 1, 2022

kahaaga changed the title ~~Permutation entropy in higher dimensions (only 2D for now)~~ Permutation entropy in higher dimensions (2D) Sep 1, 2022

kahaaga added 2 commits September 3, 2022 12:22

Fix erroneously hard-coded value

be7eaa1

Fix double import

85190bc

Datseris mentioned this pull request Sep 3, 2022

Permutation Entropy for 2D and 3D Objects #73

Closed

remove default value

435937b

Disequilibrium

f4bd4d2

Datseris mentioned this pull request Sep 4, 2022

Spatiotemporal Permutation Entropy #78

Merged

kahaaga closed this Sep 5, 2022

kahaaga deleted the symbolic2d branch August 25, 2023 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permutation entropy in higher dimensions (2D) #74

Permutation entropy in higher dimensions (2D) #74

kahaaga commented Aug 31, 2022 •

edited

Loading

codecov bot commented Aug 31, 2022 •

edited

Loading

Datseris commented Sep 3, 2022 •

edited

Loading

kahaaga commented Sep 3, 2022 •

edited

Loading

Datseris commented Sep 5, 2022

Permutation entropy in higher dimensions (2D) #74

Permutation entropy in higher dimensions (2D) #74

Conversation

kahaaga commented Aug 31, 2022 • edited Loading

What is this PR?

Interface

Basic usage

Generators for repeated application

Tests

What needs to be discussed before merging?

References

codecov bot commented Aug 31, 2022 • edited Loading

Codecov Report

Datseris commented Sep 3, 2022 • edited Loading

kahaaga commented Sep 3, 2022 • edited Loading

Datseris commented Sep 5, 2022

kahaaga commented Aug 31, 2022 •

edited

Loading

codecov bot commented Aug 31, 2022 •

edited

Loading

Datseris commented Sep 3, 2022 •

edited

Loading

kahaaga commented Sep 3, 2022 •

edited

Loading