Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutation entropy in higher dimensions (2D) #74

Closed
wants to merge 6 commits into from
Closed

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Aug 31, 2022

What is this PR?

A permutation entropy estimator for 2D arrays with arbitrary element type (matrices of numbers, strings , or any other custom type which can be sorted will work).

Interface

I use a flexible stencil-based approach, as discussed in #73. The new estimator has two signatures:

PermutationStencil(stencil::AbstractArray{T, D})
PermutationStencil(m::Int, D::Int)

where the first once takes any (D-dimensional) (hyper)rectangular stencil as input, and the second one is a convenience constructor for square stencils of size m*m in D dimensions. I've only implemented the estimator for D == 2 (for use on images, matrices).

The 3D case requires a bit more though, and I will submit that in a separate PR, once we've decided on the final api design (see below).

Basic usage

The interface is the same as for most other estimators. Below is a reproduction of the example in Riberio et al. (2012)

julia> x = [1 2 1; 8 3 4; 6 7 5]
3×3 Matrix{Int64}:
 1  2  1
 8  3  4
 6  7  5

julia> m, D = 2, 2
(2, 2)

julia> est = PermutationStencil(m, D)
PermutationStencil{2}(Bool[1 1; 1 1])

julia> # Compute the entropy to base 2 and order 1
       h = genentropy(x, est, base = 2, q = 1)
0.32715643797829735

julia> # Only compute probabilites
       ps = probabilities(x, est)
3-element Probabilities{Float64}:
 0.5
 0.25
 0.25

Generators for repeated application

Again, I find that the generator approach discussed in #70 is necessary for efficient repeated computations.
I here use both probabilitygenerator/ProbabilityGenerator and entropygenerator/EntropyGenerator. I'm favoring the approach in this PR, where probability generation and entropy computation is done separately.

# A collection of ten square matrices
xs = [rand(["a", "b", "c"], 5, 5) for i = 1:10]

# Probability generator that uses the first matrix as template. Any array with the same type and size can then be 
# used as input to to generator to obtain probabilities.
m, D = 2, 2
est = PermutationStencil(m, D)
pgen = probabilitygenerator(first(xs), est);

# Compute probabilities for each matrix:
[pgen(xi) for xi in xs]

# Entropy generators work the same way. Internally, they pre-allocate based on a template matrix, such that
# entropy may be computed efficiently for many arrays of the same size and element type.
hgen = entropygenerator(first(xs), est);

[hgen(xi, base = 2, q = 2) for xi in xs]

Tests

I have implemented generic tests, and successfully reproduce the example from Riberio et al. (2012)

What needs to be discussed before merging?

We need to decide on

  • If we want generators that pre-allocate (I vote yes, because it is much more efficient that repeated initialization)
  • What the api should look like (I'm happy with what I've implemented here (based on TimeseriesSurrogates.jl)
  • If we want ProbabilityGenerators and EntropyGenerators, or just have EntropyGenerators.

I definitely favor having both ProbabilityGenerators and EntropyGenerators, because only probabilities (and not entropies) are needed for #75. By separating, we can compute both entropy and other measures that require probabilities without double the computations. The generators are also trivially extendable to other methods, if wanted/needed.

References

Ribeiro, H. V., Zunino, L., Lenzi, E. K., Santoro, P. A., & Mendes, R. S. (2012). Complexity-entropy causality plane as a complexity measure for two-dimensional patterns.
 


Add random to deps


Separate probabilities and entropy estimators
@kahaaga kahaaga marked this pull request as draft August 31, 2022 22:05
@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #74 (f4bd4d2) into main (02fbdcc) will decrease coverage by 1.84%.
The diff coverage is 66.00%.

@@            Coverage Diff             @@
##             main      #74      +/-   ##
==========================================
- Coverage   79.32%   77.48%   -1.85%     
==========================================
  Files          21       27       +6     
  Lines         624      724     +100     
==========================================
+ Hits          495      561      +66     
- Misses        129      163      +34     
Impacted Files Coverage Δ
src/disequilibrium/disequilibrium.jl 0.00% <0.00%> (ø)
src/disequilibrium/generator.jl 0.00% <0.00%> (ø)
src/symbolic/symbolic.jl 0.00% <ø> (ø)
.../symbolic/higher_dim/stencil/PermutationStencil.jl 33.33% <33.33%> (ø)
src/symbolic/higher_dim/stencil/stencil_2d.jl 82.35% <82.35%> (ø)
src/api.jl 100.00% <100.00%> (ø)
src/symbolic/higher_dim/utils.jl 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kahaaga kahaaga mentioned this pull request Aug 31, 2022
@kahaaga kahaaga changed the title Symbolic permutation 2D Permutation entropy in higher dimensions (only 2D for now) Sep 1, 2022
@kahaaga kahaaga changed the title Permutation entropy in higher dimensions (only 2D for now) Permutation entropy in higher dimensions (2D) Sep 1, 2022
@Datseris
Copy link
Member

Datseris commented Sep 3, 2022

Thanks for your effort @kahaaga . If it is okay with you, I'd like to implement a different solution to #73, that is both more performant, and in my opinion a simpler interface with less source code. I've written code that iterates over "neighborhoods" of multi dimensional arrays several times, and refined it over three different versions, starting from TimeseriesPrediction.jl, into its fastest possible form that exists currently in Agents.jl.

Here is my code outline, and I show below an example of the API as seen by a user.

  1. Stencils are Vector{CartesianIndex}. The user just gives the indices of offsets of each entry of the array that defines the stencil. This is simpler, takes less memory, allows for efficient iterations, allows as arbitrary stencils as the current code, and naturally integrates with view so that there are no unecessary allocations.
  2. An iterator (that already exists in Agents.jl) scans through the original array and iterates using the stencil. It views the array, and efficiently uses vec on the view.
  3. I do not see yet the advantage of introducing new API functions that are the "generator" approaches. With the solution I've outlined above, there is nothing to be pre-allocated because there are no allocations done anyways.
  4. Given that we iterate over vector-like views of an array, this should lead to large code re-use of the existing permutation entropy methods.
  5. The code in Agents.jl also allows for iteration with periodic boundary conditions, or terminating boundary conditions. A user may choose if they want periodic warping or not. If not, points that do not fullfill the stencil extent will be skipped.
  6. The solution I propose is dimension agnostic. The exact same source code will work for any dimensionality of input data.
  7. The ordering of the stencil is clear. It is the same as the order of the offsets given in the array. If a user makes a stencil using a 2x2 matrix, there is ambiguity whther the top right or bottom left entry comes first. No such ambiguity with my proposed solution.

Here's how it looks like. The following defines a 2x2 "square" stencil that goes from the pixel of the array to the bottom and right.

stencil = PermutationStencil([CartesianIndex(0,1), CartesianIndex(0,1), CartesianIndex(1,1)]; periodic = true)
# stencils always include the 0,0 cartesian index.

data = [rand(50,50) for _ in 1:50]

perment = genentropy(data, stencil)

So, given my familiarity with the codebase of Agents.jl, I'll open a second PR where I implement this interface. The amount of new API stuff is definitely much much smaller. Whether it is much much faster as well, we will have to see once I open it. I have not read any other PRs yet, like #70, so I don't know yet about this generator interface you want. It will take some time to go to the other stuff.

@kahaaga
Copy link
Member Author

kahaaga commented Sep 3, 2022

Thanks for your effort @kahaaga . If it is okay with you, I'd like to implement a different solution to #73, that is both more performant, and in my opinion a simpler interface with less source code. I've written code that iterates over "neighborhoods" of multi dimensional arrays several times, and refined it over three different versions, starting from TimeseriesPrediction.jl, into its fastest possible form that exists currently in Agents.jl.

Hey @Datseris! Faster, less-code approaches are always welcome. I feel no particular urge to stick with my implementation if you've got something more efficient. I recently started using Agents.jl myself and was amazed by both the performance and simplicity of the code/api, so more of that, please 😁

I suggest we leave this PR open until you've submitted your version, and then compare performance.

I do not see yet the advantage of introducing new API functions that are the "generator" approaches. With the solution I've outlined above, there is nothing to be pre-allocated because there are no allocations done anyways.

For the permutation entropy-related stuff here, it may not be necessary with a generator approach if your implementation is just as efficient. However, it is definitely a viable approach to increase performance for other methods.

I have not read any other PRs yet, like #70, so I don't know yet about this generator interface you want.

My main take-away from working on the currently open PRs there is a lot to be gained from the pre-allocation approach using a generator. A slow-performing (allocation-a-lot) approach can be the difference between waiting hours to waiting weeks for numerical experiments to finish.

The pre-allocation/generator approach becomes highly relevant in the context null hypothesis testing for the methods in CausalityTools and friends. A user might trigger hundreds of thousands or millions of entropy calculations under the hood for a single experiment. For such uses cases, pre-allocation things like histogram containers will benefit performance greatly.

Again, perhaps this is not relevant for permutation-based methods, but for other methods, the performance gain can be pretty substantial.

However, I do like the spirit of keeping everything JuliaDynamics nice and clean API-wise. The generator functionality I want doesn't necessarily fit best in Entropies.jl. I could also move the performance-critical stuff to CausalityToolsBase, or create a EntropyGenerator package or something like that for it.

Let's revisit this when you've had time to look at the other PRs too, and your solution to #73 is ready.

It will take some time to go to the other stuff.

No worries. I've got plenty of other stuff to do in the meanwhile that doesn't hinge on any of my currently open PRs :D

@Datseris
Copy link
Member

Datseris commented Sep 5, 2022

Shouldn't we close this in favor of #78 ...?

@kahaaga kahaaga closed this Sep 5, 2022
@kahaaga kahaaga deleted the symbolic2d branch August 25, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants