Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add permentropy and its tests #20

Merged
merged 8 commits into from
Jan 27, 2018

Conversation

tkf
Copy link
Contributor

@tkf tkf commented Jan 27, 2018

@tkf tkf mentioned this pull request Jan 27, 2018
Just to be in the same style as the other docstrings that accept timeseries and cite papers
@Datseris
Copy link
Member

Looking good!!! Before we go into performance discussions, may I suggest a name change to permentropy, to have a more similar feel to genentropy ?

@Datseris Datseris self-requested a review January 27, 2018 09:47
@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Yea, I thought about that too. Working on it.

Copy link
Member

@Datseris Datseris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome this is a job well done!

I only had performance improvement comments.

Oh I also have this comment about base, what do you think?

nonzero = [c for c in count if c != 0]

p = nonzero ./ sum(nonzero)
return - sum(p .* log.(base, p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it not be much faster if one does

scale = sum(c)
return -sum( (p/scale)*log(p/scale) for p in c if p != 0)

? This version does no allocations, the original does 3 allocations, once when defining nonzero, once when defining p and once when doing p .* log.(base, p) (this also allocates a new array)

function permentropy(
time_series::AbstractArray{T, 1}, order::UInt8,
interval::Integer = 1;
base=Base.e) where {T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the "base" option is cool of course, but the thing is that all other entropies assume base e. Maybe we should add a keyword to all of them?

(Or just use base-e everywhere, which is my suggestion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think base=2 is also very popular. In fact, base=2 is used in the original permutation entropy paper. I'd like to have base option everywhere. Or something like genentropy2.

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Yea it allocates memory but it executed only once. For long time series I don't think it matters.

perms = map(PermType, permutations(1:order)) likely allocates much more memory and doing some optimization for sum(p log(p)) doesn't buy us match, unless we improve perms. I wouldn't worry about it if I were you.

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Regarding further (memory/CPU) performance improvement around perms representation, probably using some kind of tree structure would help. Maybe this?:
https://www.sciencedirect.com/science/article/pii/0898122194001782

But I'd say this implementation is OK for some moderate order.

@Datseris
Copy link
Member

The Julia people are super good on that stuff, if you care about it a lot you can post a question on Discourse and people will immediately help. I am sure there are special structures that can add a lot of performance.

At the moment it is not crucial for me and I can merge this and then open an issue about its performance.

What about the base thing?

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

I commented on base things as the review comment. Personally, I'd like to keep it and have base option everywhere.

@Datseris
Copy link
Member

I commented on base things as the review comment.

Can't find it! Regardless, you are right. Seems like the best option. I will do the change of including base on this PR, is that okay? I will also add my performance benefit, because even if it is small it still exists and does not reduce code clarity.

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

I just wrote:

I think base=2 is also very popular. In fact, base=2 is used in the original permutation entropy paper. I'd like to have base option everywhere. Or something like genentropy2.

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Re: performance

I like the saying "If you don't benchmark, you don't care the performance." We are not doing any benchmarks here so talking about it is kind of nonsense.

OK. Not fair. I was the first one to bringing that up. But I knew that computational complexity of the innner most loop of the original code was O(N) where N=order! which was ridiculous. I just switched to searchsortedfirst (which uses the binary search) since that would give us O(log N).

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Probably setting up benchmarks using PkgBenchmark.jl would help us doing further performance improvement.

Copy link
Contributor Author

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I haven't submitted my review comments. Here they are:

# To compute `p log(p)` correctly for `p = 0`, we first discard
# cases with zero occurrence. They don't contribute to the final
# sum hence to the entropy:
nonzero = [c for c in count if c != 0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Datseris You commented that this is not necessary but I think you need this. Please see my inline comment above.


for t in 1:length(time_series) - interval * order + 1
sample = @view time_series[t:interval:t + interval * (order - 1)]
i = searchsortedfirst(perms, PermType(sortperm(sample)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the performance consideration I worried in #18, I think searchsortedfirst is much better than brute-force match. It can be improved, but I think this is a good starting point. See also the comment on PermType above.


## References

[1] : Bandt, C., & Pompe, B., Phys. Rev. Lett. **88** (17), pp 174102 (2002)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? Don't you want to click the doi to get to the paper on browser? (OK, probably consistency matters as well so probably better to do it at once.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daaaaamn how did I not consider to that everywhere!

function permentropy(
time_series::AbstractArray{T, 1}, order::UInt8,
interval::Integer = 1;
base=Base.e) where {T}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think base=2 is also very popular. In fact, base=2 is used in the original permutation entropy paper. I'd like to have base option everywhere. Or something like genentropy2.

@Datseris
Copy link
Member

Datseris commented Jan 27, 2018

Probably setting up benchmarks using PkgBenchmark.jl would help us doing further performance improvement.

Yeah sounds like an excellent plan. So far I have developed myself and benchmarked on the same computer with e.g. @btime but now is the time for upgrade.

Edit: #21

@Datseris
Copy link
Member

Alright, I added base to genentropy. Assuming I did not have 10s of typos and tests pass, this is merged as-is. A separate issue will be created for its performance.

@tkf tkf changed the title Add permutation_entropy and its tests Add permentropy and its tests Jan 27, 2018
@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Thanks!

BTW, it looks like Travis is failing

@Datseris Datseris merged commit 6dc5155 into JuliaDynamics:master Jan 27, 2018
@Datseris
Copy link
Member

merged!

@tkf
Copy link
Contributor Author

tkf commented Jan 27, 2018

Thanks!

@Datseris
Copy link
Member

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permutation entropy
2 participants