-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add permentropy and its tests #20
Conversation
87310f2
to
f13a0ad
Compare
Just to be in the same style as the other docstrings that accept timeseries and cite papers
Looking good!!! Before we go into performance discussions, may I suggest a name change to |
Yea, I thought about that too. Working on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome this is a job well done!
I only had performance improvement comments.
Oh I also have this comment about base
, what do you think?
nonzero = [c for c in count if c != 0] | ||
|
||
p = nonzero ./ sum(nonzero) | ||
return - sum(p .* log.(base, p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it not be much faster if one does
scale = sum(c)
return -sum( (p/scale)*log(p/scale) for p in c if p != 0)
? This version does no allocations, the original does 3 allocations, once when defining nonzero
, once when defining p
and once when doing p .* log.(base, p)
(this also allocates a new array)
function permentropy( | ||
time_series::AbstractArray{T, 1}, order::UInt8, | ||
interval::Integer = 1; | ||
base=Base.e) where {T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the "base" option is cool of course, but the thing is that all other entropies assume base e
. Maybe we should add a keyword to all of them?
(Or just use base-e everywhere, which is my suggestion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think base=2
is also very popular. In fact, base=2
is used in the original permutation entropy paper. I'd like to have base option everywhere. Or something like genentropy2
.
Yea it allocates memory but it executed only once. For long time series I don't think it matters.
|
Regarding further (memory/CPU) performance improvement around But I'd say this implementation is OK for some moderate |
The Julia people are super good on that stuff, if you care about it a lot you can post a question on Discourse and people will immediately help. I am sure there are special structures that can add a lot of performance. At the moment it is not crucial for me and I can merge this and then open an issue about its performance. What about the base thing? |
I commented on base things as the review comment. Personally, I'd like to keep it and have base option everywhere. |
Can't find it! Regardless, you are right. Seems like the best option. I will do the change of including base on this PR, is that okay? I will also add my performance benefit, because even if it is small it still exists and does not reduce code clarity. |
I just wrote:
|
Re: performance I like the saying "If you don't benchmark, you don't care the performance." We are not doing any benchmarks here so talking about it is kind of nonsense. OK. Not fair. I was the first one to bringing that up. But I knew that computational complexity of the innner most loop of the original code was |
Probably setting up benchmarks using PkgBenchmark.jl would help us doing further performance improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! I haven't submitted my review comments. Here they are:
# To compute `p log(p)` correctly for `p = 0`, we first discard | ||
# cases with zero occurrence. They don't contribute to the final | ||
# sum hence to the entropy: | ||
nonzero = [c for c in count if c != 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Datseris You commented that this is not necessary but I think you need this. Please see my inline comment above.
|
||
for t in 1:length(time_series) - interval * order + 1 | ||
sample = @view time_series[t:interval:t + interval * (order - 1)] | ||
i = searchsortedfirst(perms, PermType(sortperm(sample))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the performance consideration I worried in #18, I think searchsortedfirst
is much better than brute-force match. It can be improved, but I think this is a good starting point. See also the comment on PermType
above.
src/dimensions/entropies.jl
Outdated
|
||
## References | ||
|
||
[1] : Bandt, C., & Pompe, B., Phys. Rev. Lett. **88** (17), pp 174102 (2002) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? Don't you want to click the doi to get to the paper on browser? (OK, probably consistency matters as well so probably better to do it at once.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daaaaamn how did I not consider to that everywhere!
function permentropy( | ||
time_series::AbstractArray{T, 1}, order::UInt8, | ||
interval::Integer = 1; | ||
base=Base.e) where {T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think base=2
is also very popular. In fact, base=2
is used in the original permutation entropy paper. I'd like to have base option everywhere. Or something like genentropy2
.
Yeah sounds like an excellent plan. So far I have developed myself and benchmarked on the same computer with e.g. Edit: #21 |
Alright, I added base to |
Thanks! BTW, it looks like Travis is failing |
merged! |
Thanks! |
thank you! |
closes #18
Here is a demo: https://gist.github.com/tkf/31cb31564dc4f61c085d013656f1a9e9