-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with Entropies v2: passing ProbabilitiesEstimator
s is broken
#183
Compatibility with Entropies v2: passing ProbabilitiesEstimator
s is broken
#183
Comments
To demonstrate the concept, I will implement it solely for the mutual information and conditional mutual information and keep everything else for future PRs. That will be more efficient. |
Fixed binning doesn't require input data, I am not sure what you are saying. |
Is the problem that you need one binning for x or y, and another binning for the joint x and y histogram? |
By the way, this was exactly what I was trying to tell you in JuliaDynamics/ComplexityMeasures.jl#107 . To actually test the new interface in upsteam packages, so something like that didn't catch us after we finished writing a new API, but maybe during writing it so we can think of alternatives. |
An alternative is to revert the decision of estimators need a well defined outcome space, and the Alternatively, just make a special dispatch method for |
That completely went over my head. Sorry about that.
Yes. To compute mutual info based on a sum of entropies, one would explicitly have to provide three pre-initialized
The problem is that the following doesn't work any longer. julia> est = ValueHistogram(RectangularBinning(4))
ERROR: MethodError: no method matching ValueHistogram(::RectangularBinning{Int64})
Closest candidates are:
ValueHistogram(::RectangularBinning, ::Any) at ~/.julia/packages/Entropies/BQq7h/src/probabilities_estimators/value_histogram.jl:43
ValueHistogram(::FixedRectangularBinning) at ~/.julia/packages/Entropies/BQq7h/src/probabilities_estimators/value_histogram.jl:47
ValueHistogram(::Union{Real, Vector}, ::Any) at ~/.julia/packages/Entropies/BQq7h/src/probabilities_estimators/value_histogram.jl:42
...
Stacktrace:
[1] top-level scope
@ REPL[3]:1 This is not a problem for
Yes, dispatching here in a clever way is the idea, so that the example above becomes a non-issue. However, we need to keep it generic. Dispatching on Also, mutual info is not the only method that can use binning-based entropies. There is conditional mutual info, relative entropy, conditional entropy, extropy - the list goes on. So the extra code is then Therefore we either need to:
I propose here to do the latter. |
Nope, not any longer. julia> est = NaiveKernel(0.1)
ERROR: ArgumentError: NaiveKernel constructor requires input data as the first argument. Do `NaiveKernel(x, ϵ).`
Stacktrace:
[1] NaiveKernel(ϵ::Float64; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Entropies ~/.julia/packages/Entropies/BQq7h/src/probabilities_estimators/kernel_density.jl:52
[2] NaiveKernel(ϵ::Float64)
@ Entropies ~/.julia/packages/Entropies/BQq7h/src/probabilities_estimators/kernel_density.jl:52
[3] top-level scope
@ REPL[5]:1 |
It does work but you do have to provide Anyways, let me think about this for a moment before implementing anything, there might be an easy solution. |
Any probabilities estiamtor, now and in the future, that requires input data to instantiate it, will not work out of the box. But that is not necessarily a problem in itself. For example, if any methods here are going to work with the spatial entropy estimators, we would have to figure out a way around it anyways, because they demand input data due to performance reasons.
I don't think we should revert this decision. It is elegant, intuitive and there are good reasons for keeping it this way. We can easily solve the issue by providing wrappers here.
Not all information methods necessarily demand equal-length input for the input variables. |
If we both brain storm a bit, we should be able to come up with a good solution. Let's use mutual info as an example, because it extends easily to other methods. I'll have to do this tomorrow though. My brain is fried, and I need to let this sink in and experiment a bit. |
I'm just listing potential solutions here as they come to mind. I have no opinion on which one (if any) is the best yet:
|
Alright, I've thought about this extensively for the last hour, and the solution appears simple to me. First some general comments:
I believe that during package development, downstream packages should use only the public API as if they know nothing about the internals of a dependencie. So, try to use a package as if you haven't written it yourself. Over time this will lead to clearer design and more contributions.
Well, as a user, I actually never found it elegant... I was always weird out by having to provide input data twice in practically the same function call. The second question to ask is, are the reasons for making this really that good...? I think arguments favor both approaches. On the one you have the specificity of the outcome space. On the other one woul argue that a probability estimator may be a more abstract concept. The NaiveKernel highlights this well. You get as probabilities the nearest negibhros. This is a solid concept that shouldn't care about the length of the input trajectory. I now realize that the change I implemented of estimators forcing a known outcome space was just a bad choice. It clearly clutters usage simplicity in realistic app-lication scenarios give the discussion of this issue. So here is the solution I favor: we revert estimators demanding a well defined outcome space. Instead, we shift the focus of a well-defined outcome space to the total_outcomes(est::ProbEst, x) = total_outcomes(est)
total_outcomes(est::ProbEst) = error("total out not defined for est, try total_out(est, x) in case input is needed for knowing outcome space concretely.") as before. This means that for estimators that have a well-defined outcome space by construction things work as is now. For others, they have to extend the two-argument method. If you think about this, there are only two functions of Entropies.jl actually affected by this: This will allow everything you want here to work. The changes for implementing this solutiion are very small, and will take me at most half an hour, so I'll go ahead and do this now. |
For ValueHistogram: it becomes like before: it doesn't instantiate a binning unless it called by |
I guess it seemed like a good idea in the heat of the moment to me too. But perhaps it got a bit too heated. I see both sides, but the practical argument should severely outweigh any theoretical considerations, unless there are extremely good reasons for doing so. The major pros and cons of this approach are, in my opinion are:
It seems obvious that it is better, both for users and package maintainers, to keep the simplicity of the modular approach. If that comes at the cost of having to manage two signatures of
Yes. And this basic issue probably is only the start. I haven't even started trying to incorporate the meta-methods, such as the automatic embedding stuff and null hypothesis testing (e.g. surrogates). |
Alright. I'll revert the state of the codebase again, and try out JuliaDynamics/ComplexityMeasures.jl#229. |
Ok, I fixed some issues in the PR you linked, and everything works again. Thanks for the effort! |
@Datseris
Because input data now are explicitly required to construct a
ProbabilitiesEstimator
, passingProbabilitiesEstimator
s to upstream methods is no longer possible. This completely got lost on me due to the many and rapid changes the past weeks.This currently breaks all upstream methods (on the dev branch) that depend on
ProbabilitiesEstimator
s. BUT, it is not a problem for CausalityTools v1.X, so no worries there.Example
The above won't work any longer, because the estimator requires input data. Behind the scenes,
mutualinfo
computes three separate entropies, each now requiring its own, uniquely initialized instance ofValueHistogram
. Therefore, naively just usingProbabilitiesEstimator
, the user would have to supply three estimators, which makes no sense.I absolutely do not have time to do a re-write of anything
Entropies
-related in the foreseeable future, and I don't think we should, because it is in a very good state. So here's what I propose:Solution
There are only three or four (I think) probabilities estimators that handle multivariate data. Therefore, I will make some basic
CausalityTools
-only estimators that just wrap relevant concepts fromEntropies
(names below are completely arbitrary):These structs accept relevant parameters and inititalises the needed entropy estimators behind the scenes, removing any confusion from the user side, and removing any need for the user to know about
Entropies.jl
.This is not an ideal solution, and it might have to change in the long run, but I reeeeeally don't have time to start fresh if I want to get any of my applications ready before the deadlines in February. We also need to focus on getting a working V2 of
CausalityTools
out, so workshop preparations can start.Therefore, I'm almost tempted to veto anything but minor changes to the proposal in this issue. If re-doing more stuff now, I think it will be the fourth time I'm rewriting everything 2.0-related based on
Entropies
changes, and I simply don't have more time to waste.Take-away: introduce simple probability-type wrappers that are compatible with higher-level measures.
What do you think? Any immediate thoughts?
The text was updated successfully, but these errors were encountered: