Adjust GaussianCDFEncoding
and related stuff for new encoding API
#199
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have now adjusted the codebase so that
GaussianCDFEncoding
obeys the encoding API. This addresses another point in #190.GaussianCDFEncoding
was only applicable to vector-valued input, meaning that we had to encode an entire timeseries in bulk. By adjusting the type fromGaussianCDFEncoding(; c::Int = 3)
toGaussianCDFEncoding(; μ, σ, c::Int = 3)
, we now require the user to explicitly specify the mean and standard deviation of the desired normal CDF. Thus,encode
anddecode
now both handle scalar-valued inputs (i.e. no need for entire timeseries, becauseμ
andσ
must be pre-computed).Dispersion
struct, instead of explicitly constructing an encoding (i.e.GaussianCDFEncoding)
, we just usec::Int
, which dictates how many bins to divide the range of the CDF into. However, we add a fieldencoding
, which specifies a type of encoding, and can be any encoding type that accepts the keywordc
. The default isGaussianCDFEncoding
. It is now trivial to replaceGaussianCDFEncoding
withWhateverCDFEncoding
, by just doing e.g.Dispersion(; encoding = WhateverCDFEncoding, c = 5)
. Internally, this will instantiate aWhateverCDFEncoding(; c = 5)
in the relevant location in the code. However, theencoding
field is not yet part of the public API. I don't think it should be public yet either, because I have an idea to make this even more generic.Dispersion
are delegated tosymbolize_for_dispersion
, which also instantiates the encoding.(0, 1)
range of the CDF to the integers1, 2, ..., c
to just use an equidistant binning. This is essentially aFixedRectangularBinning(0, 1, c)
(but without explicitly constructing a `RectangularBinEncoding, since it would have to be constructed once per encoding, which is expensive). This differs slightly from the original paper, but makes much more sense. This choice is now well documented, and makes much more intuitive sense than the previous mapping.The alternative to this entire approach would be to extend the encoding API to be defined for vector-valued inputs, which I don't want to do, because it complicates things unnecessarily. I think the approach here is more elegant anyways, because it is now trivial to user some other encoding by just specifying the
encoding
keyword toDispersion
(again, this is not in the public API atm).Other stuff:
Fixed some more
NaiveKernel
tests.Test pass.
Documentation generation is successful.
@Datseris I will merge this immediately when CI is done, so I can finish up #126 (which depends on this stuff). If you have any comments, just leave them here (or open issues if necessary), and I will address them while working on #126.