Adjust `GaussianCDFEncoding` and related stuff for new encoding API #199

kahaaga · 2022-12-20T17:37:24Z

I have now adjusted the codebase so that GaussianCDFEncoding obeys the encoding API. This addresses another point in #190.

Previously, GaussianCDFEncoding was only applicable to vector-valued input, meaning that we had to encode an entire timeseries in bulk. By adjusting the type from GaussianCDFEncoding(; c::Int = 3) to GaussianCDFEncoding(; μ, σ, c::Int = 3), we now require the user to explicitly specify the mean and standard deviation of the desired normal CDF. Thus, encode and decode now both handle scalar-valued inputs (i.e. no need for entire timeseries, because μ and σ must be pre-computed).
In the Dispersion struct, instead of explicitly constructing an encoding (i.e. GaussianCDFEncoding), we just use c::Int, which dictates how many bins to divide the range of the CDF into. However, we add a field encoding, which specifies a type of encoding, and can be any encoding type that accepts the keyword c. The default is GaussianCDFEncoding. It is now trivial to replace GaussianCDFEncoding with WhateverCDFEncoding, by just doing e.g. Dispersion(; encoding = WhateverCDFEncoding, c = 5). Internally, this will instantiate a WhateverCDFEncoding(; c = 5) in the relevant location in the code. However, the encoding field is not yet part of the public API. I don't think it should be public yet either, because I have an idea to make this even more generic.
Empirical mean and standard deviation computations for Dispersion are delegated to symbolize_for_dispersion, which also instantiates the encoding.
I let the linear mapping from the (0, 1) range of the CDF to the integers 1, 2, ..., c to just use an equidistant binning. This is essentially a FixedRectangularBinning(0, 1, c) (but without explicitly constructing a `RectangularBinEncoding, since it would have to be constructed once per encoding, which is expensive). This differs slightly from the original paper, but makes much more sense. This choice is now well documented, and makes much more intuitive sense than the previous mapping.

The alternative to this entire approach would be to extend the encoding API to be defined for vector-valued inputs, which I don't want to do, because it complicates things unnecessarily. I think the approach here is more elegant anyways, because it is now trivial to user some other encoding by just specifying the encoding keyword to Dispersion (again, this is not in the public API atm).

Other stuff:

Fixed some more NaiveKernel tests.
Test pass.
Documentation generation is successful.

@Datseris I will merge this immediately when CI is done, so I can finish up #126 (which depends on this stuff). If you have any comments, just leave them here (or open issues if necessary), and I will address them while working on #126.

Datseris · 2022-12-20T19:02:25Z

it's okay, I have planned that we have to do a full review of the entire code base before publish anyways. I just haven't told you about this plan yet xD xD xD

kahaaga · 2022-12-20T21:43:31Z

I figured as much 🤖

kahaaga added 2 commits December 20, 2022 18:18

Adjust GaussianCDFEncoding and related stuff for new encoding API

3910b05

Fix KernelDensity tests

16f651c

kahaaga merged commit 93686aa into main Dec 20, 2022

kahaaga deleted the update_gaussiancdfencoding branch December 20, 2022 18:05

kahaaga mentioned this pull request Dec 20, 2022

Some outstanding TODOs for v2 #190

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust `GaussianCDFEncoding` and related stuff for new encoding API #199

Adjust `GaussianCDFEncoding` and related stuff for new encoding API #199

kahaaga commented Dec 20, 2022 •

edited

Loading

Datseris commented Dec 20, 2022

kahaaga commented Dec 20, 2022

Adjust GaussianCDFEncoding and related stuff for new encoding API #199

Adjust GaussianCDFEncoding and related stuff for new encoding API #199

Conversation

kahaaga commented Dec 20, 2022 • edited Loading

Datseris commented Dec 20, 2022

kahaaga commented Dec 20, 2022

Adjust `GaussianCDFEncoding` and related stuff for new encoding API #199

Adjust `GaussianCDFEncoding` and related stuff for new encoding API #199

kahaaga commented Dec 20, 2022 •

edited

Loading