Skip to content

Conversation

@pdobacz
Copy link
Contributor

@pdobacz pdobacz commented Aug 17, 2022

I somehow managed to miss this altogether in here, probably left for "later" and shifted focus.

This comes in with a bit of an ugly moment when accumulating per-AID-instance results and calculating the not_enough_aid_values condition. Since this is the only condition calculated jointly for both legs (positive and negative), it goes against the result accumulator structure we are trying to reuse. I'm still hesitant if this is clean enough to be kept or should we embark on some larger refactoring of the aggregation code.

@pdobacz
Copy link
Contributor Author

pdobacz commented Aug 17, 2022

Another comment - I've been trying to validate this using prop-tests but I have been running into many problems associated with rounding errors in float processing. I've run ~tens of complicated queries only, instead of hundreds. The issues found I've investigated were ones attributable to floating point arithmetic rounding combined with inherent "instability" due to the couple of max functions our numbers go through.

For example: an epsilon-ish difference causes another AID-instance to be chosen for flattening and the results are off by a lot.

@pdobacz pdobacz marked this pull request as ready for review August 17, 2022 08:51
@pdobacz pdobacz requested a review from edongashi August 17, 2022 08:51
@cristianberneanu
Copy link
Collaborator

cristianberneanu commented Aug 17, 2022

Since both systems use the same floating point architecture results should match in theory. Could the cause instead be a different ordering of the operations or of the data between them?

edongashi
edongashi previously approved these changes Aug 17, 2022
@pdobacz
Copy link
Contributor Author

pdobacz commented Aug 17, 2022

Since both systems use the same floating point architecture results should match in theory. Could the cause instead be a different ordering of the operations or of the data between them?

I can give an detailed rundown in some other thread, but there are a couple of ways the calculations diverge - all of them related to floating point:

  • SQLite and PostgreSQL differ in the float values they hand off, for a reason I finally gave up on tracing down
  • pg_diffix and reference sum the contributions in different order (so yes - order of operations matters) - the former sum them during transitioning of the aggregate function as they appear and the latter sums it up by AID first and later over all the AIDs. I haven't investigated further, but float point arithmetic has a slightly different error in both cases, which is later inflated by some "instable" step like max or ceil.

edongashi
edongashi previously approved these changes Aug 17, 2022
@pdobacz pdobacz force-pushed the piotr/negative-summands branch from ea9e39d to 6cf58cc Compare August 17, 2022 12:14
@pdobacz pdobacz merged commit 25ef336 into master Aug 17, 2022
@pdobacz pdobacz deleted the piotr/negative-summands branch August 17, 2022 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants