Strange behaviour with aggregate function #213

johnnymoretti · 2021-11-17T18:30:15Z

Hi,
I'm using a local instance of Blazegraph 2.1.6 RC and I've noticed that if I submit a query with an aggregate function like group_concat it hangs unless I add a limit instruction.

The following query hangs

SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) 
where{
  ?lemma ontolex:writtenRep ?wr
} group by ?lemma

This other one does not hang

SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) 
where{
  ?lemma ontolex:writtenRep ?wr
} group by ?lemma 
limit 1000000000

Obviously my dataset is smaller than 1000000000: it contains 138245 triples.

Any suggestion ?

The text was updated successfully, but these errors were encountered:

thompsonbry · 2021-11-17T23:43:24Z

A few thoughts from Michael: Tough one… best guess: hash code collision issue, where we are doing materialization at a different time in the case of LIMIT (and hash code over materialized vs unmaterialized values somehow differ)? There is this logic in AST2BOp of using materializing iterator vs materializing explicitly via an operator, which depends on LIMIT and order preservation requirements. I’d guess that this choice causes different plan structures for non-LIMIT vs LIMIT, which triggers the hash problem. You could likely verify this by verifying the as generated plan variation and then drilling into the actual hash codes being used in the tail segment of the plan for the variant without the LIMIT. Bryan

…

On Wed, Nov 17, 2021 at 10:30 AM Giovanni Moretti ***@***.***> wrote: Hi, I'm using a local instance of Blazegraph 2.1.6 RC and I've noticed that if I submit a query with an aggregate function like group_concat it hangs unless I add a limit instruction. The following query hangs SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma This other one does not hang SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma limit 1000000000 Obviously my dataset is smaller than 1000000000: it contains 138245 triples. Any suggestion ? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#213>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATW7YH4R6NOPMA2E63KAE3UMPYEHANCNFSM5IHWAWYA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

johnnymoretti · 2021-11-18T11:05:47Z

Thank you for your reply and ok I understand the problem.
In order to correct this behavior and clarify if the problem is related to the materialization, there is a simple way to force the materialization ?

johnnymoretti · 2021-11-18T11:36:35Z

Ok it is definitely a materialization problem. I set the materializeProjectionInQuery variable in the AST2BOpContext class to false and thus I implicitly forced the projection to be materialized outside the plan: in this way the queries work. However, the issue about materialization remains open. Thanks a lot for your help.

johnnymoretti · 2021-11-22T19:02:12Z

I have another strange thing here.
The following query now works with the previously mentioned trick :

PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>

SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) 
where{
  ?lemma ontolex:writtenRep ?wr
} group by ?lemma

but this other one stucks again:

PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>

SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) 
where{
  ?lemma ontolex:writtenRep ?wr
} group by ?lemma 
order by ?wrs

to fix this problem I have to write the query like this:

PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
Select * {
  SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;
      separator=", ") AS ?wrs) 
  where{
    ?lemma ontolex:writtenRep ?wr
  } group by ?lemma
}
order by ?wrs

I think there is something wrong with the nesting optimization or something related to the order by function.

Can you please help me ?

thompsonbry · 2021-11-23T15:53:36Z

Adding Michael. Bryan

…

On Mon, Nov 22, 2021 at 11:02 AM Giovanni Moretti ***@***.***> wrote: I have another strange thing here. The following query now works with the previously mentioned trick : PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#> SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma but this other one stucks again: PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#> SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma order by ?wrs to fix this problem I have to write the query like this: PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#> Select * { SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ; separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma } order by ?wrs I think there is something wrong with the nesting optimization or something related to the order by function. Can you please help me ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#213 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATW7YDAVDW4W7XOIUB7VDLUNKHT7ANCNFSM5IHWAWYA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

mschmidt00 · 2021-11-24T21:10:33Z

Looking at the first question again in more detail, I am attaching the two explains (for a data set with 20k triples, all having distinct subjects and objects but the same prediacte). As suggested initially, there is a difference in the plan: the slow version (explain-wo-limit) contains a dedicated ChunkedMaterializationOp at the end, which clearly dominates runtime.

Looking at the code, I guess the problem happens here: https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata-rdf/src/java/com/bigdata/bop/rdf/join/ChunkedMaterializationOp.java#L375

We add the constructed TermIds (all with termId=0L, indicating that they are mocked) to the idToConstMap. I confirmed that all of the collide in the hash code (which is 0 as well).

Possible ideas for fixes would be

1.) Create a decorator for the idToConstMap (see e.g. https://stackoverflow.com/questions/41461515/hashmap-implementation-that-allows-to-override-hash-and-equals-method)
2.) Wrap the keys into dummy objects that override hashCode(), refining the method to return a hash code different from 0L for TermIds with that have TermId 0L and a value set (i.e., just the hash code of the value)
3.) Leave TermIds with 0L out of the map in the first place and handle them separately (iiuc, the main reason for using the hash map is de-duplication, which may be of limited value for constructed values anyways -- though I would consider this a less preferred option)

It may also be possible to change hashCode() in TermId directly, i.e. if termId==0L compute the hash code based on the cached value always. While this would be the "clean" way to fix the issue, it sounds quite risky (and may cause problems with other code paths -- while I cannot come up with a scenario where this is problematic and I would believe it is the right change conceptually, it's really difficult to assess the blast radius of such an invasive change, both in terms of correctness and performance).

mschmidt00 · 2021-11-24T21:12:59Z

Regarding your proposal:" I set the materializeProjectionInQuery variable in the AST2BOpContext class to false and thus I implicitly forced the projection to be materialized outside the plan". Yes, I think that should be okay as well, but it is more of a workaround (which may negatively impact the performance of other queries).

mschmidt00 · 2021-11-24T21:29:56Z

Regarding your second question, here's the explain for the variant with ORDER BY (executed on some dummy data). As you can see, it contains two ChunkedMaterializationOp's that are expensive, due to the same reasons discussed above (with your change for materializeProjectionInQuery, there would probably be only one):

In the light of this variant, my recommendation would be to fix the root cause (hash collisions) instead of changing materializeProjectionInQuery.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour with aggregate function #213

Strange behaviour with aggregate function #213

johnnymoretti commented Nov 17, 2021

thompsonbry commented Nov 17, 2021 via email

johnnymoretti commented Nov 18, 2021

johnnymoretti commented Nov 18, 2021

johnnymoretti commented Nov 22, 2021

thompsonbry commented Nov 23, 2021 via email

mschmidt00 commented Nov 24, 2021

mschmidt00 commented Nov 24, 2021

mschmidt00 commented Nov 24, 2021

Strange behaviour with aggregate function #213

Strange behaviour with aggregate function #213

Comments

johnnymoretti commented Nov 17, 2021

thompsonbry commented Nov 17, 2021 via email

johnnymoretti commented Nov 18, 2021

johnnymoretti commented Nov 18, 2021

johnnymoretti commented Nov 22, 2021

thompsonbry commented Nov 23, 2021 via email

mschmidt00 commented Nov 24, 2021

mschmidt00 commented Nov 24, 2021

mschmidt00 commented Nov 24, 2021