Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unprojected columns can be dropped from InUnion plan comparison keys #2728

Open
alecgrieser opened this issue May 17, 2024 · 0 comments
Open
Labels
bug Something isn't working planner Related to the query planner

Comments

@alecgrieser
Copy link
Contributor

If we have a Cascades query that looks a bit

[ ORDER BY b ] <- [SELECT b, d FROM $q WHERE a IN (a1, a2, ... an)  ] <- q: [ TypeFilter(T) ]

This is roughly equivalent to a SQL query:

SELECT b, d FROM T WHERE a IN (a1, a2, ..., an) ORDER BY b

Then suppose we have an index on (a, b, c, d). This can be used to execute this query. For example, this could be done by:

  1. For each value in a's in-predicate, do a (covering) index scan of the (a, b, c, d) index limited to that value a
  2. Then use an IN-union plan to merge those different index scans together
  3. Project the b and d columns out from the original index scan

So, something like:

map (∪ [qun IN (a1, a2, ... an)] Covering(Index(abcd [EQUALS $qun])) ON [b, c, d, primary_key]) [_.b, _.d]

Importantly, the IN-union plan uses a comparison key function to properly merge the different child plans together in the right order. In this case, the order needs to be on something like [_.b, _.c, _.d, _.primary_key] (that is, the index ordering components after _.a) to ensure that each unique entry from the original scan is returned.

However, the plan we get is instead more like:

∪ [qun IN (a1, a2, ... an)] map(Covering(Index(abcd [EQUALS $qun])) [b, d]) ON [_.b]

That is, it:

  1. For each value in the a's in-preidcate, does a covering index scan of the (a, b, c, d) index limited to that value a (so far so good)
  2. Then uses a map plan to project b and d from each inner scan
  3. And finally merges the different plans together using an IN-union plan with only b in the comparison key

This means that if two legs of the union have the same b values, then one can end up being marked as a dupe of the other, even if it has a different d value.

Note that if we project more columns out, this problem isn't hit. For example, if we include c, we get the original [_.b, _.c, _.d, _.primary_key] comparison key. Or if we include the comparison column a, we get the comparison key [_.b, _.a]. That latter comparison key function actually also works. This is because when we de-dupe using the comparison key, we only consider one child from each leg of the union, and we only remove it if there is another element that exactly matches the comparison key. With the comparison key [_.b, _.a], at no point are there ever children from different children of the union with the same comparison keys.

@alecgrieser alecgrieser added bug Something isn't working planner Related to the query planner labels May 17, 2024
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this issue May 17, 2024
…rom InUnion plans

This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.
@alecgrieser alecgrieser changed the title Unprojected columns can be dropped from InUnion plans Unprojected columns can be dropped from InUnion plan comparison keys May 20, 2024
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this issue May 22, 2024
…rom InUnion plans

This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this issue May 22, 2024
…rom InUnion plans

This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working planner Related to the query planner
Projects
None yet
Development

No branches or pull requests

1 participant