Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Identifying columns in the case of distinct #15873

Closed
smitpatel opened this issue May 31, 2019 · 6 comments · Fixed by #21990
Closed

Query: Identifying columns in the case of distinct #15873

smitpatel opened this issue May 31, 2019 · 6 comments · Fixed by #21990
Assignees
Labels
area-query breaking-change closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. punted-for-3.0 type-enhancement
Milestone

Comments

@smitpatel
Copy link
Member

No description provided.

@smitpatel
Copy link
Member Author

If collection is applying distinct and not projecting required identifying columns then we should throw exception saying that collection.Distinct() is not allowed.

@smitpatel
Copy link
Member Author

Consider scenario in #11178

@smitpatel
Copy link
Member Author

Also TVF

@maumar
Copy link
Contributor

maumar commented Jun 8, 2020

example: #20505

@smitpatel
Copy link
Member Author

Example, GroupBy aggregate in the projection which is not projecting the correlation key column.
See test Select_nested_collection_with_groupby

maumar added a commit that referenced this issue Aug 6, 2020
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.

Fixes #15873
maumar added a commit that referenced this issue Aug 6, 2020
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.

Fixes #15873
maumar added a commit that referenced this issue Aug 6, 2020
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.

Fixes #15873
maumar added a commit that referenced this issue Aug 7, 2020
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.
Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed.

Fixes #15873
maumar added a commit that referenced this issue Aug 17, 2020
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.
Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed.

Fixes #15873
Fixes #20184
@ajcvickers
Copy link
Member

@maumar @smitpatel This issue is marked closed_fixed in rc1, but is still open and doesn't seem to be tracked by a PR.

maumar added a commit that referenced this issue Sep 1, 2020
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.
Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed.

Fixes #15873
Fixes #20184
Pilchie pushed a commit that referenced this issue Sep 1, 2020
…21990)

Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization.
Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed.

Fixes #15873
Fixes #20184
@ajcvickers ajcvickers modified the milestones: 5.0.0-rc1, 5.0.0 Nov 7, 2020
maumar added a commit that referenced this issue Feb 27, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Feb 27, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 2, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 5, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 5, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 5, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 5, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 6, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 9, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
maumar added a commit that referenced this issue Mar 9, 2021
…s when projecting a subset of column and adding distinct

As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates.

Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique.
In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key.

Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect

we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate).

Fixes #22049
Fixes #24288
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query breaking-change closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. punted-for-3.0 type-enhancement
Projects
None yet
3 participants