Skip to content

IGNITE-20164 Sql. Incorrect propagation of RelCollation trait for Sort-based map/reduce aggregates#2461

Merged
korlov42 merged 3 commits intoapache:mainfrom
gridgain:IGNITE-20164
Sep 1, 2023
Merged

IGNITE-20164 Sql. Incorrect propagation of RelCollation trait for Sort-based map/reduce aggregates#2461
korlov42 merged 3 commits intoapache:mainfrom
gridgain:IGNITE-20164

Conversation

@lowka
Copy link
Copy Markdown
Contributor

@lowka lowka commented Aug 17, 2023

Updates grouping set columns on REDUCE phase of a sort aggregate to fix collation propagation.

  • Updates some javadocs describing trait propagation.
  • Migrates code that creates mapping for MAP/REDUCE aggregates to Commons::trimmingMapping.
  • Updates Commons::trimmingMapping to use MappingType.INVERSE_SURJECTION to make usage of Commons::inverseTrimmingMapping redundant because trimmingMapping::getSource returns the same values.

https://issues.apache.org/jira/browse/IGNITE-20164


Thank you for submitting the pull request.

To streamline the review process of the patch and ensure better code quality
we ask both an author and a reviewer to verify the following:

The Review Checklist

  • Formal criteria: TC status, codestyle, mandatory documentation. Also make sure to complete the following:
    - There is a single JIRA ticket related to the pull request.
    - The web-link to the pull request is attached to the JIRA ticket.
    - The JIRA ticket has the Patch Available state.
    - The description of the JIRA ticket explains WHAT was made, WHY and HOW.
    - The pull request title is treated as the final commit message. The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • Design: new code conforms with the design principles of the components it is added to.
  • Patch quality: patch cannot be split into smaller pieces, its size must be reasonable.
  • Code quality: code is clean and readable, necessary developer documentation is added if needed.
  • Tests code quality: test set covers positive/negative scenarios, happy/edge cases. Tests are effective in terms of execution time and resources.

Notes

@lowka lowka force-pushed the IGNITE-20164 branch 8 times, most recently from 79208fe to f2285cc Compare August 22, 2023 10:38
* @see org.apache.calcite.rel.core.AggregateCall#transform(TargetMapping)
*/
public static Mappings.TargetMapping trimmingMapping(int sourceSize, ImmutableBitSet requiredElements) {
public static Mapping trimmingMapping(int sourceSize, ImmutableBitSet requiredElements) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mappings.apply(mapping, ImmutableBitSet) accepts Mapping instead of Mappings.TargetMapping.

* @see #trimmingMapping(int, ImmutableBitSet)
*/
public static Mappings.TargetMapping inverseTrimmingMapping(int sourceSize, ImmutableBitSet requiredElements) {
public static Mapping inverseTrimmingMapping(int sourceSize, ImmutableBitSet requiredElements) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have access to getSource in PlanUtils::createHashAggRowType

@lowka lowka force-pushed the IGNITE-20164 branch 5 times, most recently from 5f26578 to 636730b Compare August 22, 2023 14:01
@lowka lowka marked this pull request as ready for review August 22, 2023 14:57
@lowka lowka changed the title [IGNITE-20164] Sql. Incorrect propagation of RelCollation trait for Sort-based map/reduce aggregates. IGNITE-20164 Sql. Incorrect propagation of RelCollation trait for Sort-based map/reduce aggregates. Aug 23, 2023
@lowka lowka changed the title IGNITE-20164 Sql. Incorrect propagation of RelCollation trait for Sort-based map/reduce aggregates. IGNITE-20164 Sql. Incorrect propagation of RelCollation trait for Sort-based map/reduce aggregates Aug 23, 2023
*/
public static Mappings.TargetMapping trimmingMapping(int sourceSize, ImmutableBitSet requiredElements) {
public static Mapping trimmingMapping(int sourceSize, ImmutableBitSet requiredElements) {
Mapping mapping = Mappings.create(MappingType.PARTIAL_FUNCTION, sourceSize, requiredElements.cardinality());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, let's change PARTIAL_FUNCTION to INVERSE_SURJECTION. That way we can get rid of inverseTrimmingMapping method, because created mapping start to work both ways

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@MethodSource("provideRules")
public void testAggDistinctGroupSet(String[] rules) {
try {
sql("CREATE TABLE testMe (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER);");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to create new table here? Currently, creation of a new table is a pretty time consuming operation (it takes about 10sec on my laptop). Does it make sense to use any table from initTestData?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all CREATE TABLE statements in this file to initTestData.

* </pre>
*
* <p>Otherwise this operator is unable to fully satisfy required ordering and we require collation trait based
* columns from the grouping set, and the rest of of the requirements is going to be satisfied by enforcer operator:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* columns from the grouping set, and the rest of of the requirements is going to be satisfied by enforcer operator:
* columns from the grouping set, and the rest of the requirements is going to be satisfied by enforcer operator:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* </pre>
*
* @param nodeTraits Required relational node output traits.
* @param inputTraits Required relational node input traits.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not required traits but actual traits of the input

methods for pass through and derive have similar signature, but different semantic:

  • for pass through we are saying "hey, we want you to satisfy nodeTraits. Btw, here is your input, do you want to adjust it as well?"
  • for derive we are saying "hey, let's see what your child is offering to you (inputTraits). Btw, it's you (nodeTraits)"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

agg.getGroupSet(),
agg.getGroupSets(),
outTraits.replace(IgniteDistributions.single()),
convert(map, inTraits.replace(IgniteDistributions.single())),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say, input of reduce phase should have the same traits as its output

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*
* <p>Distribution hash(0)
*/
CASE_24_1A("SELECT COUNT(val0), COUNT(DISTINCT(val1)) from test", schema(hash()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check plans for hash distribution: hash(val1)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases for both hash(val0) and hash(val1).

@lowka lowka requested review from AMashenkov and korlov42 August 29, 2023 05:47
lowka added 2 commits August 31, 2023 16:40
…rt-based map/reduce aggregates

Aggregate planner tests: Add test cases for identity distribution.
…rt-based map/reduce aggregates

Fix checkstyle issues.
@lowka lowka requested review from korlov42 and zstan September 1, 2023 08:34
@korlov42 korlov42 merged commit 950a3c0 into apache:main Sep 1, 2023
@korlov42 korlov42 deleted the IGNITE-20164 branch September 1, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants