Skip to content

Set explain attributes after the query is prepared#14490

Merged
zachjsh merged 6 commits intoapache:masterfrom
abhishekrb19:add_withas_support
Jul 6, 2023
Merged

Set explain attributes after the query is prepared#14490
zachjsh merged 6 commits intoapache:masterfrom
abhishekrb19:add_withas_support

Conversation

@abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Jun 27, 2023

In this PR:

  • Set the explain plan attributes after the query is prepared, when we've the finalized output names in rootQueryRel. This also gives us the benefit of going through all validation checks in the prepare phase.
  • Simplify resolveClusteredByColumnsToOutputColumns function for explain attributes as things are already validated and we just have one source list, rootQueryRel, containing the output column names; the existing unit tests work right out of the box. Added new unit tests using JOIN and WITH...AS to prove that the logic no longer depends on individual SqlNode types.
  • Add a negative ordinal check for clustered by columns as a workaround to Calcite's limitation; added unit tests to validate this.
Key changed/added classes in this PR
  • DruidSqlParserUtils.java
  • QueryHandler.java
  • DruidSqlParserUtilsTest.java
  • CalciteInsertDmlTest.java

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested in a test Druid cluster.

@gianm
Copy link
Contributor

gianm commented Jun 28, 2023

This change would make the "explain clustered by" feature work in more cases, but it definitely still won't work in all cases, and IMO we don't want to make the entire EXPLAIN statement blow up if we can't figure out the clustered by columns. So, I think we should go in one of two directions for this work:

  1. Figure out a way to make the "explain clustered by" feature always work, which might potentially involve moving this logic somewhere else, or refactoring something.
  2. Embrace that it might not always work, and have that not fail the EXPLAIN, but instead return some kind of signifier in the response that we can't figure out the clustered-by columns. Callers would need to look at that to be able to differentiate "no clustered by" from "can't figure out clustered by".

- Set the explain plan attributes after the query is prepared when
the query is planned and we've the finalized output names in the root
source rel node.
- Adjust tests; add unit test for negative ordinal case.
- Remove the exception / error handling logic from resolveClusteredBy
function since the validations now happen before it comes to the function
@abhishekrb19
Copy link
Contributor Author

@gianm, thanks for the review. Yes, I went ahead with 1 -- moving this logic to after the query is prepared enables us to extract the necessary output columns from rootQueryRel always - instead of relying on the different SqlNode types in the pre-prepare phase. Another benefit of doing it after the query preparation phase is that it already goes through the validation checks. IMO, the code looks simpler.

@abhishekrb19 abhishekrb19 changed the title Add support for WITH clause in explain plan clustered by columns Set explain attributes after the query is prepared Jun 30, 2023
@zachjsh zachjsh self-requested a review July 5, 2023 23:21
Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@zachjsh zachjsh merged commit d02bb8b into apache:master Jul 6, 2023
@abhishekrb19 abhishekrb19 deleted the add_withas_support branch July 18, 2023 16:35
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
* Add support for DML WITH AS.

* One more UT for with as subquery.

* Add a test with join query

* Use root query prepared node instead of individual SqlNode types.

- Set the explain plan attributes after the query is prepared when
the query is planned and we've the finalized output names in the root
source rel node.
- Adjust tests; add unit test for negative ordinal case.
- Remove the exception / error handling logic from resolveClusteredBy
function since the validations now happen before it comes to the function

* Update comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants