New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-7014: Query compiler/optimizer changes along with some PHOENIX-7015 changes #1766
PHOENIX-7014: Query compiler/optimizer changes along with some PHOENIX-7015 changes #1766
Conversation
… debug code and tests
…of the function definition
… type of ProjectedColumnExpression instead of KeyValueColumnExpression
…ompletely. Make index as the physical table for CDC
private Map<ImmutableBytesPtr, String> dataColQualNameMap; | ||
private Map<ImmutableBytesPtr, PDataType> dataColQualTypeMap; | ||
// Map<dataRowKey: Map<TS: Map<qualifier: Cell>>> | ||
private Map<ImmutableBytesPtr, Map<Long, Map<ImmutableBytesPtr, Cell>>> dataRowChanges = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty complicated. Maybe you could introduce some abstraction which represents a change as a object. Then you could have a collection of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to be a quick PoC sort of code and should go away as @TheNamesRai is already reworking the logic to build the row without having to use this intermediate data structure.
conn.createStatement().execute("UPSERT INTO " + tableName + " (k, v1) VALUES (1, 101)"); | ||
conn.commit(); | ||
String cdcName = generateUniqueName(); | ||
String cdc_sql = "CREATE CDC " + cdcName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everytime we run CREATE CDC
do we need to pass PHOENIX_ROW_TIMESTAMP
? If that is something mandatory we shouldn't make it part of the interface and the system should do it implicitly under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it seems the uncovered index that is being created as part of CREATE CDC is being created synchronously. That will only work for small tables. For bigger tables the index needs to be created async and then built explicitly using IndexTool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everytime we run CREATE CDC do we need to pass PHOENIX_ROW_TIMESTAMP ? If that is something mandatory we shouldn't make it part of the interface and the system should do it implicitly under the hood.
The feature allows any timestamp like column to be used instead of the PHOENIX_ROW_TIMESTMAP that is why this flexibility is being allowed, however I am not 100% sure about the use cases, I will have an offline discussion on this, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bigger tables the index needs to be created async and then built explicitly using IndexTool.
If we do this, won't we also need to project the index status via CDC object, since we intend to keep the index hidden as an implementation detail? Do you think we should have an offline mode for CREATE CDC itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, won't we also need to project the index status via CDC object
I think that can be later improvement but supporting creation of async index is necessary because creating sync index for large table will anyways be too expensive or not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, won't we also need to project the index status via CDC object
@haridsv I didn't quite understand what you mean by index status here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand what you mean by index status here ?
I was referring to the async index creation lifecycle, which should be attributed to the CDC so that user can query and know when it is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkhurana In any case, I am not planning to address these as part of this PR, so are you OK to merge this as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, we can tackle that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkhurana if you are good with the current state with any additional comments to be taken care of in subsequent PRs against feature branch (or final patch against master branch), we can merge this. |
With follow-ups to come in separate PRs, merging this one. |
7420443
into
apache:PHOENIX-7001-feature
PR #1794 is the follow up that implements the query functionality to the spec. |
Address the changes in the CREATE CDC spec in PHOENIX-7001 Also includes some review feedback changes across the changes in PRs apache#1681, apache#1703 and apache#1766
Address the changes in the CREATE CDC spec in PHOENIX-7001 Also includes some review feedback changes across the changes in PRs apache#1681, apache#1703 and apache#1766
Address the changes in the CREATE CDC spec in PHOENIX-7001 Also includes some review feedback changes across the changes in PRs apache#1681, apache#1703 and apache#1766
This commit includes a squash of all the below changes from the PRs apache#1662, apache#1694, down by removing non-functional changes for the ease of review. On top of it, the changes have been reworked for the changed state of master, especially the new submodules. * 4c9827a Hari.. PHOENIX-7008 Shallow grammar support for CREATE CDC (apache#1662) * e5220e0 TheN.. PHOENIX-7054 Shallow grammar support for DROP CDC and ALTER CDC (apache#1694) * 581e613 Hari.. PHOENIX-7008 Implementation for CREATE CDC (apache#1681) * e2ef886 Hari.. PHOENIX-7008 Tweaks, fixes and additional test coverage for CREATE CDC (apache#1703) * 5d3fd40 TheN.. PHOENIX-7074 DROP CDC Implementation (apache#1713) * 7420443 Hari.. PHOENIX-7014: Query compiler/optimizer changes along with some PHOENIX-7015 changes (apache#1766) * da6ddad Kadi.. Add an extra delete mutation for CDC * 93d586e Kadi.. Add an extra delete mutation during rebuild for CDC index * f07898f Hari.. PHOENIX-7008: Addressing Jira spec and review feedback changes (apache#1802) * 87a2ea1 Hari.. PHOENIX-7008: Fix for parser gap and fix for failing test (apache#1812) * e395780 TheN.. PHOENIX-7015 Implementing CDCGlobalIndexRegionScanner (apache#1813) Co-authored-by: Saurabh Rai <saurabh.rai@salesforce.com>
…ture This commit includes a squash of all the below changes from the PRs apache#1662, apache#1694, down by removing non-functional changes for the ease of review. On top of it, the changes have been reworked for the changed state of master, especially the new submodules. * 4c9827a Hari.. PHOENIX-7008 Shallow grammar support for CREATE CDC (apache#1662) * e5220e0 TheN.. PHOENIX-7054 Shallow grammar support for DROP CDC and ALTER CDC (apache#1694) * 581e613 Hari.. PHOENIX-7008 Implementation for CREATE CDC (apache#1681) * e2ef886 Hari.. PHOENIX-7008 Tweaks, fixes and additional test coverage for CREATE CDC (apache#1703) * 5d3fd40 TheN.. PHOENIX-7074 DROP CDC Implementation (apache#1713) * 7420443 Hari.. PHOENIX-7014: Query compiler/optimizer changes along with some PHOENIX-7015 changes (apache#1766) * da6ddad Kadi.. Add an extra delete mutation for CDC * 93d586e Kadi.. Add an extra delete mutation during rebuild for CDC index * f07898f Hari.. PHOENIX-7008: Addressing Jira spec and review feedback changes (apache#1802) * 87a2ea1 Hari.. PHOENIX-7008: Fix for parser gap and fix for failing test (apache#1812) * e395780 TheN.. PHOENIX-7015 Implementing CDCGlobalIndexRegionScanner (apache#1813) Co-authored-by: Saurabh Rai <saurabh.rai@salesforce.com>
This PR replaces #1738 to include some experimental scanner changes for PHOENIX-7015.
This has the changes needed for PHOENIX-7013 and PHOENIX-7014. It short-circuits the existing optimizer logic to produce a plan that is suitable for CDC queries. It also reimplements the query hints so that it is simpler and easier to maintain using regex. It includes a PoC query test and some debug test cases for short term reference (can be removed before merging to master).
This PR also goes one step further to generating a mock JSON string (PoC in #1738) to be able to generate results from the actual data (though, not per the expected format). I temporary added Gson as a dependency to generate the result, but this will be replaced with BSON once the JSON support is available. Here is a quick demo of how this works: