New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-6861] Add support to specify a query in CassandraIO #8090
Conversation
@iemejia - this is very simple PR, would you have time check it? Or can you recommend someone who would have time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm I hadn't seen how this has evolved. It is not good to keep adding more options and methods to the main API for something that in the end is only building a query. Maybe worth to take a look if we can generalize this to allow to pass the full query akin to how JdbcIO
does withQuery(String query)
and refactor the existing code to use this, we can also provide some external builder class to build the query (we are doing this in a recent PR on MongoDbIO), but well this last part (the helper builder) is probably worth a different PR.
@@ -483,6 +508,11 @@ private CassandraIO() {} | |||
.collect(Collectors.joining(",")); | |||
|
|||
List<BoundedSource<T>> sources = new ArrayList<>(); | |||
String selectFields = "*"; | |||
if (spec.selectFields() != null) { | |||
selectFields = String.join(",", spec.selectFields().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please ensure that this produces a valid string e.g. "*" if selectFields
is an empty List. I know this won't happen because of the precondition but better to make the code more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
for (Scientist sci : input) { | ||
assertNotNull(sci.id); | ||
assertNull(sci.name); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you read nameTs
here? Since objects are mapped I was wondering if it will be zero, probably worth asserting that to validate that the values were not 'filled' (given that they were not projected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nameTs will be null as it is not returned by Row so it's not mapped.
Agree on assert.
} | ||
|
||
@Test | ||
public void testSelectFieldsWithFunction() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this method, it is almost identical to the previous one or is there something I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is testing UDFs (writetime(person_name)), but I can merge this into one test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes better do only one test
@stankiewicz WDYT about the |
@iemejia , thanks for feedback - challenge with withQuery in JdbcIO is that it is expected to be full query, like |
Select is generated in Cassandra only to add the ranges no? I mean I don't see a huge difference with MongoDbIO who does something similar. And about the field selection well it could be more user friendly to have the method, but it also adds more maintenance work that will be completely fixed just by supporting the full query, We will even support unforeseen cases. |
so what do you thing about withQuery like in sqoop where user have to provide $CONDITIONS tag so library can insert range filters? I'm happy to rewrite it with such approach to have full select statement with placeholder for internal filters and I will support old where method as well and mark it as deprecated. |
to summarise change - if user provides withQuery, I will support where field, otherwise I will produce query based on schema, table name and where field (if exists). |
if user provides |
ok, @iemejia, I've done the change. Still I fill it's not perfect with this $CONDTITONS part but I don't find more ellegant way. |
Sorry for barging in on this. WDYT? |
Hi @srfrnk! If we figure out serialisation then we can clean this up a little bit. In the meantime I would try to merge this. |
@iemejia , WDYT about current implementation? |
@stankiewicz Sorry for slow review, I will check ASAP probably today or tuesday at latest. Thanks. |
@iemejia , no problem :) Thanks for all the effort! |
I am reviewing this. Mostly LGTM, I will probably do the missing minor fixes and merge, will confirm later on. Thanks. |
@iemejia - thanks a lot! |
@iemejia , Ismael, any chance closing that before cut-off or are we already too late? Thanks, Radek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Merging manually to do some final touches.
Thanks a lot @stankiewicz ! |
CassandraIO.Read currently selects all fields and maps them to POJO.
With this change it is possible to select only subset of fields or added computed fields to allow reading things like write Timestamp or other functions.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.