New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WHERE clauses in custom queries in incremental mode #112

Open
shikhar opened this Issue Aug 22, 2016 · 9 comments

Comments

Projects
None yet
6 participants
@shikhar
Contributor

shikhar commented Aug 22, 2016

Currently TimestampIncrementingTableQuerier appends aWHERE clause which causes invalid SQL if there is already one in the SQL.

One approach could be to detect if there is already a WHERE, but that seems error prone.

I propose we allow for the query to contain placeholder text like "${incrementalClause}". If that is present that gets replaced with the timestmap/ID conditions. If it is absent, we can append a "WHERE .. conditions .. " as currently.

@ewencp

This comment has been minimized.

Show comment
Hide comment
@ewencp

ewencp Aug 23, 2016

Member

Yeah, this seems like a reasonable solution if we go with string interpolation more generally across the framework/connectors. I had avoided handling this so far because I didn't really want to try to robustly handle the SQL syntax/parsing.

Member

ewencp commented Aug 23, 2016

Yeah, this seems like a reasonable solution if we go with string interpolation more generally across the framework/connectors. I had avoided handling this so far because I didn't really want to try to robustly handle the SQL syntax/parsing.

@teq0

This comment has been minimized.

Show comment
Hide comment
@teq0

teq0 Dec 20, 2016

I'd be very keen for this to go ahead, it would open up a lot of possibilities that just aren't possible right now. Can I suggest that maybe a few variables could be exposed, not just the ${incrementalClause}, which would only work for straight queries. Exposing, say, ${currentIncrementValue} and ${currentTimestampValue} would allow them to be passed to stored procedures as well.

teq0 commented Dec 20, 2016

I'd be very keen for this to go ahead, it would open up a lot of possibilities that just aren't possible right now. Can I suggest that maybe a few variables could be exposed, not just the ${incrementalClause}, which would only work for straight queries. Exposing, say, ${currentIncrementValue} and ${currentTimestampValue} would allow them to be passed to stored procedures as well.

@yangfeiran

This comment has been minimized.

Show comment
Hide comment
@yangfeiran

yangfeiran Feb 13, 2017

#191, this is my pull request for the change I made to solve your problem.
You have to use lower case "where" if you use my change.

yangfeiran commented Feb 13, 2017

#191, this is my pull request for the change I made to solve your problem.
You have to use lower case "where" if you use my change.

@teq0

This comment has been minimized.

Show comment
Hide comment
@teq0

teq0 Feb 15, 2017

Could you please make it case-insensitive? It only gets called once for any task, the overhead shouldn't matter.

This doesn't solve the stored procedure problem though, that requires using the templated variables discussed above. I'll have a go at it myself when I get a minute.

teq0 commented Feb 15, 2017

Could you please make it case-insensitive? It only gets called once for any task, the overhead shouldn't matter.

This doesn't solve the stored procedure problem though, that requires using the templated variables discussed above. I'll have a go at it myself when I get a minute.

@miketzian

This comment has been minimized.

Show comment
Hide comment
@miketzian

miketzian Mar 11, 2017

Another way would be to add a 'query.condition' config parameter, which would replace the normal where clause generated by TimestampIncrementingTableQuerier, (in this case for a stored proc):

query.condition=@from_ts = ?, @to_ts = ?

The developer could then write whatever where clause they need based on their use of incrementing and/or timestamp columns.

miketzian commented Mar 11, 2017

Another way would be to add a 'query.condition' config parameter, which would replace the normal where clause generated by TimestampIncrementingTableQuerier, (in this case for a stored proc):

query.condition=@from_ts = ?, @to_ts = ?

The developer could then write whatever where clause they need based on their use of incrementing and/or timestamp columns.

yangfeiran added a commit to yangfeiran/kafka-connect-jdbc that referenced this issue Jul 25, 2017

fix "Support WHERE clauses in custom queries in incremental mode #112"
…#191

This pull can easy solve this issue. It allows people to add "where" condition in the query part of a jdbc sink. So that, people can have any kind of offset for their jdbc sink instead of querying data from the beginning of a table or view.

yangfeiran added a commit to yangfeiran/kafka-connect-jdbc that referenced this issue Jul 26, 2017

fix "Support WHERE clauses in custom queries in incremental mode #112
This pull can easy solve this issue. It allows people to add "where" condition in the query part of a jdbc sink. So that, people can have any kind of offset for their jdbc sink instead of querying data from the beginning of a table or view.
@kgeis

This comment has been minimized.

Show comment
Hide comment
@kgeis

kgeis Jul 27, 2017

There are (limited) workarounds for this, depending on the level of SQL support in the underlying database. Examples:

SELECT * FROM (SELECT... WHERE...)

WITH a AS
   SELECT * FROM b
    WHERE ...
SELECT * FROM a

kgeis commented Jul 27, 2017

There are (limited) workarounds for this, depending on the level of SQL support in the underlying database. Examples:

SELECT * FROM (SELECT... WHERE...)

WITH a AS
   SELECT * FROM b
    WHERE ...
SELECT * FROM a
@yangfeiran

This comment has been minimized.

Show comment
Hide comment
@yangfeiran

yangfeiran Jul 28, 2017

you are absolutely right, but the right syntax is " SELECT * FROM ( SELECT * FROM table WHERE ...) as a"

yangfeiran commented Jul 28, 2017

you are absolutely right, but the right syntax is " SELECT * FROM ( SELECT * FROM table WHERE ...) as a"

@kgeis

This comment has been minimized.

Show comment
Hide comment
@kgeis

kgeis Jul 28, 2017

@yangfeiran, either I'm absolutely right or I'm not. No buts! :)

The alias is necessary on some databases (like PostgreSQL) but not on others (like Oracle).

kgeis commented Jul 28, 2017

@yangfeiran, either I'm absolutely right or I'm not. No buts! :)

The alias is necessary on some databases (like PostgreSQL) but not on others (like Oracle).

@yangfeiran

This comment has been minimized.

Show comment
Hide comment
@yangfeiran

yangfeiran commented Jul 28, 2017

@kgeis cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment