Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][connector-jdbc-base] avoid consume historical data from db to consume latest db updates events #1300

Closed
3 tasks done
baisui1981 opened this issue Oct 11, 2022 · 3 comments · Fixed by #1301
Closed
3 tasks done
Labels
feature-request this is a feature requests on the product

Comments

@baisui1981
Copy link
Contributor

baisui1981 commented Oct 11, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Chunjun jdbc source connector start to consume an DB table,sometimes the table contains a massive historical data,in order to skipping historical data consuming. By now ,We can set param startLocation that corresponding with increColumn,but that is extremely troublesome.

In order to simplify the configuartion of Chunjun task,There can be a means that set a flag to inform the source task,avoiding consume historical data from db to consume latest db updates events ,however,param setting of startLocation is not required.

In my opion,can set the param startLocation to a special value token ,for example ‘latest’, In [JdbcInputFormat],

https://github.com/DTStack/chunjun/blob/master/chunjun-connectors/chunjun-connector-jdbc-base/src/main/java/com/dtstack/chunjun/connector/jdbc/source/JdbcInputFormat.java#L146-L164

if startLocation is latest, then execute the getMaxValueFromDb() to get the max value of increColumn, finally , to create JdbcInputSplit, the Pseudocode as below:

public JdbcInputSplit[] createSplitsInternalBySplitMod(int minNumSplits, String startLocation) {
        JdbcInputSplit[] splits = new JdbcInputSplit[minNumSplits];
        if (StringUtils.isNotBlank(startLocation)) {
            if ("latest".equals(startLocation)) {
              
                String maxValueFromDb = this.getMaxValueFromDb();
     
                return new JdbcInputSplit[]{new JdbcInputSplit(
                        0,
                        minNumSplits,
                        0,
                        maxValueFromDb,
                        null,
                        null,
                        null,
                        "mod",
                        jdbcConf.isPolling())};
            }
}

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@baisui1981 baisui1981 added the feature-request this is a feature requests on the product label Oct 11, 2022
@FlechazoW
Copy link
Member

It's a nice idea, and I prefer using another parameter, instead of using 'startLocation'.

@baisui1981
Copy link
Contributor Author

It's a nice idea, and I prefer using another parameter, instead of using 'startLocation'.

which one do you like?

@mggger
Copy link
Contributor

mggger commented Oct 12, 2022

flink-cdc use StartupMode

public enum StartupMode {
    INITIAL,

    EARLIEST_OFFSET,

    LATEST_OFFSET,

    SPECIFIC_OFFSETS
}

Paddy0523 added a commit to Paddy0523/chunjun that referenced this issue Oct 12, 2022
…hat the pollingMode starts with the maximum value of the incrementColumn
@Paddy0523 Paddy0523 linked a pull request Oct 12, 2022 that will close this issue
9 tasks
FlechazoW added a commit that referenced this issue Oct 18, 2022
… pollingMode starts with the maximum value of the incrementColumn (#1301)

Co-authored-by: FlechazoW <35768015+FlechazoW@users.noreply.github.com>
lyzeo pushed a commit to lyzeo/chunjun that referenced this issue Oct 19, 2022
…hat the pollingMode starts with the maximum value of the incrementColumn (DTStack#1301)

Co-authored-by: FlechazoW <35768015+FlechazoW@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request this is a feature requests on the product
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants