Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink 3750 fixed #1941

Closed
wants to merge 10 commits into from
Closed

Conversation

fpompermaier
Copy link
Contributor

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

  • General
    • The pull request references the related JIRA issue
    • The pull request addresses only one issue
    • Each commit in the PR has a meaningful commit message
  • Documentation
    • Documentation has been added for new functionality
    • Old documentation affected by the pull request has been updated
    • JavaDoc for public methods has been added
  • Tests & Build
    • Functionality added by the pull request is covered by tests
    • mvn clean verify has been executed successfully locally or a Travis build has passed

New cleaned PR aligned with the current master


public JDBCInputFormat() {
}

@Override
public void configure(Configuration parameters) {
//called once per inputFormat (on open)
try {
establishConnection();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the new method in RichInputFormat (openInputFormat() and closeInputFormat())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I forgotto include the new connection management when I closed the previous branch..

@fhueske
Copy link
Contributor

fhueske commented May 2, 2016

Thanks for the PR @fpompermaier.
I think the new format is a bit too much tailored towards certain query templates (BETWEEN predicate on integer column). Also modifying queries that users provide, is a bit risky, IMO.

To make it more general I would propose to:

  • Accept query templates with markers, similar to parameter makers in prepared statements: SELECT address FROM people WHERE name = ? AND birthday BETWEEN ? AND ?.
  • Let users explicitly provide bounds. Users should know their data best and can provide bounds which take skewed distributions into account. Parameter values can be provided as Object[], one array for each parameter. We can provide some utility methods to help users generating uniformly distributed parameter values.
  • Let InputSplit not provide two bound values but the index for the parameter value array. So each instance can build the query by substituting the parameters by values.

What do you think?

/**
* Query parameters container
*/
public class QueryParamInputSplit implements InputSplit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a special InputSplit.
If we store the binding values in a Object[][] member variable of the JDBCInputFormat, we only need an index to look up the respective values. So we can use GenericInputSplit which has a split number that can be used as index.

@fhueske
Copy link
Contributor

fhueske commented May 4, 2016

Thanks for the update @fpompermaier. I added a few comments and suggestions.

* This splits generator allows the apply user defined splits (computed outside the IF)
*
* */
public class GenericSplitsGenerator implements JDBCInputSplitsGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to GenericParameterValueProvider

@@ -81,25 +134,51 @@ public void configure(Configuration parameters) {
* @throws IOException
*/
@Override
public void open(InputSplit ignored) throws IOException {
public void open(InputSplit inputSplit) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDocs need to be updated for new parameter name

@@ -352,6 +318,10 @@ public JDBCInputFormatBuilder setResultSetConcurrency(int resultSetConcurrency)
format.resultSetConcurrency = resultSetConcurrency;
return this;
}
public JDBCInputFormatBuilder setParametersProvider(ParameterValuesProvider parameterValuesProvider) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new line above

@fhueske
Copy link
Contributor

fhueske commented May 11, 2016

Thanks for the update @fpompermaier. Overall the PR looks good. I added a few comments and suggestions. Thanks, Fabian

*
* WARNING: this may fail if the JDBC driver doesn't handle null correctly and no column types specified in the SqlRow
* WARNING: this may fail when no column types specified (because a best effort approach is attempted in order to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning should also go into the JavaDocs of the JdbcOutputFormat class.

@fhueske
Copy link
Contributor

fhueske commented May 18, 2016

Hi @fpompermaier, I added one more comment. I'm not sure if you noticed the other two comments I made a fews day back to JDBCFullTest and JdbcInputFormat.nextRecord() after your last update of the PR. Thanks, Fabian

* query template (i.e. a valid {@link PreparedStatement}) and a {@link ParameterValuesProvider}
* which provides binding values for the query parameters.
*
* A valid RowTypeInfo must be properly configured in the builder, e.g.: </br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice JavaDocs!

@fhueske
Copy link
Contributor

fhueske commented May 18, 2016

Thanks for the update @fpompermaier. Just had a few minor comments. After that this PR should be good to merge.

@fhueske
Copy link
Contributor

fhueske commented May 18, 2016

Thanks for the update. Looks good. Will merge it later :-)

@fhueske
Copy link
Contributor

fhueske commented May 18, 2016

Merging

@asfgit asfgit closed this in 09b428b May 18, 2016
mbode pushed a commit to mbode/flink that referenced this pull request May 27, 2016
…ormat.

- New Input- and OutputFormat use Row instead of Tuple types to support null values.
- JdbcInputFormat supports parallel input due to PreparedStatement and binding values for parameters.

This closes apache#1941
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants