Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6675] Generate JDBC statement and preparedStatementSetter automatically when schema is available #8962

Merged
merged 12 commits into from
Jul 16, 2019

Conversation

JawadHyder
Copy link
Contributor

@JawadHyder JawadHyder commented Jun 28, 2019

Added functionality to automatically generate JDBC insert statement and preparedStatementSetter when there is schema attached to the PCollection. This functionality is triggered when the user has not provided withStatement and withPreparedStatementSetter during JdbcIO.write() operation.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@JawadHyder
Copy link
Contributor Author

retest this please

@JawadHyder
Copy link
Contributor Author

R: @reuvenlax
R: @jbonofre

@JawadHyder
Copy link
Contributor Author

Run Java PreCommit

@JawadHyder
Copy link
Contributor Author

Run JavaPortabilityApi PreCommit

@salmanVD
Copy link
Contributor

salmanVD commented Jul 1, 2019

Run Java PreCommit


/** PreparedStatementSetCaller for Schema Field types. * */
private static Map<Schema.TypeName, JdbcIO.PreparedStatementSetCaller> typeNamePsSetCallerMap =
new EnumMap<Schema.TypeName, JdbcIO.PreparedStatementSetCaller>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems missing handling some types in Schema?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O NVM. There is a separate section to handle other types (array, logical types, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually BYTES is not handled?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case DATE:
return (element, ps, i, fieldWithIndex) -> {
ps.setDate(
i + 1, new Date(element.getDateTime(fieldWithIndex.getIndex()).getMillis()));
Copy link
Contributor

@amaliujia amaliujia Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is it correct to set epoch millis for Date and Time type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor for Date & Time takes a long value so I don't see any issue setting epoch for date & time. Do you have any other thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding we should use these constructors instead of epoch
Time - Time(int hour, int minute, int second)
Date - Date(int year, int month, int day)
but both are deprecated as per docs, are you referring to the same thing @amaliujia ?

Copy link
Contributor

@amaliujia amaliujia Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar, for example:

The date components should be set to the "zero epoch" value of January 1, 1970 and should not be accessed.

I am not sure if current construction for TIME is doing this since the millis is used directly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated for both date and time objects
done

return (element, ps, i, fieldWithIndex) -> {
ps.setTimestamp(
i + 1,
new Timestamp(element.getDateTime(fieldWithIndex.getIndex()).getMillis()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a timezoned type, is time zone handled here?

Copy link

@snaeemk snaeemk Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia there is a method in PreparedStatement.java setTimestamp which takes Calendar instance as third argument. Can we use this to set UTC timezone? of any other suggestion you have

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking element.getDateTime will return a Joda Datetime, which has timezone information already?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp is the moment in time I don't think timezone is needed here because when user is going to read data from DB (timestamp field) then it needs to be converted at that point.
Please check this link https://stackoverflow.com/questions/15994450/joda-datetime-to-timestamp-conversion/15994657#15994657
can you share something related to your point? may be I am missing something here

Copy link
Contributor

@amaliujia amaliujia Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two TIMESTAMP: TIMESTAMP without TIMEZONE(sometimes I saw it's called DATETIME), TIMESTAMP with TIMEZONE.

In this implementation, it's processing a TIMESTAMP with TIMEZONE type, which definitely has a timezone information that matters. I think here are two concepts matter (which is also covered by your link): an absolute point in time and an offset. Offset (specified by timezone) should not be lost for TIMESTAMP with TIMEZONE

I find this SO question is useful: https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used setTimestamp with calendar instance for timezone
done


private static void validateLogicalTypeLength(Schema.Field field, Integer length) {
if (field.getType().getTypeName().isLogicalType()
&& length >= Integer.parseInt(field.getType().getLogicalType().getArgument())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logical type argument is not always set. If the argument is not set as an Integer, maybe we can further check base type?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia the case is if the string length of input field exceeds from the table max limit then exception should be raised, base type will still be same even if the length differs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, could we check input field to see if it is bytes or string?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input fields are checked in the switch case then called createStringCaller() for string for bytes it's createBytesCaller()
https://github.com/JawadHyder/beam/blob/9bb7280a65a4ea506f36d8597aaee3500633ef5a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java#L155
If only type check is enough then wouldn't it truncate the string/bytes if it exceeds the max limit from the table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O I see. Then that's good enough. I wasn't sure how this function is called. E.g. If this function is called to verify other types like Date, Time, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No these methods are called for string and bytes due to variable length. So one thing, if the argument is not set should we throw exception (the field details like argument is fetched from table schema so I don't think it can be null) or let it proceed with the insertion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument is an optional parameter. So I think if it is not set, we can proceed to insertion.

Copy link

@snaeemk snaeemk Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are 3 cases:

  1. If argument is not set - bypass the check and proceed with insertion (can we use a warning message here as data might truncate due to length mismatch?)
  2. If argument is set and it is not integer - do same as above
  3. If argument is set and it is integer - throw exception if length mismatch

Let's finalize these cases or any other case you think might possible or we can raise separate jira for this and remove checks from here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to all of these. It would be backward-compatible to allow bypassing checking in 1 and 2 such that existing types won't break if setting an argument.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@snaeemk
Copy link

snaeemk commented Jul 9, 2019

Run Java PreCommit

@snaeemk
Copy link

snaeemk commented Jul 12, 2019

@amaliujia can you please review the comments again?

@amaliujia
Copy link
Contributor

Thanks @sehrish-vd and @JawadHyder for working on this PR! Responded to comments!

@snaeemk
Copy link

snaeemk commented Jul 15, 2019

Run Java PreCommit

@amaliujia
Copy link
Contributor

LGTM

@lgajowy
Copy link
Contributor

lgajowy commented Jul 16, 2019

Run Java JdbcIO Performance Test

@lgajowy lgajowy merged commit 41478d0 into apache:master Jul 16, 2019
@amaliujia
Copy link
Contributor

Thanks @lgajowy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants