-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-6675] Generate JDBC statement and preparedStatementSetter automatically when schema is available #8962
Conversation
retest this please |
R: @reuvenlax |
Run Java PreCommit |
Run JavaPortabilityApi PreCommit |
…atically when schema is available
Run Java PreCommit |
|
||
/** PreparedStatementSetCaller for Schema Field types. * */ | ||
private static Map<Schema.TypeName, JdbcIO.PreparedStatementSetCaller> typeNamePsSetCallerMap = | ||
new EnumMap<Schema.TypeName, JdbcIO.PreparedStatementSetCaller>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems missing handling some types in Schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O NVM. There is a separate section to handle other types (array, logical types, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually BYTES is not handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case DATE: | ||
return (element, ps, i, fieldWithIndex) -> { | ||
ps.setDate( | ||
i + 1, new Date(element.getDateTime(fieldWithIndex.getIndex()).getMillis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is it correct to set epoch millis
for Date and Time type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor for Date & Time takes a long value so I don't see any issue setting epoch for date & time. Do you have any other thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here we should obey java.sql.Date
and java.sql.Time
's specs.
https://docs.oracle.com/javase/7/docs/api/java/sql/Date.html
https://docs.oracle.com/javase/8/docs/api/index.html?java/sql/Time.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding we should use these constructors instead of epoch
Time - Time(int hour, int minute, int second)
Date - Date(int year, int month, int day)
but both are deprecated as per docs, are you referring to the same thing @amaliujia ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar, for example:
The date components should be set to the "zero epoch" value of January 1, 1970 and should not be accessed.
I am not sure if current construction for TIME
is doing this since the millis is used directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated for both date and time objects
done
return (element, ps, i, fieldWithIndex) -> { | ||
ps.setTimestamp( | ||
i + 1, | ||
new Timestamp(element.getDateTime(fieldWithIndex.getIndex()).getMillis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a timezoned
type, is time zone handled here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia there is a method in PreparedStatement.java setTimestamp which takes Calendar instance as third argument. Can we use this to set UTC timezone? of any other suggestion you have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking element.getDateTime
will return a Joda Datetime, which has timezone information already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp is the moment in time I don't think timezone is needed here because when user is going to read data from DB (timestamp field) then it needs to be converted at that point.
Please check this link https://stackoverflow.com/questions/15994450/joda-datetime-to-timestamp-conversion/15994657#15994657
can you share something related to your point? may be I am missing something here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two TIMESTAMP: TIMESTAMP without TIMEZONE
(sometimes I saw it's called DATETIME
), TIMESTAMP with TIMEZONE
.
In this implementation, it's processing a TIMESTAMP with TIMEZONE
type, which definitely has a timezone information that matters. I think here are two concepts matter (which is also covered by your link): an absolute point in time and an offset. Offset (specified by timezone) should not be lost for TIMESTAMP with TIMEZONE
I find this SO question is useful: https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used setTimestamp with calendar instance for timezone
done
|
||
private static void validateLogicalTypeLength(Schema.Field field, Integer length) { | ||
if (field.getType().getTypeName().isLogicalType() | ||
&& length >= Integer.parseInt(field.getType().getLogicalType().getArgument())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logical type argument is not always set. If the argument is not set as an Integer, maybe we can further check base type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia the case is if the string length of input field exceeds from the table max limit then exception should be raised, base type will still be same even if the length differs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, could we check input field to see if it is bytes or string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input fields are checked in the switch case then called createStringCaller() for string for bytes it's createBytesCaller()
https://github.com/JawadHyder/beam/blob/9bb7280a65a4ea506f36d8597aaee3500633ef5a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java#L155
If only type check is enough then wouldn't it truncate the string/bytes if it exceeds the max limit from the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O I see. Then that's good enough. I wasn't sure how this function is called. E.g. If this function is called to verify other types like Date, Time, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No these methods are called for string and bytes due to variable length. So one thing, if the argument is not set should we throw exception (the field details like argument is fetched from table schema so I don't think it can be null) or let it proceed with the insertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument is an optional parameter. So I think if it is not set, we can proceed to insertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are 3 cases:
- If argument is not set - bypass the check and proceed with insertion (can we use a warning message here as data might truncate due to length mismatch?)
- If argument is set and it is not integer - do same as above
- If argument is set and it is integer - throw exception if length mismatch
Let's finalize these cases or any other case you think might possible or we can raise separate jira for this and remove checks from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to all of these. It would be backward-compatible to allow bypassing checking in 1 and 2 such that existing types won't break if setting an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Run Java PreCommit |
@amaliujia can you please review the comments again? |
Thanks @sehrish-vd and @JawadHyder for working on this PR! Responded to comments! |
…added method for date and time only
Run Java PreCommit |
LGTM |
Run Java JdbcIO Performance Test |
Thanks @lgajowy! |
Added functionality to automatically generate JDBC insert statement and
preparedStatementSetter
when there is schema attached to thePCollection
. This functionality is triggered when the user has not providedwithStatement
andwithPreparedStatementSetter
duringJdbcIO.write()
operation.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.