-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Spark In filter to iceberg IN Expression #749
Conversation
@rdblue can you help to review it? Thanks! |
spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
Outdated
Show resolved
Hide resolved
@@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, PartitionSpec spec, String udf, | |||
|
|||
private List<Record> testRecords(org.apache.avro.Schema avroSchema) { | |||
return Lists.newArrayList( | |||
record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), "junction"), | |||
record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), "junction"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it necessary to change these values? This doesn't change the hour partitions the values are stored in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue It is because java.sql.Timestamp
constructor uses a milliseconds time value.
There is a deprecated java.sql.Timestamp
constructor to use year, month, date, hour, minute, second, and nano. But we also need take care of timezone issue (java timestamp is always UTC).
So to avoid using deprecated method and make the test straightforward, I just update two records to be millisecond scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't use the constructor that only supports milliseconds, so these should be precise to microseconds. But the test only uses Timestamp to create a filter that gets converted to a partition filter, so the timestamps used to create the Spark filter and these timestamps shouldn't need to match. Doesn't the test pass if these are unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test fails because those partitions picked in the tests have only one value (equals the lower and higher bound) so the Timestamp must exactly match.
To avoid changing those values, I will update the test to use the partition of 2017-12-21T15
, which contains two records. So any Timestamp between them will match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jun-he! I think that's a better solution to the problem.
Issue #748