-
Notifications
You must be signed in to change notification settings - Fork 470
[common]Introduce Predicate to do filter and partition push down. #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /** A {@link CompoundPredicate.Function} to eval and. */ | ||
| public class And extends CompoundPredicate.Function { | ||
|
|
||
| private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to use the built-in serialver tool in the JDK to generate a unique UID, rather than uniformly using the constant value 1, as this helps detect serialization and deserialization compatibility issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've discussed with Jark, and we concluded that we should follow the Flink guide https://flink.apache.org/how-to-contribute/code-style-and-quality-java/#java-serialization, because serialver has strict compatibility checks that restrict developers' flexibility.
| /** A reference to a field in an input. */ | ||
| public class FieldRef implements Serializable { | ||
|
|
||
| private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously suggested, it is not advisable to uniformly use the constant value 1 for serialVersionUID.
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased and appended a commit to mention the mentioned comments and fix some license and code sytle problems.
| // String for compare | ||
| if (v1 instanceof BinaryString) { | ||
| v1 = ((BinaryString) v1).toString(); | ||
| } | ||
| if (v2 instanceof BinaryString) { | ||
| v2 = ((BinaryString) v2).toString(); | ||
| } | ||
| return ((Comparable<Object>) v1).compareTo(v2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinaryString supports comparable, we don't need to cast to String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because BinaryString can not serializable ,in predicate use string instead, so in this method need cast value(BinaryString) to string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alibaba-HZY If BinaryString is not serializable, the predicate should use String literals, and the comparison parameters should be of type String rather than BinaryString. That means we still don't need to cast BinaryString to String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alibaba-HZY I created #1495 to fix BinaryString supports Java serialization. Then we can use BinaryString in Predicate and compare BinaryString directly.
| * @see PredicateBuilder | ||
| * @since 0.4.0 | ||
| */ | ||
| public interface Predicate extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we have already implemented the statistic-based test() method, we should also expose it on the Predicate, in order to have complete test on the implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can introduce the statistic-based test() method after ArrayType support PR merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@platinumhamburg I added the statistic-based test() using Long[]. We can change it to InternalArray when it is introduced, but it's just an optimization.
| } | ||
|
|
||
| public static Predicate partitions( | ||
| List<Map<String, String>> partitions, RowType rowType, String defaultPartValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the defaultPartValue parameter, because Fluss doesn't support default values for partitions. Otherwise, it's hard to use this method in Fluss.
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| /** Test for {@link Predicate}s. */ | ||
| public class PredicateTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add statistic based tests on Predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statistic tests seems like based on PR of ArrayType support.
|
@Alibaba-HZY @platinumhamburg If you are good for the changes in 938e26b, I will merge this PR. |
lgtm , left a comment |
|
Merging |
[common]
Purpose
Because Flink Expression and Spark v1 Filter is not serializable, so fluss introduces Predicate to do filter and partition push down.
Tests
com.alibaba.fluss.predicate.PredicateBuilderTest
com.alibaba.fluss.predicate.PredicateTest
API and Format
no
Documentation
no