Skip to content

Conversation

@Alibaba-HZY
Copy link
Contributor

[common]

Purpose
Because Flink Expression and Spark v1 Filter is not serializable, so fluss introduces Predicate to do filter and partition push down.

Tests
com.alibaba.fluss.predicate.PredicateBuilderTest
com.alibaba.fluss.predicate.PredicateTest

API and Format
no

Documentation
no

/** A {@link CompoundPredicate.Function} to eval and. */
public class And extends CompoundPredicate.Function {

private static final long serialVersionUID = 1L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to use the built-in serialver tool in the JDK to generate a unique UID, rather than uniformly using the constant value 1, as this helps detect serialization and deserialization compatibility issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed with Jark, and we concluded that we should follow the Flink guide https://flink.apache.org/how-to-contribute/code-style-and-quality-java/#java-serialization, because serialver has strict compatibility checks that restrict developers' flexibility.

/** A reference to a field in an input. */
public class FieldRef implements Serializable {

private static final long serialVersionUID = 1L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously suggested, it is not advisable to uniformly use the constant value 1 for serialVersionUID.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and appended a commit to mention the mentioned comments and fix some license and code sytle problems.

Comment on lines 36 to 35
// String for compare
if (v1 instanceof BinaryString) {
v1 = ((BinaryString) v1).toString();
}
if (v2 instanceof BinaryString) {
v2 = ((BinaryString) v2).toString();
}
return ((Comparable<Object>) v1).compareTo(v2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryString supports comparable, we don't need to cast to String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because BinaryString can not serializable ,in predicate use string instead, so in this method need cast value(BinaryString) to string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alibaba-HZY If BinaryString is not serializable, the predicate should use String literals, and the comparison parameters should be of type String rather than BinaryString. That means we still don't need to cast BinaryString to String.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alibaba-HZY I created #1495 to fix BinaryString supports Java serialization. Then we can use BinaryString in Predicate and compare BinaryString directly.

* @see PredicateBuilder
* @since 0.4.0
*/
public interface Predicate extends Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we have already implemented the statistic-based test() method, we should also expose it on the Predicate, in order to have complete test on the implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can introduce the statistic-based test() method after ArrayType support PR merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@platinumhamburg I added the statistic-based test() using Long[]. We can change it to InternalArray when it is introduced, but it's just an optimization.

}

public static Predicate partitions(
List<Map<String, String>> partitions, RowType rowType, String defaultPartValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the defaultPartValue parameter, because Fluss doesn't support default values for partitions. Otherwise, it's hard to use this method in Fluss.

import static org.assertj.core.api.Assertions.assertThat;

/** Test for {@link Predicate}s. */
public class PredicateTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add statistic based tests on Predicate.

Copy link
Contributor

@platinumhamburg platinumhamburg Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statistic tests seems like based on PR of ArrayType support.

@wuchong
Copy link
Member

wuchong commented Aug 6, 2025

@Alibaba-HZY @platinumhamburg If you are good for the changes in 938e26b, I will merge this PR.

@Alibaba-HZY
Copy link
Contributor Author

@Alibaba-HZY @platinumhamburg If you are good for the changes in 938e26b, I will merge this PR.

lgtm , left a comment

@wuchong
Copy link
Member

wuchong commented Aug 7, 2025

Merging

@wuchong wuchong merged commit 74ab1f1 into apache:main Aug 7, 2025
3 checks passed
platinumhamburg pushed a commit to platinumhamburg/fluss that referenced this pull request Aug 11, 2025
polyzos pushed a commit to Alibaba-HZY/fluss that referenced this pull request Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce Fluss Predicate expressions

3 participants