Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 1, 2015

Now trait StringComparison is a BinaryExpression. In fact, it should be a BinaryPredicate.

By making StringComparison as BinaryPredicate, we can throw error when a expressions.Predicate can't translate to a data source Filter in function selectFilters.

Without this modification, because we will wrap a Filter outside the scanned results in pruneFilterProjectRaw, we can't detect about something is wrong in translating predicates to filters in selectFilters.

The unit test of #5285 demonstrates such problem. In that pr, even expressions.Contains is not properly translated to sources.StringContains, the filtering is still performed by the Filter and so the test passes.

Of course, by doing this modification, all expressions.Predicate classes need to have its data source Filter correspondingly.

There is a small bug in FilteredScanSuite for doing StringEndsWith filter. This pr also fixes it.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29540 has finished for PR 5309 at commit caf2347.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class UDFRegistration(object):
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29543 has finished for PR 5309 at commit 275a493.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's right to throw exception here, just in case people implement its own expression by inheriting from the Predicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. It is reasonable to have predicates that can't be pushed down for various reasons. Another issue is this check could often be a runtime error instead of some static compile time check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I am also not very sure about this modification. But seems there is no other proper approach to check possible error in this place.

I will revert this part first.

@chenghao-intel
Copy link
Contributor

I am not so strong to support that mixin the type StringComparison with BinaryPredicate, as the predicate kind of like the logical operators, but StringComparison is more specific for string operators. What do you think?

For the unit test part of change, can we just keep adding more unit testing, instead of replacing the existed ones? Particularly for those sqlTests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this dataType is technically redundant now.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 2, 2015

I don't feel too strongly about BinaryPredicate, but it seems reasonable to me. Thanks for adding these tests!

@viirya
Copy link
Member Author

viirya commented Apr 3, 2015

@chenghao-intel as the comment of StringComparison is:

A base trait for functions that compare two strings, returning a boolean

Because it compares two strings and returns a boolean value, it is more like a BinaryPredicate other than just a BinaryExpression.

For the unit tests, the original unit tests are wrong. E.g., the test "SELECT a, b, c FROM oneToTenFiltered WHERE c like '%d'" should be used to test StringEndsWith, but %d should be d% here.

The original test data for column c is some strings like aaaaaaaa, bbbbbbbb, etc. So %d and d% both works to get the expected row. It is modified to something like aaaaAAAA for differentiating the tests of StringStartsWith, StringEndsWith and StringContains.

@viirya viirya changed the title [SPARK-6647][SQL] Make trait StringComparison as BinaryPredicate and throw error when Predicate can't translate to data source Filter [SPARK-6647][SQL] Make trait StringComparison as BinaryPredicate and fix unit tests of string data source Filter Apr 3, 2015
@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29678 has finished for PR 5309 at commit b176385.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 3, 2015

Thanks! Merged to master.

@asfgit asfgit closed this in 26b415e Apr 3, 2015
@viirya viirya deleted the translate_predicate branch December 27, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants