Make expression binding support a case sensitivity flag #82

xabriel · 2019-01-17T23:14:08Z

Iceberg's current implementation has column case sensitivity, which hinders usability, as most sql users expect case insensitivity by default. While a query like the following will succeed in other Spark Readers, it will fail on Iceberg:

SELECT COUNT(*)
FROM iceTable
WHERE year = 2017
  AND MONTH = 11 -- Notice how MONTH has different casing than other predicates
  AND day = 01

This will fail with a stack trace similar to:

com.google.common.util.concurrent.UncheckedExecutionException: com.netflix.iceberg.exceptions.ValidationException: Cannot find field 'MONTH' in struct: struct<...>
...

In this PR, we solve this by making iceberg-api case-insensitive when binding expressions.

Some further notes:

We could also solve this in iceberg-spark, however, that implies we would have to solve it in any other engine that supports iceberg ( presto, pig, etc ).
Enabling case insensitivity implies that a table can not have columns were LOWER(a) == LOWER(b). I have a TODO in the PR as maybe this change makes more sense under a feature flag.

rdblue · 2019-01-18T19:01:48Z

Thanks for catching this, @xabriel. Looks like a reasonable set of changes to me.

The only problem I have is that this should be configurable for systems that are case sensitive. If Iceberg should not impose case sensitivity on processing engines then it shouldn't impose case insensitivity either.

I think this needs that configuration before the binder behavior change can be committed. The binder should probably accept a case sensitivity boolean argument, and that should be passed through to make this work in Spark.

If you want, we could separate this into the changes here other than updating the behavior of Binder, then add the rest in a follow-up. Up to you how you want to move forward.

rdblue · 2019-01-18T19:02:35Z

I should also note that we could also update Spark to pass the correct case back to v2 sources. That's probably a good idea either way.

xabriel · 2019-01-19T00:00:30Z

Thanks for the feedback, @rdblue.

If Iceberg should not impose case sensitivity on processing engines then it shouldn't impose case insensitivity either.

Agreed.

I'll go ahead and update the PR to:

Update Binder#bind to accept a caseSensitive parameter. Update tests accordingly.
Not change default behavior.

As suggested, I'll submit a follow up PR for using the new functionality.

rdblue · 2019-01-22T18:34:15Z

Thanks for the update, @xabriel!

Adding a caseSensitive flag requires quite a few updates elsewhere, and method calls are less readable because of the boolean argument (we'll forget what true signals). What do you think about adding versions of these methods that don't have the flag and default to case sensitive?

The drawback is that it will be easy to not pay attention later and use case sensitive matching when it shouldn't be. But I think that would be okay. At least for tests, a package-private version that defaults to case sensitive would cut down on a lot of code changes.

What do you think?

xabriel · 2019-01-22T22:55:21Z

What do you think about adding versions of these methods that don't have the flag and default to case sensitive?

I do think that that users of Binder#bind should be forced to make a choice, otherwise iceberg will evolve, get new committers, and some future PR will use default behavior from Binder#bind(StructType struct, Expression expr), thus inadvertently not honoring the new configuration flag that would come in a follow up PR to this one.

At least for tests, a package-private version that defaults to case sensitive would cut down on a lot of code changes.

A package-private version its ok with me if you think it will make this PR easier to follow. No problem.

So to summarize:

We will have Binder#bind(StructType struct, Expression expr) as package-private, thus existing tests will change minimally.
Binder#bind(StructType struct, Expression expr, boolean caseSensitive) to be used by all non-test callers.

Agreed?

rdblue · 2019-01-23T00:17:54Z

@xabriel, sounds good to me. Thanks!

api/src/main/java/com/netflix/iceberg/expressions/Evaluator.java

rdblue · 2019-01-23T19:20:01Z

Merged. Thank you for contributing this, @xabriel!

This doesn't change default behavior. Configuring case sensitivity for processing engines will be added in future commits.

xabriel mentioned this pull request Jan 17, 2019

Make Iceberg support case insensitivity #83

Closed

Make expression binding support a case sensitivity flag. (apache#83)

606ab12

xabriel force-pushed the make-expression-binding-case-insensitive branch from 48407e3 to 606ab12 Compare January 19, 2019 02:31

xabriel changed the title ~~Make expression binding case insensitive~~ Make expression binding support a case sensitivity flag Jan 19, 2019

rdblue reviewed Jan 23, 2019

View reviewed changes

api/src/main/java/com/netflix/iceberg/expressions/Evaluator.java Show resolved Hide resolved

Allow existing tests to use a package-private Binder#bind. (apache#83)

1d18549

rdblue merged commit 022fe36 into apache:master Jan 23, 2019

xabriel mentioned this pull request Jan 30, 2019

Make read-path Evaluators honor case sensitivity flag. Expose flag in Spark Reader. #89

Merged

rdblue referenced this pull request in rdblue/iceberg Apr 10, 2019

Add case sensitivity flag to expression binding (#82)

1370245

This doesn't change default behavior. Configuring case sensitivity for processing engines will be added in future commits.

rdblue referenced this pull request in rdblue/iceberg May 14, 2019

Add case sensitivity flag to expression binding (#82)

ecc2525

This doesn't change default behavior. Configuring case sensitivity for processing engines will be added in future commits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make expression binding support a case sensitivity flag #82

Make expression binding support a case sensitivity flag #82

xabriel commented Jan 17, 2019

rdblue commented Jan 18, 2019

rdblue commented Jan 18, 2019

xabriel commented Jan 19, 2019

rdblue commented Jan 22, 2019

xabriel commented Jan 22, 2019

rdblue commented Jan 23, 2019

rdblue commented Jan 23, 2019

Make expression binding support a case sensitivity flag #82

Make expression binding support a case sensitivity flag #82

Conversation

xabriel commented Jan 17, 2019

rdblue commented Jan 18, 2019

rdblue commented Jan 18, 2019

xabriel commented Jan 19, 2019

rdblue commented Jan 22, 2019

xabriel commented Jan 22, 2019

rdblue commented Jan 23, 2019

rdblue commented Jan 23, 2019