Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Sep 16, 2025

Which issue does this PR close?


Rationale for this change

A test failure was observed when building the DataFusion Substrait integration with only the nested_expressions feature enabled in datafusion/substrait/Cargo.toml. The UnicodeFunctionPlanner was not registered when unicode_expressions was disabled, causing the SUBSTR / substring function to be unplannable during Substrait logical plan conversion. This produced the error shown in the test:

NotImplemented("Substring could not be planned by registered expr planner. Hint: enable the unicode_expressions")

This PR updates the dev-dependency for datafusion within datafusion/substrait/Cargo.toml to include the unicode_expressions feature so that the simple_scalar_function_substr test can run reliably and the Unicode function planner is available in test builds.

This change is limited to test/build configuration (dev-deps) and does not alter runtime behavior for downstream users. It ensures the test-suite accurately reflects the capabilities exercised by the tests and prevents false negatives during CI.


What changes are included in this PR?

  • Modify datafusion/substrait/Cargo.toml dev-dependency for datafusion to add the unicode_expressions feature alongside nested_expressions.

Before:

[dev-dependencies]
datafusion = { workspace = true, features = ["nested_expressions"] }

After:

[dev-dependencies]
datafusion = { workspace = true, features = ["nested_expressions", "unicode_expressions"] }

No source code changes are required; this is a test/build configuration change.


Are these changes tested?

  • Yes — the change is intended to fix existing unit/integration tests that exercise SUBSTR/substring planning under the Substrait converter. In particular, the simple_scalar_function_substr test which failed with the NotImplemented error should now run with the UnicodeFunctionPlanner registered and pass.

  • CI: this change is expected to make CI green for the Substrait test suite where the previous configuration caused a failure. Please run the repository test matrix locally or rely on CI to validate across all relevant feature combinations.

If maintainers prefer not to enable unicode_expressions in dev-deps, an alternative is to mark the test as conditionally skipped when unicode_expressions is not enabled or to add a fallback planning path for ASCII-only substring expressions. Both are noted below.


Are there any user-facing changes?

  • No. This change only affects development/test configuration and does not change public APIs or runtime behavior for consumers who do not enable unicode_expressions explicitly.

Test plan / How to validate

  1. Apply the patch (update datafusion/substrait/Cargo.toml).
  2. Run the Substrait tests, or run cargo test -p datafusion-substrait --test substrait_integration
  3. Confirm the simple_scalar_function_substr test passes and that no other tests regress.
  4. Observe CI (GitHub Actions) to ensure the change resolves the previously failing job(s).

Context

This issue was uncovered here

@kosiew
Copy link
Contributor Author

kosiew commented Sep 16, 2025

Thanks @adriangb for the review.

@kosiew kosiew merged commit c61ac33 into apache:main Sep 16, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Substring not planned under Substrait when unicode_expressions` feature is disabled

2 participants