You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
parse_url was added in #4152. The native implementation (datafusion-spark's spark_parse_url) diverges from Spark on several edge cases, so CometParseUrl is marked Incompatible and falls back to Spark by default. Users opt into the native path with `spark.comet.expression.ParseUrl.allowIncompatible=true`.
This issue tracks the work to bring the native implementation closer to Spark and reduce the surface that requires opt-in.
Once Initial PR #1-copy over the script to enable pyspark as well #4 are resolved, demote `CometParseUrl.getSupportLevel` from `Incompatible` to `Compatible` and remove the `expect_fallback` queries in `spark/src/test/resources/sql-tests/expressions/url/parse_url.sql`.
Expand the native test (`parse_url_native.sql`) to cover the URL shapes currently in the fallback file (e.g. trailing-slash PATH, percent-encoded query value) so we have positive coverage once each divergence is fixed.
Consider extending `parse_url` test coverage to invalid URLs in non-ANSI mode (Spark returns NULL for everything).
Background
parse_urlwas added in #4152. The native implementation (datafusion-spark'sspark_parse_url) diverges from Spark on several edge cases, soCometParseUrlis markedIncompatibleand falls back to Spark by default. Users opt into the native path with `spark.comet.expression.ParseUrl.allowIncompatible=true`.This issue tracks the work to bring the native implementation closer to Spark and reduce the surface that requires opt-in.
Known divergences
Tracked upstream at apache/datafusion#21943.
The 2-arg `parse_url(_, 'QUERY')` does match Spark; only the 3-arg form with a key diverges.
Suggested work plan
Related