Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Spark-compatible cast / try_cast operations #286

Open
15 of 37 tasks
andygrove opened this issue Apr 18, 2024 · 6 comments
Open
15 of 37 tasks

[EPIC] Spark-compatible cast / try_cast operations #286

andygrove opened this issue Apr 18, 2024 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@andygrove
Copy link
Member

andygrove commented Apr 18, 2024

What is the problem the feature request solves?

Comet currently delegates to DataFusion for many cast operations, and the behavior is not guaranteed to match Spark. This epic is to track fully implementing Spark-compatible cast and try_cast operations in Comet, with support for ANSI mode.

For each item in this list to be considered complete, we should have scala tests demonstrating that cast and try_cast produce the same results as Spark, both with ANSI mode enabled and disabled, using fuzz testing to find edge cases. We can update this list with links to issues as we make progress.

For cast operations that we cannot easily support with full compatibility, we should either fall back to Spark or provide a configuration that the user can enable to allow the operation to run in Comet. We should also provide documentation explaining any differences in behavior compared to Spark.

In addition to the above tasks, we also need to do the following:

@andygrove andygrove added the enhancement New feature or request label Apr 18, 2024
@viirya viirya added the help wanted Extra attention is needed label Apr 18, 2024
@edmondop
Copy link
Contributor

@andygrove do we have already a framework for fuzz testing in Scala (i.e. ScalaCheck?) Should anyone wait until you are done with the first ones so you establish a pattern?

@andygrove
Copy link
Member Author

@andygrove do we have already a framework for fuzz testing in Scala (i.e. ScalaCheck?) Should anyone wait until you are done with the first ones so you establish a pattern?

There is a CometCastSuite and I am working on a PR right now to improve this and I am also implementing cast string to boolean (an easy one) so that there is an example for others to learn from. I should have a draft PR up on Monday.

@andygrove
Copy link
Member Author

I am now working on cast string -> integral types. I will have a PR up later this week.

@edmondop
Copy link
Contributor

@andygrove can I take care of this " Implement a mechanism where we can selectively fall back to Spark for specific cast operations" ? I was looking at the top of the list, but everything was quickly taken

@andygrove
Copy link
Member Author

@edmondop feel free to pick up other items on the list that don't have issues yet (I will start filing more!)

I added an example of falling back top Spark for cast string to timestamp in #337 so we do have an approach. I will update that item in this epic.

@andygrove
Copy link
Member Author

@edmondop also just added #350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants