Skip to content

Expose Spark functions #1482

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In the upstream repository we have functions that have been built for Spark compatibility. If we make these functions available in datafusion-python then we will both expand the usability of our project and also make it easier for new users to convert their projects from spark -> datafusion-python workflows.

Describe the solution you'd like

  • Evaluate all of the features that are in the upstream repository and create a list of what features we wish to expose. I have not spent a lot of time reviewing the upstream spark code so whoever takes ownership of this issue should spend time understanding all that is available and in progress.
  • Create a method to register all the spark functions with the session context.
  • Expose all of the functions via the DataFrame API.
  • Add test coverage
  • Add documentation for the user site about how to enable the spark functions and a description of why they are not enabled by default.

Describe alternatives you've considered

We could release these as a separate Python package.

Additional context

Some of these functions replace the default functions and some are unique. For users of the DataFrame API this is likely not a problem because we can put them under a spark module, such as datafusion.spark.functions or datafusion.functions.spark. The big difference will be for those who use the SQL approach. We will probably want a function that updates the session context to register the spark functions on top of the default functions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions