Skip to content

Document how to use rust UDF extensions of datafusion-python #792

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Frequently when we are working on a set of new features or data analytics we find recurring patterns in our code. We will typically abstract these into a python package to share across the team. There are times where we must use UDFs to process some data that are non-performant since we have to convert the data to python objects when the existing pyarrow operations are insufficient for our needs.

I would like to be able to take these UDFs and author them in rust and use these user defined functions in python. Currently there is a blocker in that data cannot be shared from one rust crate to another. PyO3/pyo3#1444 is a tracking issue that discusses this in more detail.

Describe the solution you'd like

It appears at first glance that PyCapsule approach is the best way forward based on examples in pyo3_arrow and rust-numpy.

I have a minimal example that I will try to put up later this week into a repo to demonstrate what I would like to accomplish. Basically, I should be able to define a UDF in rust so that I can make it highly performant, add a pyo3 wrapper function around it, and use this python module alongside the existing datafusion-python.

Describe alternatives you've considered

I'm uncertain at this point about other options besides a very painful reexport of the entire datafusion python module.

Additional context

I will try to push up my minimal demonstration repo later this week.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions