Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workplan for OmnisciDB UDFs #92

Closed
11 of 14 tasks
pearu opened this issue Feb 10, 2020 · 5 comments
Closed
11 of 14 tasks

Workplan for OmnisciDB UDFs #92

pearu opened this issue Feb 10, 2020 · 5 comments
Assignees
Labels

Comments

@pearu
Copy link
Contributor

pearu commented Feb 10, 2020

Background

OmnisciDB UDF support allows users to define and register custom functions (UDFs) to OmnisciDB server so that these can be used withing SQL queries to the server. The UDFs implementation is given as an LLVM IR module string that is linked to the server SQL engine in runtime. OmnisciDB server defines Thrift API for managing the UDFs.

The RBC project provides a Python package rbc that allows users to define and register the UDFs using a Python function definition. The rbc package uses numba for LLVM IR module string generation and registers the UDFs to OmnisciDB server via its Thrift API.

Features

  • Retrieve device (CPU, GPU) information from the server required for LLVM IR generation.
  • Supported client platforms: Linux, Windows, MacOSX.
  • Packaging: conda-forge (https://github.com/conda-forge/rbc-feedstock/), pypi (https://pypi.org/project/rbc-project/)
  • Row-wise UDFs - functions taking scalar inputs and returning scalar values are applied to a database table row by row.
  • User-defined table functions (UDTF) - functions applied to database table columns and returning non-scalar outputs
    • prototype, uses pointers to columns and data sizes
    • design and implement a user-friendly UI, one array-like argument per table column
  • Array UDFs - functions taking array inputs and returning scalar values are applied to database table row by row.
  • Variable-length Array UDFs - functions taking array inputs and returning array values, WIP.
  • Geo UDFs - functions taking Geo type inputs
  • Generalized calls within UDFs to external C/C++ libraries (Stan, XGBoost, CUML, etc).
    • Basic idea: calling a function from an external library from a UDF definition boils down to allowing undefined symbols in the LLVM IR module that are resolved on the server-side. The server must be linked against the external library and provide this information to RBC via device information retrieval Thrift API.

Work-in-progress

Work planned

Possible breakers/blockers

@pearu
Copy link
Contributor Author

pearu commented Mar 18, 2020

RBC defines milestones:

@pearu
Copy link
Contributor Author

pearu commented Apr 2, 2020

@pearu
Copy link
Contributor Author

pearu commented Apr 2, 2020

Resolving numba/numba#4546 is on hold as we haven't seen problems with the issue yet.

@pearu
Copy link
Contributor Author

pearu commented Apr 2, 2020

@pearu
Copy link
Contributor Author

pearu commented Sep 21, 2020

Will close as obsolete:

  • Geo UDF - not planned
  • Generalized calls within UDFs to external C/C++ libraries - ongoing research
  • Run notebooks under RBC test-suite - low priority

@pearu pearu closed this as completed Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant