[WIP] Dynamic backend #159

lantiga · 2019-07-16T07:16:04Z

This PR addresses #36 (and #47)

It introduces the AI.CONFIG command with a LOADBACKEND subcommand (the only one for now but there will likely be more).

Backend code in the src/backends/ directory are compiled to their individual .so libraries and not loaded when the AI module is loaded. Instead, the user loads backends on demand by specifying the backend type (TF, TORCH or ONNX) and the library for the backend (which could be pointing to any CPU, GPU version thereof, provided they were compiled by linking to the backends).

To avoid pollution, we allow only one backend to be loaded once per backend type. That is, you can't have two versions of TORCH loaded in the same instance at the same time. This is also in relation to what happens when CUDA kernels are loaded on the GPU. Any more sophisticated behavior will have to be tested. For the same reason, unloading hasn't been implemented yet.

Backend loading works as follows:

r = redis.Redis(host='localhost', port=6379)

ret = r.execute_command('AI.CONFIG', 'LOADBACKEND', 'TF', 'redisai_tensorflow.so')

model_filename = 'test/test_data/graph.pb'

with open(model_filename, 'rb') as f:
    model_pb = f.read()

ret = r.execute_command('AI.MODELSET', 'm', 'TF', 'CPU', 'INPUTS', 'a', 'b', 'OUTPUTS', 'mul', model_pb)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.MODELRUN', 'm', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')

r.execute_command('AI.TENSORSET', 'd', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
ret = r.execute_command('AI.TENSORGET', 'd', 'VALUES')
print(ret)

~~WIP~~
The PR is mostly done, but I get a segfault when trying to call RedisModule_Calloc from within the dynamically loaded backend (both on macOS and Linux). I need to create a minimal repro code (e.g. based on the hello world module examples in Redis) so that it's easier to understand what's going on for anyone who'd like to help.
UPDATE: solved, thanks @MeirShpilraien!

Tests haven't been updated yet (so they will fail).

…or the solution.

lantiga · 2019-07-17T17:50:30Z

Alright, tests pass! Only the packaging fails (see below)

So, right now the build process includes a make install phase that will result in the following layout:

install/
install/redisai.so
install/backends/
install/backends/redisai_tensorflow.so
install/backends/redisai_torch.so
install/backends/redisai_onnxruntime.so
install/backends/redisai_tensorflow
install/backends/redisai_tensorflow/lib
install/backends/redisai_torch
install/backends/redisai_torch/lib
install/backends/redisai_onnxruntime
install/backends/redisai_onnxruntime/lib

Everything is built with RPATH set both on macOS and Linux, so every backend will have its own dependencies in a known location. RPATH is set so that the libraries are fully relocatable.

Given this layout it's easy to introduce multiple versions for each backend. We just need to choose proper names for the backends .so and the respective directories. I'll do this in a subsequent PR.

As I was mentioning earlier, the only missing bit for this to be merged is updating the Makefile and fixing the packaging: @rafie I could really use your contribution here!

/cc @gkorland @K-Jo

lantiga · 2019-07-17T20:29:37Z

Backends can now be loaded through command-line arguments, eg

redis-server --loadmodule install/redisai.so TF install/backends/redisai_tensorflow.so TORCH install/backends/redisai_torch.so ONNX install/backends/redisai_onnxruntime.so

rafie · 2019-07-22T08:30:33Z

Regarding RPATH/RUNPATH:
The implemented scheme looks solid.
However, the $ORIGIN trick (in Linux) will only work if the RUNPATH dtag (rather than RPATH) is set.
There is a way of making CMake do that, but it seems to keep ignoring the supplied path and replacing it with whatever it thinks best.
Therefore, we have to fallback to setting RUNPATH via post-link or via after-cmake stop.
This, running something like:

patchelf --set-rpath '$ORIGIN' redisai_tensorflow.so

Similirly, some of the backends .so should be patched similarly.
As for OSX, I need to investigate it a little to verify its behavior.

rafie · 2019-07-22T08:39:59Z

Regarding the deployment scheme:
I see two options here.
First option is to have an independant backend release model.
In such case, the deployment scheme will look something like:

install/redisai-1.2.3/redisai.so

install/tensorflow-1.12.0-variant/redisai_tensorflow.so
install/tensorflow-1.12.0-variant/libtensorflow.so
install/tensorflow-1.12.0-variant/libtensorflow_framework.so

install/libtorch-1.11.0/redisai_torch.so
install/libtorch-1.11.0/redisai_tensorflow.so
install/libtorch-1.11.0/libc10.so
install/libtorch-1.11.0/libcaffe2.so
install/libtorch-1.11.0/libcaffe2_detectron_ops.so
install/libtorch-1.11.0/libcaffe2_module_test_dynamic.so
install/libtorch-1.11.0/libfoxi.so
install/libtorch-1.11.0/libfoxi_dummy.so
install/libtorch-1.11.0/libgomp-8bba0e50.so.1
install/libtorch-1.11.0/libonnxifi.so
install/libtorch-1.11.0/libonnxifi_dummy.so
install/libtorch-1.11.0/libtorch.so.1

install/onnxruntime-0.4.0/redisai_onnxruntime.so
install/onnxruntime-0.4.0/libonnxruntime.so.0.4.0

From this, it's possible to see how we run into unclear build/release process, not to mention backend packaging.

The other option, RedisAI-centric release model, we place all artifacts under one versioned directory:

install/redisai-1.2.3/*.so

This will greatly simplify release and deployment, making mix-and-match backends a little harder, but not impossible.

lantiga · 2019-07-22T08:53:10Z

Hi Rafi,
thank you for the feedback.

We can't put all .so in a single directory, because we need to be able to choose between alternative versions of, say, libtensorflow (CPU vs GPU, for instance).

I was thinking something in-between:

install/redisai-1.2.3/redisai.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/libtensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/libtensorflow_framework.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-gpu.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-gpu/libtensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-gpu/libtensorflow_framework.so

So every redisai version has its set of backends, which in turn are versioned across versions and variants.
We will maintain a strict policy where backend are compatible across patch versions (1.0.x), while we will re-release a minor version if the backend API changes (unfrequent but possible).

This layout would keep things tidy and expose just the files users are interested about (since they will need to LOADBACKEND the .so file).

rafie · 2019-07-23T05:50:40Z

Wouldn't the following be more concise?

install/redisai-1.2.3/redisai.so
install/redisai-1.2.3/redisai_tensorflow-1.12.0-cpu/redisai_tensorflow.so
install/redisai-1.2.3/redisai_tensorflow-1.12.0-cpu/libtensorflow.so
install/redisai-1.2.3/redisai_tensorflow-1.12.0-cpu/libtensorflow_framework.so

I just wanted all shared object of a given backend (of a particular version/variant, including their dependent libraries) to reside on the same directory, to minimize complexity of RUNPATH hacking.

I was also wondering if it will be possible to introduce the patchelf post-build steps after we integrate my PR with the ARM work and build system modifications.

lantiga · 2019-07-23T06:26:28Z

Hi @rafie, thank you.

It is fewer directories, but it would be more challenging for users to find the right backend .so file to load. I think the other way would facilitate pointing users to the right file.

By the way, with the libraries shipped by vendors (libtensorflow etc) I didn’t find a need to modify their rpath, only the rpath for the backend so (redisai_tensorflow.so).
It seems the libraries are already rpath-enabled and relocatable.

Tests pass in Linux, so I was wondering if the parchelf is a special need for the arm build or it’s general (sorry I’m not an expert in this).

Last: would it be possible for you to fix the packaging step, otherwise I won’t be able to merge.

Thank you for your help!

rafi · 2019-07-24T11:16:40Z

off-topic, rafi != rafie

lantiga · 2019-07-24T11:37:31Z

Ouch, sorry @rafi!

lantiga · 2019-07-28T13:14:45Z

So @rafie, this is my final proposal

install/redisai-1.2.3/redisai.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/redisai_tensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/lib/libtensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/lib/libtensorflow_framework.so

It's basically your last proposal but we place the library dependencies under lib, and we keep backends in a backends directory (which is easier to clean by wiping all its contents).

Are you ok with this?

rafie · 2019-07-29T08:08:29Z

We can modify CMakeLists.txt the following way to replace RPATH with RUNPATH:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 3f4c59c..b6d7606 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -19,6 +19,14 @@ SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fPIC")
 SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
 SET(CMAKE_C_STANDARD 99)
 
+FUNCTION(ADD_LDFLAGS _TARGET NEW_FLAGS)
+       GET_TARGET_PROPERTY(LD_FLAGS ${_TARGET} LINK_FLAGS)
+       IF(LD_FLAGS)
+               SET(NEW_FLAGS "${LD_FLAGS} ${NEW_FLAGS}")
+       ENDIF()
+       SET_TARGET_PROPERTIES(${_TARGET} PROPERTIES LINK_FLAGS ${NEW_FLAGS})
+ENDFUNCTION()
+
 FIND_LIBRARY(TF_LIBRARIES NAMES tensorflow libtensorflow.so
              PATHS ${depsAbs}/libtensorflow/lib)
 IF (NOT TF_LIBRARIES)
@@ -69,6 +77,7 @@ SET_TARGET_PROPERTIES(redisai_tensorflow PROPERTIES SUFFIX ".so")
 IF (APPLE)
   SET_TARGET_PROPERTIES(redisai_tensorflow PROPERTIES INSTALL_RPATH "@loader_path/lib")
 ELSE ()
+  ADD_LDFLAGS(redisai_tensorflow "-Wl,--enable-new-dtags")
   SET_TARGET_PROPERTIES(redisai_tensorflow PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
 ENDIF()
 INSTALL(TARGETS redisai_tensorflow LIBRARY DESTINATION backends/redisai_tensorflow)
@@ -81,6 +90,7 @@ SET_TARGET_PROPERTIES(redisai_torch PROPERTIES SUFFIX ".so")
 IF (APPLE)
   SET_TARGET_PROPERTIES(redisai_torch PROPERTIES INSTALL_RPATH "@loader_path/lib")
 ELSE ()
+  ADD_LDFLAGS(redisai_torch "-Wl,--enable-new-dtags")
   SET_TARGET_PROPERTIES(redisai_torch PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
 ENDIF()
 INSTALL(TARGETS redisai_torch LIBRARY DESTINATION backends/redisai_torch)
@@ -93,6 +103,7 @@ SET_TARGET_PROPERTIES(redisai_onnxruntime PROPERTIES SUFFIX ".so")
 IF (APPLE)
   SET_TARGET_PROPERTIES(redisai_onnxruntime PROPERTIES INSTALL_RPATH "@loader_path/lib")
 ELSE ()
+  ADD_LDFLAGS(redisai_onnxruntime "-Wl,--enable-new-dtags")
   SET_TARGET_PROPERTIES(redisai_onnxruntime PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
 ENDIF()
 INSTALL(TARGETS redisai_onnxruntime LIBRARY DESTINATION backends/redisai_onnxruntime)

lantiga · 2019-07-29T08:22:46Z

@rafie Done

@MeirShpilraien

[WIP] Dynamic backend (#159) * Cleanup backend headers * Move backends to libraries * Load backends dynamically * Add error handling * Make RAI_backends non-static * Block backends from being loaded if already loaded previously * Load missing symbols in backend wrappers. Thanks to @MeirShpilraien for the solution. * Add backend loaded logs * Reorganize backend layout and RPATH management * Update CI to reflect latest changes * Rename install directories * Update README and doc * Allow loading backends as command-line arguments to module * Update Dockerfiles * New directory layout * Fix ramp package * Fix path to artifact on CI * Copy recursively * Enable new-dtags for Linux build * Update documentation * Add LOADBACKEND to DEL tests

lantiga added 11 commits July 15, 2019 18:51

Cleanup backend headers

3aec0a2

Move backends to libraries

f844be8

Load backends dynamically

d1616dd

Add error handling

ed9d66c

Make RAI_backends non-static

c3a5ee1

Block backends from being loaded if already loaded previously

8b03cdd

Load missing symbols in backend wrappers. Thanks to @MeirShpilraien f…

d8fd98b

…or the solution.

Add backend loaded logs

e3a9c22

Reorganize backend layout and RPATH management

e94b0fc

Update CI to reflect latest changes

d92be58

Rename install directories

c84b6dd

lantiga added 2 commits July 17, 2019 22:05

Update README and doc

17ede5b

Allow loading backends as command-line arguments to module

78e55d1

Update Dockerfiles

efd50e7

lantiga requested a review from rafie July 18, 2019 07:03

rafie previously approved these changes Jul 23, 2019

View reviewed changes

New directory layout

a9ac9b2

lantiga dismissed rafie’s stale review via a9ac9b2 July 28, 2019 14:18

lantiga added 2 commits July 28, 2019 16:22

Fix ramp package

abe8edf

Fix path to artifact on CI

5ec7f9b

Copy recursively

039097a

Enable new-dtags for Linux build

883cfc7

rafie previously approved these changes Jul 29, 2019

View reviewed changes

Update documentation

04bfa4b

lantiga dismissed rafie’s stale review via 04bfa4b July 29, 2019 08:39

lantiga added 2 commits July 29, 2019 10:41

Merge branch 'master' into dynamic_backend

573536a

Add LOADBACKEND to DEL tests

ad74445

rafie approved these changes Jul 29, 2019

View reviewed changes

lantiga merged commit 25f0b73 into master Jul 29, 2019

This was referenced Jul 29, 2019

Add CONFIG command for configuring/loading backends #36

Closed

Refactor multi-backend support to use function pointers #47

Closed

gkorland deleted the dynamic_backend branch October 6, 2020 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Dynamic backend #159

[WIP] Dynamic backend #159

lantiga commented Jul 16, 2019 •

edited

Loading

lantiga commented Jul 17, 2019 •

edited

Loading

lantiga commented Jul 17, 2019

rafie commented Jul 22, 2019

rafie commented Jul 22, 2019

lantiga commented Jul 22, 2019

rafie commented Jul 23, 2019

lantiga commented Jul 23, 2019 •

edited

Loading

rafi commented Jul 24, 2019

lantiga commented Jul 24, 2019

lantiga commented Jul 28, 2019

rafie commented Jul 29, 2019

lantiga commented Jul 29, 2019

[WIP] Dynamic backend #159

[WIP] Dynamic backend #159

Conversation

lantiga commented Jul 16, 2019 • edited Loading

lantiga commented Jul 17, 2019 • edited Loading

lantiga commented Jul 17, 2019

rafie commented Jul 22, 2019

rafie commented Jul 22, 2019

lantiga commented Jul 22, 2019

rafie commented Jul 23, 2019

lantiga commented Jul 23, 2019 • edited Loading

rafi commented Jul 24, 2019

lantiga commented Jul 24, 2019

lantiga commented Jul 28, 2019

rafie commented Jul 29, 2019

lantiga commented Jul 29, 2019

lantiga commented Jul 16, 2019 •

edited

Loading

lantiga commented Jul 17, 2019 •

edited

Loading

lantiga commented Jul 23, 2019 •

edited

Loading