Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Dynamic backend #159

Merged
merged 22 commits into from
Jul 29, 2019
Merged

[WIP] Dynamic backend #159

merged 22 commits into from
Jul 29, 2019

Conversation

lantiga
Copy link
Contributor

@lantiga lantiga commented Jul 16, 2019

This PR addresses #36 (and #47)

It introduces the AI.CONFIG command with a LOADBACKEND subcommand (the only one for now but there will likely be more).

Backend code in the src/backends/ directory are compiled to their individual .so libraries and not loaded when the AI module is loaded. Instead, the user loads backends on demand by specifying the backend type (TF, TORCH or ONNX) and the library for the backend (which could be pointing to any CPU, GPU version thereof, provided they were compiled by linking to the backends).

To avoid pollution, we allow only one backend to be loaded once per backend type. That is, you can't have two versions of TORCH loaded in the same instance at the same time. This is also in relation to what happens when CUDA kernels are loaded on the GPU. Any more sophisticated behavior will have to be tested. For the same reason, unloading hasn't been implemented yet.

Backend loading works as follows:

r = redis.Redis(host='localhost', port=6379)

ret = r.execute_command('AI.CONFIG', 'LOADBACKEND', 'TF', 'redisai_tensorflow.so')

model_filename = 'test/test_data/graph.pb'

with open(model_filename, 'rb') as f:
    model_pb = f.read()

ret = r.execute_command('AI.MODELSET', 'm', 'TF', 'CPU', 'INPUTS', 'a', 'b', 'OUTPUTS', 'mul', model_pb)

r.execute_command('AI.TENSORSET', 'a', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
r.execute_command('AI.TENSORSET', 'b', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)

r.execute_command('AI.MODELRUN', 'm', 'INPUTS', 'a', 'b', 'OUTPUTS', 'c')

r.execute_command('AI.TENSORSET', 'd', 'FLOAT', 2, 2, 'VALUES', 2, 3, 2, 3)
ret = r.execute_command('AI.TENSORGET', 'd', 'VALUES')
print(ret)

WIP
The PR is mostly done, but I get a segfault when trying to call RedisModule_Calloc from within the dynamically loaded backend (both on macOS and Linux). I need to create a minimal repro code (e.g. based on the hello world module examples in Redis) so that it's easier to understand what's going on for anyone who'd like to help.
UPDATE: solved, thanks @MeirShpilraien!

Tests haven't been updated yet (so they will fail).

@lantiga
Copy link
Contributor Author

lantiga commented Jul 17, 2019

Alright, tests pass! Only the packaging fails (see below)

So, right now the build process includes a make install phase that will result in the following layout:

install/
install/redisai.so
install/backends/
install/backends/redisai_tensorflow.so
install/backends/redisai_torch.so
install/backends/redisai_onnxruntime.so
install/backends/redisai_tensorflow
install/backends/redisai_tensorflow/lib
install/backends/redisai_torch
install/backends/redisai_torch/lib
install/backends/redisai_onnxruntime
install/backends/redisai_onnxruntime/lib

Everything is built with RPATH set both on macOS and Linux, so every backend will have its own dependencies in a known location. RPATH is set so that the libraries are fully relocatable.

Given this layout it's easy to introduce multiple versions for each backend. We just need to choose proper names for the backends .so and the respective directories. I'll do this in a subsequent PR.

As I was mentioning earlier, the only missing bit for this to be merged is updating the Makefile and fixing the packaging: @rafie I could really use your contribution here!

/cc @gkorland @K-Jo

@lantiga
Copy link
Contributor Author

lantiga commented Jul 17, 2019

Backends can now be loaded through command-line arguments, eg

redis-server --loadmodule install/redisai.so TF install/backends/redisai_tensorflow.so TORCH install/backends/redisai_torch.so ONNX install/backends/redisai_onnxruntime.so

@lantiga lantiga requested a review from rafie July 18, 2019 07:03
@rafie
Copy link
Contributor

rafie commented Jul 22, 2019

Regarding RPATH/RUNPATH:
The implemented scheme looks solid.
However, the $ORIGIN trick (in Linux) will only work if the RUNPATH dtag (rather than RPATH) is set.
There is a way of making CMake do that, but it seems to keep ignoring the supplied path and replacing it with whatever it thinks best.
Therefore, we have to fallback to setting RUNPATH via post-link or via after-cmake stop.
This, running something like:

patchelf --set-rpath '$ORIGIN' redisai_tensorflow.so

Similirly, some of the backends .so should be patched similarly.
As for OSX, I need to investigate it a little to verify its behavior.

@rafie
Copy link
Contributor

rafie commented Jul 22, 2019

Regarding the deployment scheme:
I see two options here.
First option is to have an independant backend release model.
In such case, the deployment scheme will look something like:

install/redisai-1.2.3/redisai.so

install/tensorflow-1.12.0-variant/redisai_tensorflow.so
install/tensorflow-1.12.0-variant/libtensorflow.so
install/tensorflow-1.12.0-variant/libtensorflow_framework.so

install/libtorch-1.11.0/redisai_torch.so
install/libtorch-1.11.0/redisai_tensorflow.so
install/libtorch-1.11.0/libc10.so
install/libtorch-1.11.0/libcaffe2.so
install/libtorch-1.11.0/libcaffe2_detectron_ops.so
install/libtorch-1.11.0/libcaffe2_module_test_dynamic.so
install/libtorch-1.11.0/libfoxi.so
install/libtorch-1.11.0/libfoxi_dummy.so
install/libtorch-1.11.0/libgomp-8bba0e50.so.1
install/libtorch-1.11.0/libonnxifi.so
install/libtorch-1.11.0/libonnxifi_dummy.so
install/libtorch-1.11.0/libtorch.so.1

install/onnxruntime-0.4.0/redisai_onnxruntime.so
install/onnxruntime-0.4.0/libonnxruntime.so.0.4.0

From this, it's possible to see how we run into unclear build/release process, not to mention backend packaging.

The other option, RedisAI-centric release model, we place all artifacts under one versioned directory:

install/redisai-1.2.3/*.so

This will greatly simplify release and deployment, making mix-and-match backends a little harder, but not impossible.

@lantiga
Copy link
Contributor Author

lantiga commented Jul 22, 2019

Hi Rafi,
thank you for the feedback.

We can't put all .so in a single directory, because we need to be able to choose between alternative versions of, say, libtensorflow (CPU vs GPU, for instance).

I was thinking something in-between:

install/redisai-1.2.3/redisai.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/libtensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/libtensorflow_framework.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-gpu.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-gpu/libtensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-gpu/libtensorflow_framework.so

So every redisai version has its set of backends, which in turn are versioned across versions and variants.
We will maintain a strict policy where backend are compatible across patch versions (1.0.x), while we will re-release a minor version if the backend API changes (unfrequent but possible).

This layout would keep things tidy and expose just the files users are interested about (since they will need to LOADBACKEND the .so file).

@rafie
Copy link
Contributor

rafie commented Jul 23, 2019

Wouldn't the following be more concise?

install/redisai-1.2.3/redisai.so
install/redisai-1.2.3/redisai_tensorflow-1.12.0-cpu/redisai_tensorflow.so
install/redisai-1.2.3/redisai_tensorflow-1.12.0-cpu/libtensorflow.so
install/redisai-1.2.3/redisai_tensorflow-1.12.0-cpu/libtensorflow_framework.so

I just wanted all shared object of a given backend (of a particular version/variant, including their dependent libraries) to reside on the same directory, to minimize complexity of RUNPATH hacking.

I was also wondering if it will be possible to introduce the patchelf post-build steps after we integrate my PR with the ARM work and build system modifications.

rafie
rafie previously approved these changes Jul 23, 2019
@lantiga
Copy link
Contributor Author

lantiga commented Jul 23, 2019

Hi @rafie, thank you.

It is fewer directories, but it would be more challenging for users to find the right backend .so file to load. I think the other way would facilitate pointing users to the right file.

By the way, with the libraries shipped by vendors (libtensorflow etc) I didn’t find a need to modify their rpath, only the rpath for the backend so (redisai_tensorflow.so).
It seems the libraries are already rpath-enabled and relocatable.

Tests pass in Linux, so I was wondering if the parchelf is a special need for the arm build or it’s general (sorry I’m not an expert in this).

Last: would it be possible for you to fix the packaging step, otherwise I won’t be able to merge.

Thank you for your help!

@rafi
Copy link

rafi commented Jul 24, 2019

off-topic, rafi != rafie

@lantiga
Copy link
Contributor Author

lantiga commented Jul 24, 2019

Ouch, sorry @rafi!

@lantiga
Copy link
Contributor Author

lantiga commented Jul 28, 2019

So @rafie, this is my final proposal

install/redisai-1.2.3/redisai.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/redisai_tensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/lib/libtensorflow.so
install/redisai-1.2.3/backends/redisai_tensorflow-1.12.0-cpu/lib/libtensorflow_framework.so

It's basically your last proposal but we place the library dependencies under lib, and we keep backends in a backends directory (which is easier to clean by wiping all its contents).

Are you ok with this?

@rafie
Copy link
Contributor

rafie commented Jul 29, 2019

We can modify CMakeLists.txt the following way to replace RPATH with RUNPATH:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 3f4c59c..b6d7606 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -19,6 +19,14 @@ SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fPIC")
 SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
 SET(CMAKE_C_STANDARD 99)
 
+FUNCTION(ADD_LDFLAGS _TARGET NEW_FLAGS)
+       GET_TARGET_PROPERTY(LD_FLAGS ${_TARGET} LINK_FLAGS)
+       IF(LD_FLAGS)
+               SET(NEW_FLAGS "${LD_FLAGS} ${NEW_FLAGS}")
+       ENDIF()
+       SET_TARGET_PROPERTIES(${_TARGET} PROPERTIES LINK_FLAGS ${NEW_FLAGS})
+ENDFUNCTION()
+
 FIND_LIBRARY(TF_LIBRARIES NAMES tensorflow libtensorflow.so
              PATHS ${depsAbs}/libtensorflow/lib)
 IF (NOT TF_LIBRARIES)
@@ -69,6 +77,7 @@ SET_TARGET_PROPERTIES(redisai_tensorflow PROPERTIES SUFFIX ".so")
 IF (APPLE)
   SET_TARGET_PROPERTIES(redisai_tensorflow PROPERTIES INSTALL_RPATH "@loader_path/lib")
 ELSE ()
+  ADD_LDFLAGS(redisai_tensorflow "-Wl,--enable-new-dtags")
   SET_TARGET_PROPERTIES(redisai_tensorflow PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
 ENDIF()
 INSTALL(TARGETS redisai_tensorflow LIBRARY DESTINATION backends/redisai_tensorflow)
@@ -81,6 +90,7 @@ SET_TARGET_PROPERTIES(redisai_torch PROPERTIES SUFFIX ".so")
 IF (APPLE)
   SET_TARGET_PROPERTIES(redisai_torch PROPERTIES INSTALL_RPATH "@loader_path/lib")
 ELSE ()
+  ADD_LDFLAGS(redisai_torch "-Wl,--enable-new-dtags")
   SET_TARGET_PROPERTIES(redisai_torch PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
 ENDIF()
 INSTALL(TARGETS redisai_torch LIBRARY DESTINATION backends/redisai_torch)
@@ -93,6 +103,7 @@ SET_TARGET_PROPERTIES(redisai_onnxruntime PROPERTIES SUFFIX ".so")
 IF (APPLE)
   SET_TARGET_PROPERTIES(redisai_onnxruntime PROPERTIES INSTALL_RPATH "@loader_path/lib")
 ELSE ()
+  ADD_LDFLAGS(redisai_onnxruntime "-Wl,--enable-new-dtags")
   SET_TARGET_PROPERTIES(redisai_onnxruntime PROPERTIES INSTALL_RPATH "\$ORIGIN/lib")
 ENDIF()
 INSTALL(TARGETS redisai_onnxruntime LIBRARY DESTINATION backends/redisai_onnxruntime)

@lantiga
Copy link
Contributor Author

lantiga commented Jul 29, 2019

@rafie Done

rafie
rafie previously approved these changes Jul 29, 2019
@lantiga lantiga merged commit 25f0b73 into master Jul 29, 2019
lantiga added a commit that referenced this pull request May 6, 2020
 
[WIP] Dynamic backend (#159)

* Cleanup backend headers

* Move backends to libraries

* Load backends dynamically

* Add error handling

* Make RAI_backends non-static

* Block backends from being loaded if already loaded previously

* Load missing symbols in backend wrappers. Thanks to @MeirShpilraien for the solution.

* Add backend loaded logs

* Reorganize backend layout and RPATH management

* Update CI to reflect latest changes

* Rename install directories

* Update README and doc

* Allow loading backends as command-line arguments to module

* Update Dockerfiles

* New directory layout

* Fix ramp package

* Fix path to artifact on CI

* Copy recursively

* Enable new-dtags for Linux build

* Update documentation

* Add LOADBACKEND to DEL tests
@gkorland gkorland deleted the dynamic_backend branch October 6, 2020 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants