update

ShengYang1 · Oct 20, 2020 · 41b51bc · yiqianglee · Oct 21, 2020 · yiqianglee
1 parent 5d4f70e
commit 41b51bc
Show file tree

Hide file tree

Showing 2 changed files with 366 additions and 0 deletions.
diff --git a/rfcs/20201020-pluggable-graph-optimizer-for-tensorflow.md b/rfcs/20201020-pluggable-graph-optimizer-for-tensorflow.md
@@ -0,0 +1,366 @@
+# Pluggable Graph Optimizer for TensorFlow
+
+| Status        | Proposed                                             |
+:-------------- |:---------------------------------------------------- |
+| **RFC #**     | [262(??? change this)](https://github.com/tensorflow/community/pull/262)|
+| **Author(s)** | Yang Sheng (yang.sheng@intel.com), Zhoulong Jiang (zhoulong.jiang@intel.com), Yiqiang Li (yiqiang.li@intel.com),  Eric Lin (eric.lin@intel.com), Jianhui Li (jian.hui.li@intel.com) |
+| **Sponsor**   | Rasmus Larsen (rmlarsen@google.com)                  |
+| **Updated**   | 2020-10-20                                           |
+
+## **Objective**
+
+TensorFlow currently provides a C++ API for registering a custom graph optimizer in Grappler. This project aims to create a modular/plugin-based TensorFlow implementation with C APIs. Plugins will be able to register custom graph optimizers. Users only need to install the plugin in a specified directory, and the mechanism is able to discover and plug in the capabilities offered by the plugin.
+
+This RFC is based on the Modular TensorFlow [RFC](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), which aims at extending the TensorFlow design to plug in capabilities like adding a new graph optimizer.
+
+## **Motivation**
+
+When extending TensorFlow to support a graph optimizer, one needs to inherit a new optimizer from `CustomGraphOptimizer`. However, there are no ABI-stable APIs provided. 
+Modular TensorFlow RFC designs a plugin architecture for several TensorFlow components(`Networking`, `Filesystems`, `Kernel`, `Graph` and `Accelerator backends`) through a stable ABI. This RFC describes the `Graph` module in the TensorFlow proper side, by introducing pluggable custom graph optimizer to the TensorFlow Grappler classes.
+The pluggable graph optimizer discovery and initialization is transparent to end users. As long as the graph plugin libraries follow the design described in this RFC, it can be plugged to TensorFlow proper and add a new graph optimizer into TensorFlow Grappler.
+
+The proposed C API in its current form will not be compatible with the new TensorFlow runtime ([TFRT](https://blog.tensorflow.org/2020/04/tfrt-new-tensorflow-runtime.html)) and graph compiler. 
+1. The nature and order of passes in the new compiler will differ significantly, probably allowing only limited reuse of algorithms or patterns from plugins developed for the existing runtime. 
+2. The graph compiler will not communicate with the plugin using a GraphDef based format, but some TBD format, likely based on serialized [MLIR](https://www.tensorflow.org/mlir).
+
+
+## **User Benefit**
+
+This RFC provides a plugin infrastructure for TensorFlow to optimize graph with new custom optimizers, as long as users set up the system properly installing the graph plugin.
+
+## **Design Proposal**
+
+### Design Overview
+
+This RFC is intended to provide a new mechanism for custom graph optimizers, along with C APIs for users to register and implement their own plugable graph optimizers in Grappler.
+The C APIs follows current implementation C++ API, `TF_Buffer*` and related proto files are the interface between proper and plugin. 
+When initializing, TensorFlow loads the plugin and registers a new graph optimizer into Grappler. In the [Optimize](https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/core/grappler/optimizers/graph_optimizer.h#L58) function, plugin authors need to deserialize `TF_Buffer` to `plugin::GraphDef` object to do some graph transformations, and serialize the optimized `plugin::GraphDef` object as output. 
+
+<p align="center">
+ <img src="20201020-pluggable-graph-optimizer-for-tensorflow/flow.png" height="400"/>
+</p> 
+
+### Graph Optimization function
+
+Graph optimization function is the main part that plugin authors need to implement. The C++ API is:
+```cpp 
+Status Optimize(Cluster* cluster, const GrapplerItem& item, GraphDef* optimized_graph);
+```
+
+The equivalent C API looks like below. Both input and output graphs are represented by serialized `TF_Buffer` objects:
+```cpp
+void MyOptimizer_Optimize(void* optimizer, TF_Buffer* graph_buf, TF_Buffer* optimized_graph_buf, TF_Status* s);
+```
+#### TF_Buffer and protobuf class
+
+Grappler uses `GraphDef` to represent a graph and operations. It is a C++ object and is generated by protobuf toolchain with a predefined structure in graph.proto. `TF_Buffer` is a C struct representing a pointer to a block of data and its associated length, thus it can be used as a serialized protobuf object across the C API.
+
+In plugin side, plugin authors should first desearize `TF_Buffer` into a `plugin::GraphDef` object, and then transform the graph. To successfully desearize the buffer to generate an exactly same object as proper side, plugin authors must keep a copy of `graph.proto` in plugin, along with all other proto files which help to build a graph. Here lists all files needed:
+- attr_value.proto
+- cost_graph.proto
+- function.proto
+- graph.proto
+- node_def.proto
+- op_def.proto
+- resource_handle.proto
+- tensor.proto
+- tensor_shape.proto
+- types.proto
+- versions.proto
+
+After optimizing, plugin authors need to serialize the optimized `GraphDef` object into `TF_Buffer` as output.
+Serialization function `MessageToBuffer` is already defined, proper only needs to add a desearization function `BufferToMessage`. In plugin side, plugin authors need to define `P_MessageToBuffer` and `P_BufferToMessage`, which should be the same as that defined in proper, except the namespace of protobuf.
+
+Proper:
+```cpp
+Status MessageToBuffer(const tensorflow::protobuf::MessageLite& in, TF_Buffer* out);
+Status BufferToMessage(const TF_Buffer* in, tensorflow::protobuf::MessageLite& out);
+```
+
+Plugin:
+```cpp
+Status P_MessageToBuffer(const plugin::protobuf::MessageLite& in, TF_Buffer* out);
+Status P_BufferToMessage(const TF_Buffer* in, plugin::protobuf::MessageLite& out);
+```
+
+### Supported User Scenarios
+
+This section describes user scenarios for plugin graph optimization.
+Plugin graph optimization is targeting backend device specific optimization, and only one optimizer is allowed to be registered per device type, so device type will be used as key to decide whether TensorFlow proper needs to run this optimizer by checking graph device type and registered device type. To simplify multiple optimizers coordination and avoid optimization conflict, multiple optimizers cannot register to the same device type. If more than one optimizers register to the same device type, these optimizers's initialization would fail due to registration conflict. Users need to manually select which optimizers they want to use by unloading the conflicting plugin.
+
+#### Selecting custom optimizers
+
+Front-end python users can check all registered custom optimizers through python api and choose which of them should be turned on:
+
+```python
+>> opts = tf.config.optimizer.list_custom_optimizers()
+>> print(opts)
+['gpu_my_optimizer', 'shuffle_and_repeat_fusion', ...]
+
+>> config = tf.compat.v1.ConfigProto()
+>> my_optimizer = config.graph_options.rewrite_options.custom_optimizers.add()
+>> my_optimizer.name = "gpu_my_optimizer"
+```
+
+#### Configuring existing optimizers
+
+If pluggable graph optimizer is registered to a device type, e.g., GPU, plugin authors need to decide whether some of GPU specific optimizations in proper should be turned on/off. When registering optimizers in plugin, plugin authors populate flags in `P_RegistrationParams`, this configuration is what plugin authors recommend, and users have their rights to apply this configuration or write their own.
+
+```cpp
+// Plugin:
+void InitGraphPlugin(P_RegistrationParams* params, TF_Status* status) {
+  // Plugin authors can turn on/off some optimizers
+  params.remapping = false;
+  params.auto_mixed_precision = true;
+  // ...
+}
+
+// Proper:
+// In meta_optimizer, proper uses additional plugin_remapping flag to decide if remapper should be turned on/off.
+if (cfg_.remapping() != RewriterConfig::OFF &&
+      CustomGraphOptimizerRegistry::plugin_remapping(device_type)) {
+  optimizers->push_back(MakeUnique<Remapper>(cfg_.remapping()));
+}
+```
+
+If this configuration is different from that defined in proper, e.g., remapper is turned on in proper by default but plugin decides to turn off it, a warning will be raised to remind users. Users can use plugin-recommended configuration in below way:
+```python
+>> tf.config.optimizer.set_plugin_optimizer_params("gpu_my_optimizer")
+```
+
+#### Execution order
+
+Custom optimizers will be registered at the end of Grappler’s meta-optimizer. Plugin authors should be aware of the restrictions of their pass running at this specific point in the execution pipeline.
+
+### Detailed C APIs
+
+#### Registration
+
+```cpp
+typedef struct P_RegistrationParams {
+  char* name;
+  char* device;
+  // some flags indicating whether existing optimizers should be turned on/off
+  bool remapping;
+  bool auto_mixed_precision;
+} P_RegistrationParams;
+
+void InitGraphPlugin(P_RegistrationParams* params, TF_Status* status);
+
+// Struct for Optimizer builder. Plugin authors must provide an optimize function.
+// Creation and deletion functions are optional.
+typedef struct TF_OptimizerBuilder {
+  void* (*create_func)(),
+  void (*optimize_func)(void*, TF_Buffer*, TF_Buffer*),
+  void (*delete_func)(void*)
+} TF_OptimizerBuilder;
+
+// Optimizer registration API
+void TF_RegisterOptimizer(TF_OptimizerBuilder* builder, P_RegistrationParams* params, TF_Status* status);
+```
+
+#### Graph optimization
+```cpp
+// TF_GrapplerItem represents a combination of a graph, one of more fetch nodes, and potentially a set of nodes to feed.
+typedef struct TF_GrapplerItem TF_GrapplerItem;
+ 
+// Get GrapplerItem from TF_Buffer.
+TF_GrapplerItem* TF_GetGrapplerItem(TF_Buffer* buffer);
+ 
+// Get a set of node names that must be preserved. This includes feed and
+// fetch nodes, keep_ops, init_ops.
+void TF_GetNodesToPreserve(TF_GrapplerItem* item, void** val, int* length, int max_size);
+ 
+// Get node names for fetch nodes.
+void TF_GetFetch(TF_GrapplerItem* item, void** val, int* length, int max_size);
+ 
+// TF_GraphProperties is used to infer tensor properties from a graph, e.g. tensor shape.
+typedef struct TF_GraphProperties TF_GraphProperties;
+ 
+TF_GraphProperties* TF_NewGraphProperties(TF_GrapplerItem* item);
+void TF_DeleteGraphProperties(TF_GraphProperties* p);
+ 
+// Infer tensor shapes of each node.
+void TF_InferStatically(TF_GraphProperties* g_prop,
+                        bool assume_valid_feeds,
+                        bool aggressive_shape_inference,
+                        bool include_input_tensor_values,
+                        bool include_output_tensor_values,
+                        TF_Status* s);
+ 
+// Get a list of input properties, including data types and shapes.
+void TF_GetInputPropertiesSize(TF_GraphProperties* g_prop, const char* name, int* max_size);
+void TF_GetInputProperties(TF_GraphProperties* g_prop, const char* name, TF_Buffer** t_prop, int max_size);
+
+// TF_FunctionLibraryDefinition is used to look up an OpDef by type name.
+typedef struct TF_FunctionLibraryDefinition TF_FunctionLibraryDefinition;
+ 
+TF_FunctionLibraryDefinition* TF_NewFunctionLibraryDefinition(TF_Buffer* graph_buf);
+void TF_DeleteFunctionLibraryDefinition(TF_FunctionLibraryDefinition* f_lib);
+ 
+// Shorthand for calling LookUp to get the OpDef.
+void TF_LookUpOpDef(TF_FunctionLibraryDefinition* f_lib, const char* name, TF_Buffer* buf, TF_Status* s);
+```
+
+TensorFlow proper provides a series of functions to help modify graphs more conveniently in Grappler utils folder. Since creating C APIs for these functions is very messy, they would not be included in C APIs. Plugin authors can manually copy this part into plugin side, or they can write their own util functions.
+
+### Usage example
+#### plugin
+Define create, optimize, delete function for custom optimizer.
+
+```cpp
+typedef struct MyOptimizer { ... } MyOptimizer;
+
+// Plugin authors must provide an optimize function. Creation and deletion functions are optional.
+static void* MyOptimizer_Create() {
+  auto* optimizer = new MyOptimizer;
+  return (void*)optimizer;
+}
+static void MyOptimizer_Delete(void* optimizer) {
+  delete static_cast<MyOptimizer*>(optimizer);
+}
+static void MyOptimizer_Optimize(
+    void* optimizer, TF_Buffer* graph_buf,
+    TF_Buffer* optimized_graph_buf, TF_Status* s) {
+  // 1. Get TF_GrapplerItem from graph_buf
+  TF_GrapplerItem* item = = TF_GetGrapplerItem(graph_buf);
+
+  // 2. Deserialize graph_buf into plugin::GraphDef
+  plugin::GraphDef graph_def;
+  P_BufferToMessage(graph_buf, graph_def);
+
+  // 3. Infer shapes
+  TF_GraphProperties g_prop = TF_NewGraphProperties(item);
+  TF_InferStatically(g_prop, false, false, false, false);
+  int max_size;
+  TF_GetInputPropertiesSize(g_prop, "node1", &max_size);
+  TF_Buffer** in_prop_buf;
+  for (int i = 0; i < max_size; i++) {
+    in_prop_buf[i] = TF_NewBuffer();
+  }
+  TF_GetInputProperties(g_prop, "node1", in_prop_buf, &max_size);
+  plugin::OpInfo::TensorProperties in_prop;
+  P_BufferToMessage(in_prop_buf, in_prop);
+  for (int i = 0; i < max_size; i++)
+    TF_DeleteBuffer(in_prop_buf[i]);
+
+  // 4. Get OpDef
+  TF_FunctionLibraryDefinition* f_lib = TF_NewFunctionLibraryDefinition(graph_buf);
+  plugin::OpDef op_def;
+  TF_Buffer* op_buf = TF_NewBuffer();
+  plugin::NodeDef node_def;
+  TF_LookUpOpDef(f_lib, node_def.name(), op_buf);
+  P_BufferToMessage(op_buf, op_def);
+  TF_DeleteBuffer(op_buf);
+  DataType dt = op_def.input_arg(0).type();
+
+  // 5. Transform graph
+
+  // 6. Serialize output graph into buffer.
+  P_MessageToBuffer(graph_def, optimized_graph_buf);
+  TF_DeleteGraphProperties(g_prop);
+  TF_DeleteFunctionLibraryDefinition(f_lib);
+  TF_DeleteBuffer(graph_buf);
+}
+```
+
+Define `InitGraphPlugin` that TensorFlow will call when registering the plugin:
+
+```cpp
+void InitGraphPlugin(P_RegistrationParams* params, TF_Status* status) {
+  params.name = "my_optimizer";
+  params.device = "GPU";
+  // Define some flags indicating whether existing optimizers should be turned on/off
+  params.remapping = false;
+  params.auto_mixed_precision = true;
+ 
+  // Create a new builder and register it with TF_RegisterOptimizer
+  TF_OptimizerBuilder* builder = 
+    TF_OptimizerBuilder(&MyOptimizer_Create, &MyOptimizer_Optimize, &MyOptimizer_Delete);
+  TF_RegisterOptimizer(builder, params, status);
+  if (TF_GetCode(status) != TF_OK) { /* handle errors */ }
+}
+```
+
+#### TensorFlow proper
+
+During the plugin library initialization, TensorFlow proper calls `InitGraphPlugin` API (part of Graph C API). It is defined in plugin and plugin authors need to implement it to register a new custom graph optimizer.
+
+```cpp
+static Status InitGraphModule(void* dso_handle) {
+  void* dso_symbol;
+  tensorflow::Env* env = tensorflow::Env::Default();
+  env->GetSymbolFromLibrary(dso_handle, "InitGraphPlugin", &dso_symbol).IgnoreError();
+
+  using InitGraphPlugin = void(*)(P_RegistrationParams*, TF_Status*);
+  auto init_plugin_fn = reinterpret_cast<InitGraphPlugin>(dso_symbol);
+
+  P_RegistrationParams params;
+  TF_Status* status = TF_NewStatus();
+  init_plugin_fn(&params, status);
+}
+```
+
+`TF_RegisterOptimizer` C API relies on `CCustomGraphOptimizer`, which might look as follows:
+```cpp
+class CCustomGraphOptimizer : public CustomGraphOptimizer {
+ public:
+  explicit CCustomGraphOptimizer(
+      const char* device,
+      void* (*create_func)(),
+      void (*delete_func)(void*)
+      void (*optimize_func)(void*, TF_Buffer*, TF_Buffer*, TF_Status* s))
+    : optimize_func_(optimize_func), delete_func_(delete_func) {
+    if (create_func != nullptr) {
+      c_optimizer_ = (*create_func)();
+    } else {
+      c_optimizer_ = nullptr;
+    }
+  }
+  Status Optimize(Cluster* cluster, const GrapplerItem& item, GraphDef* optimized_graph_def) override {
+    // Call C optimize_func
+  }
+ private:
+  void (*optimize_func_)(void*, TF_Buffer*, TF_Buffer*);
+  void (*delete_func_)(void*);
+  void* c_optimizer_;
+}
+```
+
+### Testing
+
+Minor TensorFlow releases may break graph optimizations in plugins, since op versions and graph patterns used to implement a particular public TensorFlow python API are not covered by TensorFlow’s compatibility guarantees. Therefore, plugin authors have to test both end to end python tests and golden graphs to ensure that their optimizations work as expected.
+
+### **Alternatives Considered** 
+
+### **Performance Implications**
+
+The roundtrip serialization of protobuf objects is a performance risk, but it should be acceptable since it is only done once per session or concrete function instantiation.
+
+### **Dependencies**
+
+* It depends on third-party library [ProtoBuf](https://developers.google.com/protocol-buffers/)
+
+* It depends on a series of proto files defined in TensorFlow. Plugin authors must keep a copy of those files in plugin.
+
+* It depends on Modular TensorFlow [RFC](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md)
+
+### **Engineering Impact**
+
+* The impact to binary size / startup time / build time / test times are minimum. 
+
+* The TensorFlow team will maintain this code. Graph C API will be packaged along with other C APIs that TensorFlow currently has.
+
+### **Platforms and Environments**
+
+* The pluggable graph mechanism is based on `LoadLibrary()` so it should work on all the platforms supported by `LoadLibrary`. The other enhancement to TensorFlow proper is platform independent.
+
+### **Best Practices**
+
+* This works with Modular TensorFlow which will be the only way to integrate new custome graph optimizers to the current TensorFlow stack.
+
+### **Compatibility**
+
+* The RFC promotes the current TensorFlow ecosystem as it supports plugging new graph optimizers to TensorFlow.
+
+* We don't expect this proposal to impact other parts of the TensorFlow ecosystem. It doesn't support TFLite and the new TensorFlow runtime(TFRT). It should not impede distribution strategies and would not interact with tf.function and SaveModel.
diff --git a/rfcs/20201020-pluggable-graph-optimizer-for-tensorflow/flow.png b/rfcs/20201020-pluggable-graph-optimizer-for-tensorflow/flow.png