# 模型在线使用


在03-split-to-sub-graph.ipynb中保存的模型是`TensorFlow`标准的[SavedModel](https://www.tensorflow.org/guide/saved_model)格式，下面将在前文demo的基础上，继续介绍如何将离线的模型部署到线上提供服务。

## 1. 编译模型
- 使用`saved_model_cli aot_compile_cpu`编译模型
  
  编译的模型是SavedModel格式的文件夹，如下

In [6]:
!tree /tmp/wide-deep-test/model/tmp

/tmp/wide-deep-test/model/tmp
├── assets
├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00001
    └── variables.index

2 directories, 3 files


In [35]:
!/da1/s/yaolei/anaconda3/bin/saved_model_cli aot_compile_cpu \
                                 --dir /tmp/wide-deep-test/model/tmp  \
                                 --tag_set serve \
                                 --output_prefix /tmp/model/online_serving/graph/graph \
                                 --cpp_class Graph

2020-08-26 14:27:15.353717: I tensorflow/core/grappler/devices.cc:60] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)
2020-08-26 14:27:15.354013: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-26 14:27:15.366369: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2593325000 Hz
2020-08-26 14:27:15.368276: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f7194f85a90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-26 14:27:15.368314: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-26 14:27:15.480529: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] model_pruner failed: Invalid argument: Invalid input graph.
2020-08-26 14:27:15.485756: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:797] Optimization results

In [36]:
!tree /tmp/model/online_serving/graph/

/tmp/model/online_serving/graph/
├── graph.h
├── graph_makefile.inc
├── graph_metadata.o
└── graph.o

0 directories, 4 files


**tips**:

tensorflow 安装包需要在安装时添加`--define=with_xla_support=true`选项，否则编译模型时会报错，错误如下：

```bash
Traceback (most recent call last):
  File "/da1/s/yaolei/anaconda3/bin/saved_model_cli", line 8, in <module>
    sys.exit(main())
  File "/da1/s/yaolei/anaconda3/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1153, in main
    args.func(args)
  File "/da1/s/yaolei/anaconda3/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py", line 811, in aot_compile_cpu
    enable_multithreading=args.enable_multithreading)
  File "/da1/s/yaolei/anaconda3/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_aot_compile.py", line 258, in aot_compile_cpu_meta_graph_def
    raise _pywrap_tfcompile_import_error
ImportError: Unable to import _pywrap_tfcompile; you must build TensorFlow with XLA.  You may need to build tensorflow with flag --define=with_xla_support=true.  Original error: cannot import name '_pywrap_tfcompile' from 'tensorflow.python' (/da1/s/yaolei/anaconda3/lib/python3.7/site-packages/tensorflow/python/__init__.py)
```

### 2. 调用模型
- 编写graph.cc调用模型

  examples下面提供了脚本一键生成graph.cc

In [12]:
 !cd /da1/s/yaolei/tensornet/examples/online_serving/ && sh graph_cc_generator.sh

++ grep 'static constexpr size_t kNumArgs' /da1/s/yaolei/tensornet/examples/online_serving/graph.h
++ awk '-F;' '{print $1}'
++ awk '{print $NF}'
+ slot_num=8
+ python3 gen_graph.py 8
+ '[' 0 -ne 0 ']'


In [17]:
!cat /da1/s/yaolei/tensornet/examples/online_serving/graph.cc

// This file is MACHINE GENERATED! Do not edit.
// Source file is gen_graph.py

#include "examples/online_serving/graph.h"

#include <vector>

#define EIGEN_USE_THREADS
#define EIGEN_USE_CUSTOM_THREAD_POOL

#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"

extern "C" int Run(const std::vector<std::vector<std::vector<float>>>& input,
                   std::vector<float>& output) {
    Eigen::ThreadPool tp(std::thread::hardware_concurrency());
    Eigen::ThreadPoolDevice device(&tp, tp.NumThreads());
    Graph graph;
    graph.set_thread_pool(&device);

    std::vector<int> dim = {1, 8};
    for (size_t i = 0; i < input.size(); ++i) {
        if (input[i].size() != Graph::kNumArgs / 2) {
            std::cerr << "TFFeaValues size is wrong, expected " << Graph::kNumArgs / 2 << " but get" << input[i].size() << std::endl;
            return -1;
        }
        if (input[i][0].size() != dim[0] + dim[1]) {
            std::cerr << "embedding size is wr

### 3. 编译最终使用的动态库

- 编写bazel编译代码，**需要注意的是**，这里将aot编译的graph.o重命名为graph_c.o，避免在编译graph.cc时发生冲突。

In [18]:
!cat /da1/s/yaolei/tensornet/examples/BUILD

filegroup(
    name = "graph",
    srcs = [
        "online_serving/graph.cc",
        "online_serving/graph_c.o",
        "online_serving/graph.h",
    ],
)

cc_binary(
    name = "libmodel.so",
    srcs = [":graph"],
    deps = [
        "@org_tensorflow//tensorflow/compiler/tf2xla:xla_compiled_cpu_function",
        "@org_tensorflow//tensorflow/core:framework_lite",
        "@org_tensorflow//tensorflow/compiler/xla/service/cpu:runtime_conv2d",
        "@org_tensorflow//tensorflow/compiler/xla/service/cpu:runtime_key_value_sort",
        "@org_tensorflow//tensorflow/compiler/xla/service/cpu:runtime_matmul",
        "@org_tensorflow//tensorflow/compiler/xla/service/cpu:runtime_single_threaded_conv2d",
        "@org_tensorflow//tensorflow/compiler/xla/service/cpu:runtime_single_threaded_matmul",
        "@org_tensorflow//third_party/eigen3:eigen3",
    ],
    linkshared = 1,
    linkopts = ["-lpthread"],
    copts = ["-fPIC"],
)

cc_binary(
    name = "tf_se

In [15]:
!cd /da1/s/yaolei/tensornet && ./bazel build -c opt examples:libmodel.so

[32mLoading:[0m 
[1A[K[32mLoading:[0m 0 packages loaded
[1A[K[32mAnalyzing:[0m target //examples:libmodel.so (0 packages loaded, 0 targets configu\
red)
[1A[K[32mINFO: [0mAnalyzed target //examples:libmodel.so (0 packages loaded, 0 targets configured).

[1A[K[32mINFO: [0mFound 1 target...

[1A[K
[1A[K[32m[0 / 1][0m [Prepa] BazelWorkspaceStatusAction stable-status.txt
[1A[KTarget //examples:libmodel.so up-to-date:
[32m[1 / 1][0m checking cached actions
[1A[K  bazel-bin/examples/libmodel.so
[32m[1 / 1][0m checking cached actions
[1A[K[32mINFO: [0mElapsed time: 2.446s, Critical Path: 0.00s
[32m[1 / 1][0m checking cached actions
[1A[K[32mINFO: [0m0 processes.
[32m[1 / 1][0m checking cached actions
[1A[K[32mINFO:[0m Build completed successfully, 1 total action
[1A[K[32mINFO:[0m Build completed successfully, 1 total action
[0m

### 4. 线上调用libmodel.so进行预测
- 线上预测可以直接通过dlopen进行调用，`Run`函数对应graph.cc中定义的`Run`函数。具体的调用方式可以参考`/da1/s/yaolei/tensornet/examples/online_serving/main.cc`。

  **注意：**下面例子没有实现embedding lookup的功能，使用随机数代替真实的embedding。

In [22]:
!cat /da1/s/yaolei/tensornet/examples/online_serving/main.cc

#include <iostream>
#include <chrono>
#include <fstream>
#include <string>
#include <vector>
#include <map>
#include <boost/algorithm/string.hpp>
#include <dlfcn.h>

#include "random.h"

using namespace std::chrono;

typedef int (*RUN_FUNC)(const std::vector<std::vector<std::vector<float>>>&,
                        const std::vector<int>&,
                        std::vector<float>&);

const int k_batch_size = 32;

void InitWeight(int dim, std::vector<float>& weight) {
    auto& reng = tensornet::local_random_engine();                                                                                         
    auto distribution = std::normal_distribution<float>(0, 1 / sqrt(dim));

    for (int i = 0; i < dim; ++i) {
        weight.push_back(distribution(reng) * 0.001);
    }   
}

int combine_fea(std::vector<std::vector<float>> emb_feas, std::vector<float>& merged_feas) {
    if (emb_feas.size() % 2 != 0) {
        std::cerr << "combine_fea error." << std

In [21]:
!cd /da1/s/yaolei/tensornet && ./bazel build -c opt examples:tf_serving

[32mLoading:[0m 
[1A[K[32mLoading:[0m 0 packages loaded
[1A[K[32mAnalyzing:[0m target //examples:tf_serving (0 packages loaded, 0 targets configur\
ed)
[1A[K[1A[K[32mINFO: [0mAnalyzed target //examples:tf_serving (0 packages loaded, 0 targets configured).

[1A[K[32mINFO: [0mFound 1 target...

[1A[K[32m[0 / 1][0m [Prepa] BazelWorkspaceStatusAction stable-status.txt
[1A[KTarget //examples:tf_serving up-to-date:
[32m[1 / 1][0m checking cached actions
[1A[K  bazel-bin/examples/tf_serving
[32m[1 / 1][0m checking cached actions
[1A[K[32mINFO: [0mElapsed time: 0.141s, Critical Path: 0.00s
[32m[1 / 1][0m checking cached actions
[1A[K[32mINFO: [0m0 processes.
[32m[1 / 1][0m checking cached actions
[1A[K[32mINFO:[0m Build completed successfully, 1 total action
[1A[K[32mINFO:[0m Build completed successfully, 1 total action
[0m

- 准备运行环境，将编译的libmodel.so和tf_serving放到下面测试目录下

In [23]:
!tree /da1/s/yaolei/tensornet/examples/online_serving/test_env/

/da1/s/yaolei/tensornet/examples/online_serving/test_env/
├── data
│   ├── feature.data
│   ├── libmodel.so
│   └── slot.data
└── tf_serving

1 directory, 4 files


  slot.data中slot顺序需要和wide_deep.py中的WIDE_SLOTS和DEEP_SLOTS顺序一致

In [25]:
!cat /da1/s/yaolei/tensornet/examples/online_serving/test_env/data/slot.data

1,2,3,4


  feature.data数据按照main.cc中解析的格式构造即可，下面有些特殊分隔符显示不对。

In [28]:
!cat /da1/s/yaolei/tensornet/examples/online_serving/test_env/data/feature.data

0	1-1956697246319764053	2-5730244542641024933	3-9118175470622903910-8448113875518360108	4-2457261431940944054
0	1-1956697246319764053	2-1160342140770244045	3-9282460046502382416746402536336743089	48069461642963552018
0	1-1956697246319764053	2-7395780378584928338	3-81982606186909154355518680928552928316	41517509480232003656
0	1-1956697246319764053	25418959032072182947	3-73280572481107405052638487947984888231	4-3889211362256458670
0	1-1956697246319764053	2-1447814788092430700	313141364156924037955677372407420628171	48002349267150817951
0	1-1956697246319764053	2-3444864941673928379	343676846390070809168517447599233938262	47249151252390487352
0	1-1956697246319764053	2-25308817390645703	3-7048581416920648308415903991668524816	41517509480232003656
0	1-1956697246319764053	2-3444864941673928379	353805355444342988164526355427571578606	4-4517525648429849306
0	1-1956697246319764053	2-3444864941673928379	353805355444342988165299058237759487470	4

  执行`tf_serving`，输出预测结果。由于此demo没有使用真实的embedding，所以预测结果不可信。

In [29]:
!cd /da1/s/yaolei/tensornet/examples/online_serving/test_env/ && ./tf_serving

0.517147
0.517147
0.517104
0.517199
0.517208
0.517205
0.517218
0.517072
0.517067
