# 模型在线使用


在[03-split-to-sub-graph.ipynb](./03-split-to-sub-graph.ipynb)中保存的模型是TensorFlow标准的[SavedModel](https://www.tensorflow.org/guide/saved_model)格式，下面将在前文demo的基础上，继续介绍如何将离线的模型部署到线上提供服务。

## 1. 编译TensorFlow以支持XLA

使用pip install的TensorFlow默认没有打开XLA，要使用XLA的功能需要自行通过源码编译TensorFlow并安装。编译命令如下：

    bazel build --config=opt --define=with_xla_support=true //tensorflow/tools/pip_package:build_pip_package

## 2. 编译模型

这里使用的模型是在[03-split-to-sub-graph.ipynb](./03-split-to-sub-graph.ipynb)中保存的模型，如下

In [1]:
MODEL_DIR = '/tmp/wide-deep-test/model'

In [2]:
!tree $MODEL_DIR/saved_model

/tmp/wide-deep-test/model/saved_model
├── assets
├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00001
    └── variables.index

2 directories, 3 files


安装TensorFlow包时会默认安装`saved_model_cli`到anaconda的bin目录里面，请将anaconda的bin目录放到你的`PATH`环境变量里面以便可以找到这个命令。

In [3]:
TENSORNET_SOURCE_CODE_DIR='/da2/zhangyansheng/tensornet' # 请在此处更改tensornet的源码位置
GRAPH_HEADER_OUTPUT_DIR=TENSORNET_SOURCE_CODE_DIR + '/' + 'examples/online_serving' 

In [5]:
!saved_model_cli aot_compile_cpu \
             --dir $MODEL_DIR/saved_model  \
             --tag_set serve \
             --output_prefix $GRAPH_HEADER_OUTPUT_DIR/graph \
             --cpp_class Graph

2020-09-08 10:11:23.691866: I tensorflow/core/grappler/devices.cc:60] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)
2020-09-08 10:11:23.692053: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-09-08 10:11:23.702353: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2593325000 Hz
2020-09-08 10:11:23.704053: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f38338cfa30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-08 10:11:23.704083: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-08 10:11:23.783678: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] model_pruner failed: Invalid argument: Invalid input graph.
2020-09-08 10:11:23.788314: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:797] Optimization results

In [6]:
!tree $GRAPH_HEADER_OUTPUT_DIR

/da2/zhangyansheng/tensornet/examples/online_serving
├── BUILD
├── graph.cc
├── graph.h
├── graph_makefile.inc
├── graph_metadata.o
├── graph.o
├── main.cc
├── random.h
└── test_env
    └── data
        ├── feature.data
        └── slot.data

2 directories, 10 files


## 3. 调用模型

编写graph.cc调用模型，在源码`examples/online_serving`目录下我们已经编写好了一个例子，可以参考使用。

In [7]:
!cat $GRAPH_HEADER_OUTPUT_DIR/graph.cc


#include "examples/online_serving/graph.h"

#include <vector>

#define EIGEN_USE_THREADS
#define EIGEN_USE_CUSTOM_THREAD_POOL

#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"

extern "C" int Run(const std::vector<std::vector<std::vector<float>>>& input,
                   std::vector<float>& output) {
    Eigen::ThreadPool tp(std::thread::hardware_concurrency());
    Eigen::ThreadPoolDevice device(&tp, tp.NumThreads());
    Graph graph;
    graph.set_thread_pool(&device);

    std::vector<int> dim = {1, 8};
    for (size_t i = 0; i < input.size(); ++i) {
        if (input[i].size() != Graph::kNumArgs / 2) {
            std::cerr << "TFFeaValues size is wrong, expected " << Graph::kNumArgs / 2 << " but get" << input[i].size() << std::endl;
            return -1;
        }
        if ((int)input[i][0].size() != dim[0] + dim[1]) {
            std::cerr << "embedding size is wrong, expected " << dim[0] + dim[1] << " but get" << input[i][0].size() << st

## 4. 编译最终使用的动态库libmodel.so

- 执行下面命令编译libmodel.so

```bash
cd $TENSORNET_SOURCE_CODE_DIR && bazel build -c opt //examples/online_serving:libmodel.so
```

## 5. 编译tf_serving，调用libmodel.so进行预测

- 编译tf_serving
    
```bash
cd $TENSORNET_SOURCE_CODE_DIR && bazel build -c opt //examples/online_serving:tf_serving
```

预测可以直接通过dlopen调用`libmodel.so`即可， `Run` 函数对应graph.cc中定义的 `Run` 函数。具体的调用方式可以参考 `$TENSORNET_SOURCE_CODE_DIR/examples/online_serving/main.cc` 。

  **注意：**下面例子没有实现embedding lookup的功能，使用随机数代替真实的embedding。

In [8]:
!cat $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/main.cc

#include <iostream>
#include <chrono>
#include <fstream>
#include <string>
#include <vector>
#include <map>
#include <boost/algorithm/string.hpp>
#include <dlfcn.h>

#include "random.h"

using namespace std::chrono;

typedef int (*RUN_FUNC)(const std::vector<std::vector<std::vector<float>>>&,
                        std::vector<float>&);

const int k_batch_size = 32;

void InitWeight(int dim, std::vector<float>& weight) {
    weight.clear();
    auto& reng = tensornet::local_random_engine();
    auto distribution = std::normal_distribution<float>(0, 1 / sqrt(dim));

    for (int i = 0; i < dim; ++i) {
        weight.push_back(distribution(reng) * 0.001);
    }
}

int combine_fea(std::vector<std::vector<float>> emb_feas, std::vector<float>& merged_feas) {
    if (emb_feas.size() % 2 != 0) {
        std::cerr << "combine_fea error." << std::endl;
        return -1;
    }
    float wide_lr = 0.0;
    std::vector<float> dnn_vec(8, 0.0);
    for (size_t i 

## 6. 运行

在`$TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/`目录下我们已经放置了一部分测试数据，我们需要将编译好的`tf_serving`和`libmodel.so`拷贝到`$TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/`目录下以便运行。

In [9]:
!cp -f $TENSORNET_SOURCE_CODE_DIR/bazel-bin/examples/online_serving/libmodel.so $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/
!cp -f $TENSORNET_SOURCE_CODE_DIR/bazel-bin/examples/online_serving/tf_serving $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/

In [10]:
!tree $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/

/da2/zhangyansheng/tensornet/examples/online_serving/test_env/
├── data
│   ├── feature.data
│   └── slot.data
├── libmodel.so
└── tf_serving

1 directory, 4 files


  slot.data中slot顺序需要和wide_deep.py中的WIDE_SLOTS和DEEP_SLOTS顺序一致

In [11]:
!cat $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/data/slot.data

 1,2,3,4


  feature.data数据按照main.cc中解析的格式构造即可，下面有些特殊分隔符显示不对。

In [12]:
!cat $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/data/feature.data

0	1-1956697246319764053	2-5730244542641024933	3-9118175470622903910-8448113875518360108	4-2457261431940944054
0	1-1956697246319764053	2-1160342140770244045	3-9282460046502382416746402536336743089	48069461642963552018
0	1-1956697246319764053	2-7395780378584928338	3-81982606186909154355518680928552928316	41517509480232003656
0	1-1956697246319764053	25418959032072182947	3-73280572481107405052638487947984888231	4-3889211362256458670
0	1-1956697246319764053	2-1447814788092430700	313141364156924037955677372407420628171	48002349267150817951
0	1-1956697246319764053	2-3444864941673928379	343676846390070809168517447599233938262	47249151252390487352
0	1-1956697246319764053	2-25308817390645703	3-7048581416920648308415903991668524816	41517509480232003656
0	1-1956697246319764053	2-3444864941673928379	353805355444342988164526355427571578606	4-4517525648429849306
0	1-1956697246319764053	2-3444864941673928379	353805355444342988165299058237759487470	4

  执行`tf_serving`，输出预测结果。由于此demo没有使用真实的embedding，所以预测结果不可信。

In [13]:
!cd $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/ && ./tf_serving

0.500834
0.500769
0.500813
0.5008
0.500809
0.500836
0.500773
0.500765
0.500878


## 总结

上面我们展示了如何使用XLA做在线预估，这个代码只供参考学习，真实在线使用需要再做优化。其中可以看到，在线预估时：

1. 我们省去了embedding lookup的sub graph，在线实现可以更加容易的嵌入到业务代码中。
2. embedding的数据会单独保存到字典中，在线自行查询。

在[05-export-feature-embedding.ipynb](./05-export-feature-embedding.ipynb)一节中我们会说明如何将sparse的embedding数据转换成字典以供在线使用。