need op kernel for Operator #2790

jacquesqiao · 2017-07-10T06:21:23Z

operator base实现：add operator base #2725
在operator base基础上，带kernel的Op的demo：Opbase with kernel #2796
一个带kernel的op的demo cos_op

有两种实现Op的方式：

无kernel。Op带模板参数，注册时注册不同类型Op，new Op的时候需要生成特定类型的Op。
有kernel。Op不带模板参数，一种类型的Op只注册一次，同时每种Op需要注册多种kernel，new Op的时候无需特定类型的Op，运行前或者运行时决定运行哪种kernel。

上述两种方式的区别：

new op时是否需要带device信息。
- 无kernel。new op时需要带device信息，构建之后Op类型确定，无法修改。构建过程会变得复杂，需要对每个op管理device信息，运行时也不好调整和优化。
- 有kernel。new op时无需带device信息，构建完成后根据情况决定运行何种kernel。灵活，用户无需事先关心device信息，方便运行时优化。和paddle目前的做法保持一致。
注册方式。

不同框架对比：

tensorflow。有kernel，构建Graph的时候，Node可以指定device，但是可以在真正运行时调整，kernel和device绑定，特定device的executor会选择对应的kernel运行。 refs: tensorflow-operator
Mxnet。有kernel，kernel以function的形式保存，在运行graph之前，会bind到具体的node上。refs: mxnet operator
caffe2。没有kernel，opdef中需要指定device信息，如果没有，默认使用net的device，创建Operator时会根据device_option 创建特定类型的Op，TryCreateOperator，运行时无法调整。

实现kernel带来的问题：

kernel如何与Op绑定。
- 简单方法就是每个OP保存一个kernel数组。例如
如何运行时根据上下文切换Kernel。
- 简单做法，根据context判断

集中框架的对比。

多数框架带有kernel。
实现带kernel的版本不复杂。
带kernel更易于优化整个计算过程(graph)。

一个demo op的实现

typedef std::function<void(OpContext*)> ComputeFun;

/// simple kernel
template<typename T>
void CosineCPU(OpContext* ctx) {
			printf("run cosin op CPU kernel, scale = %f\n", ctx->op->GetAttr<T>("scale"));
			printf("%s\n", ctx->op->DebugString().c_str());
}

template<typename T>
void CosineGPU(OpContext* ctx) {
	printf("run cosin op GPU kernel, scale = %f\n", ctx->op->GetAttr<T>("scale"));
	printf("%s\n", ctx->op->DebugString().c_str());
}

class CosOp : public OperatorBase {
 public:
	explicit CosOp() {
		kernels_["CPU"] = CosineCPU<float>;
		kernels_["GPU"] = CosineGPU<float>;
	}

  void Run(OpContext* ctx) const override {
		auto dev_ctx = dynamic_cast<CPUDeviceContext*>(ctx->device_context);
		if (dev_ctx != nullptr) {
			kernels_.at("CPU")(ctx);
		} else {
			kernels_.at("GPU")(ctx);
		}
  }

 private:
	std::map<std::string, ComputeFun> kernels_;
};

简单带kernel的Op的运行方式：

构造不同类型的context
使用更复杂的注册和key来寻找。
运行之前统一bind一次，将kernel类型保存在op中。

op的运行

  DeviceContext* cpu_ctx = new CPUDeviceContext();
  DeviceContext* gpu_ctx = new CUDADeviceContext();
  auto scope = std::make_shared<Scope>();

  OperatorBase* op = paddle::framework::OpRegistry::CreateOp(op_desc);

  // will run on cpu kernel
  op->Run(scope, cpu_ctx);

  // will run on gpu kernel
  op->Run(scope, gpu_ctx);

Add OperatorBase. issue: #2790 Paddle design the Operator with Kernel. OperatorBase has no type and device information when create, One operator can have multiple kernels, Operator will choose a kernel to run according to context. The kernel should be bind to Operator before or during Operator running.

Superjomn · 2017-07-12T01:11:30Z

Kernal 的 Run 感觉有一些奇怪，直接传入 OpContext

感觉理想状态，OpBase 传入 OpContext ，collect所有的 tensor，如果需要从其他device 上复制，则复制完毕，然后传入给 Kernal.

Kernal 应该只负责计算，所有的input和output应该已经collect完毕，output的shape确定完毕，然后直接把tensor提供给它，比如 Kernal.Run(dev_ctx, inputs, outputs) ，Kernal不需要管 tensor 的复制，也无法修改shape。

但现在的设计感觉是， Kernal 需要负责复制、shape修改所有的东西。

reyoung · 2017-08-01T05:58:15Z

Done.

reyoung closed this as completed Aug 1, 2017

reyoung added this to Done in PaddlePaddle Refactoring: Phase 1 Aug 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need op kernel for Operator #2790

need op kernel for Operator #2790

jacquesqiao commented Jul 10, 2017 •

edited

Loading

Superjomn commented Jul 12, 2017 •

edited

Loading

reyoung commented Aug 1, 2017

need op kernel for Operator #2790

need op kernel for Operator #2790

Comments

jacquesqiao commented Jul 10, 2017 • edited Loading

有两种实现Op的方式：

上述两种方式的区别：

不同框架对比：

实现kernel带来的问题：

集中框架的对比。

一个demo op的实现

简单带kernel的Op的运行方式：

Superjomn commented Jul 12, 2017 • edited Loading

reyoung commented Aug 1, 2017

jacquesqiao commented Jul 10, 2017 •

edited

Loading

Superjomn commented Jul 12, 2017 •

edited

Loading