Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need op kernel for Operator #2790

Closed
jacquesqiao opened this issue Jul 10, 2017 · 2 comments
Closed

need op kernel for Operator #2790

jacquesqiao opened this issue Jul 10, 2017 · 2 comments

Comments

@jacquesqiao
Copy link
Member

jacquesqiao commented Jul 10, 2017

有两种实现Op的方式:

  • 无kernel。Op带模板参数,注册时注册不同类型Op,new Op的时候需要生成特定类型的Op。
  • 有kernel。Op不带模板参数,一种类型的Op只注册一次,同时每种Op需要注册多种kernel,new Op的时候无需特定类型的Op,运行前或者运行时决定运行哪种kernel。

上述两种方式的区别:

  • new op时是否需要带device信息。
    • 无kernel。new op时需要带device信息,构建之后Op类型确定,无法修改。构建过程会变得复杂,需要对每个op管理device信息,运行时也不好调整和优化。
    • 有kernel。new op时无需带device信息,构建完成后根据情况决定运行何种kernel。灵活,用户无需事先关心device信息,方便运行时优化。和paddle目前的做法保持一致。
  • 注册方式。

不同框架对比:

实现kernel带来的问题:

  • kernel如何与Op绑定。
    • 简单方法就是每个OP保存一个kernel数组。例如
  • 如何运行时根据上下文切换Kernel。

集中框架的对比。

  • 多数框架带有kernel。
  • 实现带kernel的版本不复杂。
  • 带kernel更易于优化整个计算过程(graph)。

一个demo op的实现

typedef std::function<void(OpContext*)> ComputeFun;

/// simple kernel
template<typename T>
void CosineCPU(OpContext* ctx) {
			printf("run cosin op CPU kernel, scale = %f\n", ctx->op->GetAttr<T>("scale"));
			printf("%s\n", ctx->op->DebugString().c_str());
}

template<typename T>
void CosineGPU(OpContext* ctx) {
	printf("run cosin op GPU kernel, scale = %f\n", ctx->op->GetAttr<T>("scale"));
	printf("%s\n", ctx->op->DebugString().c_str());
}

class CosOp : public OperatorBase {
 public:
	explicit CosOp() {
		kernels_["CPU"] = CosineCPU<float>;
		kernels_["GPU"] = CosineGPU<float>;
	}

  void Run(OpContext* ctx) const override {
		auto dev_ctx = dynamic_cast<CPUDeviceContext*>(ctx->device_context);
		if (dev_ctx != nullptr) {
			kernels_.at("CPU")(ctx);
		} else {
			kernels_.at("GPU")(ctx);
		}
  }

 private:
	std::map<std::string, ComputeFun> kernels_;
};

简单带kernel的Op的运行方式:

op的运行

  DeviceContext* cpu_ctx = new CPUDeviceContext();
  DeviceContext* gpu_ctx = new CUDADeviceContext();
  auto scope = std::make_shared<Scope>();

  OperatorBase* op = paddle::framework::OpRegistry::CreateOp(op_desc);

  // will run on cpu kernel
  op->Run(scope, cpu_ctx);

  // will run on gpu kernel
  op->Run(scope, gpu_ctx);
jacquesqiao added a commit that referenced this issue Jul 11, 2017
Add OperatorBase.

issue: #2790

Paddle design the Operator with Kernel. OperatorBase has no type and device information when create, One operator can have multiple kernels, Operator will choose a kernel to run according to context. The kernel should be bind to Operator before or during Operator running.
@Superjomn
Copy link
Contributor

Superjomn commented Jul 12, 2017

Kernal 的 Run 感觉有一些奇怪,直接传入 OpContext

感觉理想状态,OpBase 传入 OpContext ,collect所有的 tensor,如果需要从其他device 上复制,则复制完毕,然后传入给 Kernal.

Kernal 应该只负责计算,所有的input和output应该已经collect完毕,output的shape确定完毕,然后直接把tensor提供给它,比如 Kernal.Run(dev_ctx, inputs, outputs) ,Kernal不需要管 tensor 的复制,也无法修改shape。

但现在的设计感觉是, Kernal 需要负责复制、shape修改所有的东西。

@reyoung
Copy link
Collaborator

reyoung commented Aug 1, 2017

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants