Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add compile vs runtime discussion #3728

Closed

Conversation

jacquesqiao
Copy link
Member

@jacquesqiao jacquesqiao commented Aug 28, 2017

pull request is better for review than issue
讨论的过程也可作做一个记录

1. InferShape只需要实现一遍,配置时和运行时都调用同样的函数即可

#### 缺点
1. Clone的实现可能并不简单,比如多种设备类型之间内存如何同步(Scope for CPU vs Scope for GPU)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以加上 InferShape 在RNNOp里,每个时间步变长的情况的问题, 另外 cond_op 的output也是只有在 Run 后才知道具体 Shape 的

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

1. 切换Scope简单,只需在Op Run的时候传入一个新的Scope,框架根据全局的VarDesc Map在其中创建对应的Var即可运行。
2. 用VarDesc存储元信息,方便做图的优化。
3. InferShape就可以不需要传入Scope这个参数,因为修改的VarDesc都存在于全局的map中

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加一个下午讨论到的点
VarDesc可以提供模型(描述l)序列化。有以下好处
1、保证paddle cloud的数据安全性,用户提交脚本只能运行包含对应序列化模型,可以在服务端做检查。
2、在对源代码安全性要求高的业务线,部署模型和训练的源代码隔离

1. InferShape实现复杂,编译时InferShape是基于VarDesc,但是运行时也同样需要做InferShape和resize(),因为
a. 运行时size可能会被用户修改.
b. Op实现也要求运行时需要做InferShape(例如RNN)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

于洋提到的一个check变成三个,复杂度上升了

另外set_size时候check允许用户改除batch_size之外的size吗?

@QiJune
Copy link
Member

QiJune commented Aug 29, 2017

Please consider this by the way: #3717 (comment)

@QiJune
Copy link
Member

QiJune commented Aug 29, 2017

请考虑https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/backward.cc#L145 这里的 add operator的设计。这里的add实际上是对不定个输入Tensor做求和,得到一个Tensor。

具体输入数据有几个,是在运行时根据用户的数据决定的;而编译时,我们应该怎么设计相应的AddOpProto 以及AddOperator来可以接收运行时的参数。

- 编译完之后,用户手动修改了某些参数的维度,如果不做check,将无法发现错误。
- 有些维度,必须在看到真实数据之后才能确定,例如condition op(if else while)等,他们的分支选择之后的数据size,必须知道真实数据之后才能确定。

**结论是**:InferShape需要在编译时和运行时都被调用,编译时主要用于check size的配置是否正确,运行时一方面要check size是否正确,还要根据真实数据来做一些resize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infershape 和 check size 是两个问题吧,infer 和 check 阐述的时候可以分开

现在感觉,有的时候说的是 infer 有的时候说的是 check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, 那就把check size 和infersize/resize作为两件事情,分别来说明

4. 在分布式场景下,需要将图序列化之后发送给别的机器执行,这个终归是需要将Variable的相关属性也序列化的。这个点带来一个好处是,云端执行任务是可控的,用户发过来的是一个序列化的图,而不是一个脚本的源代码,有利于数据安全。

#### 缺点
1. InferShape实现复杂,编译时InferShape是基于VarDesc,但是运行时也同样需要做InferShape和resize(),因为
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

双层变长RNN, 外层 RNN 的 output 的 shape比较难在 Run 完前确定,所以,一种自然的想法是, Run 之后再设置 output 的 shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants