-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/design of v2 layer converter #2104
Feature/design of v2 layer converter #2104
Conversation
Add example of branching topology
doc/design/v2_layer.md
Outdated
|
||
## What are the problems | ||
|
||
Paddle V2 API give a flexible way to configure neural network topology. The user can create a neural network topology layer by layer. We use the final layer to represent the neural network topology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neural network => Neural Network
The user => Users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/v2_layer.md
Outdated
* Memory. We use memory layers in Recurrent Neural Network; a memory layer represents the output of some layer in the last time step. However, the memory layer connects to its input layer implicitly by sharing the same name. We also cannot traverse to memory layer because maybe there is no layer using this memory layer. | ||
* Recurrent Group. The recurrent group is a sub-topology config using in Recurrent Neural Network. It represents the layers in each recurrent time-step. We could traverse back to some topology in a recurrent group, but the sub-topology is non-splittable. The recurrent group should be either entirely in the topology or not in the topology. | ||
|
||
## Thinking how to resolve these problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking how to resolve these problems => How to resolve this problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with how PaddlePaddle represents and runs RNN, so I am not yet ready to judge if 1. we should introduce "tape" or 2. we should change the way we represent RNN, to handle the problem stated in this PR -- "some layers are not traversable".
Also, I am curious that if RNN layers are not traversable, how could we have those RNN-based examples in book.paddlepaddle.org?
doc/design/v2_layer.md
Outdated
|
||
We use `cost` to represent the entire topology of the neural network. We use the last node to traverse the entire topology by `depth first search` algorithm. The connection between each layer is represented by the `input` parameter. It is fit for representing a plain neural network topology. However, there are some special layers in Paddle, which are not connected explicitly by `input` parameter. They are: | ||
|
||
* Evaluator. An evaluator is used to compute metrics(such as error rate, f1 score) in Paddle. An evaluator can not be the input of other layers. So we cannot access evaluators by simply traversing back from the final layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a broken logic here. From the first statement
An evaluator can not be the input of other layers.
we cannot derive the conclusion
So we cannot access evaluators by simply traversing back from the final layer.
because logically an evaluator could be the "final layer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/v2_layer.md
Outdated
We use `cost` to represent the entire topology of the neural network. We use the last node to traverse the entire topology by `depth first search` algorithm. The connection between each layer is represented by the `input` parameter. It is fit for representing a plain neural network topology. However, there are some special layers in Paddle, which are not connected explicitly by `input` parameter. They are: | ||
|
||
* Evaluator. An evaluator is used to compute metrics(such as error rate, f1 score) in Paddle. An evaluator can not be the input of other layers. So we cannot access evaluators by simply traversing back from the final layer. | ||
* Memory. We use memory layers in Recurrent Neural Network; a memory layer represents the output of some layer in the last time step. However, the memory layer connects to its input layer implicitly by sharing the same name. We also cannot traverse to memory layer because maybe there is no layer using this memory layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the problem here that we shouldn't have memory layer at all? All, the concept of memory shouldn't be implemented as a layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory
is a very fondamental concept in Paddle Topology. And it is just a special layer currently.
doc/design/v2_layer.md
Outdated
|
||
* Evaluator. An evaluator is used to compute metrics(such as error rate, f1 score) in Paddle. An evaluator can not be the input of other layers. So we cannot access evaluators by simply traversing back from the final layer. | ||
* Memory. We use memory layers in Recurrent Neural Network; a memory layer represents the output of some layer in the last time step. However, the memory layer connects to its input layer implicitly by sharing the same name. We also cannot traverse to memory layer because maybe there is no layer using this memory layer. | ||
* Recurrent Group. The recurrent group is a sub-topology config using in Recurrent Neural Network. It represents the layers in each recurrent time-step. We could traverse back to some topology in a recurrent group, but the sub-topology is non-splittable. The recurrent group should be either entirely in the topology or not in the topology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that I am not familiar with that how PaddlePaddle represent RNN yet. Any suggestion on how I can get familiar with it so could I help review this design doc? @reyoung
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I give a demo code about RNN in this PR, hope it could be helpful to explain the problem.
8437fd5
to
ea3581d
Compare
@@ -0,0 +1,151 @@ | |||
# Using tape to refactor current paddle.v2 configuration parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tape to refactor current paddle.v2 configuration parsing
=> Using Tape to Refactor Current Paddle.V2 Configuration Parsing
http://www.titlecase.com is a useful tool and thanks @helinwang 's recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some understanding about Tape.
|
||
with Tape(): | ||
paddle.train(topology(False)) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看完之后说一些我的感受,希望继续和@reyoung 商讨。
我的一些理解
- 从这篇 design doc 中我可以理解 V2解析配置的核心思想,以及所衍生出的一些问题。引入Tape可以更加容易解决目前的几个核心问题
- 但我还无法判断引入Tape 这个概念的必然性
为什么需要引入 Tape 这个概念
- 我理解,基于V2 目前 BFS 的思想,这篇文章中提到的三个问题应该也有办法解决,但都是逐个以人工规则进行修正,不找到正确的方案,这种修正可能没有尽头
- Tape 的引入可以解决 3个问题:
- 出错信息里包含了大量调用栈,真正的出错信息被淹没,一旦出错非常难以查找错误
- 在BFS过程中,目前存在三种行为特殊的Layer(在逻辑上他们都是神经网络中的一个Layer):(1)evaluator ; (2)memory; (3)recurrent_layer_group 他们的处理策略需要手工添加规则去修正
- 目前未发现或者未来会出现的特殊Layer 可能都需要逐一添加处理逻辑,代码维护困难
Tape 的优点
我可以感受到 Tape 的优点包括:
- Tape 处理网络拓扑结构的顺序和定义神经网络的直觉逻辑是一致的(也就是用户在配置中定义神经网络的顺序),因此,一旦出错,可以报一些正常的错误,而不是现在这样的“非正常错误”(BFS这个逻辑用户是不知道的,现在报的都是遍历错误),出错信息容易被理解
- 可以不需要考虑复杂的遍历逻辑,会简化 recurrent_layer_group 的处理
基于 BFS 的思想解决提出的三个问题
- Evaluator
- 神经网络是一个有向无环图,Evaluator 和 Cost 在网络中出现的位置等同,只会出现在网络的末端。按照广度优先搜索的逻辑,Evaluator 应该是可以被创建出来的,只是如果不显示和Cost 区别,有可能不知道网络的优化准则。
- 为什么不可以继续使用 Cost (可以是多个cost)来代表网络的拓扑结构,显示地指定Evaluator 以区别于Cost呢?
- 关于Memory
- Memory 是一个特殊的Layer,是循环神经网络中的一个重要概念,有着特殊的功能,无法被去掉。Memory 对应了特殊的Layer(各类Agent Layer)
- Memory 没有 inputs ,不需要其他layers的输出作为输入
- Memory 的 inputs 有别于其他Layer,不是通过input获取,而是通过:
- boot_layer 指定第0时刻输入 (已经处理)
- name 指定第1时刻开始的输入 (遍历时没有考虑,目前存在部分layers无法被创建的bugs)
- 直接添加处理策略,修复上面 提到的 2 这种错误
- 关于 recurrent_layer_group
- recurrent_layer_group 在PaddlePaddle中是一个submodel,不能被切割,这一点必须考虑,v2 目前有没有加以考虑,我不是非常确定
- recurrent_layer_group 在PaddlePaddle中可以嵌套两次(recurrnet_layer_group的step函数又是一个recurrent_layer_group),v2 目前没有实现 recurrent_layer_group的嵌套,想必这个逻辑会非常的绕,但写肯定也能写出来,我想再看看v2 目前的处理逻辑
我的两个问题
有两个问题我还没有完全理解到位,
- Tape 这个概念是否需要暴露给用户?如果是,暴露给用户的逻辑接口有哪些呢?或者Tape 是一个不会被用户感知到的概念?
- 有了 Tape 还需要 BFS 过程吗?Tape 是用来辅助 BFS,还是为了彻底地抛弃 BFS 过程。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluator 和 Cost 在网络中出现的位置等同
这个不对吧,Evaluator其实可以出现在神经网络的任何位置。
name 指定第1时刻开始的输入 (遍历时没有考虑,目前存在部分layers无法被创建的bugs)
如果回溯的时候没有回溯到Memory同名的Layer,这个Layer没有办法用其他的办法回溯到。因为原来的写法没有在任何地方记录额外信息,只能靠回溯输出层。
Tape 这个概念是否需要暴露给用户?如果是,暴露给用户的逻辑接口有哪些呢?或者Tape 是一个不会被用户感知到的概念?
支持Paddle现有功能,不用暴露Tape给普通用户。
不过如果要实现动态神经网络,会让用户可以清空Tape或者重置tape。API可以为:
with Tape():
...
有了 Tape 还需要 BFS 过程吗?Tape 是用来辅助 BFS,还是为了彻底地抛弃 BFS 过程。
使用Tape,不需要DFS搜索。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不对吧,Evaluator其实可以出现在神经网络的任何位置。
- 我的理解是这样,目前PaddlePaddle中,Evaluator 和 Cost 一样,不会再成为下一个Layer/Evaluator的输出。Evaluator 和 Cost 都是有向无环图中只有入,没有出的结点,出度为0,但 Evaluator 不参与优化,需要特别指定。
- 我不太理解的一点,“用一个变量(目前看上去是一个cost)表示Topology” 。为什么强调是一个变量呢?
- 网络如果有多个Cost,目前是如何表示Topology呢?
- 网络可能有多个Cost,Evaluator 目前也是通过inputs 来和其它Layer连接,并没有例外
- 在我的理解里面,用有向无环图的最后一层(包括多个Cost和Evaluator)来定义拓扑结构更合理一些,为什么不是这样呢?
- 潜在的,如果 Cost 之后还可以继续接 Cost(比如 sum 之类),Evaluator 之后还可以接Evaluator , 会打破“Cost/Evaluator”是最后一层这以假设。
现在有人在写这个的代码吗? |
感谢您给PaddlePaddle贡献代码。由于Paddle V1/V2版本已不再维护,相关代码也已从develop分支上删除,因此关闭您的PR,欢迎您向Paddle最新版-Fluid贡献代码。 |
Redesign of v2 layer converter.
This design will fix the following defects:
The demo code is #2096.
Here may be better to review.