Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/design of v2 layer converter #2104

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented May 11, 2017


## What are the problems

Paddle V2 API give a flexible way to configure neural network topology. The user can create a neural network topology layer by layer. We use the final layer to represent the neural network topology.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neural network => Neural Network

The user => Users

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Memory. We use memory layers in Recurrent Neural Network; a memory layer represents the output of some layer in the last time step. However, the memory layer connects to its input layer implicitly by sharing the same name. We also cannot traverse to memory layer because maybe there is no layer using this memory layer.
* Recurrent Group. The recurrent group is a sub-topology config using in Recurrent Neural Network. It represents the layers in each recurrent time-step. We could traverse back to some topology in a recurrent group, but the sub-topology is non-splittable. The recurrent group should be either entirely in the topology or not in the topology.

## Thinking how to resolve these problems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking how to resolve these problems => How to resolve this problem

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with how PaddlePaddle represents and runs RNN, so I am not yet ready to judge if 1. we should introduce "tape" or 2. we should change the way we represent RNN, to handle the problem stated in this PR -- "some layers are not traversable".

Also, I am curious that if RNN layers are not traversable, how could we have those RNN-based examples in book.paddlepaddle.org?


We use `cost` to represent the entire topology of the neural network. We use the last node to traverse the entire topology by `depth first search` algorithm. The connection between each layer is represented by the `input` parameter. It is fit for representing a plain neural network topology. However, there are some special layers in Paddle, which are not connected explicitly by `input` parameter. They are:

* Evaluator. An evaluator is used to compute metrics(such as error rate, f1 score) in Paddle. An evaluator can not be the input of other layers. So we cannot access evaluators by simply traversing back from the final layer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a broken logic here. From the first statement

An evaluator can not be the input of other layers.

we cannot derive the conclusion

So we cannot access evaluators by simply traversing back from the final layer.

because logically an evaluator could be the "final layer".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

We use `cost` to represent the entire topology of the neural network. We use the last node to traverse the entire topology by `depth first search` algorithm. The connection between each layer is represented by the `input` parameter. It is fit for representing a plain neural network topology. However, there are some special layers in Paddle, which are not connected explicitly by `input` parameter. They are:

* Evaluator. An evaluator is used to compute metrics(such as error rate, f1 score) in Paddle. An evaluator can not be the input of other layers. So we cannot access evaluators by simply traversing back from the final layer.
* Memory. We use memory layers in Recurrent Neural Network; a memory layer represents the output of some layer in the last time step. However, the memory layer connects to its input layer implicitly by sharing the same name. We also cannot traverse to memory layer because maybe there is no layer using this memory layer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem here that we shouldn't have memory layer at all? All, the concept of memory shouldn't be implemented as a layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory is a very fondamental concept in Paddle Topology. And it is just a special layer currently.


* Evaluator. An evaluator is used to compute metrics(such as error rate, f1 score) in Paddle. An evaluator can not be the input of other layers. So we cannot access evaluators by simply traversing back from the final layer.
* Memory. We use memory layers in Recurrent Neural Network; a memory layer represents the output of some layer in the last time step. However, the memory layer connects to its input layer implicitly by sharing the same name. We also cannot traverse to memory layer because maybe there is no layer using this memory layer.
* Recurrent Group. The recurrent group is a sub-topology config using in Recurrent Neural Network. It represents the layers in each recurrent time-step. We could traverse back to some topology in a recurrent group, but the sub-topology is non-splittable. The recurrent group should be either entirely in the topology or not in the topology.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that I am not familiar with that how PaddlePaddle represent RNN yet. Any suggestion on how I can get familiar with it so could I help review this design doc? @reyoung

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give a demo code about RNN in this PR, hope it could be helpful to explain the problem.

@reyoung reyoung force-pushed the feature/design_of_v2_layer_converter branch from 8437fd5 to ea3581d Compare May 15, 2017 03:50
@@ -0,0 +1,151 @@
# Using tape to refactor current paddle.v2 configuration parsing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tape to refactor current paddle.v2 configuration parsing => Using Tape to Refactor Current Paddle.V2 Configuration Parsing

http://www.titlecase.com is a useful tool and thanks @helinwang 's recommendation.

@reyoung reyoung added this to Current Week ToDo in Defects board May 16, 2017
@reyoung reyoung moved this from Current Week ToDo to Doing in Defects board May 16, 2017
Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some understanding about Tape.


with Tape():
paddle.train(topology(False))
```
Copy link
Contributor

@lcy-seso lcy-seso May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看完之后说一些我的感受,希望继续和@reyoung 商讨。

我的一些理解

  • 从这篇 design doc 中我可以理解 V2解析配置的核心思想,以及所衍生出的一些问题。引入Tape可以更加容易解决目前的几个核心问题
  • 但我还无法判断引入Tape 这个概念的必然性

为什么需要引入 Tape 这个概念

  • 我理解,基于V2 目前 BFS 的思想,这篇文章中提到的三个问题应该也有办法解决,但都是逐个以人工规则进行修正,不找到正确的方案,这种修正可能没有尽头
  • Tape 的引入可以解决 3个问题:
    1. 出错信息里包含了大量调用栈,真正的出错信息被淹没,一旦出错非常难以查找错误
    2. 在BFS过程中,目前存在三种行为特殊的Layer(在逻辑上他们都是神经网络中的一个Layer):(1)evaluator ; (2)memory; (3)recurrent_layer_group 他们的处理策略需要手工添加规则去修正
    3. 目前未发现或者未来会出现的特殊Layer 可能都需要逐一添加处理逻辑,代码维护困难

Tape 的优点

我可以感受到 Tape 的优点包括:

  1. Tape 处理网络拓扑结构的顺序和定义神经网络的直觉逻辑是一致的(也就是用户在配置中定义神经网络的顺序),因此,一旦出错,可以报一些正常的错误,而不是现在这样的“非正常错误”(BFS这个逻辑用户是不知道的,现在报的都是遍历错误),出错信息容易被理解
  2. 可以不需要考虑复杂的遍历逻辑,会简化 recurrent_layer_group 的处理

基于 BFS 的思想解决提出的三个问题

  1. Evaluator
    • 神经网络是一个有向无环图,Evaluator 和 Cost 在网络中出现的位置等同,只会出现在网络的末端。按照广度优先搜索的逻辑,Evaluator 应该是可以被创建出来的,只是如果不显示和Cost 区别,有可能不知道网络的优化准则。
    • 为什么不可以继续使用 Cost (可以是多个cost)来代表网络的拓扑结构,显示地指定Evaluator 以区别于Cost呢?
  2. 关于Memory
    • Memory 是一个特殊的Layer,是循环神经网络中的一个重要概念,有着特殊的功能,无法被去掉。Memory 对应了特殊的Layer(各类Agent Layer)
    • Memory 没有 inputs ,不需要其他layers的输出作为输入
    • Memory 的 inputs 有别于其他Layer,不是通过input获取,而是通过:
      1. boot_layer 指定第0时刻输入 (已经处理)
      2. name 指定第1时刻开始的输入 (遍历时没有考虑,目前存在部分layers无法被创建的bugs)
    • 直接添加处理策略,修复上面 提到的 2 这种错误
  3. 关于 recurrent_layer_group
    • recurrent_layer_group 在PaddlePaddle中是一个submodel,不能被切割,这一点必须考虑,v2 目前有没有加以考虑,我不是非常确定
    • recurrent_layer_group 在PaddlePaddle中可以嵌套两次(recurrnet_layer_group的step函数又是一个recurrent_layer_group),v2 目前没有实现 recurrent_layer_group的嵌套,想必这个逻辑会非常的绕,但写肯定也能写出来,我想再看看v2 目前的处理逻辑

我的两个问题

有两个问题我还没有完全理解到位,

  1. Tape 这个概念是否需要暴露给用户?如果是,暴露给用户的逻辑接口有哪些呢?或者Tape 是一个不会被用户感知到的概念?
  2. 有了 Tape 还需要 BFS 过程吗?Tape 是用来辅助 BFS,还是为了彻底地抛弃 BFS 过程。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluator 和 Cost 在网络中出现的位置等同

这个不对吧,Evaluator其实可以出现在神经网络的任何位置。

name 指定第1时刻开始的输入 (遍历时没有考虑,目前存在部分layers无法被创建的bugs)

如果回溯的时候没有回溯到Memory同名的Layer,这个Layer没有办法用其他的办法回溯到。因为原来的写法没有在任何地方记录额外信息,只能靠回溯输出层。

Tape 这个概念是否需要暴露给用户?如果是,暴露给用户的逻辑接口有哪些呢?或者Tape 是一个不会被用户感知到的概念?

支持Paddle现有功能,不用暴露Tape给普通用户。

不过如果要实现动态神经网络,会让用户可以清空Tape或者重置tape。API可以为:

with Tape():
    ...

有了 Tape 还需要 BFS 过程吗?Tape 是用来辅助 BFS,还是为了彻底地抛弃 BFS 过程。

使用Tape,不需要DFS搜索。

Copy link
Contributor

@lcy-seso lcy-seso May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不对吧,Evaluator其实可以出现在神经网络的任何位置。

  • 我的理解是这样,目前PaddlePaddle中,Evaluator 和 Cost 一样,不会再成为下一个Layer/Evaluator的输出。Evaluator 和 Cost 都是有向无环图中只有入,没有出的结点,出度为0,但 Evaluator 不参与优化,需要特别指定。
  • 我不太理解的一点,“用一个变量(目前看上去是一个cost)表示Topology” 。为什么强调是一个变量呢?
  • 网络如果有多个Cost,目前是如何表示Topology呢?
    • 网络可能有多个Cost,Evaluator 目前也是通过inputs 来和其它Layer连接,并没有例外
    • 在我的理解里面,用有向无环图的最后一层(包括多个Cost和Evaluator)来定义拓扑结构更合理一些,为什么不是这样呢?
    • 潜在的,如果 Cost 之后还可以继续接 Cost(比如 sum 之类),Evaluator 之后还可以接Evaluator , 会打破“Cost/Evaluator”是最后一层这以假设。

@emailweixu
Copy link
Collaborator

现在有人在写这个的代码吗?

@emailweixu emailweixu mentioned this pull request May 26, 2017
@luotao1
Copy link
Contributor

luotao1 commented Feb 1, 2019

感谢您给PaddlePaddle贡献代码。由于Paddle V1/V2版本已不再维护,相关代码也已从develop分支上删除,因此关闭您的PR,欢迎您向Paddle最新版-Fluid贡献代码。
Thanks for contributing to PaddlePaddle! Since V1/V2 will not be maintained anymore, and related codes have been deleted from develop branch as well, we close this PR. Welcome to contribute to Fluid——the latest version of PaddlePaddle.

@luotao1 luotao1 closed this Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants