Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metaphor between programming language and refactored Paddle #3570

Closed
reyoung opened this issue Aug 18, 2017 · 6 comments
Closed

metaphor between programming language and refactored Paddle #3570

reyoung opened this issue Aug 18, 2017 · 6 comments
Assignees

Comments

@reyoung
Copy link
Collaborator

reyoung commented Aug 18, 2017

我认为我们的设计非常接近于函数式编程语言。原因有:

  1. Paddle重构后的基础元素是Operator,本质上就是一个没有side-effects的函数。
  2. Paddle的所有Operator都会返回一个或多个值。包括if-elsewhile
  3. Paddle的计算是延后的,惰性的。即先定义Operator。当我们要求某一个输出的时候,再去运行和这个输出有关的Operators,不处理无关的Operators。

这三点基本上一个函数式编程语言都具备,譬如lisp,haskell和scala。所以,据此关联编程语言,特别是函数式编程语言,到Paddle重构后的概念如下:

所有的Op都返回Expression

命令式编程语言和函数式编程语言的一个重要区别就是函数式编程语言的每一行都是Expression,而命令式编程语言的每一行都是Statement。Expression与Statement的区别是Expression永远是有返回值的。

函数式风格特别适宜描写RNN,IfElseOp等操作。例如如下 if-else的Scala代码

var hidden:Int = 39
hidden = if (hidden % 2 == 0) {
	var tmp = hidden / 2
	tmp = tmp * tmp
	tmp
} else {
	hidden + 1
}

与 for循环代码(rnn)

var x1 = Array(1, 2, 3)
var x2 = Array(4, 5, 6)

var rnn = for( all <- x1 zip x2) yield {
	all._1 * all._2
}

均和我们的If-Else与RNNOp的逻辑一致。即,If-Else两个分支分别处理数据,但是在最终二者一起返回一个值。RNNOp也是将最后的输出组合成一个值返回。

StepScope和IfElse Scope

如同上述的scala代码所示,RNN的每一个时间步和IfElse的每一个分支都有一个子Scope,这些Scope继承自外部的Scope。换成Scala代码的逻辑是,每一个 {} 中都是新的变量作用域。但是内部作用域可以访问外部的变量。

但值得注意的是,NetOp并不一定和编程语言里的Block一一对应。NetOp的含义是将常见的多个操作放到一起。所以更像使用宏定义的函数。譬如。

// Macro For FC
#define FC(input)\
  sigmoid(rowwise_add(mul_op(input, 'W'), 'b'))

而是否产生新的Block,只取决于是否新建了Scope。RNN和IfElse都会默认新建Scope,正如for循环和IfElse操作都要接大括号一样。但是其他的大括号怎么加,并不是我们这个框架应该决定的。

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Aug 21, 2017

Turing-complete

I agree that our system is similar to functional programming languages. The reason is simple -- our system should be able to represent any Turing-complete computation, just like a programming language.

A Super Programming Language

More than that, PaddlePaddle must create and run a backward-phase for any computation described by our users.

The latter requirement implies that the execution of a PaddlePaddle application program requires not only a stack, but a stack forest, or the Scope-hierarchy -- the one created in #3116.

Block, instead of NetOp

It also suggests that we rename class operators::NetOp into framework::Block. A block is like a segment of C++/Java code encapsulated by a pair of curly braces, where local variables are defined,

{
  int tmp1=10, tmp2=20;
  printf("%d", tmp1 + tmp2);
}

or a let-block in Lisp:

(let ((tmp1 10) (tmp2 20))
  (+ tmp1 tmp2)

Scope-hierarchy, instead of a Stack

In order to execute a program, the runtime needs to maintain a stack -- when the execution enters a block, a frame is pushed to hold block-local variables, and when the execution leaves the block, the frame is popped.

In order to execute a DL program, we cannot pop stack frames, because they are necessary for the backward computation. Given old stacks not popped, but new stacks created, we get a stack forest, which can be represented by the Scope-hierarchy.

@QiJune
Copy link
Member

QiJune commented Aug 21, 2017

有一个问题啊,函数式语言里面,变量都是immutable的。而神经网络中,变量的值是在迭代中反复更新的。

@Superjomn
Copy link
Contributor

我们可否分析下 tensorflow 和 pytorch 的python语法,看是否函数编程等是必要的,如果是必要的他们是怎么做的。

据我所知,tf 里有 tf.scope 来实现类似作用域的效果。

tensorflow, pytorch, mxnet 三者在书写上是有一些共通之处的,比如 Variable,op 的使用等,是否可以总结借鉴下,不会差的太多,也可以给一个面向用户比较通用熟悉的界面。

@reyoung @wangkuiyi @QiJune

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Aug 22, 2017

@QiJune 我看神经元网络里,变量并不会被反复更新吧。而且正相反,forward的时候变量被赋值一次,backward的时候,另一组(gradient variables)被赋值一次。这一点和functional programming language是一样的。

@wangkuiyi
Copy link
Collaborator

@Superjom 好啊。你看一下?

@JiayiFeng
Copy link
Collaborator

@wangkuiyi 我觉得 @QiJune 的意思是parameter在每一个mini-batch的迭代中都会被更新。也就是说计算产生的gradient variables最后还会被用来更新原来的variables

heavengate pushed a commit to heavengate/Paddle that referenced this issue Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants