multitask training cost calculation is wrong (does not affect training) #1757

helinwang · 2017-04-07T01:57:18Z

One can do multitask training of cost = e0*cost_a + e1*cost_b by:

cost_a = paddle.v2.layer.cross_entropy(name='cost_a', input=out1, coeff=0.3) 
cost_b = paddle.v2.layer.cross_entropy(name='cost_b', input=out2, coeff=0.5)
trainer = paddle.v2.trainer.SGD(cost=[cost_a, cost_b], ...)
trainer.train(...)

This will have correct result in training (backward pass), but when calculating the cost in forward pass the coefficients e0 and e1 is not used. So the cost calculation is wrong.

The text was updated successfully, but these errors were encountered:

qingqing01 · 2017-04-07T02:02:14Z

It is better to print all the cost, such as cost_a, cost_b. It helps to analyze the problem.

reyoung · 2017-04-07T02:44:23Z

trainer.train中的cost不应该接受多个cost，而只应该接受一个cost。

因为:

所谓多个cost只是一些cost的组合情况，具体怎么组合方法可以很多。最常见的是乘以权重组合。
所谓的cost是运算到最后的一个标量，通过这个标量即可以backward。只要是一个标量，就应该可以backward。即，神经网络最后优化的目标就是，最小化某一个****标量。

对于多个cost组合的情况，可以使用

cost_a = paddle.layer.xe(name="cost_a", ...)
cost_b = paddle.layer.xe(...)

cost_sum = paddle.layer.add_to(cost_a, cost_b)  # here, we could override '+' operator, just write `cost_sum = 0.3 * cost_a + 0.5 * cost_b` could be better.

paddle.train(cost=cost_sum, ...)

对于需要print某一个cost的话，可以

cost_a = paddle.layer.xe(...)
paddle.evaluators.print(cost_a)
...

这样做的好处是:

让逻辑非常清晰，只要是返回一个标量的Layer，都可以被优化。(标量即Layer.size==1)
cost的组合可以完全交由用户去做，这样有更大的灵活性。

这样做的问题是：

Paddle目前不支持这个风格的cost设计。应该把Paddle的CostLayer这个概念全部去掉。所有的Layer都是正常的Layer

qingqing01 · 2017-04-07T02:51:24Z

cost层的backward处理和普通层不一样，所以不做任何代码修改是否可行需要验证下。

cost_sum = paddle.layer.add_to(cost_a, cost_b)

helinwang · 2017-04-07T02:54:09Z

@reyoung 同意，我也觉着这样更加简单、清晰。非常赞同：

trainer.train中的cost不应该接受多个cost，而只应该接受一个cost。

感觉paddle.layer.add_to命名可以考虑换个名字，为的是更突出输入只能是多个scalar。比如paddle.layer.sum_scalar()。另外需要想一下接口怎么把系数放进去，我想到的是（也不一定好）：

cost_sum = paddle.layer.sum_scalar((cost_a, 5), cost_b)

lcy-seso · 2017-04-07T02:55:24Z

还有一个问题，如果是sum_cost 这种求和的方式，日志中只输出求和之后的cost 。

从训练模型的角度，接多个cost 的时候，作为用户会希望输出每个cost的值。
多任务学习中，需要确认每个任务的学习情况，如果某个任务cost 降，但另一个任务不降，总cost 是会下降的，但是这可能不是一种非常好的情况。
输出每个任务 cost 的值，了解学习过程中两个任务是否平衡。

lcy-seso · 2017-04-07T02:57:07Z

paddle 里 add_to 这样的layer 是没办法作为最后一层的吧，需要重新写一个 add_cost 的layer 吗？

add_to 需要有其他 layer 来回传梯度，然后把梯度等量回传给每一个input，只有 cost layer 会产生 error，现在的 add_to layer 应该不能直接相加cost 吧。

reyoung · 2017-04-07T03:00:53Z

感觉paddle.layer.add_to命名可以考虑换个名字，为的是更突出输入只能是多个scalar。比如paddle.layer.sum_scalar()。另外需要想一下接口怎么把系数放进去，我想到的是，也不一定好：

其实，可以重载操作符，真正让用户写的东西可以是

paddle.train(cost=0.5*cost_1 + 0.3*cost_2, ...)

reyoung · 2017-04-07T03:02:01Z

paddle 里 add_to 这样的layer 是没办法作为最后一层的吧，需要重新写一个 add_cost 的layer 吗？

我觉得这是Paddle的问题。

其实神经网络应该可以优化任何一个标量，只要最后一层是标量就可以优化。

我想，Paddle的实现里不应该有Cost Layer这个概念。只要最后一层是标量，就可以正确的backward

reyoung · 2017-04-07T03:02:49Z

从训练模型的角度，接多个cost 的时候，作为用户会希望输出每个cost的值

如果真的要输出，可以接上print evaluator。如果某一个Layer一定要输出，可以默认就接上PrintEvaluator。

pengwangucla · 2017-05-04T21:12:36Z

如果真的要输出，可以接上print evaluator。如果某一个Layer一定要输出，可以默认就接上PrintEvaluator

这个现在还是需要自己接上是么？

* Update QUICK_STARTED_cn.md (PaddlePaddle#1757) fix typo * cherry-pick PaddlePaddle#1757 and PaddlePaddle#1473, test=document_fix

xinghai-sun closed this as completed Aug 2, 2017

hong19860320 mentioned this issue Aug 27, 2019

关于paddle v2中的paddle.trainer.SGD详细解释 #19437

Closed

heavengate pushed a commit to heavengate/Paddle that referenced this issue Aug 16, 2021

Remove CN comments in roadsign config. (PaddlePaddle#1791)

58f7e85

* Update QUICK_STARTED_cn.md (PaddlePaddle#1757) fix typo * cherry-pick PaddlePaddle#1757 and PaddlePaddle#1473, test=document_fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multitask training cost calculation is wrong (does not affect training) #1757

multitask training cost calculation is wrong (does not affect training) #1757

helinwang commented Apr 7, 2017 •

edited

Loading

qingqing01 commented Apr 7, 2017

reyoung commented Apr 7, 2017 •

edited

Loading

qingqing01 commented Apr 7, 2017

helinwang commented Apr 7, 2017 •

edited

Loading

lcy-seso commented Apr 7, 2017 •

edited

Loading

lcy-seso commented Apr 7, 2017 •

edited

Loading

reyoung commented Apr 7, 2017

reyoung commented Apr 7, 2017

reyoung commented Apr 7, 2017

pengwangucla commented May 4, 2017 •

edited

Loading

multitask training cost calculation is wrong (does not affect training) #1757

multitask training cost calculation is wrong (does not affect training) #1757

Comments

helinwang commented Apr 7, 2017 • edited Loading

qingqing01 commented Apr 7, 2017

reyoung commented Apr 7, 2017 • edited Loading

qingqing01 commented Apr 7, 2017

helinwang commented Apr 7, 2017 • edited Loading

lcy-seso commented Apr 7, 2017 • edited Loading

lcy-seso commented Apr 7, 2017 • edited Loading

reyoung commented Apr 7, 2017

reyoung commented Apr 7, 2017

reyoung commented Apr 7, 2017

pengwangucla commented May 4, 2017 • edited Loading

helinwang commented Apr 7, 2017 •

edited

Loading

reyoung commented Apr 7, 2017 •

edited

Loading

helinwang commented Apr 7, 2017 •

edited

Loading

lcy-seso commented Apr 7, 2017 •

edited

Loading

lcy-seso commented Apr 7, 2017 •

edited

Loading

pengwangucla commented May 4, 2017 •

edited

Loading