Incorrect topological parsing with memory-layer referencing. #2061

xinghai-sun · 2017-05-08T13:53:40Z

It seems that PaddlePaddle V2 APIs only consider the explicit layer connection (by "input" argument) when parsing the network topology, neglect of the fact that the memory-layer referencing (by "name" argument of paddle.layer.memory) should also be considered as an implicitly connection. As a result, such a layer with its output only referenced by a memory layer and not explicitly connected to any final cost/output layer, will not be created at all during backward traversing the topological graph.

Here is a simple example:

import paddle.v2 as paddle

def main():
    hidden_size = 128
    dict_size = 30000
    paddle.init(use_gpu=False, trainer_count=1)

    words = paddle.layer.data(
        name="words",
        type=paddle.data_type.integer_value_sequence(dict_size))
    next_words = paddle.layer.data(
        name='next_words',
        type=paddle.data_type.integer_value_sequence(dict_size))

    def recurrent_step(embedding):
        last_memory = paddle.layer.memory(name="memory", size=hidden_size)
        memory_update = paddle.layer.fc(
            name="memory", input=[last_memory, embedding], size=hidden_size)
        predict = paddle.layer.fc(
            input=[embedding, last_memory],
            size=dict_size,
            act=paddle.activation.Softmax())
        return predict

    predict_seq = paddle.layer.recurrent_group(
        step=recurrent_step,
        input=[paddle.layer.embedding(input=words, size=hidden_size)])
    cost = paddle.layer.classification_cost(
        input=predict_seq, label=next_words)

    parameters = paddle.parameters.create(cost)
    optimizer = paddle.optimizer.Adam(learning_rate=5e-5)
    trainer = paddle.trainer.SGD(
        cost=cost, parameters=parameters, update_equation=optimizer)

if __name__ == '__main__':
    main()

With error:

Traceback (most recent call last):
  File "bug.py", line 39, in <module>
    main()
  File "bug.py", line 32, in main
    parameters = paddle.parameters.create(cost)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/parameters.py", line 19, in create
    topology = Topology(layers)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/topology.py", line 69, in __init__
    layers, extra_layers=extra_layers)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 96, in parse_network
    return __parse__(__real_func__)
  File "/usr/local/lib/python2.7/site-packages/paddle/trainer_config_helpers/config_parser_utils.py", line 32, in parse_network_config
    config = config_parser.parse_config(network_conf, config_arg_str)
  File "/usr/local/lib/python2.7/site-packages/paddle/trainer/config_parser.py", line 3597, in parse_config
    trainer_config()
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 89, in __real_func__
    real_output = [each.to_proto(context=context) for each in output_layers]
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/config_base.py", line 109, in to_proto
    context=context)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/config_base.py", line 116, in to_proto
    ret_val = self.to_proto_impl(**kwargs)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 398, in to_proto_impl
    RecurrentLayerGroupEnd(name=self.__recurrent_name__)
  File "/usr/local/lib/python2.7/site-packages/paddle/trainer/config_parser.py", line 419, in RecurrentLayerGroupEnd
    layer = g_layer_map[pair.layer_name]
KeyError: u'memory@__recurrent_group_0__'

I think it is due to that the memory_update layer is not created at all, and then PaddlePaddle cannot find any layer matching the name "memory" in the created last_memory layer. The reason might be that the memory_update layer is not explicitly connected to the cost layer, misleading PaddlePaddle to ignore it when creating layers.

However, it is actually connected (in a indirect or implicit manner) to the cost layer in the next time step through paddle.layer.memory component, and of-course, should never be ignored.

I guess, any recurrent model with a cost layer depending on the previous-step memory rather than current-step memory (updated just now) will meet the same problem (because the current-step update memory layer will then have no connection to the cost layer within current time step).

To prove it, I change only a single line of the code, making the cost layer depend on the current-step memory instead of the previous-step memory in original code, and then the model works just well.

I change last_memory to memory_update as below (such that memory_update is explicitly connected to the final cost), and the code works just well.

From

        predict = paddle.layer.fc(
            input=[embedding, last_memory],
            size=dict_size,
            act=paddle.activation.Softmax())

to

        predict = paddle.layer.fc(
            input=[embedding, memory_update],
            size=dict_size,
            act=paddle.activation.Softmax())

Neural Turing Machine model with "read first and write next" (not reverse) will also have such a problem. However, demos like vanilla LSTM/ GRU will not run into the problem since their cost or softmax output distribution depends LUCKILY on updated memory (hidden state, or cell state), instead of previous memory.

Besides, such a problem didn't exist in V1 APIs.

Would it be a bug? Could anyone help solve this issue?

The text was updated successfully, but these errors were encountered:

xinghai-sun · 2017-05-08T14:00:39Z

I also tried adding one line of code (as suggested by qingqing01 and jacquesqiao):

memory_last.append_child(memory_update, parent_names=[memory_last.name])

But still the same error.

wwhu · 2017-05-09T06:46:04Z

I encounter the same problem for scheduled sampling.
During the training process, scheduled sampling needs to remember the predicted word of the last time step. I tried to use the memory layer to represent the predicted word in the recurrent group.

    def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word, true_token_flag):
        generated_word_memory = paddle.layer.memory(
            name='generated_word', size=target_dict_dim, boot_with_const_id=0)

        # embedding and update the gru state (omit the code here)
        ......

        # calculate the softmax output
        with paddle.layer.mixed(
                size=target_dict_dim,
                bias_attr=True,
                act=paddle.activation.Softmax()) as out:
            out += paddle.layer.full_matrix_projection(input=gru_step)

        paddle.layer.max_id(input=out, name='generated_word')

        return out

paddle.layer.max_id is only used to update the memory and not appears in the topological graph.
The above code will encounter KeyError: u'generated_word@decoder_group'.

In order to fix this problem, I tried to use the softmax output as the memory layer and then use max_id to extract the generated word. Since the softmax output is used for calculating the cost function, it will appear in the final topological graph. Such method works.

xinghai-sun assigned jacquesqiao, lcy-seso and qingqing01 May 8, 2017

lcy-seso mentioned this issue May 9, 2017

Summary of Bugs of V2 APIs PaddlePaddle/models#33

Closed

11 tasks

xinghai-sun changed the title ~~Incorrect topological parsing with memory-layer referencing (with implicit connection)?~~ Incorrect topological parsing with memory-layer referencing. May 9, 2017

lcy-seso added the Bug label May 9, 2017

luotao1 added this to 已有BUG in V2 API Enhancement May 9, 2017

luotao1 moved this from 全局BUG to Recurrent Group相关BUG in V2 API Enhancement May 9, 2017

lcy-seso added this to Top priorities in Defects board May 10, 2017

lcy-seso moved this from Not in schedule to Next Week in Defects board May 10, 2017

lcy-seso moved this from Next Week to Current Week ToDo in Defects board May 10, 2017

lcy-seso moved this from Current Week ToDo to Not in schedule in Defects board May 10, 2017

reyoung moved this from Not in schedule to Next Week in Defects board May 10, 2017

reyoung mentioned this issue May 11, 2017

Feature/design of v2 layer converter #2104

Closed

lcy-seso moved this from Next Week to Current Week ToDo in Defects board May 22, 2017

lcy-seso removed this from Current Week ToDo in Defects board May 22, 2017

lcy-seso added this to Current Week ToDo in Defects board May 23, 2017

emailweixu mentioned this issue May 26, 2017

Fix V2 API #2288

Merged

luotao1 closed this as completed Jun 2, 2017

luotao1 moved this from Recurrent Group相关BUG to 已完成 in V2 API Enhancement Jun 2, 2017

lcy-seso moved this from Current Week ToDo to Doing in Defects board Jun 7, 2017

lcy-seso moved this from Doing to Done in Defects board Jun 7, 2017

xinghai-sun mentioned this issue Jul 13, 2017

Add model configuration for machine translation with external memory. PaddlePaddle/models#36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect topological parsing with memory-layer referencing. #2061

Incorrect topological parsing with memory-layer referencing. #2061

xinghai-sun commented May 8, 2017

xinghai-sun commented May 8, 2017

wwhu commented May 9, 2017 •

edited

Loading

Incorrect topological parsing with memory-layer referencing. #2061

Incorrect topological parsing with memory-layer referencing. #2061

Comments

xinghai-sun commented May 8, 2017

xinghai-sun commented May 8, 2017

wwhu commented May 9, 2017 • edited Loading

wwhu commented May 9, 2017 •

edited

Loading