Residual connection fix (and highway layer) #11

Spotlight0xff · 2017-05-28T13:18:20Z

Hi again,

This PR is two-fold.

Residual connection

In my opinion, the residual connections from the CBHG have to be applied from after the pre-net and not from the inputs directly.

There are two arguments which speak for this interpretation:

the residual connection is mentioned in section 3.1 on the CBHG module, not on the general model architecture which indicates that is part of the CBHG module and apply the residual connection from its input (which is after the pre-net)
You have used width=256 in the CBHG encoder, but in the paper is 128 mentioned. And I understand that it doesn't work if the residual connection is the input (character embedding, 256-dimensional). But if we use the connection from the pre-net it works as mentioned in the paper and we can use 128 in the whole encoder (last layer of pre-net has 128 units)

Hope this convinces you, if you have any concerns please comment.

Highway-Net

The second part of this PR is directed at the code quality of the Highway-Net construction.
In particular it uses tf.layers.dense instead of the custom dense function and uses the number of units of inputs if not specified otherwise.

By using tf.layers.dense it simplifies the code (we could replace it in the pre-net as well) in my opinion.
Also, I haven't added batch normalization to the highway net construction, because I haven't seen batchnorm used much in FC layers (didn't notice much difference) and the paper only mentions batch-norm for conv-nets.

Oh and I dropped the hp.embed_size, so the code is more modular (and its also wrong IMO, see the first part of this PR)

Cheers,
André

Kyubyong · 2017-05-29T00:23:42Z

For Residual Connection:
Yes, I think you're right. I also noticed that the layer size doesn't match, which I couldn't understand at that time.

For Highway-Net:
According to the original paper(https://arxiv.org/abs/1502.03167), batch normalization works well for dense layers, too. However, I agree that it doesn't make a big difference.

Thanks so much, André!

ethson · 2017-05-30T21:38:48Z

Have you guys tested these changes and seen an improvement on results?

Spotlight0xff · 2017-05-30T22:10:50Z

I've done a few changes including this one (and increased the mini-batch to 32 according to the paper) and got to about loss=0.15. The samples were still incomprehensible though, but I trained just for a few hours (seemed to converge...).

I guess we need to do more work/debugging.

onyedikilo · 2017-05-31T03:14:47Z

I've also made some changes to the variables as I tried to match them to the paper after editing the code as it is in the PR. It's around 0.10 loss at the moment, will run more to see if it gets better. My batch size is 6. Currently at 0.0005 learning rate.

With Andre's changes, I think we also need to update the codes where ever embed_size is used. It was 256 before and the code is based on that, after changing it to 128 some mods are needed to replicate the paper values for the network.

Spotlight0xff added 2 commits May 28, 2017 15:02

use tf.layers abstraction, added docstring

8c2eda3

use residual connection from *after* the prenet, not from before

30875c2

Kyubyong merged commit 9adf6ba into Kyubyong:master May 29, 2017

Spotlight0xff mentioned this pull request May 29, 2017

Dimensionality change #14

Closed

Spotlight0xff deleted the residual_connection_fix branch May 31, 2017 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Residual connection fix (and highway layer) #11

Residual connection fix (and highway layer) #11

Spotlight0xff commented May 28, 2017

Kyubyong commented May 29, 2017

ethson commented May 30, 2017

Spotlight0xff commented May 30, 2017

onyedikilo commented May 31, 2017

Residual connection fix (and highway layer) #11

Residual connection fix (and highway layer) #11

Conversation

Spotlight0xff commented May 28, 2017

Residual connection

Highway-Net

Kyubyong commented May 29, 2017

ethson commented May 30, 2017

Spotlight0xff commented May 30, 2017

onyedikilo commented May 31, 2017