Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lookahead row convolution layer. #2228

Closed
xinghai-sun opened this issue May 22, 2017 · 3 comments
Closed

Add lookahead row convolution layer. #2228

xinghai-sun opened this issue May 22, 2017 · 3 comments
Assignees

Comments

@xinghai-sun
Copy link
Contributor

  • Add lookahead row convolution layer, for both cpu and gpu version.
  • Details please find in DS2 paper.
  • Please add a design-doc here first.
@qingqing01
Copy link
Contributor

qingqing01 commented May 25, 2017

Bidirectional RNN models are challenging to deploy in an online, low-latency setting, because they are built to operate on an entire sample. This row convolution layer is used to build a unidirectional model containing forward-only RNN layers without any loss in accuracy. This layer can use a future context of T steps. It is beneficial for deployment system. The details can be referred to the papers.

Difference with sequence_conv in PaddlePaddle.

In the PaddlePaddle, we can use paddle.layers.context_projection and paddle.layers.fc to do sequence convolution, which is 1D convolution. The following figure shows this connection. Assumed that the context length( or the filter size) of convolution kernel is 3, and the hidden size of each time step is d for both the input and output feature of this operation. Thus, the weight dimension is 3d x d. And the weights are shared among all-time step.
text_conv

Different from the sequence convolution, the convolution operation is row oriented for both weight(W) and hidden state(h). This connection is as shown in the following figure. The weight dimension is 3 x d. And the weights are shared among all-time step.

row_conv_op

How to deal with boundary.

Assumed that the context length( or the filter size) of convolution kernel is k. The paper did not mention that how to do the lookahead convolution for last k-1 time steps. I think we can pad k-1 row zeros at the last time step for the input feature.

@xinghai-sun
Copy link
Contributor Author

Great doc! @qingqing01 Could you please add this part to the DS2 design doc ? Thanks!

@qingqing01
Copy link
Contributor

@xinghai-sun ok. I'll do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants