Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other decoding methods #42

Open
baudm opened this issue Sep 27, 2022 · 2 comments
Open

Support other decoding methods #42

baudm opened this issue Sep 27, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@baudm
Copy link
Owner

baudm commented Sep 27, 2022

In the paper, only two contrasting decoding methods were shown: AR (one character at a time) and non-AR (all characters at once). But in theory, PARSeq supports semi-autoregressive decoding and anything in between.

Some ideas:

  1. Ensembled AR outputs - use random AR masks (similar to training), say 3 different ones, then greedily choose the character for each position.
  2. Masked cloze refinement - based on the current prediction (can be from AR or NAR), the characters with low confidence are masked out, i.e. final mask = cloze mask & confidence mask. This way, low-confidence characters would not be considered as context for the other characters. In other words, only the low-confidence predictions will be refined further.
  3. SAR decoding - start with... decoding two characters at a time?
@FaltingsA
Copy link

Hello~@baudm, Thanks for you great work. I have some questuions about the decoing methods.

  1. Is the second idea "Masked cloze refinement " which you mentioned smilar to the cloze mask in ABINet?
  2. If I want to try the cloze mask / confidence mask, do I need to re-train the model with corresponding decoding methods.
  3. I notice that PARSeq is with internal LM, while ABINet is with external LM, if the permuted language model can be appleyed in the ABINet's language model part?
    Look forward to your favourable reply!

@baudm
Copy link
Owner Author

baudm commented Nov 5, 2022

Hi @FaltingsA

  1. The second idea is supposed to be an improvement over my current implementation of bidirectional refinement. For both PARSeq and ABINet's combined model, the refinement is modeled as $$\prod_{t}^T P(y_t | \mathbf{y}_{\neq t}, \mathbf{x})$$ where $\mathbf{y}$ is the text label of the image $\mathbf{x}$. I think that the refinement process could be further improved by restricting it to consider high-confidence characters only, preventing initial wrong predictions from messing with the refinement.
  2. The cloze mask is already being used when you run iterative refinement. Please see Sec 3.3 of the paper.
  3. It can't be because in PARSeq, like in similar prior work with internal LMs, there is no clear separation between the "vision" and the "language" parts of the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants