Support other decoding methods #42

baudm · 2022-09-27T16:48:14Z

In the paper, only two contrasting decoding methods were shown: AR (one character at a time) and non-AR (all characters at once). But in theory, PARSeq supports semi-autoregressive decoding and anything in between.

Some ideas:

Ensembled AR outputs - use random AR masks (similar to training), say 3 different ones, then greedily choose the character for each position.
Masked cloze refinement - based on the current prediction (can be from AR or NAR), the characters with low confidence are masked out, i.e. final mask = cloze mask & confidence mask. This way, low-confidence characters would not be considered as context for the other characters. In other words, only the low-confidence predictions will be refined further.
SAR decoding - start with... decoding two characters at a time?

FaltingsA · 2022-11-04T13:47:52Z

Hello~@baudm, Thanks for you great work. I have some questuions about the decoing methods.

Is the second idea "Masked cloze refinement " which you mentioned smilar to the cloze mask in ABINet?
If I want to try the cloze mask / confidence mask, do I need to re-train the model with corresponding decoding methods.
I notice that PARSeq is with internal LM, while ABINet is with external LM, if the permuted language model can be appleyed in the ABINet's language model part?
Look forward to your favourable reply！

baudm · 2022-11-05T10:46:16Z

Hi @FaltingsA

The second idea is supposed to be an improvement over my current implementation of bidirectional refinement. For both PARSeq and ABINet's combined model, the refinement is modeled as $$\prod_{t}^T P(y_t | \mathbf{y}_{\neq t}, \mathbf{x})$$ where $\mathbf{y}$ is the text label of the image $\mathbf{x}$. I think that the refinement process could be further improved by restricting it to consider high-confidence characters only, preventing initial wrong predictions from messing with the refinement.
The cloze mask is already being used when you run iterative refinement. Please see Sec 3.3 of the paper.
It can't be because in PARSeq, like in similar prior work with internal LMs, there is no clear separation between the "vision" and the "language" parts of the model.

baudm added the enhancement New feature or request label Sep 27, 2022

baudm mentioned this issue Oct 6, 2022

Fixed Decoding Schemes during inference and Random permutations during training #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support other decoding methods #42

Support other decoding methods #42

baudm commented Sep 27, 2022

FaltingsA commented Nov 4, 2022

baudm commented Nov 5, 2022 •

edited

Support other decoding methods #42

Support other decoding methods #42

Comments

baudm commented Sep 27, 2022

FaltingsA commented Nov 4, 2022

baudm commented Nov 5, 2022 • edited

baudm commented Nov 5, 2022 •

edited