Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training configuration of STR model #11

Closed
tzm-tora opened this issue Dec 15, 2021 · 2 comments
Closed

training configuration of STR model #11

tzm-tora opened this issue Dec 15, 2021 · 2 comments

Comments

@tzm-tora
Copy link

Hi, thank you for open-sourcing your work.
I have a question about the training configuration of STR model used in your paper.
How did you set the sensitive character mode and data_filtering_off option in BEST model?

@moonbings
Copy link
Collaborator

moonbings commented Dec 17, 2021

Hi, thank you for your interest.

We did experiments based on this code. (https://github.com/clovaai/deep-text-recognition-benchmark)
In training phase, we added --imgW 256 --sensitive options.
In test phase, we added --imgW 256 --sensitive --data_filtering_off options.
(94 characters are automatically added to vocab by --sensitive option)

In addition, we modified this code due to some minor issues.

Label noise in benchmark dataset

Because of label noise in the benchmark dataset, we modified this code to always evaluate them with case-insensitive in validation and test phase. We moved 149-150th lines outside if statement (to 146th line).
(See https://github.com/clovaai/deep-text-recognition-benchmark/blob/68a80fe97943a111ff1efaf52a63ad8f0f1c0e5d/test.py#L149)

pred = pred.lower()
gt = gt.lower()

Data loader bug

Although vocab contains \ character, this code excludes data containing \ character.
We fixed 170th, 213th lines as follows (escape characters).
(See https://github.com/clovaai/deep-text-recognition-benchmark/blob/68a80fe97943a111ff1efaf52a63ad8f0f1c0e5d/dataset.py#L170)
(See https://github.com/clovaai/deep-text-recognition-benchmark/blob/68a80fe97943a111ff1efaf52a63ad8f0f1c0e5d/dataset.py#L213)

escaped_chars = re.escape(self.opt.character)
out_of_char = f'[^{escaped_chars}]'

Summary

  • We trained model with case-sensitive mode.
  • We validated model with case-insensitive mode.
  • We tested model with case-insensitive mode and only considered alphanumeric characters.

Thanks.

@tzm-tora
Copy link
Author

Thank you for your detailed and kind response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants