Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text normalization takes too much time for a string which contains a lot of dates #3451

Merged
merged 8 commits into from
Jan 18, 2022

Conversation

PeganovAnton
Copy link
Contributor

@PeganovAnton PeganovAnton commented Jan 16, 2022

The PR reduces number of permutations processed in this loop.

  1. Count number of permutations that each token gives.
  2. Split sequence of tokens into smaller parts so that number of permutations from one part would not exceed certain number.
  3. Verbalize every part.
  4. Unite verbalizations of parts.

Closes #3450

Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
@PeganovAnton PeganovAnton added the bug Something isn't working label Jan 16, 2022
@PeganovAnton PeganovAnton self-assigned this Jan 16, 2022
@PeganovAnton PeganovAnton marked this pull request as draft January 16, 2022 15:30
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
@PeganovAnton PeganovAnton marked this pull request as ready for review January 16, 2022 18:07
@yzhang123 yzhang123 requested a review from ekmb January 18, 2022 17:16
Copy link
Contributor

@yzhang123 yzhang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yzhang123 yzhang123 merged commit bc3da74 into main Jan 18, 2022
@yzhang123 yzhang123 deleted the fix/normalization_of_dates branch January 18, 2022 19:08
@yzhang123 yzhang123 removed the request for review from ekmb January 18, 2022 19:08
nithinraok pushed a commit that referenced this pull request Feb 2, 2022
…lot of dates (#3451)

* Fix RANK env variable check in global rank check

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* fix: split complex token sequences

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* fix: docstring improvements

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Remove leading space

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Increase max allowed permutations

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text normalization takes too much time for a string which contains a lot of dates
2 participants