Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New indexing #2496

Merged
merged 5 commits into from
Oct 26, 2023
Merged

New indexing #2496

merged 5 commits into from
Oct 26, 2023

Conversation

vince62s
Copy link
Member

@vince62s vince62s commented Oct 25, 2023

This PR has the following purposes:

  • get rid of the old batch["indices"] key which was not actually used.
  • introduce the following keys
    batch["cid"]: identify the corpus id of the example
    batch["cid_line_number"]: identify the corpus id line number
    these two keys will be helpful to retrieve the lines where a training stopped and resume training at those lines
    one other key:
    batch["ind_in_bucket"] de facto replaces the "indices" but represents the index in the current bucket

This PR hence will sort buckets ALSO at inference (was doing it at training) which will speed up the inference and then after translation we reorder the translations from these "ind_in_bucket".

@vince62s vince62s merged commit be13d12 into OpenNMT:master Oct 26, 2023
2 checks passed
@vince62s vince62s deleted the indexing branch November 17, 2023 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant