Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Introduce sentence weighting #1438

Open
wants to merge 6 commits into
base: master
from

Conversation

@francoishernandez
Copy link
Contributor

francoishernandez commented May 17, 2019

Basic idea: give an additional file when preprocessing, containing sentence weights.

These weights will be stored in the torchtext Examples and will be used to weight the loss in _compute_loss.

I introduce the -sentence_weights opt, to which we are supposed to pass some text file(s) containing the weights for each sentence / example. If several corpora are passed according to #1413 upgrades, such weight files should be passed as well. If we want/have weights for only some of the corpora in the list, we can pass None/none instead of the filename and it will be cast to python None by argparse, and weights of 1 will be assigned.

I did some tests on basic translation / speech / image runs, and it seems to be running without any issue.

@francoishernandez francoishernandez force-pushed the francoishernandez:sentence_weighting branch from c816158 to f6846f6 May 20, 2019
if maybe_weights is not None:
weight_shards = split_corpus(maybe_weights, opt.shard_size)
else:
weight_shards = cycle(iter([cycle(iter([1]))]))

This comment has been minimized.

Copy link
@francoishernandez

francoishernandez Oct 31, 2019

Author Contributor

could be replaced by: repeat(cycle(iter([1])))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

1 participant
You can’t perform that action at this time.