Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from STDIN #30

Open
mjpost opened this issue Oct 18, 2021 · 6 comments
Open

Read from STDIN #30

mjpost opened this issue Oct 18, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@mjpost
Copy link
Contributor

mjpost commented Oct 18, 2021

🚀 Feature

It would be really nice if COMET could read input from STDIN, e.g.,

# three fields triggers comet-ref
$ paste source.txt hyps.txt ref.txt | comet [args]

# two fields -> comet-src
$ paste source.txt hyps.txt | comet [args]

Motivation

This is consistent with standard UNIX usage. It is also slightly less cumbersome, and allows comet to be used in settings without writing files to disk.

@mjpost mjpost added the enhancement New feature or request label Oct 18, 2021
@ricardorei
Copy link
Collaborator

I'll investigate this. It seems like a good feature.

Do you have support for this in sacrebleu?

If so I can start by looking into sacrebleu implementation

@mjpost
Copy link
Contributor Author

mjpost commented Oct 19, 2021

Sacrebleu supports STDIN for the system output, but not for the ref (and doesn’t use the source). So the COMET-style use would be

$ cat system.txt | comet -s source.txt -r ref.txt

which is different from what I proposed.

But it gives me another idea, which would be to add support for sacrebleu-style builtin test sets, e.g.,

# one option
$ cat system.txt | comet -t wmt20 -l de-en [other args]

# another option
$ cat system.txt | comet --sacrebleu-testset wmt20/de-en
$ cat system.txt | comet --sacrebleu-testset mtedx/valid/pt-es

You could accomplish this by just using sacrebleu as a library. It’s pretty easy:

from sacrebleu.utils import get_source, get_references, get_files

# trigger sacrebleu test set
# make these optional: nargs=“?” for argparse
if args.source is None and args.references is None:
    if args.sacrebleu_dataset is None:
        # throw error

    # some test sets are hierarchical, e.g., “mtedx/valid”
    test_set, langpair = args.sacrebleu_dataset.rsplit(“/“, maxsplit=1)
    source = get_source(test_set, langpair)
    ref = get_referencees(test_set, langpair)

     # alternative
    source, ref, _ = get_files(test_set, langpair)

@mjpost
Copy link
Contributor Author

mjpost commented Oct 19, 2021

(Note that, via @ozancaglayan, sacrebleu also supports tab-delimited system output on STDIN, and will then do significance testing among them, e.g.,

$ paste system1.txt system2.txt | sacrebleu -t wmt20 -l de-en

Just something else to consider in terms of CLI).

@ricardorei
Copy link
Collaborator

Thanks, Matt! These are nice features indeed! Do you want to submit a PR 😁 ?

If not I can still try to allocate some time to do them before the new release

@mjpost
Copy link
Contributor Author

mjpost commented Oct 22, 2021 via email

@ricardorei
Copy link
Collaborator

We were planning the release for the end of November beginning of December

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants