Proposed approach for testing CLI arg parsing #1566

veekaybee · 2024-03-12T17:35:39Z

See discussion here: #1518

Here's an approach to start testing CLI argument parsing:

Separate out setting up the argument parser in parse_eval_args into a separate method, setup_parser that gets called in parse_eval_args
Create unit tests that call the parser for each of the command line arguments
Adding specific TypeError exceptions at each argument entrypoint in the cli_evaluate method

Let me know what you think about this approach. If it seems reasonable, I'll add the tests for the rest of the methods and exceptions where it's reasonable.

@LSinev @haileyschoelkopf

LSinev · 2024-03-12T18:17:00Z

Combination of HFArgumentParser from transformers with args setup through dataclass like https://github.com/huggingface/transformers/blob/main/examples/research_projects/wav2vec2/run_asr.py#L343 and the __post_init__ value check like in video (link with timecode) https://youtu.be/zN4VCb0LbQI?t=592
But this still may not solve points that follow.

As for the current code, testing the parser the way presented seems like testing the argument parser, not the code of this repo module. We put 5 to something that should be a number and it works. In this case it might be useful to check that it always fails if the input is like --numshots five. What are the cases, which will fail at new written tests, which will not fail inside ArgumentParser?

The `try... except' example here seems to be overreacting to an already solved case — no prevention of new failures. Some future failures may be prevented (though this hypothesis should be tested by turning on failed code and rechecking) after mypy checks are turned back on (even for tests).

veekaybee · 2024-03-14T14:29:35Z

Thanks for the feedback @LSinev ! You're right that these cases don't necessarily cover what we'd like. After thinking about this and checking the videos and the links, I decided to take a different approach and unit test whether each CLI argument, with the exception of booleans, has a type.

That way, if you input one without a type unit tests won't pass and if it's a boolean you'll have to delcare a default anyway. Let me know what you think about this approach.

LSinev · 2024-03-14T15:00:11Z

This seems to be a much better approach.
By the way, some boolean cli arguments may be also set like

    parser.add_argument(
        "--some_boolean_arg",
        type=bool,
        default=True,
        help="do something good",
        action=argparse.BooleanOptionalAction,  # type: ignore[attr-defined]
    )

which also adds --no-some_boolean_arg. Mentioning this way in case you want check those too.

veekaybee · 2024-03-14T15:56:51Z

Thanks!

parser.add_argument(
"--some_boolean_arg",
type=bool,
default=True,
help="do something good",
action=argparse.BooleanOptionalAction, # type: ignore[attr-defined]
)

I checked these and decided not to add a test for them since we use the store_true pattern generally in all our arguments and it makes sense to standardize on this, what do you think?

LSinev · 2024-03-14T17:37:25Z

Standardization is good for future improvements and development. Even more, after reading the documentation I see that BooleanOptionalAction is only available since python 3.9, so it is of no use as this repo should support 3.8 as well. But I am not sure if this store_true pattern with default=True is OK:

   parser.add_argument(
       "--trust_remote_code",
       default=True,
       action="store_true",
       help="Sets trust_remote_code to True to execute code to create HF Datasets from the Hub",
   )

with or without this argument, the code is trusted by default. I don't know if this pattern adds an option of --no-trust_remote_code (and also if it depends on the Python version).

veekaybee · 2024-03-14T19:29:31Z

The behavior of store_true seems somewhat confusing in general. We override to true in the case of the default and respect the user's settings, but if we don't set the default to True, then it defaults to False, at least in 3.9: https://gist.github.com/veekaybee/2c8769789a90f219dc83a9e681773000

Ironically this is the default behavior of the module 😅 . I figured from that perspective, it was better to explicitly set it (explicit is better than implicit, etc, zen of python) even though we handle it later downstream. I can also check 3.8 if that. helps

LSinev · 2024-03-14T20:26:54Z

I think, your gist example/test may be more insightful with parsing of same set of args (and also setup when no args is provided) by all three defined parsers.

I am a bit confused here. As far as I understand now, after this PR (with default=True for some boolean store_true arguments) merged, calling lm_eval with some arguments from commandline, considering --trust_remote_code will have effect on datasets invocation like:

Command	Trust to remote code state
(some arguments but no `--trust_remote_code` at all)	`True`
`--trust_remote_code`	`True`
`--trust_remote_code false`	`False`
`--trust_remote_code true`	`True`
`--trust_remote_code 0`	`False`
`--trust_remote_code 1`	`True`

Also I suppose (in this case) user or any system calling from commandline consider equivalent all typical ways of setting True (1, true, T, True, TRUE, on, On, ON, Y, y, yes, Yes, YES) and False (0, false, F, False, FALSE, off, Off, OFF, N, n, no, No, NO). Is this implied somehow, or may be tested also?

I checked one with ipython 3.8

In [1]: import argparse

In [2]: import os

In [3]: parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)

In [4]: parser.add_argument(
   ...:        "--trust_remote_code",
   ...:               default=True,
   ...:                      action="store_true",
   ...:                             help="Sets trust_remote_code to True to execute code to create HF Datasets from the
   ...:  Hub",
   ...:                                )
Out[4]: _StoreTrueAction(option_strings=['--trust_remote_code'], dest='trust_remote_code', nargs=0, const=True, default=True, type=None, choices=None, help='Sets trust_remote_code to True to execute code to create HF Datasets from the Hub', metavar=None)

and then

In [20]: parser.parse_args([])
Out[20]: Namespace(trust_remote_code=True)

In [21]: parser.parse_args(['--trust_remote_code'])
Out[21]: Namespace(trust_remote_code=True)

Seems, there is no way to turn off trust in remote code. I tried some ways to set false and didn't find any.

Without default=True I thought is like

Command	Trust to remote code state
(some arguments but no `--trust_remote_code` at all)	`False`
`--trust_remote_code`	`True`

No confusion for user, just having key/argument set — turns something on, and no trust to remote code if not specified.

Found big discussion with many ways to implement (still no pre-commit check which I was actually looking for): https://stackoverflow.com/questions/15008758/parsing-boolean-values-with-argparse

veekaybee · 2024-03-14T20:43:08Z

It seems like you mention, we might want to test this flag specifically - I'll see what I can add from a testing perspective to cover store_true flags as they are currently implemented (with the intention of keeping code/behavior changes as minimal as possible)

Based on the thread you posted, this looks like the easiest and most accepted answer https://stackoverflow.com/a/59579733.

In looking at how HF implements this, they take a similar approach:https://github.com/huggingface/transformers/blob/11bbb505c77a1d29370cf16a964cfe73b7a76340/src/transformers/hf_argparser.py#L34C5-L34C19

so we could go this way too if we wanted.

LSinev · 2024-03-14T20:56:26Z

the easiest and most accepted answer

If sorted by highest score, there are more interesting answers.

Leaving argument like it was before, for me seems the best for now

    parser.add_argument(
        "--trust_remote_code",
        action="store_true",
        help="Sets trust_remote_code to True to execute code to create HF Datasets from the Hub",
    )

no argument — no trust, and that's all.

veekaybee · 2024-03-14T21:02:42Z

👍 Works for me, I just changed the two args that take it, but am keeping the ones added for args in cases where action is not store_true, such as model.

haileyschoelkopf · 2024-03-15T12:22:37Z

Hi @veekaybee ! This approach looks good to me.

And agree we should leave the store_true args as is, as was decided here! The desired behavior is for passing --trust_remote_code to set it to True and if not provided to be False otherwise.

veekaybee · 2024-03-15T13:14:52Z

Thanks so much both for your discussion and comments! This PR is now ready for review.

haileyschoelkopf

Everything LGTM!

* New tests for CLI args * fix spacing * change tests for parsing * add tests, fix parser * remove defaults for store_true

veekaybee added 2 commits March 12, 2024 13:25

New tests for CLI args

17ed4d3

fix spacing

4251e24

change tests for parsing

5e96a8b

add tests, fix parser

fcf6988

veekaybee marked this pull request as ready for review March 14, 2024 16:06

veekaybee requested review from haileyschoelkopf and lintangsutawika as code owners March 14, 2024 16:06

remove defaults for store_true

9fe4570

haileyschoelkopf approved these changes Mar 15, 2024

View reviewed changes

haileyschoelkopf approved these changes Mar 17, 2024

View reviewed changes

haileyschoelkopf merged commit 92f30af into EleutherAI:main Mar 17, 2024
8 checks passed

veekaybee mentioned this pull request Mar 18, 2024

Write tests for calling CLI arguments downstream to ensure correctly-returned types #1518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed approach for testing CLI arg parsing #1566

Proposed approach for testing CLI arg parsing #1566

veekaybee commented Mar 12, 2024

LSinev commented Mar 12, 2024

veekaybee commented Mar 14, 2024

LSinev commented Mar 14, 2024

veekaybee commented Mar 14, 2024

LSinev commented Mar 14, 2024

veekaybee commented Mar 14, 2024

LSinev commented Mar 14, 2024 •

edited

veekaybee commented Mar 14, 2024 •

edited

LSinev commented Mar 14, 2024

veekaybee commented Mar 14, 2024

haileyschoelkopf commented Mar 15, 2024

veekaybee commented Mar 15, 2024

haileyschoelkopf left a comment

Proposed approach for testing CLI arg parsing #1566

Proposed approach for testing CLI arg parsing #1566

Conversation

veekaybee commented Mar 12, 2024

LSinev commented Mar 12, 2024

veekaybee commented Mar 14, 2024

LSinev commented Mar 14, 2024

veekaybee commented Mar 14, 2024

LSinev commented Mar 14, 2024

veekaybee commented Mar 14, 2024

LSinev commented Mar 14, 2024 • edited

veekaybee commented Mar 14, 2024 • edited

LSinev commented Mar 14, 2024

veekaybee commented Mar 14, 2024

haileyschoelkopf commented Mar 15, 2024

veekaybee commented Mar 15, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

LSinev commented Mar 14, 2024 •

edited

veekaybee commented Mar 14, 2024 •

edited