Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specify default values in function definition #31

Closed
gregcaporaso opened this issue Dec 6, 2022 · 4 comments · Fixed by #43
Closed

specify default values in function definition #31

gregcaporaso opened this issue Dec 6, 2022 · 4 comments · Fixed by #43
Assignees

Comments

@gregcaporaso
Copy link
Contributor

At present, it looks like you tend to define default values implicitly, and it would be better to do this explicitly (i.e., when you define the function the action is mapped to).

I suspect that you are doing this so that the defaults are actually set by the underlying code if not overridden by the user, which makes sense intuitively but is not ideal for a few reasons. First, it could result in your help text becoming outdated and misleading (e.g., if you specify the default is 1, and the underlying code is changed so it's 16, the help text your user is referring to will be wrong). Next, and probably most importantly, if not specified explicitly the parameter values won't be stored in data provenance. And finally, if you define explicitly on function definition, the default value will autopopulate in the help text for your action.

As an example of how I recommend doing this, take a look at this snippet of the help text from dada2 denoise-single:

  --p-n-threads INTEGER  The number of threads to use for multithreaded
                         processing. If 0 is provided, all available cores
                         will be used.                            [default: 1]

That default value is specified here.

I recommend ultimately making this change across the whole code base.

@misialq
Copy link
Contributor

misialq commented Dec 7, 2022

Thanks for bringing that up, @gregcaporaso. Yeah, I was not sure how to go about this initially and your guess is correct: I felt like leaving those undefined and letting the tools themselves pick the default values. I was not so much concerned with the text becoming outdated as I thought that on a tool version update one would need to review the text anyways to see whether anything else changed (to keep the q2 help in-line with the underlying tool) - under this assumption I'd need to review something somewhere anyway... However, your provenance argument absolutely speaks to me much more - I'll update that in another PR. Thanks! 🙏

@gregcaporaso
Copy link
Contributor Author

on a tool version update one would need to review the text anyways

That's definitely a reasonable assumption and good practice, but in my experience these things have a way of getting out of date. For example, if someone else takes over maintenance of the plugin in the future, they may not be fully aware of where all the changes need to be made.

@nbokulich nbokulich self-assigned this May 2, 2023
@nbokulich
Copy link
Contributor

nbokulich commented May 2, 2023

how should we handle the presets param in megahit? This overrides a group of params to provide some generic settings. This sounds useful, but at the same time is bad news for provenance if you are passing params (and recording them in provenance) that are actually ignored. From the megahit docs:

 Presets parameters:

  --presets <str>  override a group of parameters;

    possible values:

    meta           '--min-count 2 --k-list 21,41,61,81,99'

          (generic metagenomes, default)

    meta-sensitive '--min-count 2 --k-list 21,31,41,51,61,71,81,91,99'

          (more sensitive but slower)

    meta-large     '--min-count 2 --k-list 27,37,47,57,67,77,87'      

          (large & complex metagenomes, like soil)

    bulk           '--min-count 3 --k-list 31,51,71,91,99 --no-mercy' 

          (experimental, standard bulk sequencing with >= 30x depth)

    single-cell '--min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96'

          (experimental, single cell data)

The options I see:

  1. status quo: we leave it and set the default as None
  2. we remove this option altogether
  3. we keep it but raise an error when it overrides a parameter that is also explicitly set (which would be almost always)

Is there any way to edit provenance? So, e.g., if a preset is passed then the preset values get written into provenance instead of the param setting that was passed?

Maybe I am being overly alarmist because, after all, the preset would also be written to provenance so the effect is reproducible... but it just seems misleading.

EDIT: for now I am going with "option 3" in #43 , i.e., to warn whenever the presets parameter is used. But it would be interesting to know if we can overwrite provenance to write in the presets.

@gregcaporaso
Copy link
Contributor Author

gregcaporaso commented May 10, 2023

@nbokulich, I agree with the approach you're taking now, and that it would be good to better support this type of situation in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants