Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config datalad.ui.interactive and allow non-interactive special remotes #7344

Open
wants to merge 1 commit into
base: maint
Choose a base branch
from

Conversation

bpoldrack
Copy link
Member

@bpoldrack bpoldrack commented Mar 22, 2023

  • Makes is_interactive configurable via datalad.ui.interactive. Defaults to current behavior of detection via isatty(). The detection result is propagated to datalad subprocesses via env var (ping Passing configuration to subprocesses #7352).
  • Adds a non-dialog annex backend variant and makes the UISwitcher use it, based on is_interactive when asked to set the annex backend.

Closes #7345
Closes #7349

@bpoldrack bpoldrack force-pushed the fix-interactive-sr branch 6 times, most recently from 7b7db45 to e4ccb8c Compare March 23, 2023 14:21
@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Patch coverage: 87.50% and project coverage change: +1.97 🎉

Comparison is base (a8d7c63) 88.74% compared to head (93d5ac4) 90.71%.

❗ Current head 93d5ac4 differs from pull request most recent head 5417429. Consider uploading reports for the commit 5417429 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##            maint    #7344      +/-   ##
==========================================
+ Coverage   88.74%   90.71%   +1.97%     
==========================================
  Files         327      327              
  Lines       44629    44649      +20     
  Branches     5913     5916       +3     
==========================================
+ Hits        39605    40505     +900     
+ Misses       5009     4129     -880     
  Partials       15       15              
Impacted Files Coverage Δ
datalad/__init__.py 98.00% <ø> (+16.00%) ⬆️
datalad/cli/tests/test_utils.py 95.65% <ø> (ø)
datalad/utils.py 87.33% <62.50%> (-0.18%) ⬇️
datalad/ui/tests/test_base.py 98.30% <80.00%> (-1.70%) ⬇️
datalad/ui/dialog.py 92.70% <90.90%> (-0.31%) ⬇️
datalad/cli/helpers.py 80.50% <100.00%> (-0.17%) ⬇️
datalad/cli/utils.py 73.33% <100.00%> (ø)
datalad/customremotes/__init__.py 91.17% <100.00%> (ø)
datalad/interface/common_cfg.py 100.00% <100.00%> (ø)
datalad/log.py 88.75% <100.00%> (+0.04%) ⬆️
... and 4 more

... and 5 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flew over the code, so this may be a bit too rough to be useful

datalad/__init__.py Show resolved Hide resolved
datalad/interface/common_cfg.py Outdated Show resolved Hide resolved
datalad/interface/common_cfg.py Show resolved Hide resolved
@bpoldrack bpoldrack force-pushed the fix-interactive-sr branch 2 times, most recently from 87e25d6 to 6cfd40b Compare March 24, 2023 11:13
@bpoldrack bpoldrack changed the title debug UnderAnnexUI Add config datalad.ui.interactive and allow non-interactive special remotes Mar 24, 2023
@bpoldrack bpoldrack added the semver-patch Increment the patch version when merged label Mar 24, 2023
@bpoldrack bpoldrack marked this pull request as ready for review March 24, 2023 15:52
@yarikoptic
Copy link
Member

with such extensive change touching functionality I also wonder if may be worth/safer for master?

@bpoldrack
Copy link
Member Author

bpoldrack commented Mar 27, 2023

with such extensive change touching functionality I also wonder if may be worth/safer for master?

Main reason for maint to me is, that we (as in Jülich) need this released ASAP.

@bpoldrack
Copy link
Member Author

I think, this is ready, @yarikoptic. There's the git-annex died of signal 11 on macos github action, but this seems unrelated.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • would break current interactions with users while installing/getting data from some datasets which do require authentication (e.g. i tried datalad install -g ///crcns/aa-1). I feel that there should be some "deeper" fix for interactivity detection or annex-no-dialog should be used only when explicitly requested with config setting (e.g. if explicitly asked to be non-interactive)
    • note that we have datalad.tests.ui.backend config to control which UI backend to use during tests but we seems to have no datalad.ui.backend
  • for a bug fix, PR IMHO unnecessarily changes API by flipping from is_interactive function to a module attribute, and now confusingly imported from various levels (datalad and datalad.ui). Was there really a need for such a change if we are aiming for fixing some issue? (made review harder)
  • code duplication should be reduced IMHO

datalad/__init__.py Outdated Show resolved Hide resolved
datalad/interface/common_cfg.py Outdated Show resolved Hide resolved
datalad/interface/common_cfg.py Outdated Show resolved Hide resolved
datalad/interface/common_cfg.py Outdated Show resolved Hide resolved
datalad/tests/test_utils.py Show resolved Hide resolved
backend = 'dialog' if is_interactive() else 'no-progress'
backend = 'dialog' if is_interactive else 'no-progress'
if not is_interactive and backend == 'annex':
backend = 'annex-no-dialog'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately this would break entirely ability of users to authenticate when possible... I have tested by removing crcns password and trying to datalad -l debug install -g ///crcns/aa-1 . I no longer is getting the prompt to enter the credentials for crcns to be able to download the data.

So I think the problem is in how we determine that session is interactive or not, and some (most? few? we don't know but so far we assumed that it is interactive) annex sessions are interactive although stdin is coming from git-annex so not a tty.

Copy link
Member Author

@bpoldrack bpoldrack Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately this would break entirely ability of users to authenticate when possible

It doesn't break the ability enitrely, it changes it - you'd need to set the new config (as env var ATM). Because we are unable to correctly detect. And as far as I can tell, it's simply not possible. (see #7345)
Keeping the behavior of assuming interactivity by default, is not an option IMO, though. Because in non-interactive jobs, that just means stalling with no error reporting whatsoever. Users would have no clue at all.
Whereas in your case, there should at least be a hint that authentication wasn't possible. I'd rather err on the somewhat informative side than on the one that leaves users with nothing to act upon.

We could maintain the behavior to some extend, when we address #7352. Passing down the (accidently correct) detection from the super process, would lead to the same behavior in your case. I'd still argue it's wrong, since detection outside of a special remote process is not more correct. It's just slightly rarer to be an issue. (see #7354)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't break the ability enitrely, it changes it - you'd need to set the new config (as env var ATM).

well, not entirely means that it still breaks it regardless how you name it.

Not breaking at all would be whenever the default behavior stays as is at least for maint. Eg. could be done via config variable staying "auto" by default and then switch to True or False based on the utils.is_interactive(), while keeping using is_interactive() function, and allowing all CIs or whatnot which knows that there is no agent behind keyboard to say so. That would IMHO be better since it would be explicit and useful regardless on how we decide to trick/handle interactivity. Then in master we could be deliberated further or even change behavior.

Whereas in your case, there should at least be a hint that authentication wasn't possible.

why wasn't possible if it was and was working just fine before this change?

Copy link
Member Author

@bpoldrack bpoldrack Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg. could be done via config variable staying "auto" by default and then switch to True or False based on the utils.is_interactive(), while keeping using is_interactive() function

That is what is happening. The special remote process evaluates the detection in the state of this PR. But detection (auto) simply doesn't work. It only happens to do the right thing, when the process in question is the top-level process. But we are querying from within the special remote process.

why wasn't possible if it was and was working just fine before this change?

It didn't "work" before that change. It happened to do the right thing in interactive sessions and the wrong thing in non-interactive ones, b/c it always assumed to be in an interactive one.

datalad/ui/dialog.py Outdated Show resolved Hide resolved
datalad/utils.py Show resolved Hide resolved
@bpoldrack
Copy link
Member Author

bpoldrack commented Mar 27, 2023

@yarikoptic

I feel that there should be some "deeper" fix for interactivity detection

I spend hours on this and couldn't find one. That's why this solution.

or annex-no-dialog should be used only when explicitly requested with config setting

As I pointed out in the other spot: That means we are defaulting to stalling of non-interactive jobs.

Also: Does that mean, you want "regular" datalad processes to default to (broken) detection, but special remote process to default to assuming interactivity? This would mean to completely ignore, that this issue is not something special about special remotes. Any datalad process can find itself in the exact same situation as a special remote process.

The mitigation I can see, is #7352 - passing config manager's state down to subprocesses. So, if the top-level datalad process is set to default (detection) the result of that detection would be set for the special remote process as well (instead of running it's own detection which will always come back False).

datalad/ui/dialog.py Outdated Show resolved Hide resolved
@yarikoptic
Copy link
Member

yarikoptic commented Mar 27, 2023

The mitigation I can see, is #7352 - passing config manager's state down to subprocesses. So, if the top-level datalad process is set to default (detection) the result of that detection would be set for the special remote process as well (instead of running it's own detection which will always come back False).

yes -- if outside process datalad.ui.interactive set to "auto" , and it determines that it is interactive (or not), setting its env var DATALAD_UI_INTERACTIVE so any nested process also realizes that sounds like a viable way to go forward which would satisfy both of our desires?

edit: would need to set DATALAD_UI_INTERACTIVE envvar in any case, so config based options are overriden too in the children processes uniformly. What could go wrong? ;)

As I pointed out in the other spot: That means we are defaulting to stalling of non-interactive jobs.

... until realizing that the first stalled job is needing credentials (which it prompts for) and making sure that they have them in the future run which anyways would seems to be needed to do.

Again -- the other end of spectrum this PR introduces is that there is not prompt to the user who (may be due to incorrect assumptions and defaults) was getting a prompt and now all of a sudden would just get a crash that it cannot get to that data, and no instructions even on how to mitigate, or to mitigate by explicitly saying "be interactive" in the interactive shell.

@bpoldrack
Copy link
Member Author

@yarikoptic

yes -- if outside process datalad.ui.interactive set to "auto" , and it determines that it is interactive (or not), setting its env var DATALAD_UI_INTERACTIVE so any nested process also realizes that sounds like a viable way to go forward which would satisfy both of our desires?

Here is a test commit: 5417429

would need to set DATALAD_UI_INTERACTIVE envvar in any case, so config based options are overriden too in the children processes uniformly.

Not sure, whether I parse that right. You mean, you'd need the env var to overrule anything set in config files? Yes, that, or -c or datalad.cfg.set(..., scope='override') (With above commit would be passed down either way)

@yarikoptic
Copy link
Member

Not sure, whether I parse that right. You mean, you'd need the env var to overrule anything set in config files? Yes, that, or -c or datalad.cfg.set(..., scope='override') (With above commit would be passed down either way)

in effect - yes, it would overrule in the children processes, but not in the parent process where it would actually obtain initial value via any available means (config, env var, overwrites) and then set explicit bool value it into DATALAD_UI_INTERACTIVE so all subprocesses (and this parent process later on may be) use it. So, I do not immediately see any problem with the fact that children processes will not use value from the config value, and do not see a reason to utilize DATALAD_CONFIG_OVERRIDES_JSON for this specific aspect.

Note: this config is asked from the general datalad.cfg, i.e. not per dataset config.

@bpoldrack bpoldrack force-pushed the fix-interactive-sr branch 2 times, most recently from 72edace to fc0abd2 Compare March 30, 2023 11:30
@bpoldrack
Copy link
Member Author

@yarikoptic

Note: this config is asked from the general datalad.cfg, i.e. not per dataset config.

Yes, I added this aspect to the changelog. Dataset.repo evaluation from within the runner seems to much of a performance hit for this feature. I guess, the need to have something passed to a subprocess, while requiring it to be specific to a dataset, is rare - can't think of anything ATM. So, currently I think this is the best of all the imperfect solutions.

@bpoldrack
Copy link
Member Author

Verdict from devcall: Rip out generic passing down of configs again. Should only apply to datalad.ui.interactive for now.

@bpoldrack bpoldrack marked this pull request as draft April 11, 2023 08:54
special remotes

This patch introduces the config `datalad.ui.interactive` in order to
let users decide whether or not to run in interactive mode.
The config defaults to the former detection, except that this detection
is additionally safeguarded - any exception during that detection will
lead to non-interactive mode. In addition, the result of the detection
will be stored in `DATALAD_UI_INTERACTIVE`, thus passing down the result
to possible subprocesses rather than having them running their own
detection. Ultimately, this is about `ui`'s ability to talk to the
terminal via `getpass` and the detection does not work from within
subprocesses.

Closes datalad#7349

Furthermore, this adds a non-interactive UI backend for annex special
remotes, which previously always assumed to be in interactive mode, and
adds the respective capacity for the UI_Switcher.

Closes datalad#7345
@bpoldrack
Copy link
Member Author

FTR: Remaining failures in crippledFS and macos github actions look as if #7367 wasn't merged.

Could it be that restarting github actions is running on an outdated, cached merge commit while Travis and AppVeyor build anew?

@mih
Copy link
Member

mih commented Apr 26, 2023

Before this is merged (in particular the runner adjustment), I recommend looking at datalad/datalad-next#325 for an alternative -- which seems simpler, and also more effective.

mih added a commit to mih/datalad-next that referenced this pull request Jun 2, 2023
This achieve the main goal of making any `datalad -c ...` specification
affect not just the datalad-specific config in the main Python process,
but can now handle *any* Git config, and also impact the behavior of
any subprocesses.

Furthermore this handling is extended to cover also `DATALAD_...`
ENV variables, including `DATALAD_CONFIG_OVERRIDES_JSON`.

Within the session `ConfigManager` instance the behavior is now more
uniform. `ConfigManager.overrides` are now exclusively instance-specific
overrides -- matching their description and implementation. No
configuration override coming from CLI or process ENV is reflected in
`ConfigManager.overrides` anymore.

Closes datalad#325 -- although the scope is a bit broader.

This changeset defers the need to address datalad#397, but does not resolve it.
Ideally there would not be a need for any CLI specific behavior and
implementation -- everything should be done by the `ConfigManager`.
However, given the numerous conceptual and design limitations, it felt
necessary to address the override impact limitation separately.

Ping

- datalad/datalad#4119
- datalad/datalad#3456
- datalad/datalad#7344
mih added a commit to mih/datalad-next that referenced this pull request Jun 2, 2023
This achieve the main goal of making any `datalad -c ...` specification
affect not just the datalad-specific config in the main Python process,
but can now handle *any* Git config, and also impact the behavior of
any subprocesses.

Furthermore this handling is extended to cover also `DATALAD_...`
ENV variables, including `DATALAD_CONFIG_OVERRIDES_JSON`.

Within the session `ConfigManager` instance the behavior is now more
uniform. `ConfigManager.overrides` are now exclusively instance-specific
overrides -- matching their description and implementation. No
configuration override coming from CLI or process ENV is reflected in
`ConfigManager.overrides` anymore.

Closes datalad#325 -- although the scope is a bit broader.

This changeset defers the need to address datalad#397, but does not resolve it.
Ideally there would not be a need for any CLI specific behavior and
implementation -- everything should be done by the `ConfigManager`.
However, given the numerous conceptual and design limitations, it felt
necessary to address the override impact limitation separately.

Ping

- datalad/datalad#4119
- datalad/datalad#3456
- datalad/datalad#7344
mih added a commit to mih/datalad-next that referenced this pull request Jun 2, 2023
This achieve the main goal of making any `datalad -c ...` specification
affect not just the datalad-specific config in the main Python process,
but can now handle *any* Git config, and also impact the behavior of
any subprocesses.

Furthermore this handling is extended to cover also `DATALAD_...`
ENV variables, including `DATALAD_CONFIG_OVERRIDES_JSON`.

Within the session `ConfigManager` instance the behavior is now more
uniform. `ConfigManager.overrides` are now exclusively instance-specific
overrides -- matching their description and implementation. No
configuration override coming from CLI or process ENV is reflected in
`ConfigManager.overrides` anymore.

Closes datalad#325 -- although the scope is a bit broader.

This changeset defers the need to address datalad#397, but does not resolve it.
Ideally there would not be a need for any CLI specific behavior and
implementation -- everything should be done by the `ConfigManager`.
However, given the numerous conceptual and design limitations, it felt
necessary to address the override impact limitation separately.

Ping

- datalad/datalad#4119
- datalad/datalad#3456
- datalad/datalad#7344
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments I found pending

"can be wrong, though, possibly making datalad wait for "
"user input, even though it is impossible to receive such "
"input."}),
'default': is_interactive_failsafe(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I have expressed my opinion that here we better have "auto" instead of immediate definition, but I forgot why that would be better -- TODO

- Introduce new config `datalad.ui.interactive` to allow users to overrule detection of interactive sessions.
Faulty detection could lead to stalling, especially when subprocesses like git-annex special remotes where involved.
Fixes [#7345](https://github.com/datalad/datalad/issues/7345)
Fixes [#7349](https://github.com/datalad/datalad/issues/7349) via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fixes [#7349](https://github.com/datalad/datalad/issues/7349) via
and [#7349](https://github.com/datalad/datalad/issues/7349) via

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

try:
interactive = all(_is_stream_tty(s)
for s in (sys.stdin, sys.stdout, sys.stderr))
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would worry about overall capturing of all exceptions here -- e.g. if detection breaks with new python or something like that -- we would miss it. Needs investigation of git history if more information is provided or just go back to no exception handling for now IMHO.

# Raise log level to DEBUG in this case, though.
CapturedException(e, level=logging.DEBUG)
interactive = False
os.environ['DATALAD_UI_INTERACTIVE'] = str(interactive)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it should overload if it is already defined.

@@ -369,14 +369,36 @@ def _is_stream_tty(stream: Optional[IO]) -> bool:


def is_interactive() -> bool:
"""Return True if all in/outs are open and tty.
from datalad import cfg
return cfg.obtain("datalad.ui.interactive")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if config is within local git repo config?

shouldn't we overload that DATALAD_UI_INTERACTIVE here? and if we are to decide for "auto" to be default -- then here call is_interactive_failsafe explicitly?

@synchon
Copy link

synchon commented Nov 21, 2023

Hi! I'm waiting for this fix to land in the main branch, is there an estimate when it can happen? It's a bit of a hassle to checkout a branch and then make it work. Thanks in advance! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make is_interactive configurable UnderAnnexUI is always interactive
4 participants