Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the use of GIT_CONFIG_GLOBAL to Windows #6260

Merged
merged 6 commits into from
Dec 1, 2021

Conversation

adswa
Copy link
Member

@adswa adswa commented Nov 29, 2021

this change sits on top of #6251 and simply expands it to all systems with a high enough git installation (i.e., I removed the if not on_osx constraint). Things pass on locally for me both on Windows (6 more tests start to pass with this change, including the one where #6160 originally surfaces) and Debian, but let's see what breaks in the CIs.
I think that with the exception of a few tests (e.g. test_expanduser) this would avoid leaking cache dirs and the like into temporary repositories during testing. 🤞

bpoldrack and others added 5 commits November 28, 2021 17:25
System level configured git-credential helper osxkeychain on OSX hangs
if it can't find the Keyring under HOME, which we change to some bogus
dir for the tests. On OSX change the approach to make use of
GIT_CONFIG_GLOBAL instead of changing HOME.

Note, however, that this env var isn't supported by git until versions
2.32.
This change sits on top of dataladgh-6251 and expands its change to
all platforms with Git version higher than 2.32. This should
ensure that we do not override the users USERPROFILE and HOME
variable by default and for all anymore (datalad#6160). Instead, only
a small set of tests use this approach, and clean up after
themselves.
@adswa adswa added the semver-tests Changes only affect tests, no impact on version label Nov 29, 2021
@codecov
Copy link

codecov bot commented Nov 29, 2021

Codecov Report

Merging #6260 (60595cb) into master (a9f423a) will decrease coverage by 1.83%.
The diff coverage is 67.50%.

❗ Current head 60595cb differs from pull request most recent head cf3f005. Consider uploading reports for the commit cf3f005 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6260      +/-   ##
==========================================
- Coverage   89.62%   87.79%   -1.84%     
==========================================
  Files         323      323              
  Lines       41979    41996      +17     
==========================================
- Hits        37624    36870     -754     
- Misses       4355     5126     +771     
Impacted Files Coverage Δ
datalad/core/distributed/tests/test_push.py 98.72% <ø> (-0.22%) ⬇️
datalad/distributed/create_sibling_gin.py 83.87% <ø> (ø)
datalad/distributed/create_sibling_gitea.py 60.52% <ø> (ø)
datalad/distributed/create_sibling_github.py 52.94% <ø> (ø)
datalad/distributed/create_sibling_gitlab.py 68.32% <ø> (ø)
datalad/distributed/create_sibling_gogs.py 100.00% <ø> (ø)
datalad/distributed/create_sibling_ria.py 81.86% <ø> (ø)
datalad/distributed/tests/test_ora_http.py 100.00% <ø> (ø)
datalad/distribution/tests/test_get.py 95.25% <ø> (-4.75%) ⬇️
datalad/local/remove.py 97.01% <ø> (ø)
... and 116 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9f423a...cf3f005. Read the comment docs.

@adswa adswa changed the title WIP: Expand the use of GIT_CONFIG_GLOBAL to Windows Expand the use of GIT_CONFIG_GLOBAL to Windows Nov 30, 2021
@bpoldrack
Copy link
Member

Given that things are basically completely broken and potentially destructive if someone runs the tests on Windows while not having git >=2.32, I think we should find a way to not run the tests in this case.
Just sticking with the legacy approach in that case is fine for this PR, but seems unsatisfying. Can anyone think of a way to be a little more protective, @datalad/developers ?

@adswa
Copy link
Member Author

adswa commented Nov 30, 2021

Given that things are basically completely broken and potentially destructive if someone runs the tests on Windows while not having git >=2.32, I think we should find a way to not run the tests in this case.

Honestly, its not more broken than it has been for the past several years - and there wasn't much momentum to fix the broken state of things in Windows when I initially raised them in #6160. The "potentially destructive" is also relative. Its only DataLad-destructive. It boils down to leaking a datalad.org or cookies.db into temporary directories and failing our test assumptions whenever we don't account for that. While it does change USERPROFILE, this variable is updated with every new process, and processes outside of the test battery don't see the temporary shim - its just our tests. While it does change HOME and HOMEDRIVE, those changes also affect only the environment within the tests, not outside. I've checked by running the tests on master and inspecting those env variables. They're fine.

Just sticking with the legacy approach in that case is fine for this PR, but seems unsatisfying. Can anyone think of a way to be a little more protective, @datalad/developers ?

We could maybe by hand identify all tests that would fail or would be otherwise affected by this, and add a test decorator that skips them if the Git version isn't sufficient. But to me personally that sounds like a complete overkill. I think I'd probably add a logger warning "Attention, your Git version is xxx and we can't set GIT_CONFIG_GLOBAL, you may see testfailures related to #6160". In a few months, when Git2.35 or 2.36 make 2.32 not seem as recent anymore this will slowly resolve itself anyway. Git has an "check daily for updates" feature under Windows. Also, you easily end up with a very recent Python 3.9, 3.10 or 3.11 on Windows if you've gotten it at any point in the last year. Due to its incompatibility with nose, running the tests under Windows is an ever greater pain than it already is, or even impossible. I would honestly be surprised if anyone other than @mih and me engage in serious Windows testing.

@bpoldrack
Copy link
Member

Honestly, its not more broken than it has been for the past several years

Yes, I'm aware. But we want to get to a better place, no?

The "potentially destructive" is also relative. Its only DataLad-destructive. It boils down to leaking a datalad.org or cookies.db into temporary directories and failing our test assumptions whenever we don't account for that. While it does change USERPROFILE, this variable is updated with every new process, and processes outside of the test battery don't see the temporary shim - its just our tests.

I don't think this assessment is entirely correct. Any subprocess started from within the test setup gets relative paths from platformdirs, which implies they are gonna be interpreted in relation CWD of that process. There's no guarantee whatsoever, that this will only ever effect temp dirs or that any such side-effect will lead to a test failure. It can also easily effect whatever dir the tests are being run from. Tests can only fail on things we happen to check. It's easily possible to run a subprocess that - either by bug or design - does not have its CWD within any of the temp dirs we create. Even more so, when extensions come into the picture where it's not even up to us to know or decide what is being called.

It may well be that ATM nobody else is running tests on windows, but the aim should be to change that. And as support improves it's gonna be more likely. I'd also not want to send the message to potential win-contributors: "Hey, we don't care we set it up in a way that just running the test may screw your CWD".

@bpoldrack
Copy link
Member

As for: How?

I agree that skipping particular tests or something is way to effortful, error prone and likely incomplete.
I thought of something like raising from with setup_package if we are on windows with too old a git. But @mih seemed to think of that as bad. Not sure whether there's a misunderstanding or I'm missing some implication of that approach.

@mih
Copy link
Member

mih commented Dec 1, 2021

Please correct me if I am wrong, but I think this is a no-brainer. This PR improves the situation for any Git >= 2.32. It maintains the present behavior for any other scenario. This means no downsides, some advantages.

What is being discussed here? A more better fix sometime later? If someone sees such a path, does this particular change block such developments? I cannot see how.

@adswa
Copy link
Member Author

adswa commented Dec 1, 2021

Yes, I'm aware. But we want to get to a better place, no?

This is what this PR does. It fixes the situation for every machine that fulfills the requirements to do so, and does not change anything on machines where we don't have this ability to fix things.

I don't think this assessment is entirely correct. Any subprocess started from within the test setup gets relative paths from platformdirs, which implies they are gonna be interpreted in relation CWD of that process. There's no guarantee whatsoever, that this will only ever effect temp dirs or that any such side-effect will lead to a test failure. It can also easily effect whatever dir the tests are being run from. Tests can only fail on things we happen to check. It's easily possible to run a subprocess that - either by bug or design - does not have its CWD within any of the temp dirs we create. Even more so, when extensions come into the picture where it's not even up to us to know or decide what is being called.

I believe that this is prevented by the way we run our tests - but even if it is not due to bugs in a test or an extension test that decides to populate cache/data/config dirs, the absolute worst thing that will happen is that CWD will gain a maximum of two probably empty directories that wouldn't even be found by any datalad process that would rely on the data, configurations or caches they may contain.

@bpoldrack
Copy link
Member

@mih No, there's nothing blocking here. As I said:

Just sticking with the legacy approach in that case is fine for this PR,

Making that switch just seems the right moment to think about it, from my POV.

the absolute worst thing that will happen is that CWD will gain a maximum of two probably empty directories

Not true.

❱ ll ~/.config/datalad
total 20
drwxr-xr-x  3 ben ben  4096 Nov  7  2019 .
drwx------ 54 ben ben  4096 Nov 30 18:12 ..
-rw-r--r--  1 ben ben 12288 Nov  7  2019 cookies.db
drwxr-xr-x  2 ben ben  4096 Nov 29 16:34 providers
❱ ll ~/.cache/datalad 
total 24
drwxr-xr-x  6 ben ben 4096 Apr  9  2021 .
drwxr-xr-x 59 ben ben 4096 Nov 25 12:08 ..
drwxr-xr-x  2 ben ben 4096 Jul 17  2019 crcns
drwxr-xr-x  2 ben ben 4096 Sep  1  2020 locks
drwx------  2 ben ben 4096 Nov 17 19:32 sockets
drwxr-xr-x  4 ben ben 4096 Apr  9  2021 tests

All of that (and potentially more) could be merged into a datalad/ tree under CWD. Who says I don't already have a dir called datalad under CWD, including something that may be called tests or crcns for that matter? In fact, I do have such dirs elsewhere (not in window, obv., in my case).

Anyways - nobody cares, feel free to proceed.

@adswa
Copy link
Member Author

adswa commented Dec 1, 2021

All of that (and potentially more) could be merged into a datalad/ tree under CWD. Who says I don't already have a dir called datalad under CWD, including something that may be called tests or crcns for that matter? In fact, I do have such dirs elsewhere (not in window, obv., in my case).
Anyways - nobody cares, feel free to proceed.

Just for the record: If one has a directory datalad or datalad.org with content that happens to have the exact same file or directory names that certain tests could overwrite when run, and then invokes those specific windows tests in that directory, and then those tests are also broken in that they do not confine themselves to the temporary test dataset but actually run and leak into CWD, and then the Git version used to run these test is below 2.32 - then yes.
And the one place where one could run this on Windows to have potential consequences for further DataLad operations is C:\Users\<user>\AppData\Local\. Which is a directory that by default is hidden to a user.

I'm not objecting a better approach - in fact, I'm actually just copying yours - but I do want to put the cases you are describing into context.

@mih
Copy link
Member

mih commented Dec 1, 2021

Anyways - nobody cares, feel free to proceed.

An ongoing conversation among three people is hardly a sign that nobody cares, quite the opposite. The question here is whether any of your thoughts in any way block this PR, and that does not seem to be the case. Nothing here discourages thinking. Anyone stopping to think after this PR is merged does so, because they choose to, not because it is enforced.

@mih mih merged commit f94f43e into datalad:master Dec 1, 2021
@adswa adswa deleted the tst-nohomeforwindows branch December 1, 2021 11:36
@bpoldrack
Copy link
Member

bpoldrack commented Dec 1, 2021

Just for the record: If one has a directory datalad or datalad.org with content that happens to have the exact same file or directory names that certain tests could overwrite when run, and then invokes those specific windows tests in that directory, and then those tests are also broken in that they do not confine themselves to the temporary test dataset but actually run and leak into CWD, and then the Git version used to run these test is below 2.32 - then yes.
And the one place where one could run this on Windows to have potential consequences for further DataLad operations is C:\Users<user>\AppData\Local. Which is a directory that by default is hidden to a user.

No, you're only thinking about what datalad tests could explicitly do. But that's not the entire picture. Any subprocess triggered by the tests (by any means) would be affected, if they happen to ask windows for those general paths (not even only using pathdirs).
Simple example: Any (number of customized) system-level credential helpers would be triggered by any call to git-clone (that requires auth) without us being in control of that whatsoever. System-level credential helpers are default on OSX (that's why I run into this being an issue on macs as well, although in a different way) and are common on windows for connecting git to windows' credential manager. Furthermore, I'd argue custom setups in that regard are certainly more common with people who'd actually run tests (devs). We have no way of knowing what (relative) paths such third party code would produce or mess with, when they are relying on being able to query win for those common paths, while we screwed it up.

I'm not opposing THIS approach, either. Not at all. I just think it's incomplete to address the actual issue fully. It's fine to do that separately or within this PR. I'm fine either way.

bpoldrack added a commit to bpoldrack/datalad that referenced this pull request Dec 1, 2021
As discussed in dataladgh-6160 and dataladgh-6260, running tests on OSX and most
importantly windows with a git version older than 2.32, can lead to
unintended side-effects on disc due to a faulty test setup.
Do not execute tests in this case, unless forced by the
`datalad.tests.force-fake-home` config variable.
bpoldrack added a commit to bpoldrack/datalad that referenced this pull request Dec 2, 2021
As discussed in dataladgh-6160 and dataladgh-6260, running tests on OSX and most
importantly windows with a git version older than 2.32, can lead to
unintended side-effects on disc due to a faulty test setup.
Do not execute tests in this case, unless forced by the
`datalad.tests.force-fake-home` config variable.
bpoldrack added a commit to bpoldrack/datalad that referenced this pull request Dec 2, 2021
As discussed in dataladgh-6160 and dataladgh-6260, running tests on OSX and most
importantly windows with a git version older than 2.32, can lead to
unintended side-effects on disc due to a faulty test setup.
Do not execute tests in this case, unless forced by the
`datalad.tests.force-fake-home` config variable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-tests Changes only affect tests, no impact on version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows test setup overwrites USERPROFILE variable, resulting in leaked caches etc in test datasets
3 participants