New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust more tests to work on Windows #5202
Conversation
568616e
to
8aa048e
Compare
8aa048e
to
6c4fe20
Compare
(hope that rebase to maint worked out) |
datalad/plugin/tests/test_addurls.py
Outdated
eq_(subpaths, | ||
["adult", "kid", "adult/no", "adult/yes", "kid/no"]) | ||
[str(Path(p)) for p in ["adult", "kid", "adult/no", "adult/yes", "kid/no"]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way would be:
["adult", "kid", os.path.join("adult", "no"), os.path.join("adult", "yes"), os.path.join("kid", "no")]
or, if you want the line to be as compressed as possible:
[os.path.join(*parts) for parts in [["adult"], ["kid"], ["adult", "no"], ["adult", "yes"], ["kid", "no"]]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks!
datalad/plugin/tests/test_addurls.py
Outdated
@@ -19,6 +19,7 @@ | |||
from unittest.mock import patch | |||
|
|||
from io import StringIO | |||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to dig for past discussions, but I think the consensus is that Path
should be imported from datalad.utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks a lot for the hint. I don't need the import in this PR anymore when I change to op.join
, but I'll try to keep it in mind!
datalad/plugin/tests/test_addurls.py
Outdated
@@ -277,7 +278,7 @@ def test_extract(): | |||
assert_dict_equal(d["meta_args"], expect) | |||
|
|||
eq_([d["subpath"] for d in info], | |||
["kid/no", "adult/yes", "adult/no", "kid/no"]) | |||
[str(Path(p)) for p in ["kid/no", "adult/yes", "adult/no", "kid/no"]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<blush> thanks for fixing my sloppiness
datalad/plugin/tests/test_addurls.py
Outdated
# On Windows, the most recent commit will be the 'adjusted branch' | ||
# commit of git-annex, therefore we need to take a look into the | ||
# default branch | ||
ok_startswith(ds.repo.format_commit('%b', DEFAULT_BRANCH), f"url_file='{json_file}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the pain of two branches. This part is only relevant for master, after 60547c2 (ENH: addurl - store original urlfile, not resolved full path, 2020-10-24).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, I see. thanks. I'll remove this change then. :)
datalad/plugin/tests/test_addurls.py
Outdated
else: | ||
# On Windows, there should be one additional "adjusted branch" | ||
# commit for each of the new commits | ||
eq_(n_annex_commits + 2 + 2, get_annex_commit_counts()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, something seems off here. If it's about adjusted branches, then this condition should be for that (e.g. is_managed_branch
) rather than windows. But when I run this under ./tools/eval_under_testloopfs
(which results in using adjusted branches), the original non-windows code path passes. I also wasn't aware that using adjusted branches modified the git-annex branch. Not sure yet what's going on...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, thanks. I think I must have jumped too quickly to conclusions.
I don't know why the commit count differs, but I can at least paste here how it differs. Maybe someone gets an insight based on this.
Here is the difference in the git log that I am seeing between the test repo on a Windows 10 machine and my Debian machine:
- On
dl-test-branch
:
#DEBIAN
adina@muninn in /tmp/datalad_temp_test_addurlsd2vmpg6m on git:dl-test-branch
❱ git log --oneline
4546b36 (HEAD -> dl-test-branch) [DATALAD] add files from URLs
ad68e47 [DATALAD] new dataset
# WINDOWS
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git log --oneline
6c8aba0 (HEAD -> dl-test-branch, refs/basis/adjusted/dl-test-branch(unlocked)) [DATALAD] add files from URLs
c24c000 [DATALAD] new dataset
4453fb6 commit before entering adjusted unlocked branch
- the git-annex branch has two extra commits:
#DEBIAN
adina@muninn in /tmp/datalad_temp_test_addurlsd2vmpg6m on git:git-annex
❱ git log --oneline
93f68be (HEAD -> git-annex) update
7ea4803 update
1832b07 update
0629230 branch created
# WINDOWS
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git log --oneline
b498e63 (HEAD -> git-annex) update
2eff062 update
311f930 update
d8bf699 update
74d1d99 update
c87051c branch created
The differences in the git-annex branches are that the most recent commit on Debian
commit 93f68be159eb43a9222482276a3a49e48d1e9bd6 (HEAD -> git-annex)
Author: DataLad Tester <test@example.com>
Date: Fri Dec 4 08:32:23 2020 +0100
update
diff --git a/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met b/24f/59>
new file mode 100644
index 0000000..1da66c4
--- /dev/null
+++ b/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met
@@ -0,0 +1 @@
+1607067143.327715633s name +b subdir +bar
diff --git a/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met b/6ea/28>
new file mode 100644
index 0000000..9b600f2
--- /dev/null
+++ b/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met
@@ -0,0 +1 @@
+1607067143.418799146s name +c subdir +foo
diff --git a/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met b/e21/1c>
new file mode 100644
index 0000000..4687f74
--- /dev/null
+++ b/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met
@@ -0,0 +1 @@
+1607067143.185355757s name +a subdir +foo
are three separate commits on Windows:
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git show 311f930
commit 311f93066d969a87ccc2b14f44866f33a29a49c2
Author: DataLad Tester <test@example.com>
Date: Fri Dec 4 08:18:18 2020 +0100
update
diff --git a/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met b/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met
new file mode 100644
index 0000000..589a79d
--- /dev/null
+++ b/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met
@@ -0,0 +1 @@
+1607066297.6131111s name +a subdir +foo
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git show 2eff062
commit 2eff062e1390bdac86ce98b71be06fba9b09285e
Author: DataLad Tester <test@example.com>
Date: Fri Dec 4 08:18:20 2020 +0100
update
diff --git a/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met b/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met
new file mode 100644
index 0000000..77a946d
--- /dev/null
+++ b/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met
@@ -0,0 +1 @@
+1607066299.4000661s name +b subdir +bar
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git show b498e63
commit b498e635e81511ba728a2908b2d4bb7547b08097 (HEAD -> git-annex)
Author: DataLad Tester <test@example.com>
Date: Fri Dec 4 08:18:21 2020 +0100
update
diff --git a/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met b/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met
new file mode 100644
index 0000000..e3a419a
--- /dev/null
+++ b/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met
@@ -0,0 +1 @@
+1607066301.1290616s name +c subdir +foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a debugger, on Debian and Windows have the same number of n_annex_commits
, but get_annex_commit_counts()
differ as expected from the git log:
# DEBIAN
(Pdb) p get_annex_commit_counts()
4
(Pdb) p n_annex_commits
2
# WINDOWS
(Pdb) p get_annex_commit_counts()
6
(Pdb) p n_annex_commits
2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging. So, it looks like you've determined that the metadata commits are the issue. Before trying to figure out where things are going wrong on our end, let's make sure that git-annex is respecting alwayscommit=false
on windows. Can you try (something like) this script on your windows machine and report if the final log command shows the expected output of a single commit?
script
set -eu
cd "$(mktemp -d "${TMPDIR:-/tmp}"/ga-XXXXXXX)"
alwayscommit=false
git init
git annex init
echo foo >foo
git annex add foo
git -c annex.alwayscommit=$alwayscommit annex metadata --set a=b foo
git -c annex.alwayscommit=$alwayscommit annex metadata --set c=d foo
git commit -mc
set -x
git log --oneline --stat git-annex -- '*.met'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting two commits:
+ git log --oneline --stat git-annex -- '*.met'
c44294d (git-annex) update
...b9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 1 +
1 file changed, 1 insertion(+)
2395118 update
...b9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 1 +
1 file changed, 1 insertion(+)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added your proposed change and adjusted the comment. Thanks for the help, @kyleam!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the report! I have a really stupid, basic question: How could I do a bug report, or comment on an existing bug report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could I do a bug report, or comment on an existing bug report?
If you go to https://git-annex.branchable.com/bugs/, you'll see a "Report a new bug titled:" box. After filling in the title and hitting edit, you'll see a login prompt where you can create an account (as well as some other options). For commenting, when you try to comment, you should also have a chance to log in or create a new account.
Once you file it, it will be committed to the main git-annex repo (example). You can select on option to get responses emailed to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
@@ -184,7 +185,7 @@ def test_no_annex(path): | |||
# one is annex'ed, the other is not, despite no change in add call | |||
# importantly, also .gitattribute is not annexed | |||
eq_([opj('code', 'inannex')], | |||
ds.repo.get_annexed_files()) | |||
[str(Path(p)) for p in ds.repo.get_annexed_files()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git-annex-find
always reports posix paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think you removed this in response to my question. I didn't doubt that you were seeing a failure on your end, and indeed appveyor shows it too now that this change is dropped. Looking at get_content_annexinfo
, which I think must handle windows paths okay overall since the core tests are passing on windows (and because Michael did it :>), I think the answer to my question is "yes".
datalad/datalad/support/annexrepo.py
Lines 3341 to 3350 in 7d01545
else: | |
cmd = 'find' | |
# stringify any pathobjs | |
if paths: | |
files = [str(p) for p in paths] | |
else: | |
opts.extend(['--include', '*']) | |
for j in self._run_annex_command_json(cmd, opts=opts, files=files): | |
path = self.pathobj.joinpath(ut.PurePosixPath(j['file'])) |
So, now I'm worried about get_annexed_files
and other spots that don't go through get_content_annexinfo
. Focusing on get_annexed_files
, things like
datalad/datalad/distribution/update.py
Lines 416 to 418 in 7d01545
present_files = [ | |
opj(ds_path, p) | |
for p in repo.get_annexed_files(with_content_only=True)] |
and
datalad/datalad/distributed/tests/test_create_sibling_ria.py
Lines 128 to 129 in 7d01545
assert_in(op.join('ds', 'file1.txt'), | |
installed_ds.repo.get_annexed_files()) |
There are some other spots in the tests.
Anyway, dealing with that definitely doesn't need to be wrapped up in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I can go through this in a separate PR (tomorrow) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I can go through this in a separate PR (tomorrow) :)
Okay, sounds good. (I can also audit these sites, if you'd like.)
However, for this PR, I think you should restore the original change to datalad/plugin/tests/test_plugins.py
. It's fine to take care of it while not worrying about the whole code base, and with the appveyor.yml move in 7d01545 (TST Appveyor: move datalad.plugin tests to 'assorted other tests'), it's needed so that the build isn't red.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, right. I've restored the change, and the tests pass now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git-annex-find always reports posix paths?
Just for completeness, to provide evidence for this - I'm not sure if that is git-annex fault or simply Windows weirdness, but it does. I've tried on two machines that
this script returns a posix path (subdir/foo
) under windows
git init
git annex init
mkdir subdir
echo foo >subdir/foo
git annex add subdir/foo
git commit -m "addded a file"
git annex find
Based on looking through a recent appveyor run, I think this PR covers all of the tests needed to move |
Codecov Report
@@ Coverage Diff @@
## maint #5202 +/- ##
==========================================
- Coverage 89.90% 89.90% -0.01%
==========================================
Files 294 294
Lines 40953 40954 +1
==========================================
Hits 36820 36820
- Misses 4133 4134 +1
Continue to review full report at Codecov.
|
On adinas Windows computer, a meta data change is represented as three separate update commits in the git-annex branch instead of a single commit. TODO: Figure out why.
6c4fe20
to
d88cd45
Compare
Thanks for the review! I have adjusted the path handling to @jwodder's suggestion. The DEFAULT BRANCH change is dropped, I will open a separate PR against master. I have commented some details on the differences in git-annex commits (the metadata commit with three metadata changes is one commit on my debian machine, but three separate commits on my windows machine and presumably also appveyor), but I don't know why they happen. |
done (I'm cancelling all non-appveyor tests) |
A check based on commit counts on the git-annex branch fails due to what seems to be separate meta data commits on Windows. This change disables the test until a fix for this behavior is found. See datalad#5202 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks Adina!
On maint, all of the plugin tests are now passing on appveyor (dataladgh-5202), so 7d01545 moved the plugin tests out of the known failures block. Unfortunately tests for master's `addurls --key` feature bring things back to a failing state (dataladgh-5213). Until this is resolved, move the plugin tests back to the known failures block in master's appveyor.yml so that every appveyor run doesn't need to be checked manually. [ci skip]
get_annexed_files() uses `git annex find`, which always returns POSIX paths. get_annexed_files() returning the paths as is may or may not have been intended, but at this point unconditionally changing it would likely break some callers. datalad#5202 (comment)
Another set of Windows adjustments that should fix the failures that occurred in #5199.
I'm not convinced I did it in the most clever way, in particular, I used Path() to convert hardcoded Posix paths to something that fits to the given operating system. If there are better alternatives, please let me know :)