Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust more tests to work on Windows #5202

Merged
merged 5 commits into from Dec 5, 2020
Merged

Conversation

adswa
Copy link
Member

@adswa adswa commented Dec 3, 2020

Another set of Windows adjustments that should fix the failures that occurred in #5199.
I'm not convinced I did it in the most clever way, in particular, I used Path() to convert hardcoded Posix paths to something that fits to the given operating system. If there are better alternatives, please let me know :)

@adswa adswa changed the base branch from master to maint December 3, 2020 17:07
@adswa
Copy link
Member Author

adswa commented Dec 3, 2020

(hope that rebase to maint worked out)

eq_(subpaths,
["adult", "kid", "adult/no", "adult/yes", "kid/no"])
[str(Path(p)) for p in ["adult", "kid", "adult/no", "adult/yes", "kid/no"]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way would be:

["adult", "kid", os.path.join("adult", "no"), os.path.join("adult", "yes"), os.path.join("kid", "no")]

or, if you want the line to be as compressed as possible:

[os.path.join(*parts) for parts in [["adult"], ["kid"], ["adult", "no"], ["adult", "yes"], ["kid", "no"]]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

@@ -19,6 +19,7 @@
from unittest.mock import patch

from io import StringIO
from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to dig for past discussions, but I think the consensus is that Path should be imported from datalad.utils.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks a lot for the hint. I don't need the import in this PR anymore when I change to op.join, but I'll try to keep it in mind!

@@ -277,7 +278,7 @@ def test_extract():
assert_dict_equal(d["meta_args"], expect)

eq_([d["subpath"] for d in info],
["kid/no", "adult/yes", "adult/no", "kid/no"])
[str(Path(p)) for p in ["kid/no", "adult/yes", "adult/no", "kid/no"]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<blush> thanks for fixing my sloppiness

# On Windows, the most recent commit will be the 'adjusted branch'
# commit of git-annex, therefore we need to take a look into the
# default branch
ok_startswith(ds.repo.format_commit('%b', DEFAULT_BRANCH), f"url_file='{json_file}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the pain of two branches. This part is only relevant for master, after 60547c2 (ENH: addurl - store original urlfile, not resolved full path, 2020-10-24).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I see. thanks. I'll remove this change then. :)

else:
# On Windows, there should be one additional "adjusted branch"
# commit for each of the new commits
eq_(n_annex_commits + 2 + 2, get_annex_commit_counts())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, something seems off here. If it's about adjusted branches, then this condition should be for that (e.g. is_managed_branch) rather than windows. But when I run this under ./tools/eval_under_testloopfs (which results in using adjusted branches), the original non-windows code path passes. I also wasn't aware that using adjusted branches modified the git-annex branch. Not sure yet what's going on...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, thanks. I think I must have jumped too quickly to conclusions.
I don't know why the commit count differs, but I can at least paste here how it differs. Maybe someone gets an insight based on this.

Here is the difference in the git log that I am seeing between the test repo on a Windows 10 machine and my Debian machine:

  • On dl-test-branch:
#DEBIAN
adina@muninn in /tmp/datalad_temp_test_addurlsd2vmpg6m on git:dl-test-branch
❱ git log --oneline                                                       
4546b36 (HEAD -> dl-test-branch) [DATALAD] add files from URLs
ad68e47 [DATALAD] new dataset

# WINDOWS
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git log --oneline
6c8aba0 (HEAD -> dl-test-branch, refs/basis/adjusted/dl-test-branch(unlocked)) [DATALAD] add files from URLs
c24c000 [DATALAD] new dataset
4453fb6 commit before entering adjusted unlocked branch
  • the git-annex branch has two extra commits:
#DEBIAN 
adina@muninn in /tmp/datalad_temp_test_addurlsd2vmpg6m on git:git-annex
❱ git log --oneline
93f68be (HEAD -> git-annex) update
7ea4803 update
1832b07 update
0629230 branch created

# WINDOWS
(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git log --oneline
b498e63 (HEAD -> git-annex) update
2eff062 update
311f930 update
d8bf699 update
74d1d99 update
c87051c branch created

The differences in the git-annex branches are that the most recent commit on Debian

commit 93f68be159eb43a9222482276a3a49e48d1e9bd6 (HEAD -> git-annex)
Author: DataLad Tester <test@example.com>
Date:   Fri Dec 4 08:32:23 2020 +0100

    update

diff --git a/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met b/24f/59>
new file mode 100644
index 0000000..1da66c4
--- /dev/null
+++ b/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met
@@ -0,0 +1 @@
+1607067143.327715633s name +b subdir +bar
diff --git a/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met b/6ea/28>
new file mode 100644
index 0000000..9b600f2
--- /dev/null
+++ b/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met
@@ -0,0 +1 @@
+1607067143.418799146s name +c subdir +foo
diff --git a/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met b/e21/1c>
new file mode 100644
index 0000000..4687f74
--- /dev/null
+++ b/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met
@@ -0,0 +1 @@
+1607067143.185355757s name +a subdir +foo

are three separate commits on Windows:

(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git show 311f930
commit 311f93066d969a87ccc2b14f44866f33a29a49c2
Author: DataLad Tester <test@example.com>
Date:   Fri Dec 4 08:18:18 2020 +0100

    update

diff --git a/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met b/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met
new file mode 100644
index 0000000..589a79d
--- /dev/null
+++ b/e21/1c3/MD5E-s9--3fb7c40c70b0ed19da713bd69ee12014.log.met
@@ -0,0 +1 @@
+1607066297.6131111s name +a subdir +foo

(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git show 2eff062
commit 2eff062e1390bdac86ce98b71be06fba9b09285e
Author: DataLad Tester <test@example.com>
Date:   Fri Dec 4 08:18:20 2020 +0100

    update

diff --git a/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met b/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met
new file mode 100644
index 0000000..77a946d
--- /dev/null
+++ b/24f/59a/MD5E-s9--71dd3e865d708f8f8e971309096cd676.log.met
@@ -0,0 +1 @@
+1607066299.4000661s name +b subdir +bar

(base) C:\Users\datalad\AppData\Local\Temp\datalad_temp_52vxkqaa>git show b498e63
commit b498e635e81511ba728a2908b2d4bb7547b08097 (HEAD -> git-annex)
Author: DataLad Tester <test@example.com>
Date:   Fri Dec 4 08:18:21 2020 +0100

    update

diff --git a/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met b/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met
new file mode 100644
index 0000000..e3a419a
--- /dev/null
+++ b/6ea/286/MD5E-s9--9b72648021b70b8c522642e4490d7ac3.log.met
@@ -0,0 +1 @@
+1607066301.1290616s name +c subdir +foo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a debugger, on Debian and Windows have the same number of n_annex_commits, but get_annex_commit_counts() differ as expected from the git log:

# DEBIAN
(Pdb) p get_annex_commit_counts()
4
(Pdb) p n_annex_commits
2

# WINDOWS
(Pdb) p get_annex_commit_counts()
6
(Pdb) p n_annex_commits
2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging. So, it looks like you've determined that the metadata commits are the issue. Before trying to figure out where things are going wrong on our end, let's make sure that git-annex is respecting alwayscommit=false on windows. Can you try (something like) this script on your windows machine and report if the final log command shows the expected output of a single commit?

script
set -eu

cd "$(mktemp -d "${TMPDIR:-/tmp}"/ga-XXXXXXX)"

alwayscommit=false

git init
git annex init

echo foo >foo
git annex add foo
git -c annex.alwayscommit=$alwayscommit annex metadata --set a=b foo
git -c annex.alwayscommit=$alwayscommit annex metadata --set c=d foo
git commit -mc

set -x
git log --oneline --stat git-annex -- '*.met'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting two commits:

+ git log --oneline --stat git-annex -- '*.met'
c44294d (git-annex) update
 ...b9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 1 +
 1 file changed, 1 insertion(+)
2395118 update
 ...b9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 1 +
 1 file changed, 1 insertion(+)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added your proposed change and adjusted the comment. Thanks for the help, @kyleam!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the report! I have a really stupid, basic question: How could I do a bug report, or comment on an existing bug report?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could I do a bug report, or comment on an existing bug report?

If you go to https://git-annex.branchable.com/bugs/, you'll see a "Report a new bug titled:" box. After filling in the title and hitting edit, you'll see a login prompt where you can create an account (as well as some other options). For commenting, when you try to comment, you should also have a chance to log in or create a new account.

Once you file it, it will be committed to the main git-annex repo (example). You can select on option to get responses emailed to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@@ -184,7 +185,7 @@ def test_no_annex(path):
# one is annex'ed, the other is not, despite no change in add call
# importantly, also .gitattribute is not annexed
eq_([opj('code', 'inannex')],
ds.repo.get_annexed_files())
[str(Path(p)) for p in ds.repo.get_annexed_files()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git-annex-find always reports posix paths?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think you removed this in response to my question. I didn't doubt that you were seeing a failure on your end, and indeed appveyor shows it too now that this change is dropped. Looking at get_content_annexinfo, which I think must handle windows paths okay overall since the core tests are passing on windows (and because Michael did it :>), I think the answer to my question is "yes".

else:
cmd = 'find'
# stringify any pathobjs
if paths:
files = [str(p) for p in paths]
else:
opts.extend(['--include', '*'])
for j in self._run_annex_command_json(cmd, opts=opts, files=files):
path = self.pathobj.joinpath(ut.PurePosixPath(j['file']))

So, now I'm worried about get_annexed_files and other spots that don't go through get_content_annexinfo. Focusing on get_annexed_files, things like

present_files = [
opj(ds_path, p)
for p in repo.get_annexed_files(with_content_only=True)]

and

assert_in(op.join('ds', 'file1.txt'),
installed_ds.repo.get_annexed_files())

There are some other spots in the tests.

Anyway, dealing with that definitely doesn't need to be wrapped up in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I can go through this in a separate PR (tomorrow) :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I can go through this in a separate PR (tomorrow) :)

Okay, sounds good. (I can also audit these sites, if you'd like.)

However, for this PR, I think you should restore the original change to datalad/plugin/tests/test_plugins.py. It's fine to take care of it while not worrying about the whole code base, and with the appveyor.yml move in 7d01545 (TST Appveyor: move datalad.plugin tests to 'assorted other tests'), it's needed so that the build isn't red.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, right. I've restored the change, and the tests pass now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git-annex-find always reports posix paths?

Just for completeness, to provide evidence for this - I'm not sure if that is git-annex fault or simply Windows weirdness, but it does. I've tried on two machines that

this script returns a posix path (subdir/foo) under windows

git init
git annex init

mkdir subdir
echo foo >subdir/foo
git annex add subdir/foo
git commit -m "addded a file"
git annex find

@kyleam
Copy link
Contributor

kyleam commented Dec 3, 2020

Based on looking through a recent appveyor run, I think this PR covers all of the tests needed to move datalad.plugin from the KNOWN2FAIL build into the "assorted other tests" one.

@codecov
Copy link

codecov bot commented Dec 3, 2020

Codecov Report

Merging #5202 (4af1fa5) into maint (f95b434) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #5202      +/-   ##
==========================================
- Coverage   89.90%   89.90%   -0.01%     
==========================================
  Files         294      294              
  Lines       40953    40954       +1     
==========================================
  Hits        36820    36820              
- Misses       4133     4134       +1     
Impacted Files Coverage Δ
datalad/plugin/tests/test_addurls.py 100.00% <100.00%> (ø)
datalad/plugin/tests/test_plugins.py 91.08% <100.00%> (+0.08%) ⬆️
datalad/utils.py 82.31% <0.00%> (-0.06%) ⬇️
datalad/downloaders/tests/test_http.py 88.59% <0.00%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f95b434...4af1fa5. Read the comment docs.

On adinas Windows computer, a meta data change is represented as three
separate update commits in the git-annex branch instead of a single
commit. TODO: Figure out why.
@adswa
Copy link
Member Author

adswa commented Dec 4, 2020

Thanks for the review! I have adjusted the path handling to @jwodder's suggestion. The DEFAULT BRANCH change is dropped, I will open a separate PR against master. I have commented some details on the differences in git-annex commits (the metadata commit with three metadata changes is one commit on my debian machine, but three separate commits on my windows machine and presumably also appveyor), but I don't know why they happen.

@adswa
Copy link
Member Author

adswa commented Dec 4, 2020

Based on looking through a recent appveyor run, I think this PR covers all of the tests needed to move datalad.plugin from the KNOWN2FAIL build into the "assorted other tests" one.

done (I'm cancelling all non-appveyor tests)

A check based on commit counts on the git-annex branch fails due to
what seems to be separate meta data commits on Windows.
This change disables the test until a fix for this behavior is found.
See datalad#5202 (comment)
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks Adina!

@kyleam kyleam merged commit b7c9864 into datalad:maint Dec 5, 2020
kyleam added a commit to kyleam/datalad that referenced this pull request Dec 7, 2020
On maint, all of the plugin tests are now passing on appveyor
(dataladgh-5202), so 7d01545 moved the plugin tests out of the known
failures block.  Unfortunately tests for master's `addurls --key`
feature bring things back to a failing state (dataladgh-5213).

Until this is resolved, move the plugin tests back to the known
failures block in master's appveyor.yml so that every appveyor run
doesn't need to be checked manually.

[ci skip]
@adswa adswa deleted the win-addurltests branch December 8, 2020 07:49
@yarikoptic yarikoptic added this to the 0.13.6 milestone Dec 12, 2020
kyleam added a commit to kyleam/datalad that referenced this pull request Dec 14, 2020
get_annexed_files() uses `git annex find`, which always returns POSIX
paths.  get_annexed_files() returning the paths as is may or may not
have been intended, but at this point unconditionally changing it
would likely break some callers.

datalad#5202 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants