Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Stop using file:// URLs in test repo annex setup #5332

Merged
merged 2 commits into from Apr 16, 2021

Conversation

mih
Copy link
Member

@mih mih commented Jan 15, 2021

git-annex is having issues with those:
#5277

But even if that is fixed, it is still suboptimal to use a URL scheme as
primary testing target that not only needs to be specifically enabled
first, but is also explicitly advised against.

From https://git-annex.branchable.com/git-annex:

List of URL schemes that git-annex is allowed to download content from. The default is "http https ftp".
Think very carefully before changing this; there are security implications. For example, if it's changed to allow "file" URLs, then anyone who can get a commit into your git-annex repository could git-annex addurl a pointer to a private file located outside that repository, possibly causing it to be copied into your repository and transferred on to other remotes, exposing its content.

This change introduces a global HTTP server datalad.test_http_server
that is persistently available at test runtime. It could be used to
replace all usage of file:// URLs with http:// URLs. However, in the
scope of this change, only git-annex related usage is modified, but
the use of file:// URLs for Git (submodules) is kept. Argueably, those
are more relevant in practice.

@codecov
Copy link

codecov bot commented Jan 15, 2021

Codecov Report

Merging #5332 (713fc94) into master (6d3ac7e) will decrease coverage by 1.73%.
The diff coverage is 81.91%.

❗ Current head 713fc94 differs from pull request most recent head 1816299. Consider uploading reports for the commit 1816299 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5332      +/-   ##
==========================================
- Coverage   90.17%   88.44%   -1.74%     
==========================================
  Files         299      305       +6     
  Lines       42528    42511      -17     
==========================================
- Hits        38351    37598     -753     
- Misses       4177     4913     +736     
Impacted Files Coverage Δ
datalad/cmdline/tests/test_main.py 95.17% <ø> (-0.69%) ⬇️
...talad/distributed/tests/test_export_to_figshare.py 100.00% <ø> (ø)
datalad/interface/__init__.py 100.00% <ø> (ø)
datalad/interface/common_cfg.py 100.00% <ø> (ø)
datalad/local/tests/test_check_dates.py 44.68% <ø> (ø)
datalad/local/tests/test_export_archive.py 100.00% <ø> (ø)
datalad/plugin/__init__.py 100.00% <ø> (+10.00%) ⬆️
datalad/plugin/add_readme.py 0.00% <0.00%> (-93.11%) ⬇️
datalad/plugin/addurls.py 0.00% <0.00%> (-95.50%) ⬇️
datalad/plugin/check_dates.py 0.00% <0.00%> (-95.72%) ⬇️
... and 98 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d3ac7e...1816299. Read the comment docs.

@mih
Copy link
Member Author

mih commented Jan 15, 2021

And I have isolated the change that originally derailed #5303 @bpoldrack

@mih
Copy link
Member Author

mih commented Apr 12, 2021

I will rebase this on the current master and get a fresh run. I am not seeing any feedback on this PR yet, although I think I remember some discussion of it -- not remembering any conclusions, though.

git-annex is having issues with those:
datalad#5277

But even if that is fixed, it is still suboptimal to use a URL scheme as
primary testing target that not only needs to be specifically enabled
first, but is also explicitly advised against.

From https://git-annex.branchable.com/git-annex:

> List of URL schemes that git-annex is allowed to download content from. The default is "http https ftp".
> Think very carefully before changing this; there are security implications. For example, if it's changed to allow "file" URLs, then anyone who can get a commit into your git-annex repository could git-annex addurl a pointer to a private file located outside that repository, possibly causing it to be copied into your repository and transferred on to other remotes, exposing its content.

This change introduces a global HTTP server `datalad.test_http_server`
that is persistently available at test runtime. It could be used to
replace all usage of file:// URLs with http:// URLs. However, in the
scope of this change, only git-annex related usage is modified, but
the use of file:// URLs for Git (submodules) is kept. Argueably, those
are more relevant in practice.
prefix='httpserve',
)
test_http_server = HTTPPath(serve_path)
test_http_server.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPPath.start patches the environment, so this interacts badly with the rest of the setup/teardown logic.

For example, running a random test locally (without explicitly setting DATALAD_LOG_LEVEL in the environment) gives this error at teardown:

======================================================================
ERROR: test suite for <module 'datalad.core.local.tests.test_save' from '/home/kyle/src/python/datalad/datalad/core/local/tests/test_save.py'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kyle/src/python/venvs/datalad/lib/python3.7/site-packages/nose/suite.py", line 229, in run
    self.tearDown()
  File "/home/kyle/src/python/venvs/datalad/lib/python3.7/site-packages/nose/suite.py", line 352, in tearDown
    self.teardownContext(ancestor)
  File "/home/kyle/src/python/venvs/datalad/lib/python3.7/site-packages/nose/suite.py", line 368, in teardownContext
    try_run(context, names)
  File "/home/kyle/src/python/venvs/datalad/lib/python3.7/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/home/kyle/src/python/datalad/datalad/__init__.py", line 272, in teardown_package
    os.environ.pop(v)
  File "/usr/lib/python3.7/_collections_abc.py", line 795, in pop
    value = self[key]
  File "/usr/lib/python3.7/os.py", line 678, in __getitem__
    raise KeyError(key) from None
KeyError: 'DATALAD_LOG_LEVEL'

I think the most basic fix would be to move the above code to the end of setup_package. That way, setup_package modifies the environment before it is patched by HTTPPath.start, and HTTPPath.stop restores the original environment before teardown_package reverse setup_packages modifications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the suggestion! Done.

@mih
Copy link
Member Author

mih commented Apr 13, 2021

Test failures are the current brew issue on the mac and the familiar #5300

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@mih
Copy link
Member Author

mih commented Apr 16, 2021

Thx @kyleam ! Let's see where this will take us...

@mih mih merged commit 470d187 into datalad:master Apr 16, 2021
@mih mih deleted the tst-nofileurl branch April 16, 2021 05:12
mih added a commit to mih/datalad that referenced this pull request Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants