Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize keys before checking content availability to ensure correct value for keys with URL or custom backend #6665

Merged
merged 3 commits into from
May 6, 2022

Conversation

yarikoptic
Copy link
Member

CP #6663 to maint

While so far this code has only been used within ORA, sanitization
of annex keys is universally useful. This commit moves it into a
more domain-agnostic part of the code base.
#6630 reported a mismatch between annex own reports of file
availability and the reports of datalad status --annex all.
The root cause for this mismatch were faulty false content
availability marks (has_content: false when it should have
been has_content:true) arising from unsanitized URL keys.

For example, a file with the URL key
URL-s700145--https://arxiv.org/pdf/0904.3664v1.pdf was not
marked as 'has_content: true' even if its content was retrieved,
because due to the presence of an unsanitized URL in the key
_mark_content_availability was not able to construct a valid
path into the object tree.

To fix this, this change reuses the _sanitize_key helper
first introduced in ORA related functionality, which
reimplements annex' own way of URL quoting. Specifically,
it sanitizes the above URL key to
URL-s700145--https&c%%arxiv.org%pdf%0904.3664v1.pdf
which can be successfully used to locate available contents in the
object tree. Fixes #6630
@yarikoptic yarikoptic added the semver-patch Increment the patch version when merged label Apr 29, 2022
@adswa
Copy link
Member

adswa commented May 2, 2022

argh, thx @yarikoptic - completely forgot to base it against maint.

@bpoldrack
Copy link
Member

Ping @yarikoptic: Failures unrelated to the code changed here, but they are "real".

@codecov
Copy link

codecov bot commented May 5, 2022

Codecov Report

Merging #6665 (d05c291) into maint (3fc0f47) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            maint    #6665      +/-   ##
==========================================
+ Coverage   91.14%   91.18%   +0.03%     
==========================================
  Files         353      353              
  Lines       44505    44518      +13     
==========================================
+ Hits        40566    40594      +28     
+ Misses       3939     3924      -15     
Impacted Files Coverage Δ
datalad/distributed/ora_remote.py 80.46% <100.00%> (-0.07%) ⬇️
datalad/support/annex_utils.py 100.00% <100.00%> (ø)
datalad/support/annexrepo.py 91.67% <100.00%> (+0.24%) ⬆️
datalad/support/tests/test_fileinfo.py 100.00% <100.00%> (ø)
datalad/support/tests/test_annexrepo.py 97.87% <0.00%> (+0.06%) ⬆️
datalad/tests/utils.py 89.44% <0.00%> (+0.63%) ⬆️
datalad/support/sshconnector.py 87.19% <0.00%> (+0.69%) ⬆️
datalad/core/local/tests/test_status.py 97.79% <0.00%> (+2.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fc0f47...d05c291. Read the comment docs.

@bpoldrack
Copy link
Member

Restarted failing builds. Should pass now.

@bpoldrack bpoldrack merged commit 8017e42 into maint May 6, 2022
@bpoldrack bpoldrack deleted the enh-cp branch May 6, 2022 14:37
@github-actions
Copy link

🚀 PR was released in 0.16.3 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants