Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tollerate %xx in filenames #4953

Merged
merged 8 commits into from Sep 24, 2020
Merged

Tollerate %xx in filenames #4953

merged 8 commits into from Sep 24, 2020

Conversation

yarikoptic
Copy link
Member

To address #4950 , but also introducing %b4 into an obscure filename used by tests. Sits on top of #4949 . The actual fix in the last commit in the series was trivial and to the one of the most ancient pieces of code in the archive (current commit is f080379 ). Let's now just see if tests pass to signal that there were no some hidden assumption somewhere ;-)

previous explicit listing tries to do combinatorics manually. Well -- we cannot
actually try all possible combinations of tricky filename components, but we can
assume that we could test adding each tricky part individually to arrive at the
trickiest one.  That is what this PR does -- replaces static list with generation
from parts.

Also it adds commented out "%b5" identified to trigger issues with datalad-archive
special remote.  With this commit I just want to RF to see if no side-effect and then
we can see what else breaks if "urlencode" like component is added to the filename.
…most obscure file

so should be prepended to spaces etc
…e on windows

Leads to at least 4 tests to error out, although most likely filesystem does support
trailing whitespaces -- something on windows precludes correct operation in some
scenarios (e.g. may be could not be for directory names? did not check and do not
remember)
So we could verify that filenames are not unquoted where they should not be
Wow -- this must be one of the oldest buggy lines of code in the datalad from the
times when both earth and moon were flat: initial commit which introduced this unquote
is 0.0.1-48-g2170d1477 (2170d14).

Commit message there
 ENH/RF: Special remote provided optional path

though hints on possibility of handling unquoting somewhere in the special remotes code.
We will see if that is needed

Closes dataladgh-4950
Copy link
Collaborator

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subject: [PATCH 4/5] ENH: add %b5 into parts of an obscure filename

So we could verify that filenames are not unquoted where they should not be

Okay, so this shows up as failures in test_archives.test_basic_scenario and test_archives.test_annex_get_from_subdir. And ...

Subject: [PATCH 5/5] BF: do not urlunquote filename of a file from an archive

... both these failures are addressed by this commit.

@yarikoptic
Copy link
Member Author

travis fail from _DL_TMPDIR="/var/tmp/sym link" DATALAD_TESTS_OBSCURE_PREFIX=- DATALAD_LOG_LEVEL=INFO DATALAD_LOG_TRACEBACK=1 DATALAD_LOG_VMEM=1
======================================================================
ERROR: datalad.plugin.tests.test_plugins.test_wtf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 554, in _wrap_with_tree
    return t(*(arg + (d,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/plugin/tests/test_plugins.py", line 75, in test_wtf
    wtf(dataset=path, on_failure="ignore")
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 494, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 482, in return_func
    results = list(results)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 413, in generator_func
    allkwargs):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 586, in _process_results
    res_lgr(msg, *msgargs)
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1297, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1421, in _log
    self.handle(record)
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1431, in handle
    self.callHandlers(record)
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1493, in callHandlers
    hdlr.handle(record)
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 861, in handle
    self.emit(record)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/plugins/logcapture.py", line 82, in emit
    self.buffer.append(self.format(record))
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 836, in format
    return fmt.format(record)
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 573, in format
    record.message = record.getMessage()
  File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 336, in getMessage
    msg = msg % self.args
TypeError: not enough arguments for format string
-------------------- >> begin captured logging << --------------------
datalad.network: WARNING: RSS/VMS: 113148/392636 kB ...>network:634  ParseResults contains params '&%b5 ΔЙקم๗あ.datc ', which will be ignored

and the same in the abominable DATALAD_LOG_LEVEL=2 DATALAD_LOG_TARGET=/dev/null DATALAD_TESTS_PROTOCOLREMOTE=1 DATALAD_TESTS_DATALADREMOTE=1 DATALAD_LOG_CWD=1 DATALAD_LOG_OUTPUTS=1 DATALAD_LOG_ENV=1 DATALAD_LOG_STDIN=1 DATALAD_TESTS_UI_BACKEND=console DATALAD_TESTS_OBSCURE_PREFIX=- DATALAD_SEED=1 GIT_AUTHOR_DATE="Thu, 07 Apr 2005 22:13:13 +0200" GIT_AUTHOR_NAME=blah GIT_AUTHOR_EMAIL=committer@example.com GIT_COMMITTER_DATE="Thu, 07 Apr 2005 22:13:13 +0200" GIT_COMMITTER_NAME=blah GIT_COMMITTER_EMAIL=committer@example.com

So -- having % as embedded with filename in the logstring somewhere manages to get into the logs and thus cause such troubles! yet to look into it

@yarikoptic
Copy link
Member Author

yarikoptic commented Sep 23, 2020

grr -- failed to reproduce locally with

TMPDIR=/tmp/sym\ link DATALAD_TESTS_OBSCURE_PREFIX=- DATALAD_LOG_LEVEL=INFO DATALAD_LOG_TRACEBACK=1 DATALAD_LOG_VMEM=1 python -m nose -s -v datalad/plugin/tests/test_plugins.py

and thanks to all the decorators and "message passing" I have no clue from above stack about the origin of the problem besides that it is due to some warning (but not that one obviously which posted fine) somewhere while rendering results records.... I think I will just safeguard that invocation of the logger since it better not puke like that

@yarikoptic
Copy link
Member Author

ok, still don't know why it failed to replicate locally but I think I know what could have lead to it, so pushed the fix (we need to sanitize path before including into msg).

@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #4953 into maint will increase coverage by 0.05%.
The diff coverage is 96.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #4953      +/-   ##
==========================================
+ Coverage   89.62%   89.68%   +0.05%     
==========================================
  Files         289      289              
  Lines       40500    40544      +44     
==========================================
+ Hits        36300    36362      +62     
+ Misses       4200     4182      -18     
Impacted Files Coverage Δ
datalad/tests/utils.py 86.79% <93.33%> (+0.27%) ⬆️
datalad/interface/base.py 91.37% <100.00%> (+0.02%) ⬆️
datalad/interface/tests/test_utils.py 100.00% <100.00%> (ø)
datalad/interface/utils.py 94.29% <100.00%> (+0.13%) ⬆️
datalad/support/archives.py 89.69% <100.00%> (ø)
datalad/tests/test_tests_utils.py 98.90% <100.00%> (+0.27%) ⬆️
datalad/support/gitrepo.py 90.45% <0.00%> (+0.07%) ⬆️
datalad/downloaders/tests/test_http.py 89.92% <0.00%> (+2.21%) ⬆️
datalad/downloaders/http.py 84.55% <0.00%> (+2.70%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2e70de...37fa322. Read the comment docs.

@yarikoptic
Copy link
Member Author

I want to push more tune ups

  • RF get_obscure to simplify logic -- just add trailing spaces into parts
  • RF get_obscure a bit more to remove windows specific code by testing obscure also on mkdir
  • add {} to most obscure components. So sample local obscure name without prefix is ' |;&%b5{}ΔЙקم๗あ .datc '...
  • now I realized that the most evil would be do add ' and " into parts! that makes it into ' |;&%b5{}\'"ΔЙקم๗あ .datc ' on my system... will run tests locally now to see what booms

I could push here, but this one is primarily a BF and IMHO it is already doing BFing it needed to do, or could initiate a new PR on top (after or before we merge this one). @kyleam - what do you think?

@kyleam
Copy link
Collaborator

kyleam commented Sep 24, 2020

I want to push more tune ups [...] @kyleam - what do you think?

I haven't looked at the fixups since I last reviewed, but as I reviewer I would very much prefer that PRs aren't a frequently moving target.

@yarikoptic
Copy link
Member Author

ok, I will leave this one alone, and will submit a follow up whenever this one is merged (to not build on top of it with a more complicated diff etc) -- I do not think there is any urgency to that matter.

@kyleam kyleam merged commit 0987979 into datalad:maint Sep 24, 2020
2 checks passed
@yarikoptic yarikoptic added this to the 0.13.5 milestone Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants