New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tollerate %xx in filenames #4953
Conversation
previous explicit listing tries to do combinatorics manually. Well -- we cannot actually try all possible combinations of tricky filename components, but we can assume that we could test adding each tricky part individually to arrive at the trickiest one. That is what this PR does -- replaces static list with generation from parts. Also it adds commented out "%b5" identified to trigger issues with datalad-archive special remote. With this commit I just want to RF to see if no side-effect and then we can see what else breaks if "urlencode" like component is added to the filename.
…most obscure file so should be prepended to spaces etc
…e on windows Leads to at least 4 tests to error out, although most likely filesystem does support trailing whitespaces -- something on windows precludes correct operation in some scenarios (e.g. may be could not be for directory names? did not check and do not remember)
So we could verify that filenames are not unquoted where they should not be
Wow -- this must be one of the oldest buggy lines of code in the datalad from the times when both earth and moon were flat: initial commit which introduced this unquote is 0.0.1-48-g2170d1477 (2170d14). Commit message there ENH/RF: Special remote provided optional path though hints on possibility of handling unquoting somewhere in the special remotes code. We will see if that is needed Closes dataladgh-4950
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject: [PATCH 4/5] ENH: add %b5 into parts of an obscure filename
So we could verify that filenames are not unquoted where they should not be
Okay, so this shows up as failures in test_archives.test_basic_scenario
and test_archives.test_annex_get_from_subdir
. And ...
Subject: [PATCH 5/5] BF: do not urlunquote filename of a file from an archive
... both these failures are addressed by this commit.
travis fail from _DL_TMPDIR="/var/tmp/sym link" DATALAD_TESTS_OBSCURE_PREFIX=- DATALAD_LOG_LEVEL=INFO DATALAD_LOG_TRACEBACK=1 DATALAD_LOG_VMEM=1======================================================================
ERROR: datalad.plugin.tests.test_plugins.test_wtf
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 554, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/plugin/tests/test_plugins.py", line 75, in test_wtf
wtf(dataset=path, on_failure="ignore")
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 494, in eval_func
return return_func(generator_func)(*args, **kwargs)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 482, in return_func
results = list(results)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 413, in generator_func
allkwargs):
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 586, in _process_results
res_lgr(msg, *msgargs)
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1297, in warning
self._log(WARNING, msg, args, **kwargs)
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1421, in _log
self.handle(record)
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1431, in handle
self.callHandlers(record)
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 1493, in callHandlers
hdlr.handle(record)
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 861, in handle
self.emit(record)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/plugins/logcapture.py", line 82, in emit
self.buffer.append(self.format(record))
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 836, in format
return fmt.format(record)
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 573, in format
record.message = record.getMessage()
File "/opt/python/3.5.6/lib/python3.5/logging/__init__.py", line 336, in getMessage
msg = msg % self.args
TypeError: not enough arguments for format string
-------------------- >> begin captured logging << --------------------
datalad.network: WARNING: RSS/VMS: 113148/392636 kB ...>network:634 ParseResults contains params '&%b5 ΔЙקم๗あ.datc ', which will be ignored and the same in the abominable So -- having |
grr -- failed to reproduce locally with
and thanks to all the decorators and "message passing" I have no clue from above stack about the origin of the problem besides that it is due to some warning (but not that one obviously which posted fine) somewhere while rendering results records.... I think I will just safeguard that invocation of the logger since it better not puke like that |
ok, still don't know why it failed to replicate locally but I think I know what could have lead to it, so pushed the fix (we need to sanitize path before including into msg). |
Codecov Report
@@ Coverage Diff @@
## maint #4953 +/- ##
==========================================
+ Coverage 89.62% 89.68% +0.05%
==========================================
Files 289 289
Lines 40500 40544 +44
==========================================
+ Hits 36300 36362 +62
+ Misses 4200 4182 -18
Continue to review full report at Codecov.
|
I want to push more tune ups
I could push here, but this one is primarily a BF and IMHO it is already doing BFing it needed to do, or could initiate a new PR on top (after or before we merge this one). @kyleam - what do you think? |
I haven't looked at the fixups since I last reviewed, but as I reviewer I would very much prefer that PRs aren't a frequently moving target. |
ok, I will leave this one alone, and will submit a follow up whenever this one is merged (to not build on top of it with a more complicated diff etc) -- I do not think there is any urgency to that matter. |
To address #4950 , but also introducing
%b4
into an obscure filename used by tests. Sits on top of #4949 . The actual fix in the last commit in the series was trivial and to the one of the most ancient pieces of code in the archive (current commit is f080379 ). Let's now just see if tests pass to signal that there were no some hidden assumption somewhere ;-)