Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patool dislikes tricky filenames #3769

Closed
yarikoptic opened this issue Oct 11, 2019 · 5 comments
Closed

patool dislikes tricky filenames #3769

yarikoptic opened this issue Oct 11, 2019 · 5 comments

Comments

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Oct 11, 2019

edit: bug in cPython: https://bugs.python.org/issue38449

While test building 0.11.8 for debian sid, build failed with a number of tests failing with the same error from patool (patoolib=1.12):

======================================================================
ERROR: datalad.tests.test_archives.test_ExtractedArchive
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/build/datalad-0.11.8/.pybuild/cpython3_3.7_datalad/build/datalad/tests/utils.py", line 440, in newfunc
    create_tree(d, tree, archives_leading_dir=archives_leading_dir)
  File "/build/datalad-0.11.8/.pybuild/cpython3_3.7_datalad/build/datalad/utils.py", line 2248, in create_tree
    archives_leading_dir=archives_leading_dir)
  File "/build/datalad-0.11.8/.pybuild/cpython3_3.7_datalad/build/datalad/utils.py", line 2211, in create_tree_archive
    compress_files([dirname], name, path=path, overwrite=overwrite)
  File "/build/datalad-0.11.8/.pybuild/cpython3_3.7_datalad/build/datalad/support/archives.py", line 251, in compress_files
    verbosity=100)
  File "/usr/lib/python3/dist-packages/patoolib/__init__.py", line 505, in _create_archive
    format, compression = get_archive_format(archive)
  File "/usr/lib/python3/dist-packages/patoolib/__init__.py", line 293, in get_archive_format
    raise util.PatoolError("unknown archive format for file `%s'" % filename)
patoolib.util.PatoolError: unknown archive format for file ` "';b&b&c `| .tar.gz'

I do not remember us changing any of those tests since last release but previous release of patool used for building 0.11.7 where no failures happened was also 0.12 (checked -- debian revision was 0.12-3 in both cases).

neurodebian@smaug ~/deb/builds/datalad/0.11.7-1 % grep test_ExtractedArchive datalad_0.11.7-1_amd64.build
datalad.tests.test_archives.test_ExtractedArchive ... ok

So I have no immediate idea and it would need some digging.

edit: also doesn't reproduce for me locally

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 11, 2019

boiling down to indigestion of ; in a filename (and I think it goes deeper into python stdlib, upgrading my laptop atm):

root@smaug:/tmp/buildd/datalad-0.11.8# python3 -c 'import patoolib.util as ut; print(ut.guess_mime(r";b.tar.gz"))'
(None, None)
root@smaug:/tmp/buildd/datalad-0.11.8# python3 -c 'import patoolib.util as ut; print(ut.guess_mime(r"b.tar.gz"))'
('application/x-tar', 'gzip')

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 11, 2019

yeap, old

$> python3 -c 'import mimetypes; mimedb = mimetypes.MimeTypes(strict=False); print(mimedb.guess_type(";1.tar.gz"))'
('application/x-tar', 'gzip')

$> python3 --version
Python 3.7.3rc1

new (debian sid):

root@smaug:/tmp/buildd/datalad-0.11.8# python3 -c 'import mimetypes; mimedb = mimetypes.MimeTypes(strict=False); print(mimedb.guess_type(";1.tar.gz"))'
(None, None)
root@smaug:/tmp/buildd/datalad-0.11.8# python3 --version
Python 3.7.5rc1

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 11, 2019

since it is unlikely to be hit in real life, for now I will

  • patch our code to avoid ; in the filename if it confuses mimetypes
  • report issue to Debian
  • verify status with current cPython git (v3.8.0b1-1174-g2b7dc40b2af) if I manage to build it, and proceed from there

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 11, 2019

ok -- verified with upstream cpython...

$> ./python3.9 -c 'import mimetypes; mimedb = mimetypes.MimeTypes(strict=False); print(mimedb.guess_type(";1.tar.gz"))'                           
(None, None)

$> ./python3.9 -c 'import mimetypes; mimedb = mimetypes.MimeTypes(strict=False); print(mimedb.guess_type("1.tar.gz"))' 
('application/x-tar', 'gzip')

$> git describe
v3.8.0b1-1174-g2b7dc40b2af

yarikoptic added a commit that referenced this issue Oct 14, 2019
0.11.8 (Oct 11, 2019) -- annex-we-are-catching-up

Fixes

- Our internal command runner failed to capture output in some cases.
  ([#3656][])
- Workaround in the tests around python in cPython >= 3.7.5 ';' in
  the filename confusing mimetypes ([#3769][]) ([#3770][])

Enhancements and new features

- Prepared for upstream changes in git-annex, including support for
  the latest git-annex
  - 7.20190912 auto-upgrades v5 repositories to v7.  ([#3648][]) ([#3682][])
  - 7.20191009 fixed treatment of (larger/smaller)than in .gitattributes ([#3765][])

- The `cfg_text2git` procedure, as well the `--text-no-annex` option
  of [create][], now configure .gitattributes so that empty files are
  stored in git rather than annex.  ([#3667][])

* tag '0.11.8': (27 commits)
  DOC: add CHANGELOG entry about mimetypes workaround, and regenerate changelog.rst
  RF: reuse fn*obscure* variables from test_archives for testing archives custom remote
  BF(TST,workaround): do not use ; in the test archive filenames
  Finalize changelog and boost version
  DOC: Adjust CHANGELOG for the fix of test
  RF(TST): use 'willgetshort' name to correctly reflect file behavior
  BF(TST): reflect the fact that since 7.20191009 file would jump from annex to git based on current size
  CHANGELOG.md: Add entry for gh-3667
  CHANGELOG.md: First batch for 0.11.8
  RF: simplify the expression for largefiles based on size
  ENH: exit with dedicated 99 exit code if installed annex is newer than -devel
  TST: known_failure_v6_or_later: Consider whether v5 is supported by git-annex
  BF(v7): gitrepo: Avoid adding files to annex
  BF: 3rdparty_analysis_workflow: Make example compatible with v6+
  ENH: annexrepo: Give more informative assertion error
  BF: annexrepo: Skip empty lines when expecting one output line
  TST: create: Adjust --text-no-annex test for aa6b8dc
  ENH: add file size rule to --text-no-annex
  TST: basic test for empty files in text2git ds
  ENH: exclude empty files from being annexed after text2git
  ...
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 17, 2020

our report was taken very seriously, offending code was reverted, 3.8.0 was released with a "fix". not sure if any reason to keep it open

@yarikoptic yarikoptic closed this Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants