Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few py2 Unicode fixes #3597

Merged
merged 6 commits into from Aug 14, 2019

Conversation

@kyleam
Copy link
Member

commented Aug 12, 2019

These were all discovered when looking into gh-3563. I've only run a subset of the tests, so I'm crossing my fingers that the changes don't trigger failures elsewhere.

Fixes #3563.

kyleam added 6 commits Aug 12, 2019
RF: dataset: Delay formatting of debug message
In addition to doing unnecessary (though negligible) work, formatting
the message directly makes fixing the Unicode error in the next commit
harder.
BF(py2): require_dataset: Fix handling of non-ascii paths
Under Python 2, we're problematically working with a mix of unicode
and byte strings here.  Consistently work with unicode.

We'll test this, but we need to fix create() and chwpd() first.
BF(py2): chpwd: Use bytes for PWD on Python 2
os.environ expects bytes on Python 2 and unicode on Python 3.
TST: Test require_dataset() with obscure file name
Now that we have fixed create() and chpwd(), add a regression test for
the require_dataset() fix from a couple of commits back.
BF(py2): wtf: Fix encoding failures when running in non-ascii path
This is primarily a Python 2 fix, but the removal of assure_bytes()
fixes the Python 3 failure from gh-3563 in addition to fixing Python 2
handling of non-ascii strings.

Fixes #3563.
BF(py2): create: Fix Unicode error with non-ascii paths
Under Python 2, GitPython expects a byte string here:

    File "[...]/lib/python2.7/site-packages/git/repo/base.py", line 126, in __init__
        epath = str(epath)
    UnicodeEncodeError: 'ascii' codec can't encode characters in
    position 63-68: ordinal not in range(128)

Note that master's local/create.py also fails with a different Unicode
error under Python 2.  That will be fixed in a separate series.

@kyleam kyleam force-pushed the kyleam:pyperclip-brought-friends branch from 1f8748b to 71ec996 Aug 12, 2019

@codecov

This comment has been minimized.

Copy link

commented Aug 12, 2019

Codecov Report

Merging #3597 into 0.11.x will decrease coverage by 12.67%.
The diff coverage is 79.41%.

Impacted file tree graph

@@             Coverage Diff             @@
##           0.11.x    #3597       +/-   ##
===========================================
- Coverage    76.6%   63.93%   -12.68%     
===========================================
  Files         252      252               
  Lines       33700    33725       +25     
===========================================
- Hits        25817    21561     -4256     
- Misses       7883    12164     +4281
Impacted Files Coverage Δ
datalad/utils.py 40.32% <0%> (-13.88%) ⬇️
datalad/distribution/tests/test_dataset.py 100% <100%> (ø) ⬆️
datalad/plugin/tests/test_plugins.py 91.81% <100%> (+0.22%) ⬆️
datalad/tests/test_utils.py 95.85% <100%> (-0.24%) ⬇️
datalad/distribution/tests/test_create.py 100% <100%> (ø) ⬆️
datalad/plugin/wtf.py 19.16% <25%> (-56.89%) ⬇️
datalad/distribution/dataset.py 64.12% <50%> (-21.47%) ⬇️
datalad/support/gitrepo.py 34.82% <66.66%> (-28.08%) ⬇️
datalad/tests/utils_testdatasets.py 11.76% <0%> (-88.24%) ⬇️
datalad/interface/tests/test_annotate_paths.py 24.51% <0%> (-75.49%) ⬇️
... and 77 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ee7b18...71ec996. Read the comment docs.

@yarikoptic

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

Thank you @kyleam for looking into those!!!

@kyleam

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

The failures that have shown up in the current build (so far) are related to gh-3598.

kyleam added a commit to kyleam/datalad that referenced this pull request Aug 14, 2019
BF(py2): interface: Don't assume that results are unicode
We call text_type() on the result's `path` because it may come in as a
pathlib object.  Ideally we'd work with unicode internally within IO
boundaries, but when default_result_renderer() gets the results,
_process_results() has already encoded unicode strings.  This means
our handling fails with non-ascii paths on Python 2:

    $ datalad create Δ0
    [ERROR ] 'ascii' codec can't decode byte 0xce in position 16:
    ordinal not in range(128)
    [utils.py:default_result_renderer:506] (UnicodeDecodeError)

Avoid the above error by ensuring we use bytes for `path` on Python 2.

Note that the added test is oddly parametrized with one case because
the next commit will add another case.

Re: datalad#3597
kyleam added a commit to kyleam/datalad that referenced this pull request Aug 14, 2019
@yarikoptic

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

The failures that have shown up in the current build (so far) are related to gh-3598.

let's merge then

@yarikoptic yarikoptic merged commit 2c7a9eb into datalad:0.11.x Aug 14, 2019

1 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
WIP Ready for review
Details

@kyleam kyleam deleted the kyleam:pyperclip-brought-friends branch Aug 14, 2019

kyleam added a commit to kyleam/datalad that referenced this pull request Aug 15, 2019
BF(py2): interface: Don't assume that results are unicode
We call text_type() on the result's `path` because it may come in as a
pathlib object.  Ideally we'd work with unicode internally within IO
boundaries, but when default_result_renderer() gets the results,
_process_results() has already encoded unicode strings.  This means
our handling fails with non-ascii paths on Python 2:

    $ datalad create Δ0
    [ERROR ] 'ascii' codec can't decode byte 0xce in position 16:
    ordinal not in range(128)
    [utils.py:default_result_renderer:506] (UnicodeDecodeError)

Avoid the above error by ensuring we use bytes for `path` on Python 2.

Note that the added test is oddly parametrized with one case because
the next commit will add another case.

Re: datalad#3597
yarikoptic added a commit that referenced this pull request Sep 6, 2019
Merge tag '0.11.7' into debian
0.11.7 (Sep 02, 2019) -- python2-we-still-love-you-but-...

Primarily bugfixes with some optimizations and refactorings.

 Fixes

- [addurls][]
  - now provides better handling when the URL file isn't in the
    expected format.  ([#3579][])
  - always considered a relative file for the URL file argument as
    relative to the current working directory, which goes against the
    convention used by other commands of taking relative paths as
    relative to the dataset argument.  ([#3582][])

- [run-procedure][]
  - hard coded "python" when formatting the command for non-executable
    procedures ending with ".py".  `sys.executable` is now used.
    ([#3624][])
  - failed if arguments needed more complicated quoting than simply
    surrounding the value with double quotes.  This has been resolved
    for systems that support `shlex.quote`, but note that on Windows
    values are left unquoted. ([#3626][])

- [siblings][] now displays an informative error message if a local
  path is given to `--url` but `--name` isn't specified.  ([#3555][])

- [sshrun][], the command DataLad uses for `GIT_SSH_COMMAND`, didn't
  support all the parameters that Git expects it to.  ([#3616][])

- Fixed a number of Unicode py2-compatibility issues. ([#3597][])

 Enhancements and new features

- The [annotate-paths][] helper now caches subdatasets it has seen to
  avoid unnecessary calls.  ([#3570][])

- A repeated configuration query has been dropped from the handling of
  `--proc-pre` and `--proc-post`.  ([#3576][])

- Calls to `git annex find` now use `--in=.` instead of the alias
  `--in=here` to take advantage of an optimization that git-annex (as
  of the current release, 7.20190730) applies only to the
  former. ([#3574][])

- [addurls][] now suggests close matches when the URL or file format
  contains an unknown field.  ([#3594][])

- Shared logic used in the setup.py files of Datalad and its
  extensions has been moved to modules in the _datalad_build_support/
  directory.  ([#3600][])

- Get ready for upcoming git-annex dropping support for direct mode
  ([#3631][])

* tag '0.11.7': (87 commits)
  DOC: Added an entry to changelogn on merged 3631
  ENH: finalizing changelog for 0.11.7
  TST: Update tests for a git-annex without direct mode
  TST: utils: Add decorator that skips when direct mode is unsupported
  ENH: annexrepo: Refuse to initialize in direct mode if unsupported
  ENH: annexrepo: Add check_direct_mode_support method
  BF+TST: Avoid leaking patched git-annex version
  TST+RF: test_annexrepo: Split up a test
  CHANGELOG.md: Second batch for 0.11.7
  TST: run_procedure: Mark test_spaces() as known Windows failure
  TST: run_procedure: Mark test_quoting as known windows failure
  TST: run_procedure: Test more arguments that need quoting
  BF(py2): run_procedure: Avoid encoding error in log message
  TST: add run_procedure test with spaces in file name
  TST/RF: non-hardcoded Python executable
  RF: newline at end of file
  RF: helper instead of conditional
  RF: remove superfluous imports
  BF/TST: remove quoting
  ENH: replace conditionals with helper function
  ...
yarikoptic added a commit that referenced this pull request Sep 6, 2019
Merge tag '0.11.7' into debian
0.11.7 (Sep 02, 2019) -- python2-we-still-love-you-but-...

Primarily bugfixes with some optimizations and refactorings.

 Fixes

- [addurls][]
  - now provides better handling when the URL file isn't in the
    expected format.  ([#3579][])
  - always considered a relative file for the URL file argument as
    relative to the current working directory, which goes against the
    convention used by other commands of taking relative paths as
    relative to the dataset argument.  ([#3582][])

- [run-procedure][]
  - hard coded "python" when formatting the command for non-executable
    procedures ending with ".py".  `sys.executable` is now used.
    ([#3624][])
  - failed if arguments needed more complicated quoting than simply
    surrounding the value with double quotes.  This has been resolved
    for systems that support `shlex.quote`, but note that on Windows
    values are left unquoted. ([#3626][])

- [siblings][] now displays an informative error message if a local
  path is given to `--url` but `--name` isn't specified.  ([#3555][])

- [sshrun][], the command DataLad uses for `GIT_SSH_COMMAND`, didn't
  support all the parameters that Git expects it to.  ([#3616][])

- Fixed a number of Unicode py2-compatibility issues. ([#3597][])

- [download-url][] now will create leading directories of the output path
  if they do not exist ([#3646][])

 Enhancements and new features

- The [annotate-paths][] helper now caches subdatasets it has seen to
  avoid unnecessary calls.  ([#3570][])

- A repeated configuration query has been dropped from the handling of
  `--proc-pre` and `--proc-post`.  ([#3576][])

- Calls to `git annex find` now use `--in=.` instead of the alias
  `--in=here` to take advantage of an optimization that git-annex (as
  of the current release, 7.20190730) applies only to the
  former. ([#3574][])

- [addurls][] now suggests close matches when the URL or file format
  contains an unknown field.  ([#3594][])

- Shared logic used in the setup.py files of Datalad and its
  extensions has been moved to modules in the _datalad_build_support/
  directory.  ([#3600][])

- Get ready for upcoming git-annex dropping support for direct mode
  ([#3631][])

* tag '0.11.7':
  Changelog entry for download-url paths handling
  ENH: downloaders: Ensure directories for target exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.