Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently convert pathlib objects with text_type #3332

Merged
merged 5 commits into from Apr 16, 2019

Conversation

@kyleam
Copy link
Member

commented Apr 13, 2019

  • Avoid str(pathobj) to prevent py2-encoding issues.
  • Make runner.(..., env={...}) work when env has unicode values (py2).

A couple of notes:

  • This will very likely conflict with ongoing topics. I'm fine leaving this until those land and then dealing with the conflicts.

  • I've avoided OBSCURE_FILENAME for compatibility with older Windows, but I'm not sure that is something worth doing. I'm happy to rewrite the tests using OBSCURE_FILENAME.

  • This PR carries a change dealing with the runner. That commit is branched off of 0.11.x so that it can be merged into that line separately, but it is needed here too.

@kyleam kyleam added the do not merge label Apr 13, 2019

@kyleam

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2019

There are a couple of create failures on the py2 run. At some point I did a local py2 run that passed, but perhaps I introduced the culprit after that point. In any case, I'll have to look into it later.

https://travis-ci.org/datalad/datalad/jobs/519757021#L1781

@kyleam kyleam added the WIP label Apr 13, 2019

@yarikoptic

This comment has been minimized.

Copy link
Member

commented Apr 14, 2019

I've avoided OBSCURE_FILENAME for compatibility with older Windows, but I'm not sure that is something worth doing. I'm happy to rewrite the tests using OBSCURE_FILENAME.

Fwiw, it's value automatically chosen for the specific filesystem in use, so shouldn't cause any troubles at least at the filesystem level.

@kyleam

This comment has been minimized.

Copy link
Member Author

commented Apr 14, 2019

I've avoided OBSCURE_FILENAME for compatibility with older Windows, but I'm not sure that is something worth doing. I'm happy to rewrite the tests using OBSCURE_FILENAME.

Fwiw, it's value automatically chosen for the specific filesystem in use, so shouldn't cause any troubles at least at the filesystem level.

See gh-2929, which is referenced both in the commit messages and code comments of this PR.

@mih

This comment has been minimized.

Copy link
Member

commented Apr 14, 2019

FTR: I personally do not object a merge for reasons of potential conflicts alone.

kyleam added some commits Apr 13, 2019

BF(py2): runner: Avoid encoding error when env has unicode
Under Python 2, convert the current working directory value to bytes
because that may be added to env, which can lead to an encoding error

  py2> subprocess.call(":", env={"VAR": u"β"})
  [...]
  raise child_exception
  UnicodeEncodeError: 'ascii' codec can't encode character [...]

@kyleam kyleam force-pushed the kyleam:pathlib-unstr branch from 745c297 to 129a023 Apr 15, 2019

@codecov

This comment has been minimized.

Copy link

commented Apr 15, 2019

Codecov Report

Merging #3332 into master will decrease coverage by 0.17%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3332      +/-   ##
==========================================
- Coverage   91.13%   90.95%   -0.18%     
==========================================
  Files         263      263              
  Lines       34126    34150      +24     
==========================================
- Hits        31100    31061      -39     
- Misses       3026     3089      +63
Impacted Files Coverage Δ
datalad/tests/utils.py 91.75% <ø> (ø) ⬆️
datalad/core/local/status.py 97.91% <100%> (ø) ⬆️
datalad/tests/test_cmd.py 97.04% <100%> (-0.49%) ⬇️
datalad/distribution/dataset.py 94.77% <100%> (ø) ⬆️
datalad/core/local/tests/test_create.py 100% <100%> (ø) ⬆️
datalad/support/gitrepo.py 88.87% <100%> (-0.33%) ⬇️
datalad/support/annexrepo.py 87.52% <100%> (+0.01%) ⬆️
datalad/core/local/create.py 98.33% <100%> (-0.83%) ⬇️
datalad/cmd.py 94.69% <100%> (-1.38%) ⬇️
datalad/core/local/tests/test_save.py 100% <100%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5b995f...6119b13. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Apr 15, 2019

Codecov Report

Merging #3332 into master will increase coverage by 44.72%.
The diff coverage is 98.11%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #3332       +/-   ##
===========================================
+ Coverage   45.41%   90.14%   +44.72%     
===========================================
  Files         260      263        +3     
  Lines       34105    34316      +211     
===========================================
+ Hits        15490    30933    +15443     
+ Misses      18615     3383    -15232
Impacted Files Coverage Δ
datalad/tests/utils.py 89.91% <ø> (+32.96%) ⬆️
datalad/core/local/status.py 97.91% <100%> (+4.16%) ⬆️
datalad/tests/test_cmd.py 97.63% <100%> (+97.63%) ⬆️
datalad/distribution/dataset.py 94.77% <100%> (+9.05%) ⬆️
datalad/core/local/tests/test_create.py 100% <100%> (+25.85%) ⬆️
datalad/support/gitrepo.py 89.12% <100%> (+16.34%) ⬆️
datalad/support/annexrepo.py 86.48% <100%> (+30.32%) ⬆️
datalad/core/local/create.py 99.16% <100%> (+2.52%) ⬆️
datalad/core/local/tests/test_save.py 99.7% <100%> (+22.19%) ⬆️
datalad/distribution/tests/test_dataset.py 99.67% <100%> (+28.94%) ⬆️
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0a4772...129a023. Read the comment docs.

@kyleam

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2019

There are a couple of create failures on the py2 run. At some point I did a local py2 run that passed, but perhaps I introduced the culprit after that point. In any case, I'll have to look into it later.

Indeed it was passing for me locally because I had a GitPython version before 2.1.11 in my py2 virtualenv. Dealt with in 129a023.

@kyleam kyleam added WIP and removed WIP do not merge labels Apr 15, 2019

kyleam added some commits Apr 13, 2019

BF(py2): pathlib: Consistently use text_type for conversion
Our code makes heavy use of str(pathobj), but under Python 2 that can
be problematic anytime the path has non-ascii characters.  Convert all
the str(pathobj) instances to text_type(pathobj).  Note that this
depends on the Python 2 pathlib objects working with unicode(), which
is the case as of f0822dc (ENH: utils: Make py2 Pathlib objects work
with unicode(), 2019-04-08).

Most of this is straight substitution or u-prefixing, but there is an
assure_unicode() call in the result renderer of status that's a bit
surprising.  The path would be expected to be a unicode string here.
On PY2 it is not, because interface.utils encodes the result values on
PY2 before they get to custom_result_renderer() as of
616e4e6 (Unicode fixes for test_add (PY2), 2018-04-13).

Note that we're using OBSCURE_FILENAME in the tests.  That sacrifices
compatibility with older Windows (see gh-2929), but, without it or
something that probes the system, the current appveyor run will fail:
https://ci.appveyor.com/project/mih/datalad/builds/23828771/job/w6wf4myq5o6nyna6#L696
BF(py2): Avoid instantiating GitPython repo with unicode path
Under Python 2, passing a unicode path to GitPython's Repo fails as of
GitPython v2.1.11, specifically 7f08b77 (Allow pathlib.Path in
Repo.__init__, 2018-07-12).

In an attempt to handle pathlib objects, GitPython converts the path
with str() if it isn't already a str.  Under Python 2, if a unicode
object is passed in, str() is called on it, leading to a possible
encoding error.  For Python 2, work around this by passing in an
encoded path.
TST: create: Add plain prefix to OBSCURE_FILENAME
Otherwise the 'git submodule add' call will fail in the
DATALAD_TESTS_OBSCURE_PREFIX=- test case (see [0]).  Our
add_submodule() should probably abort if the path starts with "-"
because the error by 'git submodule' is very misleading, and the issue
should arguably be fixed upstream, but in either case, this doesn't
have anything to do with this series, so work around this issue in the
test.

[0]: https://travis-ci.org/datalad/datalad/builds/520413270

@kyleam kyleam force-pushed the kyleam:pathlib-unstr branch from 129a023 to 6119b13 Apr 15, 2019

@kyleam kyleam removed the WIP label Apr 15, 2019

@mih

mih approved these changes Apr 16, 2019

Copy link
Member

left a comment

LGTM, thx.

@mih mih merged commit e79d4ea into datalad:master Apr 16, 2019

2 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
WIP Legacy commit status override — see details
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

mih added a commit to datalad/datalad-revolution that referenced this pull request Apr 16, 2019

@kyleam kyleam deleted the kyleam:pathlib-unstr branch Apr 16, 2019

kyleam added a commit that referenced this pull request Apr 16, 2019

Merge branch 'py2-runner-env' into 0.11.x
This brings in the 0.11.x component from gh-3332.

yarikoptic added a commit that referenced this pull request May 28, 2019

Merge tag '0.11.5' into debian
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5': (96 commits)
  [DATALAD RUNCMD] make update-changelog
  Version boost and finalize CHANGELOG.md record
  ENH: new Makefile rule linkissues-changelog to link issues, which now will also be prerequisite for update-changelog
  CHANGELOG.md: Add entries for recently merged PRs
  ENH: require "distro" for python >= 3.8
  ENH: compat with python 3.8 which removed .dist -- try distro
  CLN: wtf: Remove unused (and duplicated) import
  DOC: wtf: Avoid double period in -S's description
  ENH: -D|--decor html_details -- to make it ready for pasting to github issue without clutter
  BF: assure bytes while giving to pyperclip upon its demand (on Py2)
  RF: move always present path + type "section" into "location" section, retain order of sections from cmdline
  RF: switch from nargs="*" to action=append for wtf -S
  ENH: wtf -S to specify which sections to query/display (by default -- all)
  MNT: Avoid invalid escape sequences in strings
  BF: export_to_figshare: Don't test identity of string literal
  BF(TST): do not assume user naiveness - treat any url-like looking path as a path
  BF: Check for /, \ or # in the username@hostname part while detecting SSHRI
  CHANGELOG.md: Add entry for gh-3374
  BF: revert back (remove) check for path being PathRI
  BF: list annex-transfer also in cmdline opt choice for "what"
  ...

yarikoptic added a commit that referenced this pull request May 28, 2019

Merge tag '0.11.5' into debian -- got two more minor changes
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5':
  BF: make test for url download more reliable in cases where connection fails
  RF: remove stale commented out duecredit in setup.py.  It has now its own section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.