-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF+RF: unlock: Rewrite using status() #3459
Conversation
kyleam
commented
May 25, 2019
If the default renderer _were_ to run with the status results, it would fail with relpath() argument must be str or bytes, not 'PosixPath' [genericpath.py:_check_arg_types:143] (TypeError) This codepath isn't accessible when calling the status() command because status() uses a tailored renderer as its custom renderer (see dataladgh-2481). However, if another command that uses the default renderer runs status() with on_failure="ignore" and then yields the result, as unlock() will do in the next commit, then we will hit the above failure.
I was on the fence about this part. Feedback welcome. |
Demo of main failure#!/bin/sh
set -eu
cd $(mktemp -dt dl-XXXXXXX)
datalad create
echo a >a
echo b >b
datalad save
datalad drop --nocheck a
datalad unlock Before:
After:
|
565ae46
to
a7551eb
Compare
Codecov Report
@@ Coverage Diff @@
## master #3459 +/- ##
==========================================
- Coverage 91.27% 90.99% -0.28%
==========================================
Files 272 272
Lines 34834 34855 +21
==========================================
- Hits 31796 31718 -78
- Misses 3038 3137 +99
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #3459 +/- ##
==========================================
- Coverage 91.27% 91.27% -0.01%
==========================================
Files 272 272
Lines 34834 34865 +31
==========================================
+ Hits 31796 31822 +26
- Misses 3038 3043 +5
Continue to review full report at Codecov.
|
c15383a
to
190c64c
Compare
If a file without content is passed to unlock(), it yields a result (with a status of "impossible") saying that the file doesn't have content. But if the files without content weren't explicitly provided as paths (e.g., no path was given or a directory containing the files was given as the path), then files without content lead to a CommandError from the underlying 'git annex unlock' call. (Or in v6+, they lead to the pointer being unlocked which is inconsistent with the error given when a file is explicitly specified in v6+ and also potentially confusing to callers.) To fix the "no paths specified, missing content" behavior, rely on the information provided by status() to restrict the unlock calls to files with content. This fix raises the question of how to report files without content when the path isn't explicitly given. Because 'datalad unlock' at the dataset or a directory level could potentially lead to a lot of results, avoid reporting "can't unlock" results when explicit paths aren't given. Doing this at the __call__ level rather than as a result filter means we can do the faster status(..., untracked=no") query unless explicit, non-directory paths are given. Aside from the fix above, this tries to match the behavior of the previous unlock(). The main behavioral change, shown by the test updates, is that unlock() no longer infers the dataset from a path when the current working directory is outside a dataset and no dataset is explicitly given. This behavioral change is consistent with, for example, distribution.add vs core.local.save. This nudges annotate_paths() closer to retirement (dataladgh-3368).
190c64c
to
2080df1
Compare
First of all: Awesome! Thx for approaching this! Quick note (primarily to myself, as I could not look at this in detail yet): Make sure that we make use of the Repo.diffstatus() caching ability for ls-files/tree calls, in order to avoid possibly duplicate calls. |
# Do the actual unlocking. | ||
for ds_path, files in to_unlock.items(): | ||
ds = Dataset(ds_path) | ||
for r in ds.repo.unlock(files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to recode file names to be relative to the repository. Otherwise, repos in symlinked paths will not work.
f.relative_to(ds.pathobj) for f in to_unlock...
if f
s are Path
instances.
If anyhow possible, it would be great to also incorporate AnnexRepo.unlock()
into this RF:
- shed
normalize_paths
deocrator -- which prevents actual generator-like behavior - call
git annex unlock
with--json --json-error-messages
and relay the full results, rather than manual parsing - strip
options
argument -- there are no non-generic options supported bygit annex unlock
, our API should reflect that, and not makeunlock
-specific exceptions. - strip
AnnexRepo.lock()
from the codebase -- unused, untested, no high-level API companion (a simplesave
does the job)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to recode file names to be relative to the repository. Otherwise, repos in symlinked paths will not work.
Quick testing with a repo that had a symlink in the leading path, things seem to work, but I'm may not be testing the right case. I know you've dealt with this issue a good amount and I have no reason not to use a relative path, so I'll go with your suggestion.
If anyhow possible, it would be great to also incorporate
AnnexRepo.unlock()
into this RF [...]
Hmm looking into that hadn't occurred to me. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started to look into AnnexRepo.unlock()...
- shed
normalize_paths
deocrator -- which prevents actual generator-like behavior
I'm for dropping normalize_paths
(here and in general), but isn't any useful generator-like behavior also prevented by git annex unlock
not supporting --batch
? So, assuming we use _run_annex_command_json(), we get a list of json objects, and it doesn't seem particularly useful to, say, yield those in turn rather than just return the list.
- call
git annex unlock
with--json --json-error-messages
and relay the full results, rather than manual parsing
I agree we should use --json. --json-error-messages sounds like it'd be useful, but I'm not sure it is in its current form:
#!/bin/sh
set -eu
cd $(mktemp -dt gx-XXXXXXX)
git init
git annex init
git annex unlock --json --json-error-messages notthere
Initialized empty Git repository in /tmp/gx-rHLSVpQ/.git/
init ok
(recording state in git...)
git-annex: notthere not found
git-annex: unlock: 1 failed
- strip
options
argument -- there are no non-generic options supported bygit annex unlock
, our API should reflect that, and not makeunlock
-specific exceptions.
Sounds fine to me.
- strip
AnnexRepo.lock()
from the codebase -- unused, untested, no high-level API companion (a simplesave
does the job)
Also OK with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've pushed updates to AnnexRepo.lock(). It only partially addresses your suggestions.
It
- drops lock()
- removes unlock's
options
parameter. - uses
--json
(via_run_annex_command_json
).
It does not
- strip the
normalize_paths
decorator
I think it'd make more sense to do wholesale removal of it from methods rather than incrementally dropping it from existing methods. - bother using --json-error-messages for the reasons I noted above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: I disagree with your conclusion re not stripping normalize_paths
. It silently changes the shape of the path going in an out. Personally I think there should be a single pattern: high-level code hands over path to a repo method in relative form, or absolute pointing inside ds.repo.path
. Repo method must never try to be clever and guess how a caller might want the results and report paths in full and leave any recoding to the top-level code.
That being said, not removing it doesn't make it worse than it is, and the PR is still a massive net improvement.
@mih Thanks for your feedback! |
For better handling of symlinks in the leading path. See <datalad#3459 (comment)>.
If the file is already unlocked, it is more appropriate to tell the caller to say that it is not needed rather than impossible. 2080df1 used impossible under the incorrect impression that this is what the old unlock() did.
This isn't used or tested, there's no high-level lock dataset method, and save() can be used to keep the modifications and switch back to a locked state, so let's drop this until it's clear there's a need for it. Re: datalad#3459 (comment)
It's not currently used. Let's drop it until it's clear we need it. Re: datalad#3459 (comment)
Use the json output rather than manually parsing the plain text output. This makes the handling less brittle, but it doesn't improve the efficiency (unlock doesn't support --batch) or the lack of error information (--json-error-messages doesn't doesn't result in an unlock error being represented in as a json record). Re: datalad#3459 (comment)
Cool! Thx much! |