@@ Coverage Diff @@ ## master #4430 +/- ## ======================================= Coverage 88.95% 88.95% ======================================= Files 287 287 Lines 38253 38253 ======================================= Hits 34029 34029 Misses 4224 4224
The first thing I tried was
datalad create a echo one >a/one datalad -C a save datalad create b datalad copyfile a/one b/two tree --charset=ascii a b | colrm 50
There might be good implementation or design reasons for not being able to specify a target with a different name via
I only played around with it a bit and did not look into the code, but here are the things that caught my attention:
If specifying different target names is not possible, maybe warn/report better in accidental attempts of copying identically named files?
After I saw @kyleam's comment, I wondered what would happen if I copied a file into a directory with a file of the same name but different contents
It would be cool if the warning could state something along the lines of "File
Copying things into datasets without an annex -- maybe warn?
By chance, the very first thing I did was to copy an annexed file (with retrieved content) into a dataset with no annex (a few HCP subjects files into the top-level
Would it be maybe good to warn when it is detected that annexed contents are copied into datasets without an annex? I'm thinking of someone copying a dataset with large data into a plain Git repository (e.g, maybe one for their code) and being not able to retrieve any data afterwards.
Also: When I
I was confused that recursive copy did not preserve the directory structure.
(I was also a bit confused that
Something I anticipate to be confusing, but can't see a way around
If I have a dataset with configurations that prevent files from being annexed, copying these files into a new dataset will annex them. This is behavior makes sense as long someone knows about files in annex versus in Git, but for anyone using the command naively it may be confusing:
I don't think this can be easily prevented, but maybe its something to keep in mind.
datalad#4430 (comment) Now it looks like this: ``` datalad create a echo one >a/one datalad -C a save datalad create b datalad copyfile a/one b/two tree --charset=ascii a b | colrm 50 a `-- one -> .git/annex/objects/zZ/KF/MD5E-s4--5bbf b `-- two -> .git/annex/objects/zZ/KF/MD5E-s4--5bbf ```
I went another way.
This is not intentional. It should have copied the content, not the symlink. I will investigate.
I have not thought about it this way so far, but I think that is a fair point: If a file was annexed in its original location, it probably was for a reason. Hence might might make sense to fail....
This might hint at the necessity to introduce a
The warning in this case says what I intended, the file should get annexed.
This is fixed now, and the prev behavior was unintentional:
I have to investigate that. I would really really like to avoid having to anticipate/inspect what would happen on
I was hoping to see more people chime in on the interface.
From my perspective, it looks nice and the simple things I tried behaved as I expected. The one thing I notice from the examples is that common use cases would require repeating the dataset path as both the target directory and dataset. There is a comment asking whether the target directory, if not given, should be set to the dataset. That'd remove the need to repeat in many cases, but then I guess there's an interaction between options that makes the interface conceptually more complicated. @mih, is that your main concern with making the target directory default to --dataset?
test_copy_file_prevent_dotgit_placement calls copy_file() with .git/config. This path gets passed to get_content_annexinfo(), which returns an empty dict because 'git ls-files -o' reasonably does not produce output for .git/config files . This copy_file() call ends up signaling a KeyError  because we unconditionally call popitem() on the return value of get_content_annexinfo(). Guard against calling popitem() on an empty dict. : Note, though, that there was a longstanding regression where calling 'git ls-files -o' on .git/ files would return output. This was fixed in Git v2.25.0, specifically b9670c1f5e (dir: fix checks on common prefix directory, 2019-12-19). : https://ci.appveyor.com/project/mih/datalad/builds/32479988/job/6d90edw45gbc03p1#L1080
…ormer not This avoids duplication of input argument values (see examples), without changing behavior of the command. All examples and unittest calls are minified to make use of this new feature.
Drop the comment about setting target_dir to ds.pathobj because 6b227f4 took care of that. Drop part of the comment about guarding against .git/ copying, which was addressed by d2d7411 (BF: Prevent total breakage of target Git by copying Git-internals) and f77d5b9 (BF: Prevent placement of a rogue '.git' via recursion or madness).
I decided to rebase on master in order to strip
rather than adding explicit commits that revert these.
I left the