Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new__cp_function #10888

Merged
merged 1 commit into from
Apr 25, 2015
Merged

new__cp_function #10888

merged 1 commit into from
Apr 25, 2015

Conversation

peter1000
Copy link

hi @tkelman

This is what I came up with to address issue: #10506

I took also a look at the older conversations at:

.

My own opinion differs from @simonster and I would prefer to remove the recursive kwarg altogehter and just copy directories always recursively or export cptree.

Anyway: I left it as it was: cp requires the recursive=true to be set to copy directories and helper function cptree is not exported.

To have an option to keep symlinks as symlinks I added a kwarg keepsym=false.

  • Python uses: If follow_symlinks is false and src is a symbolic link, a new symbolic
    link will be created instead of copying the file src points to. shutil.copy

On Linux (x86_64) all tests Succeeded

Thanks for your help.

PS:

  • currently files can be copied over existing files
  • but: directories can not be copied over existing directories

I kept it like that.

Question: should we add an option: force to overwrite also directories.
Python seems not to allow such: shutil.copytree
but that does not mean julia can't have it.
If so - should that also apply to files or leave file to be overwritten anyway


# issue #10506
@non_windowsxp_only begin
# Do: absulte and realtive directory links
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolute and relative

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolute and relative are still spelled wrong in a few of the comments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry will fix that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not too big a deal, but may as well fix before merging

@peter1000
Copy link
Author

Updated the PR.

Added NEWS.md entry inclusive the missing one for readlink.

@tkelman
Copy link
Contributor

tkelman commented Apr 19, 2015

I think a force keyword argument would be useful. Though it looks like we don't use a force keyword argument in other similar functions, we've generally been defaulting to force behavior? What happens here if a destination directory already exists?

@peter1000
Copy link
Author

I can add a force keyword: at the moment it raises an error like the one of the current implementation: something like: ERROR: SystemError: mkdir: File exists. Can't say exactley as I'm working on somthing else.

Anyway: I add a force keyword: it does not remove any existing dirs but just write into such.

what I mean: I have dirs:

[@evo TESTING_DELETE_LATER]$ tree test
test
├── juliatest.txt
└── test.txt

test2
└── juliatest.txt
└── NEWS.md
  1. And I cp("test", "test2"; force=true)

The final "test2" will be:

# this file is overwritten with any content of test/juliatest.txt. By sendfile
test2
├── juliatest.txt
├── NEWS.md
└── test.txt
  1. Or do you want to remove "test2" first and just plan recursive copy "test" so that the final "test2" is the same as "test"
test2
├── juliatest.txt
└── test.txt

@tkelman
Copy link
Contributor

tkelman commented Apr 19, 2015

I guess the merging into an existing destination is consistent with what cp -rfT test test2 would do. If you replace mkdir with something like isdir(dst) || mkdir(dst), would there be any error if we just default to force behavior?

@peter1000
Copy link
Author

If you replace mkdir with something like isdir(dst) || mkdir(dst), would there be any error if we just default to force behavior?

I have to check it through but had something like this in mind. but you still want the additional karg force=true or just overwrite per default like Base.sendfile or cp for file does?

@tkelman
Copy link
Contributor

tkelman commented Apr 19, 2015

I think it's easy enough to manually do "safe" no-force behavior via isdir(dst) || cp(src,dst) that defaulting to force behavior without needing a keyword argument could work. Though again, other opinions from everyone else very welcome here.

@peter1000
Copy link
Author

In this case maybe it would be a good point to add your suggestion to the docs.
I will change the default behavour to overwrite without a keyword.

Copy the file, link or directory from *src* to *dest*.
"recursive=true" is required to copy directories or links to directories.
Passing "keepsym=true" will enable copying src-symlinks as dest-symlinks.

Note: this function overwrites existing files, links or directories.
For a "safe" no-force behavior do `ispath(dst) || cp(src,dst)`

@peter1000
Copy link
Author

I nearly finished implementing the overwrite always but a number of question arise and I'm not sure if it is worthy to do it.

@tkelman If you replace mkdir with something like isdir(dst) || mkdir(dst), would there be any error if we just default to force behavior?

This is not enough for a number of reasons: e.g. if the src is a file and overwrites a path with the same name which is a symlink I had to do more because Fs.sendfile does not do this. So I needed to first remove the symlink etc.

The problem is not so much of implementing it (which I nearly finished) but of defining which result one wants in combination with keepsym especially for directories and links to directories.

EXAMPLE: IS CLEAR

cp(srcdir/mydir (being a **link to a dir**), dstdir/mydir (being a dir); recursive=true, **keepsym=true**)  
  • existing path in dstdir get overwritten with whatever scrdir has but if dstdir/mydir contains other items these are kept as is
    RESULT: dstdir/mydir (being a link to a dir)

EXAMPLE: NOT SO CLEAR

cp(srcdir/mydir (being a **link to a dir**), dstdir/mydir (being a dir); recursive=true, **keepsym=false**)
  • existing path in dstdir get overwritten with whatever scrdir has. (links are replaced with where the links points to)
    RESULT: dstdir/mydir (being a dir with the content where the srcdir/mydir link points to.)

    BUT: in such case any other items dstdir/mydir had are not kept as is - instead they are gone.

    Or are we supposed to: make first a tmp copy of dstdir/mydir resolving all links and
    after that overwrite any existing items with the content of srcdir/mydir

There are more combinations if the dstdir/mydir is a link or srcdir/mydir and dstdir/mydir are links.


To avoid confusion or the need to write a long doc just to explain exceptions I would propose 2 options for now:

1.

Because this addresses a current bug let's fix first the problem and merge it. I still will update the inconsistency with sendfile if the destination is a link to a file.

2.

Use overwrite as default but make an easy definition which does not have many exceptions to explain: existing destination path are always first removed.

example:

test
├── juliatest.txt
└── test.txt

test2
└── juliatest.txt
└── NEWS.md
  1. And I cp("test", "test2")

The final "test2" will be: the same as the src test

test2
├── juliatest.txt
└── test.txt

@tkelman
Copy link
Contributor

tkelman commented Apr 20, 2015

Yeah this is a bit messy. I'd say try not to make things too complicated to preserve merging semantics in all combinations of links and keepsym. Since this is a bit ambiguous, maybe we go the python route, error if dst already exists and force the user to decide what to do about it?

I'd like if it weren't just you and me making these decisions though.

@peter1000
Copy link
Author

I will update the PR now. I think the clearest and internally most consistend implementation would be like this.

cp(src, dst; recursive=true, rm=false, sym=false)
Copy the file, link or directory from *src* to *dest*.
"recursive=true" is required to copy directories or links to directories.

`rm=true` will first remove aan existing `dst` before coping `src`
If `sym=true` symbolic links in the `src` are represented as symbolic links in the `dst`.
If `sym=false` the linked files or directories are copied to the new `dst`.

Having rm will make the interface consistend for files and directories. Currently cp overwrites files but does raises an error if an directory exists.

Default cp(src, dst) should be enough for most situations

  • does not overwritting anything
  • works the same for files, and directories

@tkelman tkelman added the needs decision A decision on this change is needed label Apr 20, 2015
@tkelman
Copy link
Contributor

tkelman commented Apr 20, 2015

We could bikeshed the keyword argument names a little (except recursive since that one has precedent elsewhere), but I think I like the sound of that plan.

@peter1000
Copy link
Author

Updated PR..

@peter1000
Copy link
Author

Any progress on the merge for this?

@tkelman
Copy link
Contributor

tkelman commented Apr 23, 2015

I'd like other people to comment on the names (bikeshed please) of the rmdst and sym keyword arguments. A quick survey of names other languages use for similar functionality would be useful.

@peter1000
Copy link
Author

hi tkelman,

A quick survey of names other languages use for similar functionality would be useful.

Most of the others use multiple functions like copy_file, copy_tree ect.. and musch longer descriptive kwargs.

python shutil.copy

instead of my sym they use: follow_symlinks

shutil.copy (src, dst, *, follow_symlinks=True)

Copies the file src to the file or directory dst. src and dst should be strings. If dst specifies a directory, the file will be copied into dst using the base filename from src. Returns the path to the newly created file.

If follow_symlinks is false, and src is a symbolic link, dst will be created as a symbolic link.
If follow_symlinks is true and src is a symbolic link, dst will be a copy of the file src refers to.

ruby copy_entry

uses instead of my rmdst ruby: remove_destination otherwise quite similar to my default.

copy_entry(src, dest, preserve = false, dereference_root = false, 
                  remove_destination = false)

Copies a file system entry src to dest. If src is a directory, this method copies its contents recursively. This method preserves file types, c.f. symlink, directory… (FIFO, device files and etc. are not supported yet)

Both of src and dest must be a path name. src must exist, dest must not exist.

If preserve is true, this method preserves owner, group, and modified time. Permissions are copied regardless preserve.
If dereference_root is true, this method dereference tree root.

If remove_destination is true, this method removes each destination file before copy.

@pao
Copy link
Member

pao commented Apr 23, 2015

I'm in favor of longer, descriptively-named kwargs.

@peter1000
Copy link
Author

@pao I'm in favor of longer, descriptively-named kwargs

remove_destination and follow_symlinks taken from ruby and python would suite you?

what would you suggest?

@simonster
Copy link
Member

cp calls these options --remove-destination and --dereference, but I think follow_symlinks is clearer than the latter. So +1 to remove_destination and follow_symlinks.

@peter1000
Copy link
Author

@tkelman should I go ahead and rename them to rmdst => remove_destination and sym => follow_symlinks and update the docs?

And like python default follow_symlinks=true

@tkelman
Copy link
Contributor

tkelman commented Apr 23, 2015

remove_destination sounds good given the coreutils and ruby precedent. I'm not sure I like follow_symlinks as much (and it looks like the logic direction would need to be reversed for that vs your current sym?), but I tend to dislike most things from Python, I'm weird like that. @StefanKarpinski has said before that "we tend to follow Python's lead" for filesystem functions, except I guess where our recursive keyword argument works differently than having separate functions in Python...

@peter1000
Copy link
Author

So, I will change both and the docs and update the PR.

except I guess where our recursive keyword argument works differently than having separate functions in Python.

Would you prefer to export the helperfunction: cptree like python's ?

@pao
Copy link
Member

pao commented Apr 23, 2015

what would you suggest

Those would be fine. I'm not too picky, though (which is why I didn't offer a concrete recommendation).

@tkelman
Copy link
Contributor

tkelman commented Apr 23, 2015

Would you prefer to export the helperfunction: cptree like python's ?

I don't like the asymmetry that would introduce with rm. Maybe if our recursive rm were a separate rmtree function that would make sense (does Python do rm that way? edit: yes https://docs.python.org/2/library/shutil.html), but it isn't.

@peter1000
Copy link
Author

Ok - this could be simple exported at a later time if wanted. Send you later the update - Thanks for your help.

@peter1000
Copy link
Author

Seems this went well: travis-ci: i686 failur looks like something else?


* Added function `readlink` which returns the value of a symbolic link "path" ([#10714]).

* The `cp` function now accepts keyword arguments `recursive`, `remove_destination` and `follow_symlinks` ([#10506]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick, I think this should refer to the PR in which it was implemented (here) rather than the issue it closes?

@pao
Copy link
Member

pao commented Apr 24, 2015

and symlinks instead of follow_symlinks?

The lack of a verb is a problem, though--since this is a file copy operation, that option sounds like "copy symlinks" vs. "don't copy symlinks", which is not what this option controls.

@StefanKarpinski
Copy link
Sponsor Member

The "follow" verb doesn't actually make it much clearer to me. What does that option control?

@pao
Copy link
Member

pao commented Apr 24, 2015

Whether copy...follows symlinks? Which is what it does. I thought that was the common nomenclature for moving through a symlink to where it points--I guess you could call it "dereference" by analogy to pointers but I don't know that I've seen anyone call it that.

UPDATE: the cp help uses both on the same line...

-L, --dereference            always follow symbolic links

UPDATE 2: and --remove-destination is the relevant flag in cp for the other option. There is no short flag.

@StefanKarpinski
Copy link
Sponsor Member

So it's the difference between copying the content of the symlink versus copying the symlink as though it actually were the thing it points to?

@StefanKarpinski
Copy link
Sponsor Member

(I'm sort of playing dumb here because explaining what things do often reveals good names.)

@pao
Copy link
Member

pao commented Apr 24, 2015

No problem (please note I've updated the above comment with excerpts from cp --help).

Yes; taking many liberties with the analogy and in pseudo-C (not bothering with dirents, etc.):

FileContents* ptrFile; // is a symlink

// Results in a new pointer, but only one copy of the file
FileContents* newPtrFile = ptrFile; // cp --no-dereference; cp(..., follow_symlinks=false);

// Results in a new copy of the file
newPtrFile = malloc(fileSize);
memcpy(newPtrFile, ptrFile, fileSize); //cp --dereference; cp(..., follow_symlinks=true);

@peter1000
Copy link
Author

Eventhough I prefered the short initial version rmdst (remove destination first) and sym (keep symbolic links or don't follow them) the current long verbose names remove_destination and follow_symlinks do describe the functionality rather well and are established names in ruby / python for similar functionalities.

My personal preference would be:

  • I do not see much merit in having the resursive kwarg. If one calls copy on a directory source usually one wants to copy it - if one only wants to copy files one can check that easily before hand without an extra kwarg and just don't call cp on directories.

Other languages approach like python to offer 2 different function is also ok: so helper function cptree could be exported if wanted.

It is true that: rm has a similar kwarg but personally I thing both of them are not required in general usage.

With the above discussion of more describtive kwargs this would be my prefered proposal.

cp(src, dst; remove_dst=true, follow_sym=false)
cp(src, dst; remove_dst=true, follow_sym=false)

   Copy the file, link, or directory from *src* to *dest*.
   \"remove_dst=true\" will first remove an existing `dst`.

   If `follow_sym=false`, and src is a symbolic link, dst will be created as a symbolic link.
   If `follow_symlinks=true` and src is a symbolic link, dst will be a copy of the file or directory
   `src` refers to.

@simonster
Copy link
Member

I feel pretty strongly that rm should not be recursive by default. Most of the time you do not actually need recursive rm, and disabling it unless you explicitly ask makes it harder for a typo to erase all your data. For cp, I feel that remove_destination (or whatever we're going to call it) should default to false for the same reason, although recursive cp does not seem as harmful.

@StefanKarpinski
Copy link
Sponsor Member

I'm happier about these somewhat shorter versions. An important difference between recursive copy and recursive remove is that in the remove case, it's pretty bad if you meant to call it on a single file and accidentally call it on a directory. In the case of copy, it's not so bad since the original will be left intact.

@pao
Copy link
Member

pao commented Apr 24, 2015

I suppose we'll just have to agree to disagree--the rarer something is used or encountered, the more descriptive its name should be. kwargs are the ultimate in "rarely used" since the whole point is to have a sane default. Shortening "destination" to "dst" and "symlink" to "sym"(bol? oh crap, I'm a Windows user, lemme look that up on Google...SanYang Motor Company?) just seems like clipping characters for the sake of it.

@peter1000
Copy link
Author

@pao I suppose we'll just have to agree to disagree--the rarer something is used or encountered, the more descriptive its name should be. kwargs are the ultimate in "rarely used" since the whole point is to have a sane default.

@simonster For cp, I feel that remove_destination (or whatever we're going to call it) should default to false for the same reason, although recursive cp does not seem as harmful.

@StefanKarpinski I'm happier about these somewhat shorter versions.

As this is first of all a fix for issue: #10506 we need to come to some conclusion.
I think the most common ground would be to keep the current implementation with descriptive sane default kwargs which are also used in other respected languages. (I'm quite sure favours of shorter kwargs like myself and StefanKarpinski will be able to embrace such in the end with the set defaults.

cp(src::AbstractString, dst::AbstractString; recursive::Bool=true, remove_destination::Bool=false, follow_symlinks::Bool=true)

   Copy the file, link, or directory from *src* to *dest*.
   \"recursive=true\" is required to copy directories or links to directories.
   \"remove_destination=true\" will first remove an existing `dst`.

   If `follow_symlinks=false`, and src is a symbolic link, dst will be created as a symbolic link.
   If `follow_symlinks=true` and src is a symbolic link, dst will be a copy of the file or directory
   `src` refers to.

I only use Unix and in such case I personally would prefer a default of: follow_symlinks=false and keep symlinks as symlinks when copying. But until there is a clear agreement I think we should leave it as it is and fix the bug.

Cheers
P

@StefanKarpinski
Copy link
Sponsor Member

I think I support removing the recursive option and just making it always recursive. The default should be not to delete or overwrite anything, however.

throw(ArgumentError(string("'$src' is a directory. `recursive=true` is required to ",
"copy directories or links to directories.")))
elseif ispath(dst)
remove_destination ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also make more sense with an if-else I think.

@tkelman
Copy link
Contributor

tkelman commented Apr 24, 2015

I think I support removing the recursive option and just making it always recursive.

For cp I'd be okay with this option. Better than having separate functions for files vs directories I think.

@StefanKarpinski
Copy link
Sponsor Member

I just can't think of a reason not to just do recursive all the time. As @peter1000 said, if you want to make sure you're copying a file, just check that it's a file first. Now mv is a harder argument to make since it doesn't leave the original file/directory intact.

@StefanKarpinski
Copy link
Sponsor Member

Actually, no, that's totally wrong – mv is easy to and does already default to "recursive" since you just rename the damned thing. The only one that's different is rm.

@peter1000
Copy link
Author

@StefanKarpinski I think I support removing the recursive option and just making it always recursive. The default should be not to delete or overwrite anything, however.
@tkelman For cp I'd be okay with this option.

I'm more than supportive for this.

What is your opinions about default: follow_symlinks=false and keep symlinks as symlinks when copying.

@StefanKarpinski
Copy link
Sponsor Member

Another API option would be using symbol values, e.g.

cp(src, dst, symlinks=:follow, destination=:replace)
cp(src, dst, destination=:merge)
cp(src, dst, destination=:noreplace, symlinks=:nofollow)

I'm not sure that's better, but I thought I'd throw it out there.

@peter1000
Copy link
Author

cp(src, dst, destination=:merge) what would this mean?
and what would it mean in cases where symlinks are involved. #10888 (comment)

And what about other combination: cp(src, dst, symlinks=:nofollow, destination=:replace)

I think in the end the approach with the kwargs might be prefered one.

  • What is your opinions about default: follow_symlinks=false and keep symlinks as symlinks when copying.

@StefanKarpinski
Copy link
Sponsor Member

Just a hypothetical third option besides replace and don't replace: it would merge the two directory structures. Of course, it's not a great example, since there's a further two options which is whether to favor versions in the source or destination.

@peter1000
Copy link
Author

Don't you think instead of packing all of it into cp one could have a general quite useful function merge directories whatever the name might be.

@peter1000
Copy link
Author

Based on the last common discussion points I updated the PR to:

@pao: I'm in favor of longer, descriptively-named kwargs

@simonster: For cp, I feel that remove_destination (or whatever we're going to call it) should default to false for the same reason

@StefanKarpinski: I think I support removing the recursive option and just making it always recursive. The default should be not to delete or overwrite anything, however.

@tkelman For cp I'd be okay with this option.

@peter1000 would prefer a default of: follow_symlinks=false src links will be created as a symbolic link in dst

cp(src::AbstractString, dst::AbstractString; remove_destination::Bool=false, follow_symlinks::Bool=false)

   Copy the file, link, or directory from *src* to *dest*.
   "remove_destination=true" will first remove an existing `dst`.

   If `follow_symlinks=false`, and src is a symbolic link, dst will be created as a symbolic link.
   If `follow_symlinks=true` and src is a symbolic link, dst will be a copy of the file or directory
   `src` refers to.

It seems to me that this is the best compromise of the 5 opinions involved in the discussion.

@tkelman
Copy link
Contributor

tkelman commented Apr 24, 2015

Glad this got some input from a few other people. Assuming it passes CI I'm in favor of merging peter1000@514be00

@StefanKarpinski
Copy link
Sponsor Member

Yes, this seems more than sufficiently bikeshedded. Thanks for your perseverance, @peter1000!

StefanKarpinski added a commit that referenced this pull request Apr 25, 2015
@StefanKarpinski StefanKarpinski merged commit 0c6287c into JuliaLang:master Apr 25, 2015
@peter1000 peter1000 deleted the new__cp_function branch April 25, 2015 13:37
peter1000 pushed a commit to peter1000/Lexicon.jl that referenced this pull request Apr 25, 2015
my new implementation of the `cp` function got merged
JuliaLang/julia#10888

sets `remove_destination=true` otherwise if you rerun you get an:

```
ERROR: ArgumentError: 'static/custom.css' exists.
`remove_destination=true` is required to remove
'static/custom.css' before copying.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants