Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileCopier: avoid repeated copies of absolute paths. #5792

Merged
merged 1 commit into from Oct 6, 2019

Conversation

@mplatings
Copy link
Contributor

mplatings commented Sep 22, 2019

Changelog: Bugfix: Avoid repeated copies of absolute paths when using self.copy().
Docs: Omit

  • Refer to the issue that supports this Pull Request.
    No. I tried creating an issue but the "Submit new issue" button stayed greyed-out. However, hopefully this change is minor enough that creating an issue for it isn't necessary.
  • If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
    My team have occasionally written conanfiles that do this:
    def imports(self): self.copy('foo', src=os.path.join(self.deps_cpp_info['bar'].rootpath, 'baz'))
    What isn't obvious to developers is that self.copy() will iterate over all dependencies and repeat the copy for each one. This is suboptimal. We've ended up using functions from distutils but we'd prefer to be able to use self.copy().
  • I've read the Contributing guide.
  • I've followed the PEP8 style guides for Python code.
  • I've opened another PR in the Conan docs repo to the develop branch, documenting this one.
    No. This change shouldn't be observable other than speeding things up in some cases.
@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Sep 22, 2019

CLA assistant check
All committers have signed the CLA.

@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 1, 2019

Hi @mplatings

I am trying to understand the use case, before checking the code. It seems that the case for

def imports(self): 
     self.copy('foo', src=os.path.join(self.deps_cpp_info['bar'].rootpath, 'baz'))

Is what can be achieved with:

def imports(self): 
     self.copy('foo', root_package="bar")

That will restrict the copy to the bar package, not all. Could you please clarify your use case? thanks!

@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 2, 2019

Hi James, thanks for taking the time to look this over. I didn't spot the root_package option, looks very handy.

The example that comes up most frequently is:

def package(self):
    self.copy('include/*', src=self.source_folder)

Currently, this will copy everything in the include directory twice.

@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 2, 2019

Hi @mplatings

I am not sure it is copying those files twice. The code is:

folders = [source_folder, build_folder] if source_folder != build_folder else [build_folder]
conanfile.copy = FileCopier(folders, package_folder)

So it will only execute the pattern twice if the source folder is different to the build folder, which is what happens if no_copy_source is True, and we need to package headers both from the "source" folder, but potentially headers that have been generated in the "build" folder as a result of the build.

Could you please double-check this duplicated copy, and provide some way to reproduce it? I cannot see it so far. Thanks!

@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 2, 2019

Sorry and thank you for your patience.

def package(self):
    self.copy('include/*', src=os.path.join(self.source_folder, 'foo'))
@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 2, 2019

No problem glad to help if possible :)

I have just debugged a package creation with that package() method, and still the copy is called just once, not duplicated. What makes you think that with such method things are being copied twice?

@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 2, 2019

If build folder and source folder are the same then the double-copy won't occur. But if you do out-of-source builds (like my team does) then files are copied twice. Hopefully this reproducer makes it clear:

$ ls foo
thing.h
$ cat conanfile.py
import os
from conans import ConanFile
class MichaelConan(ConanFile):
    def package(self):
        print(self.copy('*', src=os.path.join(self.source_folder, 'foo')))
$ mkdir build && cd build
$ conan install ..
...
$ conan package ..
...
['.../thing.h', '.../thing.h']
...

The print statement shows that thing.h is in the list of copied files twice.

@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 3, 2019

I see.

The mechanism for package() method and self.copy() is almost the same in the user space (conan install + conan package) than in the cache. The copies are iterated over 2 folders, the source folder (typically to be able to gather headers and other files), and the lib folder (to collect libraries, binaries, etc). In the local case, the FileCopier is being provided with such 2 folders:

source-folder =
build-folder = /build

(note that this is not purely an out-of source, as the build folder is inside the source)

The FileCopier has mechanisms to avoid iterating twice the same folder. But only as long as the src argument is relative, not absolute, when it is specified as an absolute path, Conan uses it always. It is better to be slow copying twice than filtering out a folder, and it is not possible to filter 100% correctly without breaking other users.

But as long as you use the relative path, it will copy the thing.h file just once. I have tried with:

import os
from conans import ConanFile
class MichaelConan(ConanFile):
    def package(self):
        print(self.copy('*', src='foo'))

And it works fine, it prints only 1 occurrence.

@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 3, 2019

I was going to ask what the right answer is when you want to be certain only to copy files from the source directory and not the build directory (or vice versa). But I see that when running conan create those two folders are the same. So any conanfile that relies on the two folders being separate is broken.
For example, a conanfile that does self.copy('*', src=self.build_folder) is going to behave quite differently when running conan export-pkg from a build folder compared to conan create.
Maybe a warning in the docs to say that using absolute paths will lead to divergent behaviour would help others? Or even have Conan emit a warning if an absolute path is passed as src?
Thanks again for your help. I'm off to remove all the code that uses absolute paths with self.copy...

@mplatings mplatings closed this Oct 3, 2019
@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 3, 2019

Actually, if no_copy_source=True is set in the conanfile then conan create keeps the source and build folders separate, and self.copy('*', src=self.build_folder) doesn't copy anything from the source folder.
So, what's the right answer for this scenario please?:

$ ls foo
thing.h
$ cat conanfile.py
import os
from conans import ConanFile
class MichaelConan(ConanFile):
    no_copy_source=True
    exports_sources = "foo*"
    def build(self):
        os.makedirs('foo', exist_ok=True)
        with open(os.path.join(self.build_folder, 'foo/top-secret-do-not-distribute.h'), 'w') as f:
            f.write("Area 51 entry code: 1234")
    def package(self):
        print(self.copy('*', src=os.path.join(self.source_folder, 'foo')))
$ conan create . test/0.1@
@mplatings mplatings reopened this Oct 3, 2019
@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 4, 2019

Hi @mplatings

Of course there is always the possibility to not use self.copy(), it is just a helper, it doesn't do that much, so for special cases, might be the only solution.

But I think for your use case, maybe the excludes argument could help.
I have tried:

import os
from conans import ConanFile
from conans.tools import save

class MichaelConan(ConanFile):
    no_copy_source=True
    exports_sources = "foo*"

    def build(self):
        save(os.path.join(self.build_folder, 'foo/top-secret-do-not-distribute.h'),
            "Area 51 entry code: 1234")

    def package(self):
        print(self.copy('*', src='foo', excludes="*secret*"))

Seems to do the right thing in the different flows:

  • conan create
  • conan install + conan build + conan package
  • conan install + conan build + conan export-pkg

Please let me know if this helps.

@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 4, 2019

That works, thanks. I still think there's value in the pull request - I can't see how it would break anything, but could speed things up in some cases. But if you disagree I won't object to closing the pull request.
Thanks again for taking the time :)

@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 4, 2019

That works, thanks. I still think there's value in the pull request - I can't see how it would break anything, but could speed things up in some cases. But if you disagree I won't object to closing the pull request.
Thanks again for taking the time :)

You are totally right :) I think this can be merged, it might provide some value for some users (even if they could avoid it with other mechanisms), it is correct from the implementation point of view and it seems very low risk.

Sorry for all the lengthy discussion, but it is very important for us to understand the use cases, problems and approaches before merging code, because sometimes these changes are hiding issues that should be addressed elsewhere.

The only problem is that it seems that the test has been moved from that functional/old folder to the unittest/client folder, if you could please put the test there and update the PR, I will merge it for next release. Thanks very much for all your help!

@mplatings mplatings force-pushed the mplatings:develop branch from 012a440 to af96215 Oct 6, 2019
@mplatings

This comment has been minimized.

Copy link
Contributor Author

mplatings commented Oct 6, 2019

please put the test there and update the PR

Done. Thanks!

@memsharded memsharded added this to the 1.20 milestone Oct 6, 2019
@memsharded memsharded merged commit 5352a4e into conan-io:develop Oct 6, 2019
2 checks passed
2 checks passed
continuous-integration/jenkins/pr-head This commit looks good
Details
license/cla Contributor License Agreement is signed.
Details
@memsharded

This comment has been minimized.

Copy link
Member

memsharded commented Oct 6, 2019

Merged! Will be in 1.20, thanks for the feedback and the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.