New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package runfiles along with C++ binaries in pkg_* rules #4383

Open
sharvil opened this Issue Jan 3, 2018 · 16 comments

Comments

Projects
None yet
7 participants
@sharvil
Copy link

sharvil commented Jan 3, 2018

Description of the problem / feature request / question:

(https://groups.google.com/forum/#!topic/bazel-discuss/5r_Ajw_j-ZI for context)

The current pkg_* rules don't package runfiles along with C++ binaries. This behavior makes it difficult to deploy an entire C++ application to machines that don't have source access.

Can we update the packaging rules to include runfiles along with binaries?

Environment info

  • Operating System:
    MacOS 10.13.2

  • Bazel version (output of bazel info release):
    release 0.8.1-homebrew

@hlopko

This comment has been minimized.

Copy link
Contributor

hlopko commented Jan 4, 2018

cc @lberki @ventrescadeatun
Lukacs do you know if this is by design? It looks broken to me. To me pkg_tar should tar everything that's needed to execute the cc_binary, am I wrong? :)

@lberki

This comment has been minimized.

Copy link
Contributor

lberki commented Jan 8, 2018

Yep, this is by design because the runfiles are not always necessary.

We internally have a pkg_runfiles rule that packages the runfiles of a binary, which should simply be open sourced.

@kgreenek

This comment has been minimized.

Copy link

kgreenek commented Jan 9, 2018

Major +1 for this

This also is the same for python. At area17, we want to use this for deployment, but it is difficult currently because there isn't a good way to package everything we need together (for both py and cc).

For example, for our python targets, we often depend on pip targets (which are installed within bazel via the pip_import rule). Those pip imports aren't packaged.

This would be a great feature!

@hlopko

This comment has been minimized.

Copy link
Contributor

hlopko commented Feb 2, 2018

@laszlocsomor Is this something your work on runfiles will make simple(r)? :)

@laszlocsomor

This comment has been minimized.

Copy link
Contributor

laszlocsomor commented Feb 2, 2018

@mhlopko : The runfiles library will make it very easy to use look up runfiles, once they are available one way or another. Packaging the runfiles is a separate issue; my work is unrelated to that.

@hlopko

This comment has been minimized.

Copy link
Contributor

hlopko commented Feb 2, 2018

Ok, thanks for the info!

@sharvil

This comment has been minimized.

Copy link
Author

sharvil commented Feb 4, 2018

Thanks for looking into this, @mhlopko and @lberki.

Does pkg_runfiles include the associated binary as well? If not, this feature request is for a slightly different behavior: to provide a packaging rule that produces a self-contained package of a binary and its runfiles.

@hlopko

This comment has been minimized.

Copy link
Contributor

hlopko commented Mar 8, 2018

Sorry for the silence. I just looked into the code and it looks like you should be able to write a skylark rule that will provide this behavior. https://docs.bazel.build/versions/master/skylark/lib/runfiles.html and https://docs.bazel.build/versions/master/skylark/rules.html#runfiles should help you get started.

@curtismuntz

This comment has been minimized.

Copy link

curtismuntz commented Jun 28, 2018

It looks like some commits have made it into 0.15 that 100% resolve this issue on c++ for me (f90ed65#diff-73cc3e84377e7c63ef4406039e060016), but doesn't necessarily fix this for python still.

The new include_runfiles parameter to the pkg_tar rule will copy over all the required files for both c++ and python, but don't correctly update the python runfile paths to reflect. The c++ paths work fine.

I created a repository that demonstrates the issue here https://github.com/curtismuntz/bazel_pkg_tar

After building via bazel build src:foo_tar and extracting the produced tarball, the following tree structure exists:

$ tree  -L 3
.
├── opt
│   ├── foo
│   └── foo.py
└── pypi__numpy_1_13_1
    ├── numpy
    │   ├── add_newdocs.py
    │   ├── compat
    │   ├── __config__.py
    │   ├── core
    │   ├── ctypeslib.py
    │   ├── _distributor_init.py
    │   ├── distutils
    │   ├── doc
    │   ├── dual.py
    │   ├── f2py
    │   ├── fft
    │   ├── _globals.py
    │   ├── _import_tools.py
    │   ├── __init__.py
    │   ├── lib
    │   ├── linalg
    │   ├── ma
    │   ├── matlib.py
    │   ├── matrixlib
    │   ├── polynomial
    │   ├── random
    │   ├── setup.py
    │   ├── testing
    │   ├── tests
    │   └── version.py
    ├── numpy-1.13.1.data
    │   └── scripts
    └── numpy-1.13.1.dist-info
        ├── DESCRIPTION.rst
        ├── METADATA
        ├── metadata.json
        ├── RECORD
        ├── top_level.txt
        └── WHEEL

Attempting to run opt/foo produces:

$ ./opt/foo   
Traceback (most recent call last):
  File "./foo", line 203, in <module>
    Main()
  File "./foo", line 139, in Main
    module_space = FindModuleSpace()
  File "./foo", line 86, in FindModuleSpace
    raise AssertionError('Cannot find .runfiles directory for %s' % sys.argv[0])
AssertionError: Cannot find .runfiles directory for ./foo

And trying to call opt/foo.py directly:

$ ./opt/foo.py 
Traceback (most recent call last):
  File "./foo.py", line 2, in <module>
    import numpy as np
ImportError: No module named 'numpy'

Pretty sure this is a bazel core py_binary issue, but is the plan for pkg_tar to provide this interface via include_runfiles? Or is it best practice to implement a skylark rule for this functionality?

@hlopko

This comment has been minimized.

Copy link
Contributor

hlopko commented Jul 5, 2018

castler added a commit to castler/bazel that referenced this issue Aug 11, 2018

[pkg_tar] Package py_binary runfiles correctly
For py_binary rules it is necessary that the packaging of the runfiles
is done correctly, since otherwise the py_binary might not be able to
be executed, since dependencies are missing.

Currently the runfiles are not packaged in a according <target>.runfiles
directory, which leads undefined depdencies.

This commit creats such a <target>.runfiles directory as done by bazel
in the bazel-bin folder structure.

Part of issue: bazelbuild#4383

castler added a commit to castler/bazel that referenced this issue Aug 31, 2018

[pkg_tar] Package py_binary runfiles correctly
For py_binary rules it is necessary that the packaging of the runfiles
is done correctly, since otherwise the py_binary might not be able to
be executed, since dependencies are missing.

Currently the runfiles are not packaged in a according <target>.runfiles
directory, which leads undefined depdencies.

This commit creats such a <target>.runfiles directory as done by bazel
in the bazel-bin folder structure.

In order to ensure that other rules e.g. cc_binary, are not affected,
we check if our rule is python related.

Part of issue: bazelbuild#4383
@laszlocsomor

This comment has been minimized.

Copy link
Contributor

laszlocsomor commented Oct 16, 2018

@lberki : how hard would it be to opensource pkg_runfiles? The question came up again: https://stackoverflow.com/questions/52823983

@laszlocsomor

This comment has been minimized.

Copy link
Contributor

laszlocsomor commented Oct 16, 2018

Actually, it should be easy to implement pkg_runfiles in Starlark:

  • DefaultInfo.data_runfiles (or DefaultInfo.default_runfiles?) contains the File objects for the runfiles
  • FilesToRunProvider.runfiles_manifest contains File object for the runfiles manifest
  • It should be easy to implement a custom C++ binary or Bash script that can parse a runfiles manifest and tars/zips up the runfiles it references
  • Using that binary/script, our hypothetical pkg_runfiles rule could create a ctx.actions.run / ctx.actions.run_shell with the binary/script, pass all the File objects for the runfiles and the manifest as inputs, and expect just a tar or zip as the output.

WDYT?

@benjaminp

This comment has been minimized.

Copy link
Collaborator

benjaminp commented Oct 16, 2018

FilesToRunProvider.runfiles_manifest contains File object for the runfiles manifest

I think this attribute should be removed from Starlark. The manifest contains absolute paths, so depending on this artifact is a good way to make your rule non-reproducible across machines. I raised this point on the mailing list, but it didn't generate much interest.

What would be helpful would be extending the Starlark runfiles API as I have proposed to allow complete introspection of runfiles objects. With my CL merged, it should be possible for Starlark to construct a runfiles tree identical to the one build-runfiles makes in whatever package is desired.

@laszlocsomor

This comment has been minimized.

Copy link
Contributor

laszlocsomor commented Oct 17, 2018

That's a good point.

Your proposal sounds good to me in general. My only question is how to distinguish runfiles types -- normal symlinks vs. empty files (__init__.py) vs. whatever else there could be? Implementing a pkg_runfiles rule would need that ability.

@laszlocsomor

This comment has been minimized.

Copy link
Contributor

laszlocsomor commented Oct 17, 2018

@benjaminp : let's continue this discussion on the thread: https://groups.google.com/d/msg/bazel-dev/uCfpNnVLJa4/Uomy07-iBgAJ

@lberki

This comment has been minimized.

Copy link
Contributor

lberki commented Oct 19, 2018

@laszlocsomor : depends on what you want to do -- we seem to have an API so that one can look into runfiles trees to see which symlinks there are and where they point. There are all sorts of little wrinkles, though, that need to be considered. They are mostly for Google-internal awful hacks we had the sense not to contaminate Bazel with, but still, some auditing will be required.

@hlopko hlopko added the bazel 1.0 label Nov 28, 2018

@hlopko hlopko removed their assignment Nov 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment