Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: pypi_package #699

Closed
cgrushko opened this issue Dec 10, 2015 · 29 comments
Closed

Feature request: pypi_package #699

cgrushko opened this issue Dec 10, 2015 · 29 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) type: feature request
Milestone

Comments

@cgrushko
Copy link
Contributor

... similar to maven_jar, but downloads Python code from PyPi and makes them available in the build.

@cgrushko cgrushko changed the title Feature request: maven_pypi Feature request: pypi_library Dec 10, 2015
@cgrushko
Copy link
Contributor Author

@Shaywei

@Shaywei
Copy link
Contributor

Shaywei commented Dec 10, 2015

I am definitely interested in this!
Can't promise anything, but if I'll get some time to hack on Bazel, I'll definitely give this a shot!

@damienmg
Copy link
Contributor

On the roadmap but might take some time, see https://docs.google.com/document/d/1jKbNXOVp2T1zJD_iRnVr8k5D0xZKgO8blMVDlXOksJg/

@damienmg damienmg added type: feature request P2 We'll consider working on this in future. (Assignee optional) external repositories and removed external repositories labels Dec 11, 2015
@xiongchiamiov
Copy link

Registering interest in this as well.

In the mean time, https://github.com/mihaibivol/bazel_pipy_rules provides some hacks for getting PyPI-based libraries into the build system.

@damienmg
Copy link
Contributor

/cc @meteorcloudy

@davidzchen
Copy link
Member

This is more of a matter of naming but would it be confusing to have workspace rules called *_library that appear to be regular build rules? Would it be better to name this rule pypi_package instead?

@damienmg damienmg changed the title Feature request: pypi_library Feature request: pypi_package Mar 30, 2016
@damienmg damienmg modified the milestones: 0.5, 0.4 Jun 14, 2016
@yugui
Copy link
Member

yugui commented Oct 4, 2016

I have prototyped a workspace rule for PyPI packages. I hope it helps us to implement this feature in the main Bazel repository.
https://github.com/gengo/rules_pypi

Currently this prototype does not support building extensions in Bazel sandbox.
Now I am trying to extract extension metadata from setup.py and to let cc_library build extensions as cgo support in rules_go does.

BTW, @damienmg, is there any good way to get paths to python2 and python3 interpreters in Skylark rules?

@damienmg
Copy link
Contributor

damienmg commented Oct 4, 2016

There is not AFAIK :(

The pypi rules would be ok to contribute back, I am however a bit hesitant in using directly pypi to do it (rather than a more reproducible way, like download), can we have confidence in what pypi does?

@yugui
Copy link
Member

yugui commented Oct 4, 2016

@damienmg

I am however a bit hesitant in using directly pypi to do it

I agree. In the prototype, I have used pip download just for ease of implementation.
But we can write a wrapper script of pip.index.PackageFinder to parse PyPI page.
Then we can use ctx.download in Bazel.

@damienmg
Copy link
Contributor

damienmg commented Oct 4, 2016

Sounds like a good plan.

On Tue, Oct 4, 2016 at 3:45 PM Yuki Yugui Sonoda notifications@github.com
wrote:

@damienmg https://github.com/damienmg

I am however a bit hesitant in using directly pypi to do it

I agree. In the prototype, I have used pip download just for ease of
implementation.
But we can write a wrapper script of pip.index.PackageFinder to parse
PyPI page.
Then we can use ctx.download in Bazel.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#699 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf-a3m1pQfmweCQjiD1ez3xS6WLR6ks5qwliUgaJpZM4Gy_5I
.

@yugui
Copy link
Member

yugui commented Oct 4, 2016

There is not AFAIK :(

:(
It would be great if I can have ctx.fragments.python so that I can call appropriate interpreters in repository rules.

@damienmg
Copy link
Contributor

damienmg commented Oct 4, 2016

repository rules does not even have access to that information :( I guess
depending on a python target could work though. But this target does not
exists in Bazel :(

On Tue, Oct 4, 2016 at 3:48 PM Yuki Yugui Sonoda notifications@github.com
wrote:

There is not AFAIK :(

:( I want ctx.fragments.python so that I can call appropriate
interpreters in repository rules.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#699 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf4kUuR1gHJYw_9xLusDcHb2mi-6Xks5qwlksgaJpZM4Gy_5I
.

@Zaspire
Copy link

Zaspire commented Oct 5, 2016

I found third party build rules for that:
https://github.com/benley/bazel_rules_pex

@yugui
Copy link
Member

yugui commented Oct 5, 2016

rules_pex looks to be mainly responsible for building PEX from the given package.
It just records the given list of PyPI packages in the PEX Manifest and let PEX to fetch and build the packages.
So I don't think rules_pex solve the issue we are trying to solve in this feature. Instead, rules_pex implements a simple workaround of the issue with some compromise of {fast, correct}.

@benley
Copy link
Contributor

benley commented Oct 5, 2016

That's correct, rules_pex delegates pypi handling to the pex application. It does provide a mechanism for staying fast and correct, by using the eggs attribute to pass python eggs or wheels (downloaded by bazel http_file rules) to pex, at least. I have been trying to come up with a way of extracting pex's pypi handling to repository rules, so I'm quite pleased to see @yugui's implementation come along!

@yugui
Copy link
Member

yugui commented Oct 6, 2016

Now I am trying to extract extension metadata from setup.py and to let cc_library build extensions as cgo support in rules_go does.

Small update.
I tried to extract extension metadata from setup.py in a branch but I realized that it was nearly impossible because build setups in setup.py is fundamentally arbitrary code execution.

The basic architecture of setup.py is that (1) the package author gives build step description to distutils.core.setup -- they can also use setuptools to collect such description from the source file tree; (2) then, execute build steps driven by the description metadata.
However some popular packages like numpy or psycopg2 customize implementation of the build steps driven by data. So we cannot expect the given data is sufficient to construct a dependency graph which Bazel needs nor that the description metadata is written in a certain common format. The only way to know how should happen with setup.py looks to be to actually execute it.

@yugui
Copy link
Member

yugui commented Oct 11, 2016

Tried another approach that assumes wheel files are available locally or remote.
https://github.com/gengo/rules_pypi/tree/feature/wheel

However this approach does not solve the problem of building wheels from sources as is.
Also it is not so easy to build wheels without resolving package dependencies -- e.g. pip wheel pyxDamerauLevenshtein fails unless we install numpy in a virtualenv.

@damienmg
Copy link
Contributor

Can we provide a way to build a virtualenv? That would be awesome to have a comprehensive suite of tooling to work with pip.

/cc @meteorcloudy

@benley
Copy link
Contributor

benley commented Oct 12, 2016

The pex rules use virtualenv in a genrule to bootstrap the pex builder:
https://github.com/benley/bazel_rules_pex/blob/master/pex/BUILD#L4

It's a bit gross, but I suspect it wouldn't be too hard to generalize into a skylark py_virtualenv rule. What I'm unsure about is how to handle the output in a way that makes it useful to other rules without turning it into a tarball.

@yugui
Copy link
Member

yugui commented Oct 13, 2016

Here's our tentative solution.
https://gist.github.com/yugui/b7e80987dc2077754d91750b26eca3de

I have implemented it as a repository_rule so that the output is available as a plain py_library without archiving files -- it requires the rule to generate an unpredictable set of files.

@yugui
Copy link
Member

yugui commented Oct 13, 2016

Findings from my prototype.

  1. It is possible to locate source archives or python wheels and download them with ctx.download.
  2. We cannot know what build steps are required to build the target PyPI package because setup.py is actually arbitrary code execution even though it looks to be metadata-driven.
  3. We cannot know a PyPI package alone in a sandbox because its setup.py often depends on another PyPI package. This is another reason why we need to actually install the package in a virtualenv.
  4. Not all PyPI packages are zip-safe. So,
  5. Some good examples of major PyPI packages.
    • Numpy -- it does not require any C libraries installed, but its build process is complicated. It is not zip-safe.
    • pyyaml -- it depends on a C library, libyaml.
    • pyxDamerauLevenshtein -- pip install pyxDamerauLevenshtein will fail without installing numpy before. pip install numpy pyxDamerauLevenshtein will also fail.

@benley
Copy link
Contributor

benley commented Oct 13, 2016

One unfortunate thing I ran into while experimenting with your implementation: On MacOS, the default case-preserving-but-insensitive HFS+ filesystem causes setup.py builds to break. Setuptools (and presumably distutils) seems to insist on creating a directory called build/ in the source root, but bazel's BUILD file makes that impossible.

(edit: only applies when applying py_requirements to a source repo, not when using the -r requirements.txt style)

@BillWSY
Copy link

BillWSY commented Oct 28, 2016

@yugui Will you be able to share pypi_universal_repository, required by https://gist.github.com/yugui/b7e80987dc2077754d91750b26eca3de?

@trivigy
Copy link

trivigy commented Nov 21, 2016

So I have been messing around with this for a while and came up with something very short. I loved @benley implementation of pex_rules and ended up learning a ton from it and implementing my own but the missing part was the pex,pip,setuptools dependencies that had to come in during the analysis stage. After reviewing @yugui repo for her implementation of the pypi_rules I ended up creating something somewhat similar in concept.

Sorry for posting it in raw form. I am kinda short on time and just wanted to share for others.

WORKSPACE

load('//tools/build_rules:pypi_rules.bzl', 'pypi_repositories')

pypi_repositories(['pex', 'protobuf'])

The syntax for pypi_repositories is exactly as one that is used with pip since that function just pipes things into get-pip.py. One could use syntax like "setuptools==x.x.x" etc.

pypi_rules.bzl

_BUILD_FILE = """
filegroup(
    name = 'pip_tools',
    srcs = glob(
        include = ['bin/**/*', 'site-packages/**/*'],
        exclude = [
            # Illegal as Bazel labels but are not required by pip.
            "site-packages/setuptools/command/launcher manifest.xml",
            "site-packages/setuptools/*.tmpl",
        ]
    ),
    visibility = ['//visibility:public']
)
"""


def _pip_tools_impl(ctx):
    getpip = ctx.path(ctx.attr._getpip)
    tools = ctx.path('site-packages')

    command = ['python3', str(getpip)]
    command += list(ctx.attr.packages)
    command += ['--target', str(tools)]
    command += ['--install-option', '--install-scripts=%s' % ctx.path('bin')]
    command += ['--no-cache-dir']
    ctx.execute(command)
    ctx.file('BUILD', _BUILD_FILE, False)


_pip_tools = repository_rule(
    _pip_tools_impl,
    attrs={
        'packages': attr.string_list(),
        '_getpip': attr.label(
            default=Label('@getpip//file:get-pip.py'),
            allow_single_file=True,
            executable=True,
            cfg='host'
        )
    }
)


def pypi_repositories(packages=None):
    native.http_file(
        name="getpip",
        url="https://bootstrap.pypa.io/get-pip.py",
        sha256="19dae841a150c86e2a09d475b5eb0602861f2a5b7761ec268049a662dbd2bd0c"
    )

    _pip_tools(
        name="pypi",
        visibility=['//visibility:public'],
        packages=packages if packages else []
    )

    native.bind(
        name="pip_tools",
        actual="@pypi//:pip_tools",
    )

I am including the pex_rules just for the visualization of how it is actually used. Concentrate on the ctx.action() inside of pex_binary_impl() and _pip_tools attr in pex_bin_attrs for how the dependencies end up getting propagated from the cache. I ended up using --no-cache-dir flag for now but obviously one could expand on this and make it more advanced by allowing it actually check with whatever is installed on the local machine to save time from forcing a download every time a build is made.

pex_rules.bzl

pex_file_types = FileType([".py"])


def collect_transitive_srcs(ctx):
    transitive_srcs = set(order="compile")
    for dep in ctx.attr.deps:
        transitive_srcs += dep.transitive_srcs
    transitive_srcs += pex_file_types.filter(ctx.files.srcs)
    return transitive_srcs


def collect_transitive_reqs(ctx):
    transitive_reqs = set(order="compile")
    for dep in ctx.attr.deps:
        transitive_reqs += dep.transitive_reqs
    transitive_reqs += ctx.attr.reqs
    return transitive_reqs


def pex_library_impl(ctx):
    build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
    transitive_srcs = collect_transitive_srcs(ctx)
    transitive_reqs = collect_transitive_reqs(ctx)
    transitive_reqs += set([build_path])
    return struct(
        files=set(),
        transitive_srcs=transitive_srcs,
        transitive_reqs=transitive_reqs
    )


def pex_binary_impl(ctx):
    build_path = '/'.join(ctx.build_file_path.split('/')[:-1])
    transitive_srcs = collect_transitive_srcs(ctx)
    transitive_reqs = collect_transitive_reqs(ctx)
    transitive_reqs += set([build_path])

    command = 'external/pypi/bin/pex ' + \
              '%s ' % ' '.join([f for f in transitive_reqs]) + \
              ('-v ' if ctx.attr.verbose else ' ') + \
              '--entry-point=%s ' % ctx.attr.entry_point + \
              '--output-file=%s ' % ctx.outputs.executable.path + \
              '--python=%s' % ctx.attr.interpreter

    ctx.action(
        mnemonic='PexCompile',
        inputs=list(transitive_srcs + ctx.attr._pip_tools.files),
        command=command,
        outputs=[ctx.outputs.executable],
        env={
            'PATH': '/bin:/usr/bin:/usr/local/bin',
            'PYTHONPATH': 'external/pypi/site-packages',
            'LANG': 'en_US.UTF-8',
            'PEX_ROOT': '.pex'
        },
    )

    return struct(files=set([ctx.outputs.executable]))


pex_attrs = {
    'srcs': attr.label_list(allow_files=True),
    'reqs': attr.string_list(),
    'deps': attr.label_list(
        providers=[
            'transitive_srcs',
            'transitive_reqs'
        ],
        allow_files=False
    )
}

pex_bin_attrs = pex_attrs + {
    'entry_point': attr.string(mandatory=True),
    'interpreter': attr.string(default='python3.5'),
    'verbose': attr.bool(default=False),
    "_pip_tools": attr.label(default=Label("//external:pip_tools"))
}

pex_library = rule(
    pex_library_impl,
    attrs=pex_attrs
)

pex_binary = rule(
    pex_binary_impl,
    attrs=pex_bin_attrs,
    executable=True
)

@tanin47
Copy link

tanin47 commented Apr 12, 2017

Is there a progress on this?

@damienmg
Copy link
Contributor

No sorry.

@tanin47
Copy link

tanin47 commented Apr 13, 2017

Thanks @trivigy for the code snippet. I modified the code to work with python 2.7.

Here's the full working example: https://github.com/tanin47/bazel-pex-pip/blob/master/pypi.bzl

@mattmoor
Copy link

Just found this thread, but I have a version of the rules (in progress) here built around the pip concept of .whl files.

Basically, it relies on pip wheel to translate requirements.txt into .whl files (either by fetching them, or by building them). Once in .whl form, it imports the content from each .whl into a py_library, importing dependency data from foo.dist-info/metadata.json.

I'd love feedback on the PR, or reports of any issues people may have with them.

@mattmoor
Copy link

My PR is merged, so I'm going to close this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) type: feature request
Projects
None yet
Development

No branches or pull requests