Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first draft of conda-build variants docs #414

Merged
merged 24 commits into from
Jun 22, 2017

Conversation

msarahan
Copy link
Contributor

Proposed docs for conda/conda-build#1585

@cio-docs please ignore for now, but would appreciate copy editing soon, after this functionality has undergone more review and testing.

@electronwill
Copy link
Contributor

Sounds good! Just let us know when it's ready and we should be able to help with copy editing.

2. The dependency is listed by name (no pinning) in the requirements/run section
3. The ``pin_run_as_build`` key in the variant has a value that is a dictionary,
containing a key that matches the dependency name listed in the run
requirements. The value should be a pinning expression, or a tuple of two pinning expressions to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to ... ? Is there more to this item?

The result here is that the runtime numpy dependency will be pinned to
``>=(current numpy 1.11.x version),<1.12``

Numpy is an interesting example here. It actually would not make a good case for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have a more straightforward example for pinning at the variant level and leave numpy as a note. Would boost be a better example?

u'source': {u'path': u'/Users/msarahan/code/conda-build/tests/test-recipes/split-packages/_rm_rf_stays_within_prefix'}}}


Extra Jinja2 functions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if arguments and their meaning were explained here.

@msarahan msarahan force-pushed the document_conda_build_variants branch from 7e98d45 to 5579dee Compare February 13, 2017 15:42
@msarahan
Copy link
Contributor Author

Thanks @kerrywatson1 - I'm not sure, but I think your comments apply to the docs as a whole, not specifically to this PR. Can we separate your comments that apply to content outside of this PR to another PR?

@kerrywatson1
Copy link
Contributor

kerrywatson1 commented Feb 13, 2017

Copy link

@groutr groutr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments and a few minor nitpicks (which I won't feel bad if you choose to ignore).

characters allowed). This example builds python 2.7 and 3.5 packages in one
build command:

variant config file like:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick here for consistency. I would suggest referring to the variant config by a filename (conda_build_config.yaml) here to be consistent with how you refer to meta.yaml below.

lists of versions to iterate over. These are aggregated as detailed in the
Aggregation of multiple variants section below.

2. Set the ``variant`` member of a Config object. This is just a simple
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between "just a dictionary" and "just a simple dictionary"?

Aggregation of multiple variants section below.

2. Set the ``variant`` member of a Config object. This is just a simple
dictionary. The values for fields should be strings, except "extended keys",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is an accurate statement. The example variant in line 119 shows a field with a list for a value.

variant. The way this works is that all variants are evaluated, but if any
hashes are the same, then they are considered duplicates, and are deduplicated.
By omitting some packages from the build dependencies, we can avoid creating
unnecessarily unique hashes, and allow this deduplication.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably word this "avoid creating unnecessarily specific hashes". Since hashes are meant to be unique, it was a little confusing to understand at first.



Extended keys
-------------
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And example of extended keys would be great. The concept is still a little nebulous for me. What are the use cases?

expressing the compiler to be used. Three new Jinja2 functions are available when
evaluating ``meta.yaml`` templates:

* ``pin_compatible``: To be used as pin in run and/or test requirements. Takes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nitpick: An easier to read format for arguments might be:
pin_compatible(package)

@msarahan msarahan force-pushed the document_conda_build_variants branch from 5579dee to 1f582c5 Compare February 16, 2017 03:41
@msarahan
Copy link
Contributor Author

Thanks @groutr. I incorporated your comments in the latest commit and tried to add more examples (an intro section that outlines broad use cases as a guide to deeper features). I also documented pin_downstream and pin_subpackage a little better.

@msarahan msarahan force-pushed the document_conda_build_variants branch from 1f582c5 to 4903252 Compare February 16, 2017 04:11
Copy link

@cekees cekees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few questions and one typo correction.

we need to build binary packages (and any package containing binaries) with
potentially several variants to support different usage environments. For
example, using Numpy's C API means that a package must be used with the same
version of Numpy as runtime that was used at build time.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as runtime -> at runtime

@@ -439,6 +440,42 @@ emitted during the build process and the variable will remain
undefined.


.. _pin_downstream:

Pin downstream
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a need that doesn't arise in the hashdist design. I guess this is comparable to how hashdist allows upstream packages to specify information to downstream packages if the upstream package is a "build" dependency:

https://github.com/hashdist/hashstack/blob/master/pkgs/boost/boost.yaml#L108

For "run" dependencies the upstream packages resources would be linked into profile by default and any change to the upstream source or configuration would result in a new downstream build artifact hash. I guess the pinning mechanism allows relaxation of the dependency so changes to the upstream package within the constraints specified in the pinning expression would not generate a change to the downstream package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, with conda's need to be more flexible, it's important to allow different packages to specify their minimum and maximum requirements. Otherwise we end up with unsatisfiable dependencies very easily. For you, you are building a single unique system (and moreover, building is less of a concern for you than for consumers of conda), so it's less of an issue.

run:
- python {{ python }}


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you use variants to express the different combinations of compilers, processor-specific optimizations, and critical host/vendor packages (e.g. the mpi and blas/lapack libraries that are paired with the compiler toolchain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already figured this out below. The central idea is "everything is a package" - whether it is files or just simply environment activation scripts that configure compilers. I'll write more in later replies about exactly what goes into the hash.

numpy:
- 1.11
- 1.12

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the variant descriptor be git commits or would we have to enforce that every package is a release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's anything that conda recognizes as a version. Git hashes are perfectly fine.

The information that goes into this hash is currently defined in conda-build's
metadata.py module; the _get_hash_contents member function. This function
captures the following information:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the package (builid) hash come from hashing these entries in the metata file or what they point to? e.g. does it change when I modify the source section or when I modify the source of a package referenced in the section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a few things. Relevant code is at https://github.com/conda/conda-build/pull/1585/files#diff-c86934c27c2f25579540ddb447fa45bdR767

  1. source, requirements, and build sections of the recipe. This captures inputs, as well as exact versions of packages present at build time. The build section has a few options that affect the output, so it's there too. A few fields in the build section are omitted, so that the hashes are more reproducible.
  2. Any other files in the recipe folder (alongside meta.yaml). Everything but meta.yaml is part of the hash.

compilers that are installed on the system. The analogous compiler packages on
Windows run any compiler activation scripts and set compiler flags instead of
actually installing anything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. It seems like the compiler package variants and more generally host/vendor package variants are critical in the HPC environment. Like the windows situation many of these packages would either run 'module load' commands or freeze and enforce the environment changes executed by the 'module' command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, once I realized that everything really could be a package, then the problem became a lot more tractable. =)

* CONDA_PERL
* CONDA_LUA

Legacy CLI flags are also still available. These are sticking around for their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should be "legacy", especially if they aren't going to be removed. The environment variables and command line flags are much more useful for an audience that is just building a handful of packages. The above is overkill in those cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that legacy seems like the wrong word. It might be worth, somewhere in the larger set of conda build documentation, pointing out this simple way of pinning.

build:
pin_downstream:
- libstdc++ {{ pin_compatible('g++', 'x') }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting issue: missing .. code-block:: yaml

Copy link
Contributor

@mwcraig mwcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short transition guide, perhaps consisting of a few complete before/after meta.yaml files, would be useful (or at least I would find it useful).

Overall I like the separation of the build variants from the recipe.

One question -- will all current recipes that use numpy x.x need to be updated for 3.0, or will it still recognize those pinnings?


requirements:
build:
- numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be addressed later, but should this be numpy {{ numpy }} also?

When our two proper yaml config files are combined, ordinarily the recipe-local
variant would clobber the site-wide variant, yielding ``{'some_trait':
'pony'}``. However, with the extend_keys entry, we end up with what we've always
wanted: a dog *and* pony show: ``{'some_trait': ['dog', 'pony'])}``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

API, it would actually be better to not pin numpy with ``pin_run_as_build``.
Pinning it is over-constraining your requirements unnecessarily when you are not
using Numpy's C API. Instead, we should customize it for each recipe that uses
numpy. See also the `Avoiding unnecessary builds`_ section above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to have a clear statement at the beginning of this section that pin_run_as_build is intended only for those cases when all packages should be pinned (it seems like the most familiar case would be python if you are building python packages).

build:
- numpy {{ numpy }}
run:
- numpy {{ pin_compatible('numpy', lower_bound='1.10', upper_bound='3.0') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice!


Any contents in ``recipe_append.yaml`` will add to the contents of meta.yaml.
List values will be extended, and string values will be concatenated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for appending to recipes?


Since conflicts only need to be prevented within one version of a package, we
think this will be adequate. If you run into hash collisions with this limited
subspace, please file an issue on the conda-build issue tracker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can us add a link to it?

* ``number``
* ``string``
* any other recipe files, such as bld.bat, build.sh, etc. Every file other than
meta.yaml is part of the hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include recipe_append and recipe_clobber?

* CONDA_PERL
* CONDA_LUA

Legacy CLI flags are also still available. These are sticking around for their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that legacy seems like the wrong word. It might be worth, somewhere in the larger set of conda build documentation, pointing out this simple way of pinning.


* ``pin_run_as_build``: should be a dictionary. Keys are package names. Values
are "pinning expressions" - explained in more detail in `Customizing
compatibility`_. This is a generalization of the ``numpy x.x`` spec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't mention numpy x.x here because later on you point out that it is a bad candidate for using with pin_run_as_build.

@msarahan
Copy link
Contributor Author

@mwcraig thanks for the comments. Sorry it has taken me so long to review them. My latest PR should address all of them. Regarding your numpy x.x question - it should work in conda-build 3.0, but you should see a deprecation warning. It will be removed in some future version (4.0 or above).

requirements:
build:
- python
- numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example need to use the numpy variant value in the build requirements? I don't see how the variants do anything if you're missing it in the build requirements. My guess is that this example would build the package twice with the same version of numpy being used.

On this note, does conda-build at least warn you when you've put something in conda_build_config.yaml that isn't used in the rendering of the recipe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, after reading Avoiding unncessary builds again, I think I understand what you're trying to convey. I'm going to submit a PR to your branch to help clarify this example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msarahan I created the PR on your fork's branch msarahan#1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Merged.

Make it clear that you would be building two different numpy dependent
packages by giving them different names and add a little bit more prose
to explain the first example that doesn't pin numpy.
outputs a single compiler package name. For example, this could be used to
compile outputs targeting x86_64 and arm in one recipe, with a variant.

There are default "native" compilers that and runtimes that are used when no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop "that" after compilers.

This is the only occurrence of "runtimes" and I assume it is what you meant by "detailed further" in .. _special_variant_keys.

  1. It would be helpful to put runtimes in bactics so that it stands out like a key from conda_build_config.yaml when following the link from .. _special_variant_keys.
  2. It would be nice to have a definition of what you consider runtimes and how they are different than compilers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is a remnant of the time before pin_depends or run_exports. I'll certainly add a better explanation, too.

Copy link
Contributor

@183amir 183amir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Such amazing new features. We have been longing to write our CI with conda recipes but this was missing to make the migration feasible.
I added a few minor comments.

- {{ pin_subpackage('libxgboost', exact=True) }}
- python {{ python }}

- name: py-xgboost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to say r-xgboost here.


.. code-block:: python

variants = [{'numpy': ['1.10', '1.11'], 'ignore_version': ['numpy]}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last numpy string here does not seem to close properly.

@msarahan
Copy link
Contributor Author

msarahan commented Jun 8, 2017

Thanks for the review @183amir. Looking forward to seeing the fun things people do with these new capabilities.

@msarahan msarahan force-pushed the document_conda_build_variants branch 2 times, most recently from b5f5df4 to 7ae4481 Compare June 9, 2017 18:58
@msarahan msarahan force-pushed the document_conda_build_variants branch from 7ae4481 to 610c23f Compare June 9, 2017 19:22
@msarahan
Copy link
Contributor Author

msarahan commented Jun 9, 2017

@cio-docs please review this when you can. I will be tagging the conda-build 3 release by Monday, and these docs are very important for people who start using the new version.

@kerrywatson1
Copy link
Contributor

I chatted with @msarahan re outside editors have just finished editing conda docs and are in final review. @cio-docs will ping @msarahan as soon as we receive the edited docs back from outside team, and we'll incorporate these changes into the edited docs.

small style edits to meta-yaml

.. _pinning_expressions:

Pinning expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msarahan The code in the first example only seems to mention 1.11 but the comment mentions 1.11.2. Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Added a small blurb explaining why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand much better now. Thanks!

requirements. The value should be a dictionary with up to 4 keys:
``min_pin``, ``max_pin``, ``lower_bound``, ``upper_bound``. The first two are
pinning expressions. The latter two are version numbers, overriding detection
of current version. (defaulting to None).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msarahan what defaults to None? All four keys, the latter two, or the dictionary that contains the four keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this comment and instead added a reference to a lower section where that function's default arguments are shown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks!

@electronwill
Copy link
Contributor

The xgboost example and the clarity of its explanation are excellent! I also liked the dog and pony show example to demonstrate extend_keys. If none of my edits look like a problem to you, I'll merge this when the CI passes.

@github-actions
Copy link

Hi there, thank you for your contribution!

This pull request has been automatically locked because it has not had recent activity after being closed.

Please open a new issue or pull request if needed.

Thanks!

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Apr 15, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.