Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude test files from tarball #3451

Closed
carterbox opened this issue Mar 21, 2019 · 8 comments
Closed

Exclude test files from tarball #3451

carterbox opened this issue Mar 21, 2019 · 8 comments
Labels
locked [bot] locked due to inactivity

Comments

@carterbox
Copy link
Contributor

Actual Behavior

Source files listed in the test section of meta.yaml are included in the packaged tarball. Is there a way to exclude them from the tarball, but still use them for running tests? I have only found how to exclude files from a package, but this method cannot be used because we are talking about files copied into the test environment.

Expected Behavior

Files copied from the source to the test environment are only used for post-build tests and are not distributed with the final package. This allows checking that the build is correct while not significantly increasing the final package size when test files are large.

Steps to Reproduce

If meta.yaml contains

test:
  source_files:
    - a_folder_for_tests/*  # large test files

, then my_package.tar.bz2 contains /info/a_folder_for_tests/*

Output of conda info
(base) bash-4.2$ conda info

     active environment : base
    active env location : /home/carterbox/miniconda3
            shell level : 1
       user config file : /home/carterbox/.condarc
 populated config files : /home/carterbox/.condarc
          conda version : 4.6.8
    conda-build version : 3.17.8
         python version : 3.6.7.final.0
       base environment : /home/carterbox/miniconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/linux-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/carterbox/miniconda3/pkgs
                          /home/carterbox/.conda/pkgs
       envs directories : /home/carterbox/miniconda3/envs
                          /home/carterbox/.conda/envs
               platform : linux-64
             user-agent : conda/4.6.8 requests/2.21.0 CPython/3.6.7 Linux/3.10.0-957.1.3.el7.x86_64 rhel/7.6 glibc/2.17
                UID:GID : 1992:3234
             netrc file : None
           offline mode : False

@mingwandroid
Copy link
Contributor

This is unsupported. You have two options (well maybe 3).

  1. Move your expensive test files into a separate package and add those to test/requires.

  2. Rework your tests do they download the file that are needed before commencing the actual test.

  3. Stick your tests onto the end of build.sh, though this has the downside that your package cannot be atomically tested (or at least these tests cannot be run that way).

@carterbox
Copy link
Contributor Author

@mingwandroid Thanks for the work-around suggestions.

I'd also like to understand why conda-build is set up the way that is it is presently. Is there a reason why someone would want to distribute the test files inside /info? AFAIK, there is no conda install --tests such that packages are tested on the end-user's machine. And if a package developer wanted the tests to be included in the distribution, then they would be included as part of the package not as conda metadata. Could this behavior be unintended consequences from when the source_files keyword was added as an option for meta.yaml? Can I open a pull request which prevents these files from being included in /info?

@mingwandroid
Copy link
Contributor

mingwandroid commented Mar 21, 2019

I'd also like to understand why conda-build is set up the way that is it is presently

Because being able to test that a package you built on computer X runs ok on computer X (with all the same temporary, build and host directories around not to mention the same distro/libc/OS version etc) is not nearly as good as being able to test that a package you built on computer X runs on a large range of computers which have never even seen the build environment. Also, the latter includes the former. You can test the tarball on computer X as well.

AFAIK, there is no conda install --tests such that packages are tested on the end-user's machine

There is. conda test (there are a few caveats to this but we'll get round to fixing them at some point, namely the dependencies all need to be in the same local channel). Also all of our installers since Anaconda Distribution 5.0 can be passed a -t flag to run the tests embedded in each package in the installers. This is hugely useful for maintaining the quality of our releases.

We believe that being able to bundle tests atomically with a package is very powerful, esp. when you are making software that attempts at very broad compatibility. To be clear, when ArchLinux makes a new package, they only need to test it on one Linux distro, once and they are free to assume that, modulus broken end-user hardware, the results would be the same everywhere. We are far from having such a luxury.

We were not concerned about package size bloat due to large test data because we have test/requires. This means you can either make packages with the testdata or add e.g. curl to test/requires and have your test scripts run curl to download that data at test time.

And if a package developer wanted the tests to be included in the distribution, then they would be included as part of the package not as conda metadata

How is this better? What you end up with is N different testing methods for N different packages and no hope of automation or of being able to do integration testing along with the other packages. I am all for packages having their own tests built-in too, but they should be accessible and run by conda tests.

Could this behavior be unintended consequences from when the source_files keyword was added as an option for meta.yaml

Do you mean was this a mistake, no absolutely not! This testing scheme was designed and discussed.

Can I open a pull request which prevents these files from being included in /info?

You can but it would be a huge amount of work since testing now requires the final tarball.

Can you explain why my workarounds are not appropriate for you?

@mingwandroid
Copy link
Contributor

The other thing that independent testing brings us is the ability to cross-compile, then send the package to some different target hardware for testing.

@carterbox
Copy link
Contributor Author

Thank you for that detailed response. I understand the value of testing for cross-compiled programs, but since I had not heard of conda-test before now, I didn't think it was a feature that was offered by conda. Is there documentation anywhere about this feature? I wasn't able to find it.

What you end up with is [...] no hope of automation or of being able to do integration testing along with the other packages

I'm not advocating for the termination of the test block from meta.yaml. I agree that running post-build tests automatically in this standard way makes sense. What I am suggesting is that the test (files) can be excluded from the tarball which most users download because I don't think most users "use the -t flag to run the tests embedded in each package". Additionally, I was offering to help reduce package bloat if I had, in fact, identified a bug.

You workarounds are fine. In fact, I will probably implement your second suggestion: rework your tests so they download the file that are needed before commencing the actual test. However, it would be convenient for everyone if conda was capable of doing this automatically by packaging the files described in test/source_files separately and only downloading them if requested. In my case, this will cut the size of the package tarball size in half; image if every package had a similar size reduction.

@mingwandroid
Copy link
Contributor

I guess this comes down to a difference of opinion about what those tests are in meta.yaml and about testing philosophy. The things we speak of, these conda ecosystem tests and general software tests are not mutually exclusive and in most cases the conda ones build on the upstream software's existing tests.

You said earlier:

And if a package developer wanted the tests to be included in the distribution, then they would be included as part of the package not as conda metadata

These are not the (upstream) package developer's tests. These are the conda package packager's tests. To me they exist to ensure the quality and compatibility of the final binary artefact, and each packager is free to pick the weapons at their disposal that will best do that. In some cases it could mean installing a known-to-cause-trouble-before dependency alongside the test package (we have formalised this into downstream tests too) and running a few simple checks, but in other cases it could mean a lot more.

The upstream software developer is free to add their own tests to their source code, then, through their build system make those tests installable if they wish to. If they can't be made installable then all we can do is run them during builds, as in make check in build.sh, however if they can be installed then it is eminently sensible to make them testable on end-users machines too (remember we support a vast range of distros and OS versions), and this is what we do.

It also means any collection of these binary packages remains testable into the future (e.g. when new OS versions or hardrware comes out).

Clearly sometimes test files will be large.

Conda-build offers a solution for that, test/requires. I detailed two ways that can be used already. It also offers split-packages whereby multiple packages can be output from the same recipe and they can refer to each other as dependencies. You can use this to split your test data off easily.

It would be possible to do something to split off the test data and scripts into a seperate package, then make the tests in the real package simple stubs that defer to the test package. If someone were to make a patch for this I wouldn't oppose it.

I guess I don't really see the point since chopping and changing how testing is done is so easy and we have so many options.

.. in fact in the past we used to collate the test data and scripts for many packages into one package but this was very clumsy so we changed it to the way it is now! Everything gets put in the package (unless you use test/requires) and packages are atomically and collaboratively testable and no one has to mess about with other files.

@carterbox
Copy link
Contributor Author

Thanks for all your helpful discussion! In the short term, I have used test/requires to download test files for the corresponding release from GitHub using their svn API, but this requires that I maintain a long living branch for each minor release. In the long term, I will try to figure out how to build and package my tests separately using the same meta.yaml as the main package and without moving the tests to a separate repo.

@github-actions
Copy link

Hi there, thank you for your contribution!

This issue has been automatically locked because it has not had recent activity after being closed.

Please open a new issue if needed.

Thanks!

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Mar 10, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

No branches or pull requests

2 participants