Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing nightly packages #76

Closed
jakirkham opened this issue Jul 27, 2020 · 18 comments
Closed

Publishing nightly packages #76

jakirkham opened this issue Jul 27, 2020 · 18 comments

Comments

@jakirkham
Copy link
Member

When doing active development, it can be quite useful to have nightly packages of things that are being developed in tandem. In the context of Dask, this comes up when developing a downstream library that depends on Dask (often requiring the very latest changes). Am curious what others think about supplying nightly packages of dask and distributed?

@kkraus14
Copy link
Member

FYI: This is an ongoing problem for us at cuDF as there's often PRs of ours where we need to make matching changes in both cuDF and Dask and we end up leaving things in a broken state until a new Dask release is pushed out. If there were nightly conda and pypi packages that would largely solve our problems.

@mrocklin
Copy link
Member

mrocklin commented Jul 27, 2020 via email

@mrocklin
Copy link
Member

mrocklin commented Jul 27, 2020 via email

@kkraus14
Copy link
Member

I think that most people will pip install from master when they want an up-to-date version. I'm guessing that that doesn't work for RAPIDS development for some reason though.

On Mon, Jul 27, 2020 at 1:24 PM Matthew Rocklin @.> wrote: I personally have no objections to this. Where would the packages live? A new Dask channel? Does conda-forge have a dev channel? Who would maintain this and keep it running (I'm guessing @jakirkham and the NVIDIA folks). Are there any other concerns that we should be aware of? John, you're probably the expert here. On Mon, Jul 27, 2020 at 1:21 PM Keith Kraus @.> wrote: > FYI: This is an ongoing problem for us at cuDF as there's often PRs of > ours where we need to make matching changes in both cuDF and Dask and we > end up leaving things in a broken state until a new Dask release is pushed > out. If there were nightly conda and pypi packages that would largely solve > our problems. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#76 (comment)>, or > unsubscribe > https://github.com/notifications/unsubscribe-auth/AACKZTGOLC7CCRTNDXMWIADR5XOWJANCNFSM4PJCDGLA > . >

I think the challenge here is there's no clean and nice way to integrate that into a conda package dependency. I.E. as we make a coordinated change in Dask and RAPIDS, once the PR is merged in Dask we could upgrade the conda pinning in the RAPIDS recipe to ensure we get the newer Dask version.

@quasiben
Copy link
Member

One option for nightlies is that we could publish to the dask channel on anaconda:
https://anaconda.org/dask/

@jakirkham
Copy link
Member Author

I personally have no objections to this.

Great!

Where would the packages live? A new Dask channel? Does conda-forge have a dev channel?

A dask or dask-nightly channel would make sense. conda-forge doesn't support nightlies.

Who would maintain this and keep it running (I'm guessing @jakirkham and the NVIDIA folks).

That's my guess as well. Though it would be nice to have someone from the cuDF side on the hook as well (since that's driving the demand 😉).

Are there any other concerns that we should be aware of? John, you're probably the expert here.

The main thing to consider would be where these get built, how to signal these are dev packages, and what to do regarding communicating build issues to relevant maintainers.

As far as building goes, doing this as part of Dask and Distributed CI seems like the natural place. This ensures when new changes get integrated packages are published in pretty rapid order. It also avoids needing to add a separate thing to maintain and coordinate between.

The next thing would be communicating to users that these are only dev packages and they should go to more formal channels (conda-forge, defaults) for final releases. This may involve naming the channel accordingly or using a label to indicate these are development builds. A small blurb in the docs next to other install instructions might also be useful for relaying this information.

Lastly it would be good to setup a GitHub team for those maintaining nightlies to direct communication/issues in that direction.

Those are the other additional points I can think of atm. Though please feel free to raise others 🙂

@kkraus14
Copy link
Member

That's my guess as well. Though it would be nice to have someone from the cuDF side on the hook as well (since that's driving the demand 😉).

FWIW it's more than just the cuDF side, cuML runs into these situations as well, albeit not as frequently as cuDF. I'm happy to be "on the hook".

The main thing to consider would be where these get built, how to signal these are dev packages, and what to do regarding communicating build issues to relevant maintainers.

As far as building goes, doing this as part of Dask and Distributed CI seems like the natural place. This ensures when new changes get integrated packages are published in pretty rapid order. It also avoids needing to add a separate thing to maintain and coordinate between.

Agreed. Looks like there's Travis CI jobs that run after PRs are merged that could likely be a good place for uploads to take place given the packages are pure python: https://travis-ci.org/github/dask/dask/builds

@jakirkham
Copy link
Member Author

Thanks Keith! Happy to have someone from cuML as well 🙂

@quasiben
Copy link
Member

If we want to publish to https://anaconda.org/dask/ I have admin access and can generate tokens. @kkraus14 is there someone who can sit with John and I while we get this up and running

@kkraus14
Copy link
Member

If we want to publish to https://anaconda.org/dask/ I have admin access and can generate tokens. @kkraus14 is there someone who can sit with John and I while we get this up and running

That someone is me 😄

@jsignell
Copy link
Member

This would be on the dev label of the dask channel right?

@TomAugspurger
Copy link
Member

Part of the uploading job should probably delete older versions. The ones uploaded to https://anaconda.org/scipy-wheels-nightly/ are cleared after some number of days.

@charlesbluca
Copy link
Member

Reviving this issue because there has recently been some demand from RAPIDS folk to publish nightlies of dask-sql:

https://github.com/dask-contrib/dask-sql

and the dask channel seems like a sensible place to host these. Would there be any objection to using dask-sql as a trial for publishing nightlies to dask, and then pushing on Dask/Distributed nightlies once that's working?

@jrbourbeau
Copy link
Member

No objection from me. I am curious why pip installing from main isn't sufficient, but I don't mean for this to be a blocking comment

@jakirkham
Copy link
Member Author

There's a lot of dependencies (some from the Java world) and it is more configuration involved. Ideally this would just be a self-contained Conda package to just make this easier for people to work with.

@charlesbluca
Copy link
Member

John and I got this working with dask-contrib/dask-sql#263, and now there are dask-sql nightlies uploaded with every commit:

https://anaconda.org/dask/dask-sql/files

Some thoughts on the process, in case we want to try this on Dask/distributed later on:

  • we should make sure the local conda recipe stays synced up with the conda-forge recipe; this can be done manually, but I'm interested in if we could/should try to automate this
  • we still don't have anything in place to remove old builds - right now we were planning to manually purge it, maybe with each release, but depending on the flexibility of the Anaconda API it might make sense to have a scheduled job to anaconda remove any files past a certain point (this could be number of days or number of files depending on the repo's activity)

jsignell pushed a commit to dask/dask that referenced this issue Dec 20, 2021
As part of addressing dask/community#76, this PR adds:

- A conda recipe in `continuous_integration` to build a nightly `dask-core` package
- A GHA workflow to build this nightly as a check for PRs, and upload this package to the Dask channel under the `dev` label for pushes to `main`
aeisenbarth pushed a commit to aeisenbarth/dask that referenced this issue Jan 6, 2022
As part of addressing dask/community#76, this PR adds:

- A conda recipe in `continuous_integration` to build a nightly `dask-core` package
- A GHA workflow to build this nightly as a check for PRs, and upload this package to the Dask channel under the `dev` label for pushes to `main`
jsignell pushed a commit to dask/distributed that referenced this issue Jan 31, 2022
As part of dask/community#76 and following up on dask/dask#8469, this PR adds:

- Conda recipes in `continuous_integration` to build `distributed` and `dask` pre-release packages from the Git repo
- A GHA workflow to build the pre-release packages as a check for PRs, and to upload these packages to the Dask channel under the `dev` label for pushes to `main`
@charlesbluca
Copy link
Member

With dask/dask#8469 and dask/distributed#5636 in, we now have Dask + Distributed pre-release packages being published:

conda install dask/label/dev::dask
...
  + dask               2022.01.1a220201  py38_gb581bb6f_6     dask/label/dev/noarch         5 KB
  + dask-core          2022.01.1a220131  py38_gfab25d4b_1     dask/label/dev/noarch       798 KB
  + distributed        2022.01.1a220201  py38_gb581bb6f_6     dask/label/dev/linux-64       1 MB

Some good next steps from here could be working on using the pre-release packages in Dask's (and other downstream libraries') CI, and seeing if there's a reasonable solution to automate the removal of the packages after a certain point in time.

gjoseph92 pushed a commit to gjoseph92/distributed that referenced this issue Feb 1, 2022
…#5636)

As part of dask/community#76 and following up on dask/dask#8469, this PR adds:

- Conda recipes in `continuous_integration` to build `distributed` and `dask` pre-release packages from the Git repo
- A GHA workflow to build the pre-release packages as a check for PRs, and to upload these packages to the Dask channel under the `dev` label for pushes to `main`
@jakirkham
Copy link
Member Author

Thanks Charles! 😄

Going to go ahead and close as the original issue has since been resolved. Though if other libraries want to follow this workflow for nightlies, hopefully now there is a good template to follow with the work done here.

Currently the dask channel has no storage limits and the packages themselves are quite small. That said, if this does become an issue, agree we should look into cleaning up old packages after some point in time. Though we can cross that bridge when we get there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants