-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installing autogluon in conda #612
Comments
AutoGluon is not currently available via conda, but we plan to add it to conda in future. For now, please follow these instructions to install AutoGluon in a conda environment: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#installing-non-conda-packages |
is GPU supported for tabular data in autogluon? |
#263 tracks GPU support for Tabular, I believe the Tabular Neural Network supports GPU at present. |
I upvote this feature request. |
I upvote this feature request. Actually it could be conda-forge instead of conda. It's much easier to upload and already has many frameworks to do so (which means it's now easier if there already a pip installation). It really would be a major "minor" improvement to see autogluon on conda-forge. https://conda-forge.org/#add_recipe |
upvote |
+1 for this. Especially when autogluon doesn't support the latest dask version and thus collides with other packages that requires it. Sadly due to autogluons structure it isn't possible to just use grayskull on the main package and publish that; all subpackages need to be packaged on conda-forge. |
It is time to add autogluon to Conda. This package has more than 5k in stars. I'm going to reach out to conda gitter for advice on this. As @kelszo said, all the sub-packages must be on conda. They probably are already there, as they have more than 20.000 packages. If they are not, like the dask version required by autogluon, we can make a list of packages that need to be updated on autogluon for being able to publish on conda. I will do the following:
|
Funny, I was just looking for autogluon from conda-forge today.
|
I have published some packages on conda-forge before. I can help here. Who would like to be listed as the autogluon conda-forge recipe maintainers? |
You can edit that later but absolutely add them. especially Innixma |
I have submitted a PR to add |
This is awesome, and long overdue!! Thank you so much @giswqs for initiating this and @arturdaraujo and @PertuyF for providing additional information. @gradientsky & @tonyhoo: Please monitor this thread going forward, we should prioritize making AutoGluon available in conda-forge by v0.7 at latest. v0.6.2 will be good to try for to get us familiar. We should also investigate automating GitHub Actions to do the conda-forge release at the same time as the PyPi release. |
Conda-forge has pretty neat CI/CD that may take care about this, at least in part, if combining packages from PyPi isn't too complex. |
I use GitHub Actions to automatically release the package to PyPI. It is easy to set it up. Here is a sample yml file. |
I see that all packages live in the same repo, and are cross referenced at build time, however the cross references are unreachable from sdist published on PyPi. This would also ensure versions are in sync for each release. |
Regarding those old dask and distributed dependencies we have in 0.6.2, I am looking into removing those dependencies entirely: #2691 . I think those dependencies are a remnant from old deleted code that are not necessary anymore. Hopefully they will be gone in v0.7. |
While debugging the multimodal stuff, I noticed that autogluon.core depends on
Pinning to a package that's over a year old is... bad. If you want tight & well-tested ranges, the flipside is that you need to keep them up to date +/- all the time. Obviously some slack will happen, but basically for every release, each upper bound should be double-checked, and raised to the most recent version except if there are really substantial problems. |
Numpy in particular should get a different treatment: the promise there (& very strictly held) is that on a warning-free build of |
OK, I managed to figure out some things in conda-forge/autogluon.multimodal-feedstock#15. I'm categorizing the various dependencies into a couple of different classes, based on how I suggest you should handle this for 0.7.0. I've collected the most current version (in conda-forge) in a comment on the right ( Note: these are just the dependencies of Hard incompatibilitiesWith the existing pins, it was not possible to solve the environments, while it does work without an upper bound. I have not investigated where exactly the break between passing and failing is located. These are crucial to fix.
Pytorch: ABI-dependentThese are some of the hardest dependency (in general, but especially for making sure the CUDA stuff works), because they need to built for one globally uniform pytorch version (otherwise they become incompatible with each other), which in conda-forge is now 1.13. This will be less of an issue in the future (when existing autogluon packages will still work fine with older pytorch if a newer version becomes available), but for now it's equally crucial to lift the pins for these. - - pytorch >=1.9,<1.13
- - torchvision <0.14.0
- - torchtext <0.14.0 # [not arm64]
- - fairscale >=0.4.5,<=0.4.6 # [not win]
+ - pytorch >=1.9,<1.14 # pytorch @ 1.13.1
+ - torchvision <0.15 # torchvision @ 0.14.1
+ - torchtext <0.15 # [not arm64] # torchtext @ 0.14.1
+ - fairscale >=0.4.5,<0.4.14 # [not win] # fairscale @ 0.4.13 Breaking changesThese should IMO be made compatible if possible. I found these because removing the upper bounds makes the import of
Pytorch: secondary packagesThese only run-depend on pytorch, so it's less of an issue, but should still be lifted if possible. As discussed above, nlpaug 1.1.11 is necessary to support pytorch CUDA in conda-forge (but mostly by a quirk of packaging).
Other key packages in the ecosystemAside from the specific comments on numpy, it's quite user-hostile to limit these key packages to anything less than their most current version.
Far behindJust based on the version number, autogluon.multimodal is quite far behind with its upper bound relative to what's available already. This is not ideal for both users & the solver, which has to go far back in time to pick them up (incurring potential other conflicts).
A bit behind
Up to dateSide note: hard pins (
|
@h-vetinari Thank you for the amazing work!! By building the artifacts locally using your recipe, I can confirm that now installing python build-locally.py
conda create -n agu -c "file://${PWD}/build_artifacts" -c conda-forge autogluon.multimodal |
@h-vetinari Absolutely fantastic deep dive! Very useful. Regarding dask/distributed: These were old dependencies that are no longer necessary, in fact we didn't even use dask/distributed at all in v0.6, but we did not realize we could remove them. They have been fully removed in #2691, and won't be present in v0.7. I agree that our team should be careful to not let these old dependencies linger without updates. I think being in conda-forge will help force us to adopt best practices here.
I think your reasoning for numpy makes a lot of sense, and I'll consider increasing the upper limit beyond what is released. Are you suggesting that I should do the same treatment for scipy and scikit-learn? (Note: scikit-learn broke us with a minor release in the past, without prior warning). For all core/tabular dependencies, I am tracking version updates for v0.7 here: #2813 For timeseries/multimodal dependencies, these are not as closely tracked by me due to having slightly less context. I think timeseries dependencies are largely ok. @sxjscience please refer to this comment and see if we can address concerns regarding multimodal dependencies for v0.7 release. |
@h-vetinari Thanks for the comments! @Innixma Do we need to loose the bound about autogluon/core/src/autogluon/core/_setup_utils.py Lines 22 to 29 in dc21d0b
|
@sxjscience I am planning to loosen bound on numpy. For scipy and scikit-learn, I will await response from @h-vetinari on his thoughts. (I intend to upgrade to latest, the question is whether to have version ranges go beyond what has been released). For Pillow, this is entirely up to those working on multimodal, since that is the only module that depends on Pillow. I would recommend avoiding these micro version caps though, such as |
Happy to hear it 🙃
I'm not suggesting that you need to compromise you testing coverage or strategy; the main point is that people really eagerly want to use the last numpy/scipy/pandas/scikit-learn1 version (features, fixes, etc.), and barring them from doing so should be avoided where possible (e.g. by ensuring you've tested against the newest versions of those packages available at the time of an autogluon release). In my mind, I think there's a sort of sliding scale of how conservative projects are with their APIs, where numpy is most conservative, then scipy2, then scikit-learn. I think for numpy it's fine to proceed as I described above ( in contrast to numpy/scipy, pandas promises to use semver so theoretically As I mentioned above, all this is relevant mostly for the PyPI side of things, where the metadata in a released version is immutable (unless you yank the whole release). In conda-forge, we have the ability to introduce version caps for a given release after the fact, so that takes quite a bit of the pressure off (though it's still not fun to have to respond to, so by all means, cap with what you're comfortable with, but try to aim for as expansive as you can make it). Footnotes
|
Yes, please! In general, try not to prohibit patch version updates (unless it's a zero-ver library where you have the expectation that things will break), but add |
@h-vetinari We have adopted this strategy for our dependency management (new ranges). Thanks again for the suggestion! We have also updated all version ranges to include latest releases for all packages across all submodules (with the exception of networkx 3.0 which we will do in v0.8). The only version ranges which have not yet been upgraded are in AutoGluon TimeSeries. The upgrades for those dependencies are tracked in #2831 and are planned for v0.7 release. |
Great news, looking forward to the release! :) Thanks a lot for the work on this! |
Finally, the last two dependencies ( autogluon.core autogluon.tabular
autogluon.multimodal
autogluon.timeseries
autogluon
|
mamba install -c conda-forge autogluon.tabular from autogluon.tabular import TabularPredictor, TabularDataset
if __name__ == '__main__':
train_path = 'https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv'
test_path = 'https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv'
label = 'class'
train_data = TabularDataset(train_path)
test_data = TabularDataset(test_path)
subsample_size = 10000 # subsample subset of data for faster demo, try setting this to much larger values
if subsample_size is not None and subsample_size < len(train_data):
train_data = train_data.sample(n=subsample_size, random_state=0)
predictor = TabularPredictor(label=label).fit(train_data)
predictor.persist_models('all')
leaderboard = predictor.leaderboard(test_data)
|
Just to set expectations, this is blocked indefinitely until we find a way to build pytorch on windows: conda-forge/pytorch-cpu-feedstock#32 |
@h-vetinari Thanks for the heads up. Really appreciate your work on building these challenging packages on conda-forge! Without your help, we could not have made this far. I don't expect the pytorch window build will be available any time soon. For the three remaining items on the list, I think autogluon.core
autogluon.multimodal
|
Thanks h-vetinari ! |
conda install -n base mamba -c conda-forge
mamba create -n ag autogluon python -c conda-forge The conda install -n base mamba -c conda-forge
mamba create -n ag autogluon.tabular autogluon.timeseries python -c conda-forge This issue can be closed now. |
FYI, instructions could be simplified to mamba create -n ag autogluon python=3.9 -c conda-forge The python version is optional, the most recent allowed by the solver will be installed. Hence if autogluon is built for 3.9 max, it is taken care of already during environment creation. In case you want to specifically provide guidance for mamba install, it's better to install in base environment, alongside conda itself. conda install -n base mamba -c conda-forge This way mamba is "centralized" and you don't have to install in each environment. |
An astounding amount of work has been put into adding AutoGluon to conda-forge. With 133 comments, this has over double the comments of our 2nd most commented GitHub issue (51), so finally marking this as resolved is quite an exciting feeling! Kudos to everyone:
|
@PertuyF Thank you for the suggestion. I have simplified the installation instructions as follows. Thank you everyone for your support during this long journey! Special thanks to @PertuyF for the many suggestions and @h-vetinari for helping build some of the most challenging conda-forge recipes! This would not be possible without your help. Thank you. For Linux and macOS: conda install -n base mamba -c conda-forge
mamba create -n ag autogluon python -c conda-forge For Windows: conda install -n base mamba -c conda-forge
mamba create -n ag autogluon.tabular autogluon.timeseries python -c conda-forge |
Hi I am getting the following issue while installing autogluon in Conda environment. Can you please help?
The text was updated successfully, but these errors were encountered: