-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conda forge package #1855
Comments
Hi, We could do that but it would probably be low-priority unless some other package (like OGB) wishes to be included in Also, I have noticed that PyTorch only has CPU included in conda-forge, and TensorFlow doesn't have 2.0 included. AFAIK PyTorch ship (GPU) releases from their own conda repository and Tensorflow is included in Anaconda official repository, so I guess our current practice of maintaining our own conda repository also sounds natural. |
ogb is now part of conda forge: https://github.com/conda-forge/ogb-feedstock conda forge now allows the use of the I am happy to give it a try if you guys are ok with that. |
I just noticed dgl relies on a lot of external deps from https://github.com/dmlc/dgl/blob/master/.gitmodules They would need to be included in conda-forge before adding dgl to it. |
Packages that should be included in conda forge are:
(others are already available in conda-forge) |
I have opened an issue with conda-forge devs so they can provide us guidance about how to do that: conda-forge/staged-recipes#12537 |
A first attempt is being made at conda-forge/staged-recipes#12552 |
I was a bit verbose/explicit with dependencies. I base them on what is imported by modules within the package. `dgl` could be added as a dependency if it's added to conda-forge. See dmlc/dgl#1855 and #12522 Even though this is a pure Python package, I run a few of the fastest tests that exercise different files, because this package has some heavy dependencies (pytorch, pytorch_geometric, etc) and I want that warm fuzzy feeling :) I get the tarball from github instead of PyPI, because the PyPI tarball does not (yet) contain the license file or the tests. Currently, the github tarball is not that much larger than from PyPI.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you |
I'm working on this right now, and the only blocker (so far!) I've run into is the 3rd party libraries that are included with git submodules. conda-forge wants to build from tarballs and not repos: https://conda-forge.org/docs/maintainer/adding_pkgs.html#build-from-tarballs-not-repos which means that I'll need to create a tarball that includes the git submodules already checked out. The most straightforward way to do this would be to have someone upload a copy of the repo as an asset on the releases page that has locally updated the git submodules. This can be automated with a github action. For now I'm going to do this manually while I identify other potential issues. EDIT: We should be able to just use |
@BarclayII here is the "in-progress" conda forge recipe for dgl: conda-forge/staged-recipes#18620 Do you think you could help with the vendored packages @mikemhenry has mentioned? |
I wasn't quite sure what I should do precisely on our side: is it that we need to copy all the source code in the submodules in the release tarball? |
@BarclayII Actually I think that conda-forge will be okay with using What would be really helpful is if the cmake build system was setup in a way where it could grab "system" versions of the packages here: https://github.com/dmlc/dgl/tree/master/third_party Like you do here for NCCL: https://github.com/dmlc/dgl/blob/master/CMakeLists.txt#L184-L191 This is because conda-forge likes to avoid packages vendoring other packages if it can be done. |
@BarclayII any updates on this? If there isn't bandwidth for the team to take care of this, would you accept a PR? |
@mikemhenry Please do. One caveat is that I remember we previously bumped into a problem where we have to use CUB and thrust in Lines 46 to 61 in 4c14781
Otherwise there seem to be a symbol collision problem as described in #2758, and even running python examples/pytorch/gcn/train.py --dataset cora --gpu 0 will crash.
Once you built the conda-forge packages with the system packages, could you run a test and see if it crashes? Thanks! |
@BarclayII Will do! Worst case we just might not be able to have a CUDA 10.2 build, but I think I might be able to include a vendored lib that is needed to fix a bug for a version of CUDA, so we will see how it goes! |
@BarclayII is there is a plan to bring the Compiling dgl itself on conda-forge is quite easy to do (thanks @mikemhenry, for the push with conda-forge/staged-recipes#18620) but the real bottleneck right now is that we need to package all the third-party libraries on conda forge first (conda-forge/staged-recipes#18620 (comment)). The latest dgl 1.0.0 makes it even more challenging to install dgl CUDA packages since they are hosted on different conda channels. At least for us internally, it's not practical having to switch conda channels depending on the current CUDA version. If the dgl team has bandwidth and will be willing to provide support on the task of packaging all the third-party libraries, that would be greatly appreciated. I am also happy to help, but I will unlikely be able to tackle everything myself. The other benefit of a dgl conda forge package is that for any new python, CUDA and pytorch versions, the latest dgl package will be automatically rebuilt against those versions lowering down the packaging maintenance burden for the dgl team. So, I feel it's like a good thing to do since on the short term it will require a significant amount of work (packaging third-party libraries) but on the long term and thanks to the conda forge infrastructure, the packaging burden will be handled automatically by conda forge itself. |
It is on my backlog, but I am not a cmake wizard so it would be much faster if someone from the That should make it easy to patch then on the feedstock to use the 3rd party libraries that exist on conda-forge and anything missing we can vendor (or package). |
@mikemhenry Is there an example conda-forge recipe that looks for conda-forge 3rd party libraries if available and uses its own 3rd party libraries if not? I can help on the CMake part but I'm not sure how to interact with conda-forge environments. |
@BarclayII I don't think this is specific to conda-forge here since conda-forge will always install all the dependencies of your package (the third-party libraries) in ${PREFIX}. It means that any standard So, the CMake modifications are mostly about adding a boolean flag that will either look for the installed third party or either compile and use the one from this repo. The logic could look something like that: cmake .. \
${CMAKE_ARGS} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=${PREFIX} \
-DCMAKE_INSTALL_LIBDIR=lib \
-DBUILD_SHARED_LIBS=ON \
-DWITH_EXTERNAL_METIS=True \
${EXTRA_CMAKE_ARGS} message("WITH_EXTERNAL_METIS=${WITH_EXTERNAL_METIS}")
if( WITH_EXTERNAL_METIS )
message("WITH_EXTERNAL_METIS evaluates to True")
find_package(Metis)
target_link_libraries(DGL PUBLIC Metis::Metis)
target_include_directories(DGL SYSTEM PUBLIC ${METIS_INCLUDE_DIRS})
else()
# add the custom dgl logic to use metis from the third-party folder.
endif() That being said, not all the dep ship a Beyond modifying the dgl CMake logic, I think the challenge is also in packaging all the third-party deps into conda-forge. |
@hadim Exactly!
If the package is already on conda-forge, then we don't need to worry about it. My proposal is:
|
I don't think I'll have time to work on those packages in the short term, but I did a packages analysis in case it can help:
For any of those packages, it might be useful to get infos from a DGL devs for the below:
|
Wow this is way better than I expected! |
All these packages can use the latest version I think.
|
That is great news, then the next step will be to make some CMake modifications so a user can tell cmake to look for system packages instead of the vendored ones |
@hadim @mikemhenry I have been making some progress on this with respect to the CMake changes. I'll update here as I go. Main issue is that I have had to work from the 0.8.2 tag as dgl > 0.8.2 requires CUB >=1.17 due to this file here. CUB 1.17 is not on conda-forge (and is unlikely to be as the feedstock is now readonly). My understanding is that it has been moved inside Working from that branch seems to be going well. I'll make an updated table soon. |
Awesome, I've been keeping tabs on getting cuda 12 on conda-forge, that is being tracked here: conda-forge/staged-recipes#21382 |
We are using CUB 1.17, but still with CUDA 10.2 and 11+ since it is a header-only library and it worked fine. Also DGL 0.8.2+ does not run on CUDA 12 for now. |
Right so a possible option could be to vendor |
The dependencies on |
I can look into getting those on conda-forge, @hmacdope this would be another argument for building a newer version |
Vendored |
This issue can be closed, we have 1.1.0 on conda forge and will work on the newer versions now. |
Can you guys consider building your conda packages to
conda-forge
(https://github.com/conda-forge/staged-recipes)?It now accepts packages that depend on pytorch and tf so DGL could be built without issues.
The main motivation is to be able to add to conda-forge, packages that depend directly on
dgl
such asogb
(https://github.com/snap-stanford/ogb).The text was updated successfully, but these errors were encountered: