Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conda forge package #1855

Closed
hadim opened this issue Jul 23, 2020 · 31 comments
Closed

conda forge package #1855

hadim opened this issue Jul 23, 2020 · 31 comments
Labels
feature request Feature request

Comments

@hadim
Copy link

hadim commented Jul 23, 2020

Can you guys consider building your conda packages to conda-forge (https://github.com/conda-forge/staged-recipes)?

It now accepts packages that depend on pytorch and tf so DGL could be built without issues.

The main motivation is to be able to add to conda-forge, packages that depend directly on dgl such as ogb (https://github.com/snap-stanford/ogb).

@BarclayII
Copy link
Collaborator

Hi,

We could do that but it would probably be low-priority unless some other package (like OGB) wishes to be included in conda-forge but it is blocked by DGL not being in conda-forge.

Also, I have noticed that PyTorch only has CPU included in conda-forge, and TensorFlow doesn't have 2.0 included. AFAIK PyTorch ship (GPU) releases from their own conda repository and Tensorflow is included in Anaconda official repository, so I guess our current practice of maintaining our own conda repository also sounds natural.

@hadim
Copy link
Author

hadim commented Sep 2, 2020

ogb is now part of conda forge: https://github.com/conda-forge/ogb-feedstock

conda forge now allows the use of the anaconda channel which has tf >=2.0 and also cudatoolkit. cudatoolkit-dev is also available in conda-forge. That should allow you to build dgl in conda-forge.

I am happy to give it a try if you guys are ok with that.

@hadim
Copy link
Author

hadim commented Sep 2, 2020

I just noticed dgl relies on a lot of external deps from https://github.com/dmlc/dgl/blob/master/.gitmodules

They would need to be included in conda-forge before adding dgl to it.

@hadim
Copy link
Author

hadim commented Sep 2, 2020

Packages that should be included in conda forge are:

(others are already available in conda-forge)

@hadim
Copy link
Author

hadim commented Sep 2, 2020

I have opened an issue with conda-forge devs so they can provide us guidance about how to do that: conda-forge/staged-recipes#12537

@hadim
Copy link
Author

hadim commented Sep 3, 2020

A first attempt is being made at conda-forge/staged-recipes#12552

dopplershift pushed a commit to conda-forge/staged-recipes that referenced this issue Dec 15, 2020
I was a bit verbose/explicit with dependencies.  I base them on what is imported by modules within the package.

`dgl` could be added as a dependency if it's added to conda-forge.  See dmlc/dgl#1855 and #12522

Even though this is a pure Python package, I run a few of the fastest tests that exercise different files, because
this package has some heavy dependencies (pytorch, pytorch_geometric, etc) and I want that warm fuzzy feeling :)

I get the tarball from github instead of PyPI, because the PyPI tarball does not (yet) contain the license file or the tests.
Currently, the github tarball is not that much larger than from PyPI.
@github-actions
Copy link

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

@Rhett-Ying Rhett-Ying added feature request Feature request and removed stale-issue labels Mar 28, 2022
@mikemhenry
Copy link

mikemhenry commented Apr 11, 2022

I'm working on this right now, and the only blocker (so far!) I've run into is the 3rd party libraries that are included with git submodules. conda-forge wants to build from tarballs and not repos: https://conda-forge.org/docs/maintainer/adding_pkgs.html#build-from-tarballs-not-repos which means that I'll need to create a tarball that includes the git submodules already checked out.

The most straightforward way to do this would be to have someone upload a copy of the repo as an asset on the releases page that has locally updated the git submodules. This can be automated with a github action. For now I'm going to do this manually while I identify other potential issues.

EDIT: We should be able to just use git_url which will take care of the submodule "issues".

@hadim
Copy link
Author

hadim commented Apr 22, 2022

@BarclayII here is the "in-progress" conda forge recipe for dgl: conda-forge/staged-recipes#18620

Do you think you could help with the vendored packages @mikemhenry has mentioned?

@BarclayII
Copy link
Collaborator

I wasn't quite sure what I should do precisely on our side: is it that we need to copy all the source code in the submodules in the release tarball?

@mikemhenry
Copy link

@BarclayII Actually I think that conda-forge will be okay with using git_url which will pull the source code in the submodules. That is exactly what I was asking for, but I don't think it is strictly necessary.

What would be really helpful is if the cmake build system was setup in a way where it could grab "system" versions of the packages here:

https://github.com/dmlc/dgl/tree/master/third_party

Like you do here for NCCL: https://github.com/dmlc/dgl/blob/master/CMakeLists.txt#L184-L191

This is because conda-forge likes to avoid packages vendoring other packages if it can be done.
A header only library is okay, we will just need to make sure we package the license for it.

@mikemhenry
Copy link

@BarclayII any updates on this? If there isn't bandwidth for the team to take care of this, would you accept a PR?

@BarclayII
Copy link
Collaborator

@mikemhenry Please do.

One caveat is that I remember we previously bumped into a problem where we have to use CUB and thrust in third_party directory for older CUDA versions:

dgl/CMakeLists.txt

Lines 46 to 61 in 4c14781

if(USE_CUDA)
message(STATUS "Build with CUDA support")
project(dgl C CXX)
include(cmake/modules/CUDA.cmake)
if ((CUDA_VERSION_MAJOR LESS 11) OR
((CUDA_VERSION_MAJOR EQUAL 11) AND (CUDA_VERSION_MINOR EQUAL 0)))
# For cuda<11, use external CUB/Thrust library because CUB is not part of CUDA.
# For cuda==11.0, use external CUB/Thrust library because there is a bug in the
# official CUB library which causes invalid device ordinal error for DGL. The bug
# is fixed by https://github.com/NVIDIA/cub/commit/9143e47e048641aa0e6ddfd645bcd54ff1059939
# in 11.1.
message(STATUS "Detected CUDA of version ${CUDA_VERSION}. Use external CUB/Thrust library.")
cuda_include_directories(BEFORE "${CMAKE_SOURCE_DIR}/third_party/thrust")
cuda_include_directories(BEFORE "${CMAKE_SOURCE_DIR}/third_party/cub")
endif()
endif(USE_CUDA)

Otherwise there seem to be a symbol collision problem as described in #2758, and even running python examples/pytorch/gcn/train.py --dataset cora --gpu 0 will crash.

Once you built the conda-forge packages with the system packages, could you run a test and see if it crashes? Thanks!

@mikemhenry
Copy link

@BarclayII Will do! Worst case we just might not be able to have a CUDA 10.2 build, but I think I might be able to include a vendored lib that is needed to fix a bug for a version of CUDA, so we will see how it goes!

@hadim
Copy link
Author

hadim commented Feb 16, 2023

@BarclayII is there is a plan to bring the dgl package to conda-forge?

Compiling dgl itself on conda-forge is quite easy to do (thanks @mikemhenry, for the push with conda-forge/staged-recipes#18620) but the real bottleneck right now is that we need to package all the third-party libraries on conda forge first (conda-forge/staged-recipes#18620 (comment)).

The latest dgl 1.0.0 makes it even more challenging to install dgl CUDA packages since they are hosted on different conda channels. At least for us internally, it's not practical having to switch conda channels depending on the current CUDA version.

If the dgl team has bandwidth and will be willing to provide support on the task of packaging all the third-party libraries, that would be greatly appreciated. I am also happy to help, but I will unlikely be able to tackle everything myself.

The other benefit of a dgl conda forge package is that for any new python, CUDA and pytorch versions, the latest dgl package will be automatically rebuilt against those versions lowering down the packaging maintenance burden for the dgl team. So, I feel it's like a good thing to do since on the short term it will require a significant amount of work (packaging third-party libraries) but on the long term and thanks to the conda forge infrastructure, the packaging burden will be handled automatically by conda forge itself.

@mikemhenry
Copy link

It is on my backlog, but I am not a cmake wizard so it would be much faster if someone from the dgl could setup the cmake to look for the 3rd party libraries, and if it can't find them, use the packages as vendored.

That should make it easy to patch then on the feedstock to use the 3rd party libraries that exist on conda-forge and anything missing we can vendor (or package).

@BarclayII
Copy link
Collaborator

@mikemhenry Is there an example conda-forge recipe that looks for conda-forge 3rd party libraries if available and uses its own 3rd party libraries if not? I can help on the CMake part but I'm not sure how to interact with conda-forge environments.

@hadim
Copy link
Author

hadim commented Feb 23, 2023

@BarclayII I don't think this is specific to conda-forge here since conda-forge will always install all the dependencies of your package (the third-party libraries) in ${PREFIX}. It means that any standard FindXXX.cmake file will be able to find the correct dep (both include and .so files).

So, the CMake modifications are mostly about adding a boolean flag that will either look for the installed third party or either compile and use the one from this repo. The logic could look something like that:

cmake .. \
    ${CMAKE_ARGS} \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=${PREFIX} \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DBUILD_SHARED_LIBS=ON \
    -DWITH_EXTERNAL_METIS=True \
    ${EXTRA_CMAKE_ARGS}
message("WITH_EXTERNAL_METIS=${WITH_EXTERNAL_METIS}")
if( WITH_EXTERNAL_METIS )
  message("WITH_EXTERNAL_METIS evaluates to True")
  find_package(Metis)
  target_link_libraries(DGL PUBLIC Metis::Metis)
  target_include_directories(DGL SYSTEM PUBLIC ${METIS_INCLUDE_DIRS})
else()
	# add the custom dgl logic to use metis from the third-party folder.
endif()

That being said, not all the dep ship a FindXXX.cmake by default but usually a quick googling can help here. As an example, I cannot find FindMETIS.cmake in the METIS folder but after a quick search I have been able to find https://github.com/elemental/Elemental/blob/6eb15a0da2a4998bf1cf971ae231b78e06d989d9/cmake/modules/FindMETIS.cmake

Beyond modifying the dgl CMake logic, I think the challenge is also in packaging all the third-party deps into conda-forge.

@mikemhenry
Copy link

@hadim Exactly!

I think the challenge is also in packaging all the third-party deps into conda-forge.

If the package is already on conda-forge, then we don't need to worry about it.
If the package is a header only lib, conda-forge is okay with that (or at least can be convinced)
There also is an option for making a static lib output as well if we have to: https://github.com/conda-forge/cfep/blob/main/cfep-18.md

My proposal is:

  1. Identify all the 3rd party libs dgl uses/needs
  2. Check which ones are on conda-forge
  3. Put missing libs on conda-forge if possible (this can be in the same PR to get dgl on conda-forge)
  4. Update the cmake build as @hadim describes. Optionally, we can patch some of this behavior if we can't get the changes made upstream in dgl

@hadim
Copy link
Author

hadim commented Feb 24, 2023

I don't think I'll have time to work on those packages in the short term, but I did a packages analysis in case it can help:

Status DGL Name Conda forge Name Package on c-f? DGL version Correct version on c-f? Comment
🔴 METIS metis yes 5.1.1 no 5.1.1 old PR
🔴 gklib gklib yes 5.1.1 maybe Embedded into METIS 5.1.1?
🟢 dlpack dlpack yes 0.7 yes check whether latest (0.8) is ok?
🟢 dmlc-core dmlc yes in between 0.4 and 0.5 yes check latest can be used?
🟢 googletest not needed?
🟢 libxsmm libxsmm yes 1.16 yes check latest 1.17 can be used?
🟢 nanoflann nanoflann yes in between 1.3.2 and 1.4.0 yes check latest can be used?
🟢 nccl nccl yes i guess latest can just be used here (CUDA related)
🔴 pcg no not clear dgl git sha from 08/04/2022 but latest git tag from 2015
🟢 phmap parallel-hashmap yes 1.32 yes check latest 1.33 can be used?
🟠 tensorpipe is it shipped with the pytorch package?
🔴 thrust thrust yes 1.17.1 no latest is 1.16.0
🔴 tvm no in between 0.7 and 0.8 no
🔴 xbyak xbyak yes 6.00 no latest on cf is 5.992

For any of those packages, it might be useful to get infos from a DGL devs for the below:

  • Should/could we use the latest versions when the dgl git hash is older than the latest one?
  • Are all those deps required for packaging? (for example, I guess googletest is not required)

@mikemhenry
Copy link

mikemhenry commented Feb 24, 2023

Wow this is way better than I expected!
And amazing work @hadim

@BarclayII
Copy link
Collaborator

BarclayII commented Mar 2, 2023

  • Should/could we use the latest versions when the dgl git hash is older than the latest one?

All these packages can use the latest version I think.

  • Are all those deps required for packaging? (for example, I guess googletest is not required)

googletest is not required. I don't think TVM is required as well (unless @jermainewang wants to ship FeatGraph, which I don't think we do in our releases). So I think we don't need to package additional dependencies to conda-forge. Among these, I think thrust, pcg, phmap are header-only libraries. So I assume we don't need to detect system installations? @hadim

@mikemhenry
Copy link

That is great news, then the next step will be to make some CMake modifications so a user can tell cmake to look for system packages instead of the vendored ones

@hmacdope
Copy link
Contributor

@hadim @mikemhenry I have been making some progress on this with respect to the CMake changes. I'll update here as I go.

Main issue is that I have had to work from the 0.8.2 tag as dgl > 0.8.2 requires CUB >=1.17 due to this file here. CUB 1.17 is not on conda-forge (and is unlikely to be as the feedstock is now readonly). My understanding is that it has been moved inside CUDAToolkit for >= 12 which is not on conda-forge yet. Perhaps some DGL devs can confirm my logic here?

Working from that branch seems to be going well. I'll make an updated table soon.

@mikemhenry
Copy link

Awesome, I've been keeping tabs on getting cuda 12 on conda-forge, that is being tracked here: conda-forge/staged-recipes#21382

@BarclayII
Copy link
Collaborator

Main issue is that I have had to work from the 0.8.2 tag as dgl > 0.8.2 requires CUB >=1.17 due to this file here. CUB 1.17 is not on conda-forge (and is unlikely to be as the feedstock is now readonly). My understanding is that it has been moved inside CUDAToolkit for >= 12 which is not on conda-forge yet.

We are using CUB 1.17, but still with CUDA 10.2 and 11+ since it is a header-only library and it worked fine. Also DGL 0.8.2+ does not run on CUDA 12 for now.

@hmacdope
Copy link
Contributor

Right so a possible option could be to vendor thrust and cub and use a more updated version. Lets see how I go with my 0.82 branch first and then look into a more up to date version. The CMake changes should be not too bad to port across.

@jermainewang
Copy link
Member

The dependencies on cub, thrust and xbyak have been removed in 1.0 and we are on the way to removing other unnecessary dependencies such as tvm and tensorpipe. The remaining issue METIS and GKLib which are needed by some DGL APIs. How hard is it to have them on conda forge?

@mikemhenry
Copy link

I can look into getting those on conda-forge, @hmacdope this would be another argument for building a newer version

@hmacdope
Copy link
Contributor

hmacdope commented Jun 2, 2023

@jermainewang

The dependencies on cub, thrust and xbyak have been removed in 1.0 and we are on the way to removing other unnecessary dependencies such as tvm and tensorpipe. The remaining issue METIS and GKLib which are needed by some DGL APIs. How hard is it to have them on conda forge?

Vendored cub and thrust seem to still be being used with -DUSE_CUDA=ON at least from just looking at CMakeLists.txt is this not used in the actual compilation?

@mikemhenry
Copy link

mikemhenry commented Aug 18, 2023

This issue can be closed, we have 1.1.0 on conda forge and will work on the newer versions now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Feature request
Projects
Status: Done
Development

No branches or pull requests

6 participants