Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge import latency caused by Term, GPUArrays, and CUDA #156

Open
Moelf opened this issue Jan 10, 2023 · 17 comments
Open

Huge import latency caused by Term, GPUArrays, and CUDA #156

Moelf opened this issue Jan 10, 2023 · 17 comments

Comments

@Moelf
Copy link
Contributor

Moelf commented Jan 10, 2023

julia> @time_imports using XGBoost
     17.5 ms  SparseMatricesCSR
     17.1 ms  AbstractTrees
      7.5 ms  OrderedCollections
      0.1 ms  SnoopPrecompile
     68.5 ms  Parsers 11.99% compilation time
     12.0 ms  StructTypes
     71.2 ms  JSON3
      0.2 ms  DataValueInterfaces
      1.2 ms  DataAPI
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  TableTraits
     13.6 ms  Tables
      3.7 ms  DocStringExtensions 75.50% compilation time
     62.0 ms  Highlights
      1.3 ms  MyterialColors
      0.1 ms  UnPack
      0.3 ms  Parameters
      3.1 ms  ProgressLogging
      1.2 ms  UnicodeFun
      1.8 ms  CodeTracking
    958.0 ms  Term
      4.1 ms  CEnum
     20.1 ms  Preferences
      0.4 ms  JLLWrappers
      5.7 ms  LLVMExtra_jll 59.12% compilation time
    186.4 ms  LLVM 41.51% compilation time (100% recompilation)
      0.4 ms  ExprTools
    132.5 ms  TimerOutputs 11.86% compilation time
    551.1 ms  GPUCompiler 1.29% compilation time
      0.4 ms  Adapt
      0.1 ms  Reexport
      3.5 ms  GPUArraysCore
    876.0 ms  GPUArrays
      0.6 ms  Requires
      8.6 ms  BFloat16s
      0.7 ms  CompilerSupportLibraries_jll
     32.6 ms  RandomNumbers 33.48% compilation time (13% recompilation)
      7.7 ms  Random123
      1.2 ms  Compat
     82.5 ms  ChainRulesCore
     11.3 ms  AbstractFFTs
   1625.2 ms  CUDA 0.75% compilation time
     16.4 ms  CUDA_Driver_jll 94.73% compilation time
      0.2 ms  CUDA_Runtime_jll
      2.4 ms  XGBoost_jll
      9.5 ms  XGBoost
@Moelf Moelf changed the title Huge import latency caused Huge import latency caused by Term and CUDA Jan 10, 2023
@Moelf Moelf changed the title Huge import latency caused by Term and CUDA Huge import latency caused by Term, GPUArrays, and CUDA Jan 10, 2023
@ExpandingMan
Copy link
Collaborator

I suppose an argument can be made for removing Term, but how would you propose to remove CUDA without removing important GPU features?

@Moelf
Copy link
Contributor Author

Moelf commented Jan 10, 2023

I'm wondering if you just need GPUArrays.jl for interface

@ExpandingMan
Copy link
Collaborator

ExpandingMan commented Jan 10, 2023

Ah, that's a good point... technically this supports CuArray and not any other type of GPU array, but I suppose it wouldn't be so bad to use GPUArray and let it crash in some other way of somebody tries to use something other than a CuArray (pretty sure libxgboost will crash somewhere).

Anyone else have opinions on this? I'm definitely willing to do the PR and make the change if there is consensus, but I'm not completely confident that nobody will take issue with it.

@Moelf
Copy link
Contributor Author

Moelf commented Jan 10, 2023

yeah, thanks for the GPU support, I'm just experience slow loading time on my poor cluster login node so no rush

@devmotion
Copy link
Contributor

IMO even GPUArrays alone seems to increase loading times quite significantly. It's nice to have GPU support but personally usually I do not have access to dedicated GPUs, so to me it rather seems to be an extension than a basic part of the package (also highlighted by the fact that GPU support was added only recently). Maybe it would be better to use weak dependencies on Julia >= 1.9 and Requires on older Julia versions (https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)).

@Moelf
Copy link
Contributor Author

Moelf commented Jan 11, 2023

@Moelf
Copy link
Contributor Author

Moelf commented Jan 11, 2023

It's nice to have GPU support but personally usually I do not have access to dedicated GPUs, so to me it rather seems

yeah I agree with this

@ExpandingMan
Copy link
Collaborator

I'm not sure exactly what can be done with weak dependencies, but I'm certainly open to exploring it once 1.9 is released.

Personally I'm not too fond of the argument that stuff should be removed just because it has to compile. Compilation is just part of life, it's an issue for the compiler, not individual packages. That dependencies should be trimmed where possible seems like a significantly better argument to me.

That said, I'm at least open to all the specific proposals made here: yes using GPUArrays only sounds good to me, I'm happy to embrace weak dependencies, and arguably the functionality I'm using Term for belongs in another package. On the other hand I don't like the idea of having a separate package for GPU support at all.

Of course I'm not the only maintainer of this, so my opinion is hardly authoritative.

@andyferris
Copy link

andyferris commented Feb 1, 2023

I too am having issues with unexpectedly depending on CUDA. In my case I am using PackageCompiler - here's a (cut down) snippet:

julia> using PackageCompiler; create_app(".", "dist")'
PackageCompiler: bundled artifacts:
  ├── CUDA_Runtime_jll - 1.858 GiB
  └── ...
  Total artifact file size: 2.134 GiB

I would really like my bundle to be 2GB smaller. I'm assuming just using GPUArrays will at least fix the worst of the problem for now, and we can look at weak dependencies later?

@tylerjthomas9
Copy link
Contributor

tylerjthomas9 commented Feb 1, 2023

CUDA_Runtime_jll - 1.858 GiB

Are you running on a machine with CUDA 11? If so, you might be able to eliminate the large CUDA artifacts by switching. The CUDA_Runtime_jll dependency comes from XGBoost_jll when it is build with CUDA. The non-cuda builds do not have it.

https://github.com/JuliaPackaging/Yggdrasil/blob/2b49bcda328332a02b0d6ee3926bafb3f031d026/X/XGBoost/build_tarballs.jl#L92

I wonder if we would be able to handle the CUDA deps with the new conditional package loading: https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)

@devmotion
Copy link
Contributor

I wonder if we would be able to handle the CUDA deps with the new conditional package loading:

That was what I was referring to with weak dependencies in #156 (comment) 🙂 It's great, I'm already using it in a few packages, but it requires Julia 1.9 - if one wants to support the conditional features on older Julia versions one either has to add the weak dependencies as hard dependencies on these Julia versions or use Requires (which does not support precompilation). In the case of XGBoost, the most natural approach would seem to keep the already existing dependencies as hard dependencies on Julia < 1.9.

@nickrobinson251
Copy link

nickrobinson251 commented Feb 3, 2023

Just to add that at RAI we're in exactly the same situation @andyferris mentioned above, where we're using PackageCompiler and do not want to include CUDA_Runtime_jll or CUDA_Driver_jll.

But given these deps seem to have been added in XGBoost_jll at v1.7.1+1 i don't think there's even a way to pin the version to avoid getting these? And so what we're having to do is just manually not check-in the XGBoost-related changes to the Manifest.toml, which is really disruptive and error-prone

e.g. when updating a unrelated package we see things like

(Repo) pkg> add --preserve=all XUnit @1.1.5
   Resolving package versions...
    Updating `~/Project.toml`
  [3e3c03f2]  XUnit v1.1.4  v1.1.5
    Updating `~/Manifest.toml`
  [3e3c03f2]  XUnit v1.1.4  v1.1.5
  [4ee394cb] + CUDA_Driver_jll v0.3.0+0
  [76a88914] + CUDA_Runtime_jll v0.3.0+2
⌃ [a5c6f535]  XGBoost_jll v1.7.1+0  v1.7.1+1
        Info Packages marked with ⌃ have new versions available and may be upgradable.
Precompiling project...
  ✓ XGBoost_jll
  ✓ XUnit
  ✓ XGBoost
  ✓ Repo
  4 dependencies successfully precompiled in 42 seconds. 259 already precompiled. 1 skipped during auto due to previous errors.

@ExpandingMan
Copy link
Collaborator

We will definitely demote both CUDA and Term to weak dependencies once 1.9 is officially released. In the meantime, if someone wants to replace CUDA with GPUArrays, we can probably make that work. Otherwise, it looks like it's just going to have to wait. Suffice it to say, this problem is hardly unique to this package.

@andyferris
Copy link

i don't think there's even a way to pin the version to avoid getting these?

@nickrobinson251 I did manage to pin both XGBoost at 2.0.2 and XGBoost_jll at 1.6.2+0, and that seemed to work for me.

Looking forward to Julia 1.9 still :)

@andyferris
Copy link

As a follow up of this, I see there are some weakdep stuff in the Project.toml, and when installing on Julia 1.9.2 I get this:

(@v1.9) pkg> add XGBoost
   Resolving package versions...
    Updating `~/.julia/environments/v1.9/Project.toml`
  [009559a3] + XGBoost v2.3.0
    Updating `~/.julia/environments/v1.9/Manifest.toml`
  [fa961155] + CEnum v0.4.2
  [9a962f9c] + DataAPI v1.15.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [692b3bcd] + JLLWrappers v1.4.1
  [0f8b85d8] + JSON3 v1.13.1
  [69de0a69] + Parsers v2.7.1
  [a0a7dd2c] + SparseMatricesCSR v0.6.7
  [856f2bd8] + StructTypes v1.10.0
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.10.1
  [009559a3] + XGBoost v2.3.0
  [4ee394cb] + CUDA_Driver_jll v0.5.0+1
  [76a88914] + CUDA_Runtime_jll v0.6.0+0
  [1d63c593] + LLVMOpenMP_jll v15.0.4+0
  [a5c6f535] + XGBoost_jll v1.7.5+2
  [4af54fe1] + LazyArtifacts
  [37e2e46d] + LinearAlgebra
  [a63ad114] + Mmap
  [2f01184e] + SparseArrays
  [10745b16] + Statistics v1.9.0
  [4607b0f0] + SuiteSparse
  [8dfed614] + Test
  [e66e0078] + CompilerSupportLibraries_jll v1.0.5+0
  [4536629a] + OpenBLAS_jll v0.3.21+4
  [bea87d4a] + SuiteSparse_jll v5.10.1+6
  [8e850b90] + libblastrampoline_jll v5.8.0+0

I see that Term and CUDA aren't installed, which is great, but there seems to be a bunch of GPGPU stuff downloaded regardless (this is a lot of stuff for a docker image intended for a machine that doesn't even have an nVidia GPU). Do we need to do the weakdep thing in XGBoost_jll too? (Will that work? Will the XGBoost_jll .so/.dll load and run fine in CPU mode if it can't find the CUDA .so/.dll files?)

@devmotion
Copy link
Contributor

Yes, the JLL still pulls in CUDA JLL dependencies. I asked about weakdeps/optional dependencies and GPU-/non-GPU binaries in some issue in Yggdrasil a while ago but there does not seem to be a solution to this problem yet. (Some other libraries are built both in a GPU and non-GPU version on Yggdrasil but last time I checked there was no Julia package actually tried to combine them - also just depending on both would still pull in all the undesired binaries...).

@andyferris
Copy link

Ah I see - thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants