Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze into StackOverflow when JULIA_DEBUG=CUDA set #1721

Closed
ToucheSir opened this issue Jan 7, 2023 · 2 comments
Closed

Freeze into StackOverflow when JULIA_DEBUG=CUDA set #1721

ToucheSir opened this issue Jan 7, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@ToucheSir
Copy link
Contributor

Describe the bug

On 3.12.1, trying to invoke many functions (e.g. device()) seems to freeze the REPL and peg one CPU core for multiple minutes. The internal error shown below is then reported. This only happens with JULIA_DEBUG=CUDA set.

To reproduce

$ env JULIA_DEBUG=CUDA julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.4 (2022-12-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CUDA

julia> cu([1])
Internal error: stack overflow in type inference of string(String, Module, Any...).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of setindex!(Base.RefValue{Union{Exception, String}}, StackOverflowError).
This might be caused by recursion over very long tuples or argument lists.
error: <unknown>:0: starting new .cfi frame before finishing the previous one

This is running on a fresh env on Cyclops.

Expected behavior

Debug messages should work.

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.8.4
Commit 00177ebc4fc (2022-12-23 21:32 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  Threads: 1 on 12 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/cuda/lib64

Details on CUDA:

# same as cyclops

Additional context

I tried seeing how far back this goes, but I ran into precompile errors with 3.10 and 3.11. 3.8 does not appear to have this issue.

@ToucheSir ToucheSir added the bug Something isn't working label Jan 7, 2023
@maleadt
Copy link
Member

maleadt commented Jan 7, 2023

Only on 3.12.1, not on 3.12? Can you bisect?

@ToucheSir
Copy link
Contributor Author

ToucheSir commented Jan 7, 2023

Yes, just tested. Not sure how to bisect given the precompilation errors (#1558 ?) on intermediate versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants