Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompilation error on the Nvidia Jetson platform #101

Closed
coezmaden opened this issue Apr 26, 2020 · 8 comments
Closed

Precompilation error on the Nvidia Jetson platform #101

coezmaden opened this issue Apr 26, 2020 · 8 comments

Comments

@coezmaden
Copy link

Hi, firstly thanks for the work on the package.

I'm trying to get some code using LoopVectorization.jl running on the NVIDIA Jetson AGX Xavier.

Unfortunately it fails in the precompilation stage in REPL. Below is the complete stacktrace invoked by only including this package.

I'm using Julia 1.4.0 and LoopVectorization v0.6.30. The Jetson has a 64-Bit ARM CPU.

(@v1.4) pkg> status
Status ~/.julia/environments/v1.4/Project.toml
.
.
[bdcacae8] LoopVectorization v0.6.30
.
.

julia> using LoopVectorization
[ Info: Precompiling LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]
ERROR: LoadError: could not open file /home/coz/.julia/packages/VectorizationBase/FfrB7/src/cpu_info.jl
Stacktrace:
[1] include(::Module, ::String) at ./Base.jl:377
[2] include(::String) at /home/coz/.julia/packages/VectorizationBase/FfrB7/src/VectorizationBase.jl:1
[3] top-level scope at /home/coz/.julia/packages/VectorizationBase/FfrB7/src/VectorizationBase.jl:215
[4] include(::Module, ::String) at ./Base.jl:377
[5] top-level scope at none:2
[6] eval at ./boot.jl:331 [inlined]
[7] eval(::Expr) at ./client.jl:449
[8] top-level scope at ./none:3
in expression starting at /home/coz/.julia/packages/VectorizationBase/FfrB7/src/VectorizationBase.jl:215
ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/coz/.julia/compiled/v1.4/VectorizationBase/Dto5m_LPpax.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922
[6] include(::Module, ::String) at ./Base.jl:377
[7] top-level scope at none:2
[8] eval at ./boot.jl:331 [inlined]
[9] eval(::Expr) at ./client.jl:449
[10] top-level scope at ./none:3
in expression starting at /home/coz/.julia/packages/LoopVectorization/zXjmq/src/LoopVectorization.jl:3
ERROR: Failed to precompile LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890] to /home/coz/.julia/compiled/v1.4/LoopVectorization/4TogI_LPpax.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922

Any help would be appreciated.

@chriselrod
Copy link
Member

If you're willing to help, I think we can get things to work.

First of all, it looks like VectorizationBase failed to build. While building, it creates a file (using CpuId.jl) with a bunch of constants describing the host CPU.

CpuId.jl looks like it is Intel and AMD only (or x86 only?).
So the first thing needed would be a build script that is able to detect it's an ARM processor, and then if so correctly fill out these fields:

const REGISTER_SIZE = $register_size # bytes per register
const REGISTER_COUNT = $register_count # how many floating point registers?
const REGISTER_CAPACITY = $register_capacity # not used, but it was size * count
const FP256 = $(cpufeature(CpuId.FP256)) # not used, was supposed to distinguish zen1
const CACHELINE_SIZE = $(cachelinesize()) # 
const CACHE_SIZE = $cache_size # NTuple{3,Int} (L1, L2, L3), but I should generalize code that uses it for different numbers of cache levels 
const NUM_CORES = $num_cores # number of physical cores
const FMA3 = $(cpufeature(CpuId.FMA3)) # does it have fused multiply-add? This is used specifically for whether it has the `vfmadd231` instruction, to see if it can use asm call for that particular variant
const AVX2 = $(cpufeature(CpuId.AVX2)) # Does it have SIMD integer support?
const AVX512F = $(cpufeature(CpuId.AVX512F)) # Does it have AVX512?
const AVX512ER = $(cpufeature(CpuId.AVX512ER)) # does it have hardware exp2, and accurate hardware inverse and inverse square root?
const AVX512PF = $(cpufeature(CpuId.AVX512PF)) # avx512 prefetch extensions
const AVX512VL = $(cpufeature(CpuId.AVX512VL)) # do avx512 instructions work with shorter registers?
const AVX512BW = $(cpufeature(CpuId.AVX512BW)) # avx512 with 8- and 16-bit integer support?
const AVX512DQ = $(cpufeature(CpuId.AVX512DQ)) # avx512 with 32- and 64-bit integer support?
const AVX512CD = $(cpufeature(CpuId.AVX512CD)) # conflict detection, includes SIMD count-leading-zeroes

Do you happen to know a lot of low level details about ARM? Or how to query them?

Some of these constants could be renamed to generalize them across instruction sets, others can be split into general and specific versions (e.g., FMA3 for whether it has that specific instruction set and a more general HAS_FUSED_MULTIPLY_ADD which would be true for FMA4 on x86, and whatever the ARM equivalent is).

From there, we'd have to make sure SIMDPirates and SLEEFPirates work as intended.

@coezmaden
Copy link
Author

Hi thanks for your prompt response. Unfortunately I've got no practical experience with ARM chips or any low level instruction set programming . However I'm willing to help if needed. It seems like as you have mentioned the problem lies within the CpuId.jl.

julia> using CpuId
[ Info: Precompiling CpuId [adafc99b-e345-5852-983c-f28acb93d879]
error: couldn't allocate output register for constraint '{ax}'
ERROR: Failed to precompile CpuId [adafc99b-e345-5852-983c-f28acb93d879] to /home/coz/.julia/compiled/v1.4/CpuId/vMZBF_LPpax.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

Maybe I should redirect this to their issue board?

@chriselrod
Copy link
Member

You could, but it's likely that ARM is out of scope for CpuId.jl, in which case VectorizationBase would need an alternative means of getting info about the host computer.

@chriselrod
Copy link
Member

Could you see if this works?
JuliaSIMD/VectorizationBase.jl#9

@coezmaden
Copy link
Author

Could you see if this works?
chriselrod/VectorizationBase.jl#9

The v0.8.1 seems to work fine. Thank you!

(@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
...
  [bdcacae8] LoopVectorization v0.6.30
...

(@v1.4) pkg> update
...
  [bdcacae8]  LoopVectorization v0.6.30  v0.8.1
...

julia> using LoopVectorization
[ Info: Precompiling LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]

# pass 👍 

@DilumAluthge
Copy link
Member

@chriselrod @ozmaden Is this issue resolved?

@chriselrod
Copy link
Member

I think so, but I'd like to improve ARM support, especially as more SVE CPUs start appearing (recently Neoverse and A64FX).

@coezmaden
Copy link
Author

Haven't had problems since the last comment in May, so I think it is resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants