Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reinterpret performance #28707

Merged
merged 1 commit into from
Aug 17, 2018
Merged

Fix reinterpret performance #28707

merged 1 commit into from
Aug 17, 2018

Conversation

Keno
Copy link
Member

@Keno Keno commented Aug 16, 2018

This fixes #25014 by making it more obvious what's going on to LLVM.
Instead of a memcpy loop, we use a new intrinsic that puts an actual
llvm.memcpy into the IR, which is enough for LLVM to fold everything
away. In the benchmark from #25014, we still see some regressions from
0.6, but that is because it needs to dereference through the pointers
in the reinterpret and reshape wrappers. In any real code, that
dereferencing should be loop-invariantly moved out of the inner loop.

@Keno Keno changed the title Fix reinterpret performnace Fix reinterpret performance Aug 16, 2018
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 16, 2018

Glad to see this works. Can we just transparently make this he implementation of ccall(:memcpy) (it looks like we don’t need the extra alignment argument right now)

@tknopp
Copy link
Contributor

tknopp commented Aug 16, 2018

@Keno Does this make the alignment workaround

A = Mmap.mmap(fdio(fd), Array{UInt8,1}, prod(dims)*sizeof(T), offset)
reshape(reinterpret(T,A),dims)

as fast as

A = Mmap.mmap(fdio(fd), Array{T,dims}, dims, offset)

under 0.6? I needed to change this here
https://github.com/JuliaIO/HDF5.jl/blob/master/src/HDF5.jl#L1651
in order to make code work that failed in 0.7 due to the alignment issue.

@Keno
Copy link
Member Author

Keno commented Aug 16, 2018

Can we just transparently make this he implementation of ccall(:memcpy)

Ok, let's do it that way. We can always add the aligned version if we need it later.

@Keno
Copy link
Member Author

Keno commented Aug 16, 2018

Does this make the alignment workaround [...] fast

Probably

This fixes #25014 by making it more obvious what's going on to LLVM.
Instead of a memcpy loop, we use a ccall to :memcpy and turn this into
llvm.memcpy at the IR level, which is enough for LLVM to fold everything
away. In the benchmark from #25014, we still see some regressions from
0.6, but that is because it needs to dereference through the pointers
in the reinterpret and reshape wrappers. In any real code, that
dereferencing should be loop-invariantly moved out of the inner loop.
@Keno Keno merged commit 777810b into master Aug 17, 2018
@StefanKarpinski StefanKarpinski deleted the kf/reinterpretperf branch August 17, 2018 17:11
@StefanKarpinski
Copy link
Sponsor Member

This seems eligible for 1.0.1, right?

@Keno
Copy link
Member Author

Keno commented Aug 17, 2018

Yes, already has the backport label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance concerns with ReinterpretArray
5 participants