Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Cstring when passing strings to C #550

Merged
merged 1 commit into from
Aug 14, 2019

Conversation

bicycle1885
Copy link
Member

@bicycle1885 bicycle1885 commented Apr 24, 2019

HDF5.jl currently uses Ptr{UInt8} instead of Cstring when it passes strings to HDF5 functions. This causes problems if we try to handle substrings such as:

using HDF5

# filepath will be a SubString object.
filepath, grouppath = split("test/tt1.h5:/x", ':')

# This is OK.
file = h5open(String(filepath))
close(file)

# But this is not.
file = h5open(filepath)
close(file)

This fix is a minimal patch to fix the problem demonstrated above. I think we need a more thorough investigation of the wrapper functions to completely fix that. If you like this idea, I will add more patches here.

@tknopp
Copy link
Contributor

tknopp commented Apr 24, 2019

LGTM: Could you add a test?

@bicycle1885
Copy link
Member Author

Sure. I will add unit tests. This is just a prototype to show my approach to fix the problem.

@musm
Copy link
Member

musm commented Apr 24, 2019

would also be great to see where else in the library this change can be made

@bicycle1885
Copy link
Member Author

Another fix would be to normalize string types to String in high-level functions. But it still makes it possible to call low-level functions in inappropriate ways. So, I think this is the safest and fundamental fix to the problem.

@musm
Copy link
Member

musm commented Apr 24, 2019

I don't think that is what we should do (i.e. normalization). The approach here, by fixing it in the low level functions is the most correct way to go about this.

@bicycle1885
Copy link
Member Author

Agreed. Anyway, we should use Cstring instead of Ptr{UInt8} for GC safety.

@yuyichao
Copy link
Contributor

The two has no difference in GC safety.

@bicycle1885
Copy link
Member Author

Oh, I don't understand Julia's GC rooting then. I thought that because Cstring generates a GC frame while Ptr{UInt8} does not like this:

julia> f(x) = ccall(:getenv, Cstring, (Cstring,), x)
f (generic function with 1 method)

julia> code_llvm(f, (String,))

;  @ REPL[1]:1 within `f'
define i64 @julia_f_12244(%jl_value_t addrspace(10)* nonnull) {
top:
  %1 = alloca %jl_value_t addrspace(10)*, i32 2
  %gcframe = alloca %jl_value_t addrspace(10)*, i32 3
  %2 = bitcast %jl_value_t addrspace(10)** %gcframe to i8*
  call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 24, i32 0, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"()
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i64 -15552
...

julia> f(x) = ccall(:getenv, Cstring, (Ptr{UInt8},), x)
f (generic function with 1 method)

julia> code_llvm(f, (String,))

;  @ REPL[3]:1 within `f'
define i64 @julia_f_12252(%jl_value_t addrspace(10)* nonnull) {
top:
; ┌ @ pointer.jl:59 within `unsafe_convert'
; │┌ @ pointer.jl:143 within `pointer_from_objref'
    %1 = addrspacecast %jl_value_t addrspace(10)* %0 to %jl_value_t addrspace(11)*
    %2 = addrspacecast %jl_value_t addrspace(11)* %1 to %jl_value_t*
; │└
; │┌ @ pointer.jl:155 within `+'
    %3 = bitcast %jl_value_t* %2 to i8*
    %4 = getelementptr i8, i8* %3, i64 8
    %5 = ptrtoint i8* %4 to i64
; └└
  %6 = call i64 inttoptr (i64 140308921968352 to i64 (i64)*)(i64 %5)
  ret i64 %6
}

@yuyichao
Copy link
Contributor

Oh, I don't understand Julia's GC rooting then. I thought that because Cstring generates a GC frame while Ptr{UInt8} does not like this:

Never use any of the code_* if you can't understand all of the output, especially not llvm and native.................................................................................

Just look at code_typed, you are just looking at the much less efficient code generated to scan the embeded NUL character in a string. x never need to be rooted in either case and it does NOT take that much code to root something.

@bicycle1885
Copy link
Member Author

Okay.

@musm
Copy link
Member

musm commented Apr 25, 2019

@bicycle1885 Tests would be great,
and yeah, it seems like most of the wrappers that take a pathname would benefit from similar change

@musm
Copy link
Member

musm commented Jun 12, 2019

bump

@musm musm merged commit bdf4625 into JuliaIO:master Aug 14, 2019
@musm
Copy link
Member

musm commented Aug 14, 2019

this is strictly better than what we have so merging. Hopefully we can address the improvements brought up in this PR discussion in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants