[RFC] c callbacks from Julia (proof-of-concept) #1208

wants to merge 2 commits into


None yet
4 participants

vtjnash commented Aug 24, 2012

reference issue #1096

This is a proof-of-concept showing a more complete interop between Julia and C code, allowing the creation of Julia callbacks that can be passed to arbitrary C code. There are several optimizations that can be made (i.e. I would like to make this an intrinsic so that it doesn't have to generate code at run-time), but the intended interface would be the same.

Here's some example code that I've been using for testing:

ccallback(f::Function,t::Type,a::Tuple) = ccall(:ccallback, Ptr{Void}, (Function,Type,Tuple,Ptr{Void}), f, t, a, C_NULL)

f64 = ccall(:ccallback, Ptr{Void}, (Function,Type,Tuple,Nothing), float64, Float64, (Int,), nothing)
ccall(f64, Float64, (Int,), 1)

sh = ccall(:ccallback, Ptr{Void}, (Function,Type,Tuple,Nothing), show, Any, (Int,), nothing)
ccall(sh, Nothing, (Int,), 1)

sorti(a,b) = (b[] - a[])
qsort{T}(a::Vector{T}, f::Function) = ccall(:qsort, Void, (Ptr{T}, Int, Int, Ptr{Void}), a, length(a), sizeof(T), ccallback(f, Int32, (Ptr{T},Ptr{T})))
x = [3,12,5,25,2]

add_one = ccall(:ccallback, Ptr{Void}, (Function,Type,Tuple,Nothing), square, Int, (Int,), nothing)
x = ccall(:compute_something, Uint, (Ptr{Void},), add_one)
# in C
#extern "C" DLLEXPORT long compute_something(long (*x)(long)) {
#    return x(-2)+1;

Note that the garbage collector is not fully aware of the operation of these callbacks. Do not allow a julia object parameter to go out of scope.

What's with this changing arrayref? It's not obvious to me why that would be affected. Otherwise this is a cool proof of concept.


vtjnash commented Aug 24, 2012

For convenience and efficiency, I added a definition to ref / arrayref that made dereferencing pointers easier. It wasn't strictly related, but it made calling qsort easier (and I've wanted to be able to more easily / natively dereference pointers).

So if a and b are pointers to numbers, i could rewrite this function

sorti(a,b) = (ai = pointer_to_array(a,(1,))[1]; bi = pointer_to_array(b,(1,)); b - a)

more clearly as

sorti(a,b) = (b[] - a[])

without the unnecessary creation of an array (which wasn't really providing any helpful bounds checking anyways)

It would also be relatively straightforward to write the inverse operation (assign) to make writing to c global variables easier:

convert(Ptr{Ptr{Uint8}}, dlsym(jl_lib, prompt_string))[] = "julia$ ".data

(although, as written, this would have issues with the string being garbage collected)

This is absolutely the wrong place to put this. This is a primitive accessor function for a particular data type.


vtjnash replied Aug 24, 2012

Do you want me to put together a separate pull request with this done right (with new primitive functions jl_pointerref and jl_pointerset)? I put it here as a quick hack while I demonstrated ccallback (expecting that almost everything would need to be rewritten before this could be merged anyways).

It feels better to me to make these compiler intrinsics, since they naturally correspond to load and store instructions. It will probably also mean less code to write. Actually I was close to scrapping jl_f_arrayref and making it an intrinsic too but it is useful for bootstrapping.


vtjnash replied Aug 24, 2012

Other than not needing two versions of the functions, what's the difference between an intrinsic and a known_call?

An intrinsic corresponds directly to llvm code, so the types of the arguments must be known at compile time. The jl_f_* functions are there to allow certain things to be done with run-time checks.


JeffBezanson commented Aug 24, 2012

Cool stuff! A couple comments on how to make this amazing:

  • It shouldn't depend on the address of the function, but indirect through the llvm function name so the code could potentially be saved.
  • The given argument types should be used to look up a specialized method with jl_get_specialization so the overhead of dynamic dispatch can be avoided.

vtjnash commented Aug 24, 2012

  • I'm not quite clear on how much the serializer can save of an llvm code block. Somewhere along the line, it needs to be converted into a pointer (possibly in ccall) that can be passed to arbitrary C code. I think it should be converted to a known_call, like ccall, to take better advantage of compiler optimizations (and earlier code generation). But then I'm really not sure what the return value would be.
  • Cool, thanks. I'll look into jl_get_specialization. I definitely want to avoid that overhead; I used jlapplygeneric_func while I tried to figure out how all the machinery interacted.(I started to make a very sparse jl_codectx_t in case this was needed). As a ccall, this is running rather late in the code generation phase, so it seemed to lack access to some of the llvm information used by emit_call and the like. I think I just don't know llvm well enough right now, but also that this could be generated at compile-time like a ccall.

JeffBezanson commented Aug 24, 2012

Currently our serializer doesn't save any llvm bitcode or native code at all. But in the future when we save bitcode or native code there will need to be symbolic references instead of bare addresses, which can be done I believe by using an llvm Function* as a value. Then I think llvm will take care of writing all the references properly.


ihnorton commented Oct 27, 2012

Any plan to finish/merge this pull? or is a different approach intended?


JeffBezanson commented Nov 2, 2012

I have an approach in mind and will get to it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment