I would like advice on how to proceed with an enhancement to Julia and GPU arrays. I'd like to figure out how to speed up (if possible) BigFloat performance - perhaps at first for a subset of BigFloat like Float128. My first thought is to try rewriting MPFR routines directly in Julia to avoid the ccall overhead. It would be even better if the Julia implementation would then also "just work" with GPUArrays. Any advice on whether this is a worthwhile endeavor would be greatly appreciated.