on cuda < 12, we return a cuFunction from get_kernel(), create an xfail test which launches the kernel on two distinct contexts