I am working on a project that involves a large network of small ODE systems (that are all the same) connected by linear equations, and I am wondering if it is worth it to try to modify DiffEqGPU.jl to allow support for discrete and continuous callbacks to be able to affect other sub-systems. I know right now ensembleGPU is designed for parameter and initial condition searches in parallel, but I am imagining it would be super useful to allow these parallel systems to interact. I know this would require synchronizing the threads at every step, but the alternative is to essentially do the same thing, but sync the threads and then instead of having one kernel that just synchronizes itself, we have to launch a new kernel for each new step after the sync threads.