You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#170
The latest profile, while solving from EnsembleGPUKernel, raises some questions:
Some overheads are discussed here for potential improvements EnsembleGPUKernel for Tsit5.
Converting the solution back to CuArrays.
The reason for this overhead (converting to CPU Arrays) is to provide users access to something like sol[i].u[j] where i,j are some indexes. It would cause scalar indexing on ts,us, which are CuArrays.
Possible workaround: Leave it to the user to convert to CPU Arrays if it needs to index the solution.
Ensemble problem creation for parameter parallelism
The probs creation within the DiffEqGPU seems to be necessary, but maybe it could be pulled out of DiffEqGPU? Currently, it was done to adhere to the DiffEqGPU way of handling it. This was not coming in the previous benchmarks because ps was being built separately and passed to the vectorized_solve.
Possible workaround: create ps or u0s and pass them into DiffEqGPU instead of only specifying the trajectories, and the library handles the rest.
If we don’t convert to CPU Arrays, we’ll get good performance (~2x faster) plus if we let user build ps (instead of asking the trajectories and building ourselves), we’ll probably reach the desired benchmark.
The text was updated successfully, but these errors were encountered:
Possible workaround: Leave it to the user to convert to CPU Arrays if it needs to index the solution.
We can make that be an option (with a val type). But we can also
The probs creation within the DiffEqGPU seems to be necessary, but maybe it could be pulled out of DiffEqGPU? Currently, it was done to adhere to the DiffEqGPU way of handling it. This was not coming in the previous benchmarks because ps was being built separately and passed to the vectorized_solve.
I think for that, we can have a documented lower level API for people who really want to pull as much speed out as possible. On that note, we should make some real docs.
#170
The latest profile, while solving from
EnsembleGPUKernel
, raises some questions:Some overheads are discussed here for potential improvements
EnsembleGPUKernel
forTsit5
.The reason for this overhead (converting to CPU Arrays) is to provide users access to something like sol[i].u[j] where i,j are some indexes. It would cause scalar indexing on
ts,us
, which are CuArrays.Possible workaround: Leave it to the user to convert to CPU Arrays if it needs to index the solution.
The probs creation within the
DiffEqGPU
seems to be necessary, but maybe it could be pulled out of DiffEqGPU? Currently, it was done to adhere to the DiffEqGPU way of handling it. This was not coming in the previous benchmarks because ps was being built separately and passed to thevectorized_solve.
Possible workaround: create
ps
oru0s
and pass them intoDiffEqGPU
instead of only specifying the trajectories, and the library handles the rest.If we don’t convert to CPU Arrays, we’ll get good performance (~2x faster) plus if we let user build ps (instead of asking the trajectories and building ourselves), we’ll probably reach the desired benchmark.
The text was updated successfully, but these errors were encountered: