Scope of improvements in EnsembleGPUKernel #171

utkarsh530 · 2022-08-02T18:44:10Z

#170
The latest profile, while solving from EnsembleGPUKernel, raises some questions:

Some overheads are discussed here for potential improvements EnsembleGPUKernel for Tsit5.

Converting the solution back to CuArrays.
The reason for this overhead (converting to CPU Arrays) is to provide users access to something like sol[i].u[j] where i,j are some indexes. It would cause scalar indexing on ts,us, which are CuArrays.

Possible workaround: Leave it to the user to convert to CPU Arrays if it needs to index the solution.

Ensemble problem creation for parameter parallelism
The probs creation within the DiffEqGPU seems to be necessary, but maybe it could be pulled out of DiffEqGPU? Currently, it was done to adhere to the DiffEqGPU way of handling it. This was not coming in the previous benchmarks because ps was being built separately and passed to the vectorized_solve.

Possible workaround: create ps or u0s and pass them into DiffEqGPU instead of only specifying the trajectories, and the library handles the rest.

If we don’t convert to CPU Arrays, we’ll get good performance (~2x faster) plus if we let user build ps (instead of asking the trajectories and building ourselves), we’ll probably reach the desired benchmark.

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2022-08-04T00:49:59Z

Possible workaround: Leave it to the user to convert to CPU Arrays if it needs to index the solution.

We can make that be an option (with a val type). But we can also

The probs creation within the DiffEqGPU seems to be necessary, but maybe it could be pulled out of DiffEqGPU? Currently, it was done to adhere to the DiffEqGPU way of handling it. This was not coming in the previous benchmarks because ps was being built separately and passed to the vectorized_solve.

I think for that, we can have a documented lower level API for people who really want to pull as much speed out as possible. On that note, we should make some real docs.

utkarsh530 · 2022-08-04T23:28:49Z

Sounds good to me. I will start writing some documentation for it. I can help setting up docs page for it, something aligned with SciMLDocs.

utkarsh530 mentioned this issue Aug 17, 2022

GPU [EnsembleGPUAutonomous] has no performance advantage over CPU [EnsembleThreads]? #175

Closed

utkarsh530 mentioned this issue Oct 26, 2022

Recent benchmarks of Kernelized GPU ODE solvers with MPGOS #187

Closed

ChrisRackauckas closed this as completed Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope of improvements in EnsembleGPUKernel #171

Scope of improvements in EnsembleGPUKernel #171

utkarsh530 commented Aug 2, 2022 •

edited

Loading

ChrisRackauckas commented Aug 4, 2022

utkarsh530 commented Aug 4, 2022

Scope of improvements in EnsembleGPUKernel #171

Scope of improvements in EnsembleGPUKernel #171

Comments

utkarsh530 commented Aug 2, 2022 • edited Loading

ChrisRackauckas commented Aug 4, 2022

utkarsh530 commented Aug 4, 2022

utkarsh530 commented Aug 2, 2022 •

edited

Loading