Skip to content
Gregor Daiß edited this page Oct 9, 2020 · 14 revisions

Part 1 Prepare Octo-Tiger for Kokkos:

  • Adapt toolchain (use spack)
  • Adapt Octotiger to compile with Kokkos (split main file, host-only blacklist, patch nvcc_wrapper, ...)
  • Remodel GPU kernel buffer management (create memory pools for arbitrary host/gpu/kokkos data - avoid device malloc)
  • Remodel GPU execution management (go from thread_local cuda_streams to executor pools - avoid stream creations)
  • Adapt CPU/GPU launch inteface for the pools
  • Remove Vc from headers

Part 2 Port Gravity Module to Kokkos

  • Remove all thread_local workarounds in the gravity module
  • Adapt current CUDA implementation to work with the memory and executor pools (keeps those working with the rest)
  • Remove old CUDA management
  • Create Kokkos Kernel for the Monopole Interactions
  • Adapt Mikaels Kokkos executors and create unified interface for launching Kokkos Kernels on the CPU and GPU (~ 1 week including cleanup)
  • Create Kokkos Kernel for the Multipole Interactions (~ 1-2 weeks)
  • Evaluate Kokkos vs Cuda performance (Concern: Needless fencing in Kokkos) - In Progress
  • Bonus: Test on AMD - In Progress
  • Merge master into kokkos branch

Part 3 Port flux method

  • Refactor flux scalar single core
  • Refactor flux to use explicit SIMD
  • Make datastructure more GPU-friendly
  • Create first GPU (CUDA?) kernel
  • Integrate GPU kernel into existing launch infrastructure

Part 4 Port Reconstruct method

  • Switch datastructure over to the flux-way of doing things
  • Refactor reconstruct to more easily use the available parallelism (at least 2 weeks given my experience trying to port the last reconstruct, better to plan for extra time)
  • Create Basic reconstruct GPU kernel
  • Interface reconstruct kernel with the existing GPU Infracstructure (see part 2)
  • Evaluate need for optimizations