Kokkos

Adapt toolchain (use spack)
Adapt Octotiger to compile with Kokkos (split main file, host-only blacklist, patch nvcc_wrapper, ...)
Remodel GPU kernel buffer management (create memory pools for arbitrary host/gpu/kokkos data - avoid device malloc)
Remodel GPU execution management (go from thread_local cuda_streams to executor pools - avoid stream creations)
Adapt CPU/GPU launch inteface for the pools
Remove Vc from headers

Remove all thread_local workarounds in the gravity module
Adapt current CUDA implementation to work with the memory and executor pools (keeps those working with the rest)
Remove old CUDA management
Create Kokkos Kernel for the Monopole Interactions
Adapt Mikaels Kokkos executors and create unified interface for launching Kokkos Kernels on the CPU and GPU (~ 1 week including cleanup)
Create Kokkos Kernel for the Multipole Interactions (~ 1-2 weeks)
Evaluate Kokkos vs Cuda performance (Concern: Needless fencing in Kokkos) - In Progress
Bonus: Test on AMD - In Progress
Merge master into kokkos branch

Switch datastructure over to the flux-way of doing things
Refactor reconstruct to more easily use the available parallelism (at least 2 weeks given my experience trying to port the last reconstruct, better to plan for extra time)
Create Basic reconstruct GPU kernel
Interface reconstruct kernel with the existing GPU Infracstructure (see part 2)
Evaluate need for optimizations

Provide feedback