Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the DPC++ and LevelZero device driver #486

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Commits on Feb 13, 2023

  1. Since PR ICLDisco#482 we don't copy these files in the build director…

    …y, so we must point to the source ones
    therault committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    3b578c3 View commit details
    Browse the repository at this point in the history
  2. Introduce the DPC++ and LevelZero device driver, enable this device i…

    …n DTD and PTG.
    
    This branch is based on common_gpu and should be merged only after
    common_gpu
    
    Add a new level_zero device (WIP)
    
     - copy device_cuda in device_level_zero and rename things
     - module_init and module_fini for level_zero
    
    Need to factorize a little bit more.
    
    Factorizing (need to do it in base)
    
    Port above new common
    
    Add DPC++ to the loop...
    
      - Add multiple CMake logic files and commands
      - jdf2c.c now generates dpcpp output files when needed
      - make DEV_DPCPP be an alias to DEV_LEVEL_ZERO
      - Command Lists for I/O (streams of id 0 and 1) are still immediate
      - Command Lists for computations (streams of id >= 2) are now normal lists connected to a queue
        that queue exists as a compute level-zero queue and as a DPC++ queue
      - Missing compilation logic to compile generated dpc++ code and link it with the target binary
    
    Risk: it is unclear that the user can still push orders / events in the command list, after it is closed,
    and it is necessary to close it to force the orders to be pushed on the queue. I might need to create a
    new command list after each close, and attach the command list to the event for garbage collection.
    
    Adapt findlevel-zero.cmake to support systems where pkg-config is broken
    
    Re-enable Level Zero test; update to latest level zero / oneAPI API
    
    Update wrapper to allow testing both CUDA and Level Zero with new Level Zero update
    
    use_cuda / use_cuda_index have been renamed to follow proper naming scheme; do the same for level_zero
    
    Try to automate DPCPP generated code compilation; fix ordinal of memory allocation request in wrapper.
    
    Command Lists need to be sent to the Command Queue if they are not created immediate (and they cannot be immediate if we want to get their Command Queue, which is necessary for the DPC++ interface)
    
    Typo and multiple CMake fixes to make CMake link with DPCPP generated files
    
    Add a standalone test for Zero Level capability and integration with DPC++ kernels
    
    Rebase the entire Level Zero driver based on the susbsystem test
    
    Buffer interface is not required. We can use the USM OneMKL interface, it seems to work ok. Need to check for performance.
    
    We cannot mix immediate and non-immediate command lists apparently. Or at least it makes the passing of command queues unreliable
    
    There is an exception in data.c how we handle GPU copies, it must be ported to Level Zero too.
    
    The Level Zero runtime has a atexit procedure to delete command queues, and this seems to conflict with our own actions to delete the command queues...
    
    Porting of the DTD GEMM test to Level Zero
    
    NULL is not a valid MPI datatype when compiling with a clone of MPICH. The value doesn't matter in this case, just cast
    
    Manage LEVEL_ZERO devices in DTD
    
    Accept LEVEL_ZERO devices in the PTG generated code
    
    Some fixes in device level_zero
    
    Temp fix for termination detection -- tag size must be made portable. TODO!
    
    Support LEVEL_ZERO devices in the DSL tests
    
    Fix the subsystem test. Need to backport fixes in the MCA device
    
    Fully functional sketch for level zero
    
    Use level-zero fences to synchronize command lists and command queues, because command lists (or work) submitted to the command queues by SYCL (typically oneMKL) can complete in parallel with events belonging to other command lists.
    
    Define the set of globals in DPC++ code after the includse happen to avoid polluting their namespace; cleanup some unused variables
    
    Install LevelZero driver files; setup the environment to find the same LevelZero library as at compile time in PaRSECConfig.cmake
    therault committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    ab86fa3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    60ed5a1 View commit details
    Browse the repository at this point in the history