Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap analysis for Memory Management #151

Closed
8 tasks
PokhodenkoSA opened this issue Jan 14, 2021 · 5 comments · Fixed by #201
Closed
8 tasks

Gap analysis for Memory Management #151

PokhodenkoSA opened this issue Jan 14, 2021 · 5 comments · Fixed by #201
Assignees

Comments

@PokhodenkoSA
Copy link
Contributor

PokhodenkoSA commented Jan 14, 2021

Study the Memory Management section and document with examples equivalent features in numba-dppy. Identify missing features, e.g. device arrays.

  • Data transfer
CUDA DPPY
numba.cuda.device_array -
numba.cuda.device_array_like -
numba.cuda.to_device -
numba.cuda.as_cuda_array (Create a DeviceNDArray from any object that implements the cuda array interface.) -
numba.cuda.is_cuda_array -
  • Device arrays in CUDA
CUDA DPPY
numba.cuda.cudadrv.devicearray.DeviceNDArray -
copy_to_host -
is_c_contiguous -
is_f_contiguous -
ravel -
reshape -
  • Pinned memory in CUDA / no explicit mechanism to request pinned memory in SYCL

There are generally two ways which host memory can be allocated: * When not using the cl::sycl::property::buffer::use_host_pointer property, the SYCL runtime will allocate host memory when required. This uses the implementation-specific mechanism, which can attempt to request pinned memory.
If the cl::sycl::property::buffer::use_host_pointer property is used, then the SYCL runtime will not allocate host memory and will use the pointer provided when the buffer is constructed. In this case, it is the users responsibility to ensure any requirements for memory allocation to allow pinned memory are satisfied.
Users can manually allocate pinned memory on the host, and hand it over to the SYCL implementation. This will often involve allocating host memory with a suitable alignment and multiple, and sometimes can be managed manually using OS specific operations such as mmap and munmap.

CUDA DPPY
numba.cuda.pinned -
numba.cuda.pinned_array -
  • Streams in CUDA / Queue in SYCL

In a similar fashion to CUDA streams, SYCL queues submit command groups for execution asynchronously. However, SYCL is a higher-level programming model, and data transfer operations are implicitly deduced from the dependencies of the kernels submitted to any queue. Furthermore, SYCL queues can map to multiple OpenCL queues, enabling transparent overlapping of data-transfer and kernel execution. The SYCL runtime handles the execution order of the different command groups (kernel + dependencies) automatically across multiple queues in different devices.

CUDA DPPY
numba.cuda.stream queue
numba.cuda.default_stream -
numba.cuda.legacy_default_stream -
numba.cuda.per_thread_default_stream -
numba.cuda.external_stream -
numba.cuda.cudadrv.driver.Stream -
auto_synchronize -
synchronize event
  • Per-block Shared memory and thread synchronization in CUDA / Local memory in SYCL
CUDA DPPY
numba.cuda.shared.array dppy.local.static_alloc
numba.cuda.syncthreads dppy.barrier
  • Per-thread Local memory / Private memory in SYCL
CUDA DPPY
numba.cuda.local.array -
  • Constant memory in CUDA / Constant memory in SYCL
CUDA DPPY
numba.cuda.const.array_like -
  • Deallocation Behavior

https://numba.pydata.org/numba-doc/dev/cuda/external-memory.html#cuda-emm-plugin

CUDA DPPY
numba.cuda.defer_cleanup -
@1e-to
Copy link
Contributor

1e-to commented Jan 15, 2021

SYCL
image

CUDA
image

@PokhodenkoSA
Copy link
Contributor Author

@1e-to
Copy link
Contributor

1e-to commented Jan 19, 2021

Data transfer
#162
Device arrays
#163
Memory
#164
#165
#166

@1e-to
Copy link
Contributor

1e-to commented Jan 19, 2021

Streams in CUDA / Queue in SYCL
https://github.com/IntelPython/numba-dppy/issues/167

@1e-to
Copy link
Contributor

1e-to commented Jan 22, 2021

PR with docs and examples
#201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants