# Memory management

In [1]:
import numpy as np
import math
from numba import cuda

## Data transfer

Even though Numba can automatically transfer NumPy arrays to the device, it can only do so conservatively by always transferring device memory back to the host when a kernel finishes. To avoid the unnecessary transfer for read-only arrays, you can use the following APIs to manually control the transfer:

<style>
p.indent {margin-left: 1em}
</style>

> `numba.cuda.device_array(shape, dtype=np.float64, strides=None, order='C', stream=0)`

<p class="indent">
Allocate an empty device ndarray. Similar to numpy.empty().
</p>

> `numba.cuda.device_array_like(ary, stream=0)`

<p class="indent">
Call device_array() with information from the array.
</p>

> `numba.cuda.to_device(obj, stream=0, copy=True, to=None)`

<p class="indent">
Allocate and transfer a numpy ndarray or structured scalar to the device.
</p>

To copy host->device a numpy array:

In [5]:
ary = np.arange(10)
d_ary = cuda.to_device(ary)

To enqueue the transfer to a stream:

In [6]:
stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)

The resulting `d_ary` is a `DeviceNDArray`

To copy device->host:

In [7]:
hary = d_ary.copy_to_host()

To copy device->host to an existing array:

In [8]:
ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

To enqueue the transfer to a stream:

In [9]:
hary = d_ary.copy_to_host(stream=stream)

In addition to the device arrays, Numba can consume any object that implements cuda array interface. These objects also can be manually converted into a Numba device array by creating a view of the GPU buffer using the following APIs:

<style>
p.indent {margin-left: 1em}
</style>

> `numba.cuda.as_cuda_array(obj, sync=True)`

<p class="indent">
Create a DeviceNDArray from any object that implements the cuda array interface.
</p>

<p class="indent">
A view of the underlying GPU buffer is created. No copying of the data is done. The resulting DeviceNDArray will acquire a reference from obj.
</p>

<p class="indent">
If `sync` is `True`, then the imported stream (if present) will be synchronized.
</p>

> `numba.cuda.is_cuda_array(obj)`

<p class="indent">
Test if the object has defined the *__cuda_array_interface__* attribute.
</p>

Does not verify the validity of the interface.

### Device arrays

Device array references have the following methods. These methods are to be called in host code, not within CUDA-jitted functions.

<style>
p.indent {margin-left: 1em}
p.indent2x {margin-left: 2em}
</style>

> `classnumba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, gpu_data=None)`

<p class="indent">
An on-GPU array type
</p>

> > `copy_to_host(ary=None, stream=0)`

<p class="indent2x">
Copy `self` to `ary` or create a new Numpy ndarray if `ary` is `None`.
</p>

<p class="indent2x">
If a CUDA `stream` is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.
</p>

<p class="indent2x">
Always returns the host array.
</p>

Example:

In [14]:
import numpy as np
from numba import cuda

@cuda.jit
def my_kernel(io_array):
    pos = cuda.grid(1)
    if pos < io_array.size:
        io_array[pos] *= 2

arr = np.arange(10)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

d_arr.copy_to_host()



array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])


<style>
p.indent {margin-left: 1em}
p.indent2x {margin-left: 2em}
</style>

> > `is_c_contiguous()`

<p class="indent2x">
Returns `True` if the array is C-contiguous.
</p>

> > `is_f_contiguous()`

<p class="indent2x">
Returns `True` if the array is Fortran-contiguous.
</p>

> > `ravel(order='C',stream=0)`

<p class="indent2x">
Flattens a contiguous array without changing its contents, similar to `numpy.ndarray.ravel()`.  If the array is not contiguous, raises an exception.
</p>

> > `reshape(*newshape,**kws)`

<p class="indent2x">
Reshape the array without changing its contents, similar to `numpy.ndarray.reshape()`.
</p>

Example:

In [16]:
d_arr = d_arr.reshape(2, 5, order='F')
d_arr.copy_to_host()

array([[ 0,  4,  8, 12, 16],
       [ 2,  6, 10, 14, 18]])

> Note<br>
> `DeviceNDArray` defines the cuda array interface.

## Pinned memory


<style>
p.indent {margin-left: 1em}
</style>

> `numba.cuda.pinned(*arylist)`

<p class="indent">
A context manager for temporary pinning a sequence of host ndarrays.
</p>

> `numba.cuda.pinned_array(shape, dtype=np.float64, strides=None, order='C')`

<p class="indent">
Allocate an ndarray with a buffer that is pinned (pagelocked). Similar to `np.empty()`.
</p>

> `numba.cuda.pinned_array_like(ary)`

<p class="indent">
Call `pinned_array()` with the information from the array.
</p>


## Mapped memory


<style>
p.indent {margin-left: 1em}
</style>

> `numba.cuda.mapped(*arylist, **kws)`

<p class="indent">
A context manager for temporarily mapping a sequence of host ndarrays.
</p>

> `numba.cuda.mapped_array(shape, dtype=np.float64, strides=None, order='C', stream=0, portable=False, wc=False)`

<p class="indent">
Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to np.empty()

<p class="indent">
Parameters
</p>
<p class="indent">
portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.
</p>

<p class="indent">
wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.
</p>

> `numba.cuda.mapped_array_like(ary, stream=0, portable=False, wc=False)`

<p class="indent">
Call `mapped_array()` with the information from the array.
</p>
