In [2]:
import numpy as np
import numba.cuda as cuda

# 1. Data transfer

Even though Numba can automatically transfer NumPy arrays to the device, it can only do so conservatively by always transferring device memory back to the host when a kernel finishes. To avoid the unnecessary transfer for read-only arrays, you can use the following APIs to manually control the transfer:

**A sample code**

```
import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()
```

-----

```
# Allocate an empty device ndarray. Similar to numpy.empty()
numba.cuda.device_array(shape, dtype=np.float, strides=None, order='C', stream=0)

# Call cuda.devicearray() with information from the array.
numba.cuda.device_array_like(ary, stream=0)

# Allocate and transfer a numpy ndarray or structured scalar to the device.
numba.cuda.to_device(obj, stream=0, copy=True, to=None)
```

**To copy host->device a numpy array:**

In [3]:
ary = np.arange(10)
d_ary = cuda.to_device(ary)

In [4]:
d_ary

<numba.cuda.cudadrv.devicearray.DeviceNDArray at 0x7fed211b4320>

**To enqueue the transfer to a stream:**

In [5]:
stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)

In [6]:
d_ary

<numba.cuda.cudadrv.devicearray.DeviceNDArray at 0x7fed211b4b00>

**To copy device->host:**

In [7]:
hary = d_ary.copy_to_host()

In [8]:
print(hary)

[0 1 2 3 4 5 6 7 8 9]


**To copy device->host to an existing array:**

In [11]:
ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
print(ary)

[0 1 2 3 4 5 6 7 8 9]


**To enqueue the transfer to a stream:**

In [13]:
hary = d_ary.copy_to_host(stream=stream)
print(hary)

[0 1 2 3 4 5 6 7 8 9]


## Device arrays

# 2. Pinned memory

# 3. Streams

# 4. Shared memory and thread synchronization

# 5. Local memory

# 6. Constant memory

-----

# 7. SmartArrays

# 8. Deallocation Behavior