# Warp Core Tutorial: Basics

Warp is a Python framework for writing high-performance code. Warp takes regular Python functions and JIT compiles them to efficient kernel code that can run on the CPU or GPU.

This notebook showcases the essential features and capabilities that form the foundation of programming with Warp.

A more in-depth reference of the API can be found in the [official documentation](https://nvidia.github.io/warp/).

Prerequisites:

- Basic Python knowledge.
- Understanding of NumPy arrays.

In [3]:
import numpy as np
import warp as wp

wp.config.quiet = True

# Explicitly initializing Warp is not necessary but
# we do it here to ensure everything is good to go.
wp.init()

if not wp.get_cuda_device_count():
    print(
        "Some snippets in this notebook assume the presence of "
        "a CUDA-compatible device and won't run correctly without one."
    )

## Data Types

Warp offers a range of data types that covers the needs in common compute workflows.

### Boolean

The types `wp.bool` and `bool`, which are interchangeable, can be used to represent `True`/`False` values.

### Scalars

Signed/unsigned integer and floating-point numbers with different widths are supported.

<table>
    <tr>
        <th></th>
        <th>Integer</th>
        <th>Floating-Point</th>
    </tr>
    <tr>
        <td>8-bit</td>
        <td>wp.[u]int8</td>
        <td></td>
    </tr>
    <tr>
        <td>16-bit</td>
        <td>wp.[u]int16</td>
        <td>wp.float16</td>
    </tr>
    <tr>
        <td>32-bit</td>
        <td>wp.[u]int32</td>
        <td>wp.float32</td>
    </tr>
    <tr>
        <td>64-bit</td>
        <td>wp.[u]int64</td>
        <td>wp.float64</td>
    </tr>
</table>

Python's `int` and `float` types can also be used in place of `wp.int32` and `wp.float32`.

Note that typing in Warp is strict, and no integer promotion is done under the hood, so types need to be explicitly matched for operations to succeed.

In [4]:
# Operation between 32-bit integers.
print("\nx:")
x = 123 + 234
print(x)

# Operation between 32-bit floating-points.
print("\ny:")
y = 1.2 + 2.3
print(y)

# Operation between 8-bit integers.
print("\nz:")
z = wp.int8(1) + wp.int8(2)
print(z)

# Invalid operation, both integer types must match.
print("\nw:")
try:
    w = wp.int8(1) + wp.int16(2)
    print(w)
except Exception:
    print("invalid operation")


x:
357

y:
3.5

z:
3

w:
invalid operation


### Linear Algebra

Vector, matrix, and quaternion types are also provided with the most common combination of scalar types and sizes being predefined.

<table>
    <tr>
        <th></th>
        <th colspan="4">Integer</th>
        <th colspan="3">Floating-Point</th>
    </tr>
    <tr>
        <th></th>
        <th>8-bit</th>
        <th>16-bit</th>
        <th>32-bit</th>
        <th>64-bit</th>
        <th>16-bit</th>
        <th>32-bit</th>
        <th>64-bit</th>
    </tr>
    <tr>
        <td>2D Vector</td>
        <td>wp.vec2</td>
        <td>wp.vec2</td>
        <td>wp.vec2</td>
        <td>wp.vec2</td>
        <td>wp.vec2h</td>
        <td>wp.vec2f</td>
        <td>wp.vec2d</td>
    </tr>
    <tr>
        <td>3D Vector</td>
        <td>wp.vec3</td>
        <td>wp.vec3</td>
        <td>wp.vec3</td>
        <td>wp.vec3</td>
        <td>wp.vec3h</td>
        <td>wp.vec3f</td>
        <td>wp.vec3d</td>
    </tr>
    <tr>
        <td>4D Vector</td>
        <td>wp.vec4</td>
        <td>wp.vec4</td>
        <td>wp.vec4</td>
        <td>wp.vec4</td>
        <td>wp.vec4h</td>
        <td>wp.vec4f</td>
        <td>wp.vec4d</td>
    </tr>
    <tr>
        <td>2x2 Matrix</td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td>wp.mat22h</td>
        <td>wp.mat22f</td>
        <td>wp.mat22d</td>
    </tr>
    <tr>
        <td>3x3 Matrix</td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td>wp.mat33h</td>
        <td>wp.mat33f</td>
        <td>wp.mat33d</td>
    </tr>
    <tr>
        <td>4x4 Matrix</td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td>wp.mat44h</td>
        <td>wp.mat44f</td>
        <td>wp.mat44d</td>
    </tr>
    <tr>
        <td>Quaternion</td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td>wp.quath</td>
        <td>wp.quatf</td>
        <td>wp.quatd</td>
    </tr>
    <tr>
        <td>Transformation</td>
        <td></td>
        <td></td>
        <td></td>
        <td></td>
        <td>wp.transformh</td>
        <td>wp.transformf</td>
        <td>wp.transformd</td>
    </tr>
</table>

The transformation types, as defined by Warp, define a translation part `pos` and a rotation `rot`, and is primarily intended to be used in the context of rigid bodies.

A few aliases defaulting to 32-bit floating-points are also available as `wp.vec2`, `wp.vec3`, `wp.vec4`, `wp.mat22`, `wp.mat33`, `wp.mat44`, `wp.quat`, and `wp.transform`.

In [5]:
# Rotate and scale a position vector.
print("\nnew_pos:")
pos = wp.vec3(1.0, 2.0, 3.0)
rot = wp.mat33(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
scale = 0.5
new_pos = (pos * rot) * scale
print(new_pos)


new_pos:
[15.0, 18.0, 21.0]


### Custom Linear Algebra Types

It is possible to create linear algebra types of other sizes using the functions `wp.vec(length, dtype)` and `wp.mat(shape, dtype)`.

In [6]:
# Create a 5D vector of 32-bit floating-points.
print("\nv:")
vec5f = wp.vec(length=5, dtype=wp.float32)
v = vec5f(1.0, 2.0, 3.0, 4.0, 5.0)
print(v)

# Create a 2x3 matrix of 32-bit floating-points.
print("\nm:")
mat23f = wp.mat(shape=(2, 3), dtype=wp.float32)
m = mat23f(1.0, 2.0, 3.0, 4.0, 5.0, 6.0)
print(m)


v:
[1.0, 2.0, 3.0, 4.0, 5.0]

m:
[[1.0, 2.0, 3.0],
 [4.0, 5.0, 6.0]]


## Arrays

Arrays are multidimensional containers of fixed size that can store homogeneous elements of any Warp data type either on CPU or GPU memory.

They are designed to seamlessly interop with arrays from other frameworks, such as [NumPy](https://numpy.org), [PyTorch](https://pytorch.org), [JAX](https://jax.readthedocs.io), and others.

A gotcha due to supporting both CPU and GPU data within a unified interface, is that accessing individual elements directly from Python's runtime isn't exposed since this would otherwise encourage suboptimal performance, as explained in this [FAQ entry](https://nvidia.github.io/warp/faq.html#why-aren-t-assignments-to-warp-arrays-supported-outside-of-kernels).

Arrays can be initialized from multidimensional sequences of scalar data.

In [7]:
# Create a 1D array of integers.
print("\narr_int:")
arr_int = wp.array((1, 2, 3), dtype=int)
print(f"dtype={arr_int.dtype}, shape={arr_int.shape}")
print(arr_int)

# Create a 1D array of vectors.
print("\narr_vec:")
arr_vec = wp.array(((1, 2, 3), (4, 5, 6)), dtype=wp.vec3)
print(f"dtype={arr_vec.dtype}, shape={arr_vec.shape}")
print(arr_vec)

# Create a 2D array of floating-points.
print("\narr_2d:")
arr_2d = wp.array(((1, 2, 3), (4, 5, 6)), dtype=float)
print(f"dtype={arr_2d.dtype}, shape={arr_2d.shape}")
print(arr_2d)


arr_int:
dtype=<class 'warp.types.int32'>, shape=(3,)
[1 2 3]

arr_vec:
dtype=<class 'warp.types.vec3f'>, shape=(2,)
[[1. 2. 3.]
 [4. 5. 6.]]

arr_2d:
dtype=<class 'warp.types.float32'>, shape=(2, 3)
[[1. 2. 3.]
 [4. 5. 6.]]


A few utilities allow to initialize arrays with a given value, or to skip initialization altogether.

In [8]:
# Create an array filled with zeros.
print("\narr_zeros:")
arr_zeros = wp.zeros(3)
print(f"dtype={arr_zeros.dtype}, shape={arr_zeros.shape}")
print(arr_zeros)

# Create an array filled with ones.
print("\narr_ones:")
arr_ones = wp.ones(3)
print(f"dtype={arr_ones.dtype}, shape={arr_ones.shape}")
print(arr_ones)

# Create an uninitialized array.
print("\narr_empty:")
arr_empty = wp.empty(3)
print(f"dtype={arr_empty.dtype}, shape={arr_empty.shape}")
print(arr_empty)

# Create an array filled with a custom value.
print("\narr_custom:")
arr_custom = wp.full(3, 123)
print(f"dtype={arr_custom.dtype}, shape={arr_custom.shape}")
print(arr_custom)


arr_zeros:
dtype=<class 'warp.types.float32'>, shape=(3,)
[0. 0. 0.]

arr_ones:
dtype=<class 'warp.types.float32'>, shape=(3,)
[1. 1. 1.]

arr_empty:
dtype=<class 'warp.types.float32'>, shape=(3,)
[0. 0. 0.]

arr_custom:
dtype=<class 'warp.types.int32'>, shape=(3,)
[123 123 123]


Initializing arrays from NumPy objects, or other frameworks like Torch, is also supported.

In [9]:
# Initialize an array from NumPy.
print("\narr_from_np:")
rng = np.random.default_rng(seed=123)
arr_np = rng.standard_normal((4, 2)).astype(np.float16)
arr_from_np = wp.from_numpy(arr_np)
print(f"dtype={arr_from_np.dtype}, {arr_from_np.shape}")
print(arr_from_np)


arr_from_np:
dtype=<class 'warp.types.vector.<locals>.vec_t'>, (4,)
[[-0.9893 -0.3677]
 [ 1.288   0.194 ]
 [ 0.9204  0.577 ]
 [-0.636   0.542 ]]


## Structs

When composite types are desired, it's possible to define Python classes decorated with `@wp.struct`, where each field is a class member that must be annotated with a Warp data type.

Structs, like every other data types, are supported by arrays.

In [10]:
# Create a new data type made of 2 fields.
@wp.struct
class Obstacle:
    pos: wp.vec3
    radius: float


# Create a first instance.
print("\no1:")
o1 = Obstacle()
o1.pos = wp.vec3(1.0, 2.0, 3.0)
o1.radius = 0.75
print(o1)

# Create a second instance.
print("\no2:")
o2 = Obstacle()
o2.pos = wp.vec3(2.0, 3.0, 4.0)
o2.radius = 0.5
print(o2)

# Create an array with these instances.
print("\narr_struct:")
arr_struct = wp.array((o1, o2), dtype=Obstacle)
print(f"dtype={arr_struct.dtype}, shape={arr_struct.shape}")
print(arr_struct)


o1:
Obstacle(
	pos=[1.0, 2.0, 3.0],
	radius=0.75,
)

o2:
Obstacle(
	pos=[2.0, 3.0, 4.0],
	radius=0.5,
)

arr_struct:
dtype=<warp.codegen.Struct object at 0x7fadac82bf70>, shape=(2,)
[([1., 2., 3.], 0.75) ([2., 3., 4.], 0.5 )]


## Kernels

In a typical Warp program, Python's runtime is used to allocate data and orchestrate operations, whereas the computationally intensive tasks are expected to be implemented as kernels.

These kernels are functions decorated with `@wp.kernel`, however one notable difference with the usual Python functions is that they don't return values—all inputs and outputs must be defined as parameters with typed annotations, and all output parameters must be arrays.

Passing data to these kernels and evaluating them on the desired device (CPU or GPU) is done with the `wp.launch()` function.

Additionally, `wp.launch()` expects a `dim` argument that allows executing the same kernel many times in parallel, using threads, which is how the massively parallel architecture of modern GPUs and its associated performance boost can be leveraged.

The `dim` argument expects either a single integer or a tuple with up to 4 values for multidimensional launches. To know which thread ID is currently being evaluated, we can call `wp.tid()` from within the kernel, which accordingly returns either a single value, or multiple ones.

In [11]:
# Define a kernel that performs a component-wise average of two arrays.
@wp.kernel
def avg_kernel(
    a: wp.array(dtype=float),
    b: wp.array(dtype=float),
    out_avg: wp.array(dtype=float),
):
    i = wp.tid()
    out_avg[i] = (a[i] + b[i]) * 0.5


# Initialize the arrays to operate on and the output one storing their average.
shape = (32,)
rng = np.random.default_rng(seed=123)
a = wp.array(rng.standard_normal(shape).astype(np.float32))
b = wp.array(rng.standard_normal(shape).astype(np.float32))
out_avg = wp.empty_like(a)

# Launch the kernel.
print("\navg:")
wp.launch(avg_kernel, dim=shape, inputs=(a, b), outputs=(out_avg,))
print(out_avg)


avg:
[ 0.11855397 -1.2699152   0.45888895  0.17917725  0.890056    1.1693826
  0.17843008  0.12521538  0.20576604 -0.79199475  0.76355296 -0.84120286
  0.25920346 -0.65507483  0.46945405 -0.1282319   1.9109714  -0.68907523
 -0.13959356  0.1829095  -1.0895995   0.44163364  0.53003377  0.2716996
 -0.05369544 -0.8170762   0.7491045   1.0292035  -0.07523166 -0.27675036
 -0.33912486 -0.845436  ]


## Devices

We mentioned earlier that arrays can live either on CPU or GPU memory and, similarly, that kernels can be evaluated on either device, but we didn't mention how to specify that.

Arrays as well as many other functions from the API come with a `device` parameter that can either be left to the default value of `None`. or to a value representing the target device. When set to `None`, the default device currently set is used, otherwise `"cpu"`, and `"cuda"` can be set to pick either CPU or GPU memory. In the case of configurations with multiple GPUs, it's also possible to specify the device index, such as `"cuda:0"`, `"cuda:1"`, and so on.

In [12]:
# Define a kernel that fills an array with range values.
@wp.kernel
def range_fill_kernel(
    out: wp.array(dtype=int),
):
    i = wp.tid()
    out[i] = i


# Retrieve the current default device.
print("\ncurrent_device:")
current_device = wp.get_device()
print(current_device)

# Fill an array on the current default device.
print("\narr:")
arr = wp.zeros(3, dtype=int)
wp.launch(range_fill_kernel, dim=arr.shape, outputs=(arr,))
print(f"device={arr.device}")
print(arr)

# Fill an array on a specified device.
print(f"\narr_explicit:")
device = "cpu"
arr_explicit = wp.zeros(3, dtype=int, device=device)
wp.launch(range_fill_kernel, dim=arr_explicit.shape, outputs=(arr_explicit,), device=device)
print(f"device={arr_explicit.device}")
print(arr_explicit)


current_device:
cuda:0

arr:
device=cuda:0
[0 1 2]

arr_explicit:
device=cpu
[0 1 2]


In applications where all compute is intended to be run on a same device, it is recommended to not pass any `device` argument for individual API calls and, instead, wrap all code within a `wp.ScopedDevice()` context that sets the default device for all the API calls within that scope.

In [13]:
# Define a kernel that fills an array with a fibonacci sequence.
@wp.kernel
def fibonacci_fill_kernel(
    out: wp.array(dtype=int),
):
    i = wp.tid()
    sqrt_5 = wp.sqrt(5.0)
    p = (1.0 + sqrt_5) / 2.0
    q = 1.0 / p
    out[i] = int((p ** float(i) + q ** float(i)) / sqrt_5 + 0.5)


# Ensure that all nested code is set to operate on a specified device.
device = "cuda"
with wp.ScopedDevice(device):
    print(f"\narr_scoped:")
    arr_scoped = wp.zeros(8, dtype=int)
    wp.launch(fibonacci_fill_kernel, dim=arr_scoped.shape, outputs=(arr_scoped,))
    print(f"device={arr_scoped.device}")
    print(arr_scoped)


arr_scoped:
device=cuda:0
[ 1  1  1  2  3  5  8 13]


Transferring data between CPU and GPU memory is made easy across the API. For example, initializing an array on the GPU from an array on the CPU is handled seamlessly. More explicit functions are also exposed, such as `wp.copy()`, `wp.clone()`, or `wp.array.numpy()`.

In [14]:
# Clone a CPU array onto GPU memory.
print("\narr_clone_gpu:")
arr_clone_cpu = wp.array((1, 2, 3), dtype=int, device="cpu")
arr_clone_gpu = wp.clone(arr_clone_cpu, device="cuda")
print(f"device={arr_clone_gpu.device}")
print(arr_clone_gpu)


arr_clone_gpu:
device=cuda:0
[1 2 3]


## Built-Ins

Similarly to Python's built-in functions, Warp ships with a set of functions that aims to cover the most common operations in areas such as scalar math (e.g.: `wp.min()`, `wp.abs()`, ...), vector math (e.g.: `wp.dot()`, `wp.length()`, ...), quaternion math (e.g.: `wp.quat_from_axis_angle()`, `wp.quat_rotate()`, ...), random numbers (e.g.: `wp.noise()`, `wp.sample_unit_sphere()`, ...), and others.

Some math functions like `math.cos()` and `math.sin()` are available as part of Python's standard `math` module, however only their Warp counterpart such as `wp.cos()` and `wp.sin()` can be used within Warp kernels.

All of these built-ins are available from kernels but, where possible, they can also be called directly from Python's runtime.

The full list of built-ins is available in the documentation: https://nvidia.github.io/warp/modules/functions.html

In [15]:
# Define a kernel that computes the sine of each element from an array.
@wp.kernel
def sine_kernel(
    values: wp.array(dtype=float),
    out_sine: wp.array(dtype=float),
):
    i = wp.tid()
    out_sine[i] = wp.sin(values[i])


# Launch the sine kernel, once for each element.
print("\nsine (kernel):")
values = wp.array((1.0, 2.0, 3.0), dtype=float)
out_sine = wp.empty_like(values)
wp.launch(sine_kernel, dim=out_sine.shape, inputs=(values,), outputs=(out_sine,))
print(out_sine)

# Try the same `wp.sin()` built-in from Python.
print("\nsine (runtime):")
x = wp.sin(1.0)
y = wp.sin(2.0)
z = wp.sin(3.0)
print(x, y, z)


sine (kernel):
[0.841471   0.90929747 0.14112   ]

sine (runtime):
0.8414709568023682 0.9092974066734314 0.14112000167369843


### Random Numbers

Random numbers is made available from within Warp kernels using the `wp.rand_init()` built-in to initialize the state of the generator, followed by any of `wp.randf()`, `wp.randi()`, or `wp.randn()` calls.

In [16]:
# Define a kernel that generates random numbers.
@wp.kernel
def rand_kernel(
    seed: int,
    out_rand: wp.array(dtype=float),
):
    i = wp.tid()
    rng = wp.rand_init(seed, i)
    out_rand[i] = wp.randf(rng)


# Launch the rand kernel.
print("\nrand:")
out_rand = wp.empty(3, dtype=float)
wp.launch(rand_kernel, dim=out_rand.shape, inputs=(123,), outputs=(out_rand,))
print(out_rand)


rand:
[0.1415146 0.9632247 0.6449367]


Geometric sampling is available through built-ins like `wp.sample_triangle()`, `wp.sample_unit_disk()`, `wp.sample_unit_sphere()`, `wp.sample_unit_cube()`, and others.

In [17]:
# Define a kernel that samples random points within a unit hemisphere.
@wp.kernel
def sample_unit_hemisphere_kernel(
    seed: int,
    out_sample: wp.array(dtype=wp.vec3),
):
    i = wp.tid()
    rng = wp.rand_init(seed, i)
    out_sample[i] = wp.sample_unit_hemisphere(rng)


# Launch the rand kernel.
print("\nsample:")
out_sample = wp.empty(3, dtype=wp.vec3)
wp.launch(sample_unit_hemisphere_kernel, dim=out_sample.shape, inputs=(123,), outputs=(out_sample,))
print(out_sample)


sample:
[[-0.13402136  0.46108842  0.80380124]
 [-0.26024503 -0.7997964   0.03095155]
 [-0.6081554   0.10367007  0.23431608]]


Finally, Perlin-based noise functions are exposed using `wp.noise()`, `wp.pnoise()`, and `wp.curlnoise()`.

In [18]:
# Define a kernel that outputs a curl noise for a 2D value.
@wp.kernel
def noise_kernel(
    seed: int,
    out_noise: wp.array(dtype=wp.vec2),
):
    i = wp.tid()
    rng = wp.rand_init(seed, i)
    xy = wp.vec2(float(123 + i * 2), float(234 + i * 3))
    out_noise[i] = wp.curlnoise(rng, xy)


# Launch the noise kernel.
print("\nnoise:")
out_noise = wp.empty(3, dtype=wp.vec2)
wp.launch(noise_kernel, dim=out_noise.shape, inputs=(12,), outputs=(out_noise,))
print(out_noise)


noise:
[[-0.6399567  -0.768411  ]
 [ 0.64504147 -0.7641476 ]
 [ 0.99747217 -0.07105775]]


## User Functions

For a function to be available in kernels, it needs to be decorated with `@wp.func`. However, unlike with kernels, these functions cannot be passed to `wp.launch()` directly, instead they are meant to be called either by a kernel or by another Warp function.

In [19]:
# Define a function that computes the component-wise product of a 2D vector.
# Providing the return type hint is optional
@wp.func
def product(
    v: wp.vec2,
) -> float:
    return v[0] * v[1]


# Define a kernel that computes the component-wise product of 2 vectors.
@wp.kernel
def product_kernel(
    v1: wp.vec2,
    v2: wp.vec2,
    out_product: wp.array(dtype=float),
):
    out_product[0] = product(v1) * product(v2)


# Launch the product kernel once.
print("\nproduct:")
v1 = wp.vec2(2.0, 4.0)
v2 = wp.vec2(3.0, 5.0)
out_product = wp.empty(1, dtype=float)
wp.launch(product_kernel, dim=1, inputs=(v1, v2), outputs=(out_product,))
print(out_product)


product:
[120.]
