Releases: NVIDIA/warp
Releases · NVIDIA/warp
v1.0.2
v1.0.1
[1.0.1] - 2024-03-15
- Document Device
total_memory
andfree_memory
- Documentation for allocators, streams, peer access, and generics
- Changed example output directory to current working directory
- Added
python -m warp.examples.browse
for browsing the examples folder - Print where the USD stage file is being saved
- Added
examples/optim/example_walker.py
sample - Make the drone example not specific to USD
- Reduce the time taken to run some examples
- Optimise rendering points with a single colour
- Clarify an error message around needing USD
- Raise exception when module is unloaded during graph capture
- Added
wp.synchronize_event()
for blocking the host thread until a recorded event completes - Flush C print buffers when ending
stdout
capture - Remove more unneeded CUTLASS files
- Allow setting mempool release threshold as a fractional value
v1.0.0
[1.0.0] - 2024-03-07
- Add
FeatherstoneIntegrator
which provides more stable simulation of articulated rigid body dynamics in generalized coordinates (State.joint_q
andState.joint_qd
) - Introduce
warp.sim.Control
struct to store control inputs for simulations (optional, by default theModel
control inputs are used as before); integrators now have a different simulation signature:integrator.simulate(model: Model, state_in: State, state_out: State, dt: float, control: Control)
joint_act
can now behave in 3 modes: withjoint_axis_mode
set toJOINT_MODE_FORCE
it behaves as a force/torque, withJOINT_MODE_VELOCITY
it behaves as a velocity target, and withJOINT_MODE_POSITION
it behaves as a position target;joint_target
has been removed- Add adhesive contact to Euler integrators via
Model.shape_materials.ka
which controls the contact distance at which the adhesive force is applied - Improve handling of visual/collision shapes in URDF importer so visual shapes are not involved in contact dynamics
- Experimental JAX kernel callback support
- Improve module load exception message
- Add
wp.ScopedCapture
- Removing
enable_backward
warning for callables - Copy docstrings and annotations from wrapped kernels, functions, structs
v0.15.1
[0.15.1] - 2024-03-05
- Add examples assets to the wheel packages
- Fix broken image link in documentation
- Fix codegen for custom grad functions calling their respective forward functions
- Fix custom grad function handling for functions that have no outputs
- Fix issues when
wp.config.quiet = True
v0.15.0
[0.15.0] - 2024-03-04
- Add thumbnails to examples gallery
- Apply colored lighting to examples
- Moved
examples
directory underwarp/
- Add example usage to
python -m warp.tests --help
- Adding
torch.autograd.function
example + docs - Add error-checking to array shapes during creation
- Adding
example_graph_capture
- Add a Diffsim Example of a Drone
- Fix
verify_fp
causing compiler errors and support CPU kernels - Fix to enable
matmul
to be called in CUDA graph capture - Enable mempools by default
- Update
wp.launch
to support tuple args - Fix BiCGSTAB and GMRES producing NaNs when converging early
- Fix warning about backward codegen being disabled in
test_fem
- Fix
assert_np_equal
when NaN's and tolerance are involved - Improve error message to discern between CUDA being disabled or not supported
- Support cross-module functions with user-defined gradients
- Suppress superfluous CUDA error when ending capture after errors
- Make output during initialization atomic
- Add
warp.config.max_unroll
, fix custom gradient unrolling - Support native replay snippets using
@wp.func_native(snippet, replay_snippet=replay_snippet)
- Look for the CUDA Toolkit in default locations if the
CUDA_PATH
environment variable or--cuda_path
build option are not used - Added
wp.ones()
to efficiently create one-initialized arrays - Rename
wp.config.graph_capture_module_load_default
towp.config.enable_graph_capture_module_load_by_default
[0.14.0] - 2024-02-19
- Add support for CUDA pooled (stream-ordered) allocators
- Support memory allocation during graph capture
- Support copying non-contiguous CUDA arrays during graph capture
- Improved memory allocation/deallocation performance with pooled allocators
- Use
wp.config.enable_mempools_at_init
to enable pooled allocators during Warp initialization (if supported) wp.is_mempool_supported()
- check if a device supports pooled allocatorswp.is_mempool_enabled()
,wp.set_mempool_enabled()
- enable or disable pooled allocators per devicewp.set_mempool_release_threshold()
,wp.get_mempool_release_threshold()
- configure memory pool release threshold
- Add support for direct memory access between devices
- Improved peer-to-peer memory transfer performance if access is enabled
- Caveat: enabling peer access may impact memory allocation/deallocation performance and increase memory consumption
wp.is_peer_access_supported()
- check if the memory of a device can be accessed by a peer devicewp.is_peer_access_enabled()
,wp.set_peer_access_enabled()
- manage peer access for memory allocated using default CUDA allocatorswp.is_mempool_access_supported()
- check if the memory pool of a device can be accessed by a peer devicewp.is_mempool_access_enabled()
,wp.set_mempool_access_enabled()
- manage access for memory allocated using pooled CUDA allocators
- Refined stream synchronization semantics
wp.ScopedStream
can synchronize with the previous stream on entry and/or exit (only sync on entry by default)- Functions taking an optional stream argument do no implicit synchronization for max performance (e.g.,
wp.copy()
,wp.launch()
,wp.capture_launch()
)
- Support for passing a custom
deleter
argument when constructing arrays- Deprecation of
owner
argument - usedeleter
to transfer ownership
- Deprecation of
- Optimizations for various core API functions (e.g.,
wp.zeros()
,wp.full()
, and more) - Fix
wp.matmul()
to always use the correct CUDA context - Fix memory leak in BSR transpose
- Fix stream synchronization issues when copying non-contiguous arrays
[0.13.1] - 2024-02-22
- Ensure that the results from the
Noise Deform
are deterministic across different Kit sessions
v0.13.0
[0.13.0] - 2024-02-16
- Update the license to NVIDIA Software License, allowing commercial use (see
LICENSE.md
) - Add
CONTRIBUTING.md
guidelines (for NVIDIA employees) - Hash CUDA
snippet
andadj_snippet
strings to fix caching - Fix
build_docs.py
on Windows - Add missing
.py
extension towarp/tests/walkthrough_debug
- Allow
wp.bool
usage in vector and matrix types
[0.12.0] - 2024-02-05
- Add a warning when the
enable_backward
setting is set toFalse
upon callingwp.Tape.backward()
- Fix kernels not being recompiled as expected when defined using a closure
- Change the kernel cache appauthor subdirectory to just "NVIDIA"
- Ensure that gradients attached to PyTorch tensors have compatible strides when calling
wp.from_torch()
- Add a
Noise Deform
node for OmniGraph that deforms points using a perlin/curl noise
v0.11.0
[0.11.0] - 2024-01-23
- Re-release 1.0.0-beta.7 as a non-pre-release 0.11.0 version so it gets selected by
pip install warp-lang
. - Introducing a new versioning and release process, detailed in
PACKAGING.md
and resembling that of Python itself:- The 0.11 release(s) can be found on the
release-0.11
branch. - Point releases (if any) go on the same minor release branch and only contain bug fixes, not new features.
- The
public
branch, previously used to merge releases into and corresponding with the GitHubmain
branch, is retired.
- The 0.11 release(s) can be found on the
[1.0.0-beta.7] - 2024-01-23
- Ensure captures are always enclosed in
try
/finally
- Only include .py files from the warp subdirectory into wheel packages
- Fix an extension's sample node failing at parsing some version numbers
- Allow examples to run without USD when possible
- Add a setting to disable the main Warp menu in Kit
- Add iterative linear solvers, see
wp.optim.linear.cg
,wp.optim.linear.bicgstab
,wp.optim.linear.gmres
, andwp.optim.linear.LinearOperator
- Improve error messages around global variables
- Improve error messages around mat/vec assignments
- Support conversion of scalars to native/ctypes, e.g.:
float(wp.float32(1.23))
orctypes.c_float(wp.float32(1.23))
- Add a constant for infinity, see
wp.inf
- Add a FAQ entry about array assignments
- Add a mass spring cage diff simulation example, see
examples/example_diffsim_mass_spring_cage.py
- Add
-s
,--suite
option for only running tests belonging to the given suites - Fix common spelling mistakes
- Fix indentation of generated code
- Show deprecation warnings only once
- Improve
wp.render.OpenGLRenderer
- Create the extension's symlink to the core library at runtime
- Fix some built-ins failing to compile the backward pass when nested inside if/else blocks
- Update examples with the new variants of the mesh query built-ins
- Fix type members that weren't zero-initialized
- Fix missing adjoint function for
wp.mesh_query_ray()
v1.0.0-beta.6
[1.0.0-beta.6] - 2024-01-10
- Do not create CPU copy of grad array when calling
array.numpy()
- Fix
assert_np_equal()
bug - Support Linux AArch64 platforms, including Jetson/Tegra devices
- Add parallel testing runner (invoke with
python -m warp.tests
, usewarp/tests/unittest_serial.py
for serial testing) - Fix support for function calls in
range()
matmul
adjoints now accumulate- Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
- Fix multi-gpu synchronization issue in
sparse.py
- Add depth rendering to
OpenGLRenderer
, documentwarp.render
- Make
atomic_min
,atomic_max
differentiable - Fix error reporting using the exact source segment
- Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
- Address multiple differentiability issues
- Fix backpropagation for returning array element references
- Support passing the return value to adjoints
- Add point basis space and explicit point-based quadrature for
warp.fem
- Support overriding the LLVM project source directory path using
build_lib.py --build_llvm --llvm_source_path=
- Fix the error message for accessing non-existing attributes
- Flatten faces array for Mesh constructor in URDF parser
v1.0.0-beta.5
[1.0.0-beta.5] - 2023-11-22
- Fix for kernel caching when function argument types change
- Fix code-gen ordering of dependent structs
- Fix for
wp.Mesh
build on MGPU systems - Fix for name clash bug with adjoint code: #154
- Add
wp.frac()
for returning the fractional part of a floating point value - Add support for custom native CUDA snippets using
@wp.func_native
decorator - Add support for batched matmul with batch size > 2^16-1
- Add support for tranposed CUTLASS
wp.matmul()
and additional error checking - Add support for quad and hex meshes in
wp.fem
- Detect and warn when C++ runtime doesn't match compiler during build, e.g.: libstdc++.so.6: version `GLIBCXX_3.4.30' not found
- Documentation update for
wp.BVH
- Documentaiton and simplified API for runtime kernel specialization
wp.Kernel
[1.0.0-beta.4] - 2023-11-01
- Add
wp.cbrt()
for cube root calculation - Add
wp.mesh_furthest_point_no_sign()
to compute furthest point on a surface from a query point - Add support for GPU BVH builds, 10-100x faster than CPU builds for large meshes
- Add support for chained comparisons, i.e.:
0 < x < 2
- Add support for running
warp.fem
examples headless - Fix for unit test determinism
- Fix for possible GC collection of array during graph capture
- Fix for
wp.utils.array_sum()
output initialization when used with vector types - Coverage and documentation updates
[1.0.0-beta.3] - 2023-10-19
- Add support for code coverage scans (test_coverage.py), coverage at 85% in omni.warp.core
- Add support for named component access for vector types, e.g.:
a = v.x
- Add support for lvalue expressions, e.g.:
array[i] += b
- Add casting constructors for matrix and vector types
- Add support for
type()
operator that can be used to return type inside kernels - Add support for grid-stride kernels to support kernels with > 2^31-1 thread blocks
- Fix for multi-process initialization warnings
- Fix alignment issues with empty
wp.struct
- Fix for return statement warning with tuple-returning functions
- Fix for
wp.batched_matmul()
registering the wrong function in the Tape - Fix and document for
wp.sim
forward + inverse kinematics - Fix for
wp.func
to return a default value if function does not return on all control paths - Refactor
wp.fem
support for new basis functions, decoupled function spaces - Optimizations for
wp.noise
functions, up to 10x faster in most cases - Optimizations for
type_size_in_bytes()
used in array construction
[1.0.0-beta.2] - 2023-09-01
- Fix for passing bool into
wp.func
functions - Fix for deprecation warnings appearing on
stderr
, now redirected tostdout
- Fix for using
for i in wp.hash_grid_query(..)
syntax
[1.0.0-beta.1] - 2023-08-29
- Fix for
wp.float16
being passed as kernel arguments - Fix for compile errors with kernels using structs in backward pass
- Fix for
wp.Mesh.refit()
not being CUDA graph capturable due to synchronous temp. allocs - Fix for dynamic texture example flickering / MGPU crashes demo in Kit by reusing
ui.DynamicImageProvider
instances - Fix for a regression that disabled bundle change tracking in samples
- Fix for incorrect surface velocities when meshes are deforming in
OgnClothSimulate
- Fix for incorrect lower-case when setting USD stage "up_axis" in examples
- Fix for incompatible gradient types when wrapping PyTorch tensor as a vector or matrix type
- Fix for adding open edges when building cloth constraints from meshes in
wp.sim.ModelBuilder.add_cloth_mesh()
- Add support for
wp.fabricarray
to directly access Fabric data from Warp kernels, see https://omniverse.gitlab-master-pages.nvidia.com/usdrt/docs/usdrt_prim_selection.html for examples - Add support for user defined gradient functions, see
@wp.func_replay
, and@wp.func_grad
decorators - Add support for more OG attribute types in
omni.warp.from_omni_graph()
- Add support for creating NanoVDB
wp.Volume
objects from dense NumPy arrays - Add support for
wp.volume_sample_grad_f()
which returns the value + gradient efficiently from an NVDB volume - Add support for LLVM fp16 intrinsics for half-precision arithmetic
- Add implementation of stochastic gradient descent, see
wp.optim.SGD
- Add
warp.fem
framework for solving weak-form PDE problems (see https://nvidia.github.io/warp/_build/html/modules/fem.html) - Optimizations for
omni.warp
extension load time (2.2s to 625ms cold start) - Make all
omni.ui
dependencies optional so that Warp unit tests can run headless - Deprecation of
wp.tid()
outside of kernel functions, users should passtid()
values towp.func
functions explicitly - Deprecation of
wp.sim.Model.flatten()
for returning all contained tensors from the model - Add support for clamping particle max velocity in
wp.sim.Model.particle_max_velocity
- Remove dependency on
urdfpy
package, improve MJCF parser handling of default values
v0.10.1
[0.10.1] - 2023-07-25
- Fix for large multidimensional kernel launches (> 2^32 threads)
- Fix for module hashing with generics
- Fix for unrolling loops with break or continue statements (will skip unrolling)
- Fix for passing boolean arguments to build_lib.py (previously ignored)
- Fix build warnings on Linux
- Fix for creating array of structs from NumPy structured array
- Fix for regression on kernel load times in Kit when using warp.sim
- Update
warp.array.reshape()
to handle-1
dimensions - Update margin used by for mesh queries when using
wp.sim.create_soft_body_contacts()
- Improvements to gradient handling with
warp.from_torch()
,warp.to_torch()
plus documentation
[0.10.0] - 2023-07-05
- Add support for macOS universal binaries (x86 + aarch64) for M1+ support
- Add additional methods for SDF generation please see the following new methods:
wp.mesh_query_point_nosign()
- closest point query with no sign determinationwp.mesh_query_point_sign_normal()
- closest point query with sign from angle-weighted normalwp.mesh_query_point_sign_winding_number()
- closest point query with fast winding number sign determination
- Add CSR/BSR sparse matrix support, see
warp.sparse
module:wp.sparse.BsrMatrix
wp.sparse.bsr_zeros()
,wp.sparse.bsr_set_from_triplets()
for constructionwp.sparse.bsr_mm()
,wp.sparse_bsr_mv()
for matrix-matrix and matrix-vector products respectively
- Add array-wide utilities:
wp.utils.array_scan()
- prefix sum (inclusive or exlusive)wp.utils.array_sum()
- sum across arraywp.utils.radix_sort_pairs()
- in-place radix sort (key,value) pairs
- Add support for calling
@wp.func
functions from Python (outside of kernel scope) - Add support for recording kernel launches using a
wp.Launch
object that can be replayed with low overhead, usewp.launch(..., record_cmd=True)
to generate a command object - Optimizations for
wp.struct
kernel arguments, up to 20x faster launches for kernels with large structs or number of params - Refresh USD samples to use bundle based workflow + change tracking
- Add Python API for manipulating mesh and point bundle data in OmniGraph, see
omni.warp.nodes
module- See
omni.warp.nodes.mesh_create_bundle()
,omni.warp.nodes.mesh_get_points()
, etc.
- See
- Improvements to
wp.array
:- Fix a number of array methods misbehaving with empty arrays
- Fix a number of bugs and memory leaks related to gradient arrays
- Fix array construction when creating arrays in pinned memory from a data source in pageable memory
wp.empty()
no longer zeroes-out memory and returns an uninitialized array, as intendedarray.zero_()
andarray.fill_()
work with non-contiguous arrays- Support wrapping non-contiguous NumPy arrays without a copy
- Support preserving the outer dimensions of NumPy arrays when wrapping them as Warp arrays of vector or matrix types
- Improve PyTorch and DLPack interop with Warp arrays of arbitrary vectors and matrices
array.fill_()
can now take lists or other sequences when filling arrays of vectors or matrices, e.g.arr.fill_([[1, 2], [3, 4]])
array.fill_()
now works with arrays of structs (pass a struct instance)wp.copy()
gracefully handles copying between non-contiguous arrays on different devices- Add
wp.full()
andwp.full_like()
, e.g.,a = wp.full(shape, value)
- Add optional
device
argument towp.empty_like()
,wp.zeros_like()
,wp.full_like()
, andwp.clone()
- Add
indexedarray
methods.zero_()
,.fill_()
, and.assign()
- Fix
indexedarray
methods.numpy()
and.list()
- Fix
array.list()
to work with arrays of any Warp data type - Fix
array.list()
synchronization issue with CUDA arrays array.numpy()
called on an array of structs returns a structured NumPy array with named fields- Improve the performance of creating arrays
- Fix for
Error: No module named 'omni.warp.core'
when running some Kit configurations (e.g.: stubgen) - Fix for
wp.struct
instance address being included in module content hash - Fix codegen with overridden function names
- Fix for kernel hashing so it occurs after code generation and before loading to fix a bug with stale kernel cache
- Fix for
wp.BVH.refit()
when executed on the CPU - Fix adjoint of
wp.struct
constructor - Fix element accessors for
wp.float16
vectors and matrices in Python - Fix
wp.float16
members in structs - Remove deprecated
wp.ScopedCudaGuard()
, please usewp.ScopedDevice()
instead