Release v0.7.0: add options for faster diffusion inference: shared variable caching, efficient bias fusion, and TF32 acceleration. · bytedance/Protenix

What's Changed

We’re excited to announce the open-source release of Protenix v0.7.0, supported by @yangyanpinghpc, featuring several performance optimizations for diffusion inference. This version introduces three new optional acceleration flags (enabled by default in inference stage) and improved support for batched inference:

--enable_cache
Precomputes and caches shared intermediate variables (pair_z, p_lm, c_l) across the N_sample and N_step dimensions.
--enable_fusion
Fuses bias transformations and normalization in the 24-layer diffusion transformer blocks at compile time.
--enable_tf32
Enables TF32 precision for matrix multiplications when using FP32 computation, trading slight numerical accuracy for speed.
Batched Diffusion Support (N_sample > 1)
Shares s_trunk and z_pair across the N_sample dimension during diffusion, reducing memory and compute overhead without affecting results.

You can run it using the following example command:
(Note: if not specified, --enable_cache, --enable_fusion, and --enable_tf32 default to true.)

protenix predict -i examples/example.json -o  ./test_outputs/cmd/output_mini -s 105,106 -n "protenix_mini_default_v0.5.0" --triatt_kernel "torch" --trimul_kernel "torch" --enable_cache true --enable_fusion true --enable_tf32 true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0: add options for faster diffusion inference: shared variable caching, efficient bias fusion, and TF32 acceleration.

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!