Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585

tenpercent · 2026-01-16T03:17:37Z

Summary

Use compiler builtin __make_integer_seq for sequence_gen and uniform_sequence_gen
Reduces template instantiation depth from O(N) to O(1)

Motivation

The recursive sequence_gen_impl creates deep template chains for large sequences, increasing compile time and memory usage.

Test Plan

Waiting for full CI

PR Stack

#	PR	Description
1	#3585	sequence_gen with `__make_integer_seq`
2	#3588	generate_identity_sequences helper
3	#3589	Named functors in transform_tensor_descriptor
4	#3590	container_concat optimization
5	#3596	O(1) pack expansion rewrites
6	#3600	TensorDescriptor/TensorAdaptor lambda elimination

shumway · 2026-01-16T17:08:59Z

Do you want to add unit tests for this, or just rely on the tests of all the code that depends on this? If it's easy to add unit tests, that's generally better, but I'm also fine with moving fast to cut down compilation times.

shumway · 2026-01-16T17:14:51Z

include/ck/utility/sequence.hpp

-// generate sequence
-template <index_t NSize, typename F>
-struct sequence_gen
+// Four sequences: direct concatenation


I like these specializations. It will be interesting to get a survey of the code to see how often the specializations are used and if these four smallest cases are the most impactful ones.

I'm using the build traces to drive the optimizations. Maybe removing the unused code is one other aspect which could help with parsing times

tenpercent · 2026-01-16T19:17:39Z

Do you want to add unit tests for this, or just rely on the tests of all the code that depends on this? If it's easy to add unit tests, that's generally better, but I'm also fine with moving fast to cut down compilation times.

let's move fast, in case something important breaks CI will catch it, tests also need maintenance and let's see how much of the metaprogramming is left after the initial sprint

Replace recursive template instantiation with compiler intrinsic __make_integer_seq and pack expansion for O(1) instantiation depth. Before: Maximum nesting depth of 90 levels with recursive divide-and-conquer After: Maximum nesting depth of 26 levels using flat pack expansion Performance improvements measured on example_grouped_conv_fwd_xdl_fp16: - Template instantiation wall-clock time: 36.8s -> 18.7s (49% faster) - Template instantiation cumulative time: 56.6s -> 25.8s (54% faster) - Maximum nesting depth: 90 -> 26 (71% reduction) The key changes: - sequence_gen: Uses __make_integer_seq to generate indices 0..N-1, then applies functor F via pack expansion in a single step - uniform_sequence_gen: Uses __make_integer_seq with pack expansion to generate N copies of a constant value Co-Authored-By: Claude <noreply@anthropic.com>

Replace linear recursive instantiation with direct pack expansion for 1-4 sequences, and binary tree reduction for larger cases. Before: O(N) depth for merging N sequences After: O(log N) depth with O(1) for up to 4 sequences This further reduces maximum nesting depth from 26 to 22 levels when combined with the previous sequence_gen optimization. Co-Authored-By: Claude <noreply@anthropic.com>

tenpercent force-pushed the tenpercent/old-ck-pack-rewrites branch from a477221 to 57c8cb1 Compare January 16, 2026 03:34

shumway reviewed Jan 16, 2026

View reviewed changes

tenpercent mentioned this pull request Jan 16, 2026

Rewrite O(N) recursive templates with O(1) pack expansion #3596

Open

1 task

tenpercent mentioned this pull request Jan 16, 2026

Replace nested static_for lambdas with compile-time search helper #3600

Open

1 task

tenpercent marked this pull request as ready for review January 17, 2026 03:41

tenpercent requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, vidyasagar-amd and vpietila-amd as code owners January 17, 2026 03:41

tenpercent and others added 2 commits January 16, 2026 21:45

tenpercent force-pushed the tenpercent/old-ck-pack-rewrites branch from 57c8cb1 to 3d46680 Compare January 17, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585

Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585

tenpercent commented Jan 16, 2026 •

edited

Loading

Uh oh!

shumway commented Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

tenpercent Jan 16, 2026

Uh oh!

tenpercent commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585

Are you sure you want to change the base?

Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585

Conversation

tenpercent commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test Plan

PR Stack

Uh oh!

shumway commented Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

tenpercent Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

tenpercent commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tenpercent commented Jan 16, 2026 •

edited

Loading