VARYING subgroup path was buggy. Simplify to only consider 32-bits at a time, and vary the tile size based on 32/64-threads instead.