Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode image write block instruction #41

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alyssarosenzweig
Copy link
Contributor

This instruction ("TODO.unkB1") is used to write out an entire block from local memory into an image. Because it is block based and not pixel based, in comparison to the regular image write instruction it works even if the destination image is compressed. It is tailor fit for use in the end-of-tile program, to blit tile memory to the framebuffer.

@TellowKrinkle
Copy link
Contributor

TellowKrinkle commented May 19, 2023

Texture extension is at 62 0001-imageblock-write-Texture-extension.patch

Test shader
using namespace metal;

struct Test {
	float4 yay;
};

struct ArgBuf {
	texture2d<float, access::write> tex[64];
};

kernel void test(metal::imageblock<Test> f, constant ArgBuf& texlist, ushort2 lid [[thread_position_in_threadgroup]], ushort2 gid [[thread_position_in_grid]]) {
	threadgroup_imageblock Test* t = f.data(lid);
	t->yay = float4(0, 1, 2, 3);
	threadgroup_barrier(mem_flags::mem_threadgroup_imageblock);
	if (all(lid == 0)) {
		imageblock_slice<float4, imageblock_layout_explicit> slice = f.slice(t->yay);
		for (int i = 0; i < 64; i++) {
			texlist.tex[i].write(slice, gid);
		}
	}
}
compute shader prolog:
   0: 0541040d00c73200     device_load      0, i32, xy, r104_r105, u2_u3, 0, signed, lsl 1
   8: 05a1144d00c43200     device_load      1, i32, xy, r20_r21, u2_u3, 1, signed, lsl 1
  10: 0591144d00c83200     device_load      1, i32, xy, r18_r19, u2_u3, 1, signed, lsl 2
  18: 0581344d00c43200     device_load      1, i32, xy, r16_r17, u2_u3, 3, signed, lsl 1
  20: 0571244d00c83200     device_load      1, i32, xy, r14_r15, u2_u3, 2, signed, lsl 2
  28: 0561544d00c43200     device_load      1, i32, xy, r12_r13, u2_u3, 5, signed, lsl 1
  30: 0551344d00c83200     device_load      1, i32, xy, r10_r11, u2_u3, 3, signed, lsl 2
  38: 0541744d00c43200     device_load      1, i32, xy, r8_r9, u2_u3, 7, signed, lsl 1
  40: 0531444d00c83200     device_load      1, i32, xy, r6_r7, u2_u3, 4, signed, lsl 2
  48: 0521940d00c43200     device_load      0, i32, xy, r4_r5, u2_u3, 9, signed, lsl 1
  50: 0511540d00c83200     device_load      0, i32, xy, r2_r3, u2_u3, 5, signed, lsl 2
  58: 0501b40d00c43200     device_load      0, i32, xy, r0_r1, u2_u3, 11, signed, lsl 1
  60: 0531640d00cb3200     device_load      0, i32, xy, r102_r103, u2_u3, 6, signed, lsl 2
  68: 0521d40d00c73200     device_load      0, i32, xy, r100_r101, u2_u3, 13, signed, lsl 1
  70: 0511740d00cb3200     device_load      0, i32, xy, r98_r99, u2_u3, 7, signed, lsl 2
  78: 0501f40d00c73200     device_load      0, i32, xy, r96_r97, u2_u3, 15, signed, lsl 1
  80: 3800                 wait             0
  82: 05f1840d00ca3200     device_load      0, i32, xy, r94_r95, u2_u3, 8, signed, lsl 2
  8a: 05e1140d01c63200     device_load      0, i32, xy, r92_r93, u2_u3, 17, signed, lsl 1
  92: 05d1940d00ca3200     device_load      0, i32, xy, r90_r91, u2_u3, 9, signed, lsl 2
  9a: 05c1340d01c63200     device_load      0, i32, xy, r88_r89, u2_u3, 19, signed, lsl 1
  a2: 05b1a40d00ca3200     device_load      0, i32, xy, r86_r87, u2_u3, 10, signed, lsl 2
  aa: 05a1540d01c63200     device_load      0, i32, xy, r84_r85, u2_u3, 21, signed, lsl 1
  b2: 0591b40d00ca3200     device_load      0, i32, xy, r82_r83, u2_u3, 11, signed, lsl 2
  ba: 0581740d01c63200     device_load      0, i32, xy, r80_r81, u2_u3, 23, signed, lsl 1
  c2: 3801                 wait             1
  c4: 0571c44d00ca3200     device_load      1, i32, xy, r78_r79, u2_u3, 12, signed, lsl 2
  cc: 0561944d01c63200     device_load      1, i32, xy, r76_r77, u2_u3, 25, signed, lsl 1
  d4: 0551d44d00ca3200     device_load      1, i32, xy, r74_r75, u2_u3, 13, signed, lsl 2
  dc: 0541b44d01c63200     device_load      1, i32, xy, r72_r73, u2_u3, 27, signed, lsl 1
  e4: 0531e44d00ca3200     device_load      1, i32, xy, r70_r71, u2_u3, 14, signed, lsl 2
  ec: 0521d44d01c63200     device_load      1, i32, xy, r68_r69, u2_u3, 29, signed, lsl 1
  f4: 0511f44d00ca3200     device_load      1, i32, xy, r66_r67, u2_u3, 15, signed, lsl 2
  fc: 0501f44d01c63200     device_load      1, i32, xy, r64_r65, u2_u3, 31, signed, lsl 1
 104: 3800                 wait             0
 106: 05f1040d01c93200     device_load      0, i32, xy, r62_r63, u2_u3, 16, signed, lsl 2
 10e: 05e1140d02c53200     device_load      0, i32, xy, r60_r61, u2_u3, 33, signed, lsl 1
 116: 05d1140d01c93200     device_load      0, i32, xy, r58_r59, u2_u3, 17, signed, lsl 2
 11e: 05c1340d02c53200     device_load      0, i32, xy, r56_r57, u2_u3, 35, signed, lsl 1
 126: 05b1240d01c93200     device_load      0, i32, xy, r54_r55, u2_u3, 18, signed, lsl 2
 12e: 05a1540d02c53200     device_load      0, i32, xy, r52_r53, u2_u3, 37, signed, lsl 1
 136: 0591340d01c93200     device_load      0, i32, xy, r50_r51, u2_u3, 19, signed, lsl 2
 13e: 0581740d02c53200     device_load      0, i32, xy, r48_r49, u2_u3, 39, signed, lsl 1
 146: 3801                 wait             1
 148: 0571444d01c93200     device_load      1, i32, xy, r46_r47, u2_u3, 20, signed, lsl 2
 150: 0561944d02c53200     device_load      1, i32, xy, r44_r45, u2_u3, 41, signed, lsl 1
 158: 0551544d01c93200     device_load      1, i32, xy, r42_r43, u2_u3, 21, signed, lsl 2
 160: 0541b44d02c53200     device_load      1, i32, xy, r40_r41, u2_u3, 43, signed, lsl 1
 168: 0531644d01c93200     device_load      1, i32, xy, r38_r39, u2_u3, 22, signed, lsl 2
 170: 0521d44d02c53200     device_load      1, i32, xy, r36_r37, u2_u3, 45, signed, lsl 1
 178: 0511744d01c93200     device_load      1, i32, xy, r34_r35, u2_u3, 23, signed, lsl 2
 180: 0501f44d02c53200     device_load      1, i32, xy, r32_r33, u2_u3, 47, signed, lsl 1
 188: 3800                 wait             0
 18a: 05f1840d01c83200     device_load      0, i32, xy, r30_r31, u2_u3, 24, signed, lsl 2
 192: 05e1140d03c43200     device_load      0, i32, xy, r28_r29, u2_u3, 49, signed, lsl 1
 19a: 05d1940d01c83200     device_load      0, i32, xy, r26_r27, u2_u3, 25, signed, lsl 2
 1a2: 05c1340d03c43200     device_load      0, i32, xy, r24_r25, u2_u3, 51, signed, lsl 1
 1aa: 05b1a40d01c83200     device_load      0, i32, xy, r22_r23, u2_u3, 26, signed, lsl 2
 1b2: c540803d00833000     uniform_store    2, i16, xy, 0, r104l_r104h, 8
 1ba: 62a1000000000030     mov_imm          r104, 0
 1c2: c540a03d00833000     uniform_store    2, i16, xy, 0, r104l_r104h, 10
 1ca: c5a0c03d00803000     uniform_store    2, i16, xy, 0, r20l_r20h, 12
 1d2: 05a1540d03c43200     device_load      0, i32, xy, r20_r21, u2_u3, 53, signed, lsl 1
 1da: c540e03d00833000     uniform_store    2, i16, xy, 0, r104l_r104h, 14
 1e2: c590003d01803000     uniform_store    2, i16, xy, 0, r18l_r18h, 16
 1ea: 0591b40d01c83200     device_load      0, i32, xy, r18_r19, u2_u3, 27, signed, lsl 2
 1f2: c540203d01833000     uniform_store    2, i16, xy, 0, r104l_r104h, 18
 1fa: c580403d01803000     uniform_store    2, i16, xy, 0, r16l_r16h, 20
 202: 0581740d03c43200     device_load      0, i32, xy, r16_r17, u2_u3, 55, signed, lsl 1
 20a: c540603d01833000     uniform_store    2, i16, xy, 0, r104l_r104h, 22
 212: c570803d01803000     uniform_store    2, i16, xy, 0, r14l_r14h, 24
 21a: 3801                 wait             1
 21c: 0571c44d01c83200     device_load      1, i32, xy, r14_r15, u2_u3, 28, signed, lsl 2
 224: c540a03d01833000     uniform_store    2, i16, xy, 0, r104l_r104h, 26
 22c: c560c03d01803000     uniform_store    2, i16, xy, 0, r12l_r12h, 28
 234: 0561944d03c43200     device_load      1, i32, xy, r12_r13, u2_u3, 57, signed, lsl 1
 23c: c540e03d01833000     uniform_store    2, i16, xy, 0, r104l_r104h, 30
 244: c550003d02803000     uniform_store    2, i16, xy, 0, r10l_r10h, 32
 24c: 0551d44d01c83200     device_load      1, i32, xy, r10_r11, u2_u3, 29, signed, lsl 2
 254: c540203d02833000     uniform_store    2, i16, xy, 0, r104l_r104h, 34
 25c: c540403d02803000     uniform_store    2, i16, xy, 0, r8l_r8h, 36
 264: 0541b44d03c43200     device_load      1, i32, xy, r8_r9, u2_u3, 59, signed, lsl 1
 26c: c540603d02833000     uniform_store    2, i16, xy, 0, r104l_r104h, 38
 274: c530803d02803000     uniform_store    2, i16, xy, 0, r6l_r6h, 40
 27c: 0531e44d01c83200     device_load      1, i32, xy, r6_r7, u2_u3, 30, signed, lsl 2
 284: c540a03d02833000     uniform_store    2, i16, xy, 0, r104l_r104h, 42
 28c: c520c03d02803000     uniform_store    2, i16, xy, 0, r4l_r4h, 44
 294: 0521d44d03c43200     device_load      1, i32, xy, r4_r5, u2_u3, 61, signed, lsl 1
 29c: c540e03d02833000     uniform_store    2, i16, xy, 0, r104l_r104h, 46
 2a4: c510003d03803000     uniform_store    2, i16, xy, 0, r2l_r2h, 48
 2ac: 0511f44d01c83200     device_load      1, i32, xy, r2_r3, u2_u3, 31, signed, lsl 2
 2b4: c540203d03833000     uniform_store    2, i16, xy, 0, r104l_r104h, 50
 2bc: c500403d03803000     uniform_store    2, i16, xy, 0, r0l_r0h, 52
 2c4: 0501f44d03c43200     device_load      1, i32, xy, r0_r1, u2_u3, 63, signed, lsl 1
 2cc: c540603d03833000     uniform_store    2, i16, xy, 0, r104l_r104h, 54
 2d4: c530803d03833000     uniform_store    2, i16, xy, 0, r102l_r102h, 56
 2dc: c540a03d03833000     uniform_store    2, i16, xy, 0, r104l_r104h, 58
 2e4: c520c03d03833000     uniform_store    2, i16, xy, 0, r100l_r100h, 60
 2ec: c540e03d03833000     uniform_store    2, i16, xy, 0, r104l_r104h, 62
 2f4: c510003d04833000     uniform_store    2, i16, xy, 0, r98l_r98h, 64
 2fc: c540203d04833000     uniform_store    2, i16, xy, 0, r104l_r104h, 66
 304: c500403d04833000     uniform_store    2, i16, xy, 0, r96l_r96h, 68
 30c: c540603d04833000     uniform_store    2, i16, xy, 0, r104l_r104h, 70
 314: c5f0803d04823000     uniform_store    2, i16, xy, 0, r94l_r94h, 72
 31c: c540a03d04833000     uniform_store    2, i16, xy, 0, r104l_r104h, 74
 324: c5e0c03d04823000     uniform_store    2, i16, xy, 0, r92l_r92h, 76
 32c: c540e03d04833000     uniform_store    2, i16, xy, 0, r104l_r104h, 78
 334: c5d0003d05823000     uniform_store    2, i16, xy, 0, r90l_r90h, 80
 33c: c540203d05833000     uniform_store    2, i16, xy, 0, r104l_r104h, 82
 344: c5c0403d05823000     uniform_store    2, i16, xy, 0, r88l_r88h, 84
 34c: c540603d05833000     uniform_store    2, i16, xy, 0, r104l_r104h, 86
 354: c5b0803d05823000     uniform_store    2, i16, xy, 0, r86l_r86h, 88
 35c: c540a03d05833000     uniform_store    2, i16, xy, 0, r104l_r104h, 90
 364: c5a0c03d05823000     uniform_store    2, i16, xy, 0, r84l_r84h, 92
 36c: c540e03d05833000     uniform_store    2, i16, xy, 0, r104l_r104h, 94
 374: c590003d06823000     uniform_store    2, i16, xy, 0, r82l_r82h, 96
 37c: c540203d06833000     uniform_store    2, i16, xy, 0, r104l_r104h, 98
 384: c580403d06823000     uniform_store    2, i16, xy, 0, r80l_r80h, 100
 38c: c540603d06833000     uniform_store    2, i16, xy, 0, r104l_r104h, 102
 394: c570803d06823000     uniform_store    2, i16, xy, 0, r78l_r78h, 104
 39c: c540a03d06833000     uniform_store    2, i16, xy, 0, r104l_r104h, 106
 3a4: c560c03d06823000     uniform_store    2, i16, xy, 0, r76l_r76h, 108
 3ac: c540e03d06833000     uniform_store    2, i16, xy, 0, r104l_r104h, 110
 3b4: c550003d07823000     uniform_store    2, i16, xy, 0, r74l_r74h, 112
 3bc: c540203d07833000     uniform_store    2, i16, xy, 0, r104l_r104h, 114
 3c4: c540403d07823000     uniform_store    2, i16, xy, 0, r72l_r72h, 116
 3cc: c540603d07833000     uniform_store    2, i16, xy, 0, r104l_r104h, 118
 3d4: c530803d07823000     uniform_store    2, i16, xy, 0, r70l_r70h, 120
 3dc: c540a03d07833000     uniform_store    2, i16, xy, 0, r104l_r104h, 122
 3e4: c520c03d07823000     uniform_store    2, i16, xy, 0, r68l_r68h, 124
 3ec: c540e03d07833000     uniform_store    2, i16, xy, 0, r104l_r104h, 126
 3f4: c510003d08823000     uniform_store    2, i16, xy, 0, r66l_r66h, 128
 3fc: c540203d08833000     uniform_store    2, i16, xy, 0, r104l_r104h, 130
 404: c500403d08823000     uniform_store    2, i16, xy, 0, r64l_r64h, 132
 40c: c540603d08833000     uniform_store    2, i16, xy, 0, r104l_r104h, 134
 414: c5f0803d08813000     uniform_store    2, i16, xy, 0, r62l_r62h, 136
 41c: c540a03d08833000     uniform_store    2, i16, xy, 0, r104l_r104h, 138
 424: c5e0c03d08813000     uniform_store    2, i16, xy, 0, r60l_r60h, 140
 42c: c540e03d08833000     uniform_store    2, i16, xy, 0, r104l_r104h, 142
 434: c5d0003d09813000     uniform_store    2, i16, xy, 0, r58l_r58h, 144
 43c: c540203d09833000     uniform_store    2, i16, xy, 0, r104l_r104h, 146
 444: c5c0403d09813000     uniform_store    2, i16, xy, 0, r56l_r56h, 148
 44c: c540603d09833000     uniform_store    2, i16, xy, 0, r104l_r104h, 150
 454: c5b0803d09813000     uniform_store    2, i16, xy, 0, r54l_r54h, 152
 45c: c540a03d09833000     uniform_store    2, i16, xy, 0, r104l_r104h, 154
 464: c5a0c03d09813000     uniform_store    2, i16, xy, 0, r52l_r52h, 156
 46c: c540e03d09833000     uniform_store    2, i16, xy, 0, r104l_r104h, 158
 474: c590003d0a813000     uniform_store    2, i16, xy, 0, r50l_r50h, 160
 47c: c540203d0a833000     uniform_store    2, i16, xy, 0, r104l_r104h, 162
 484: c580403d0a813000     uniform_store    2, i16, xy, 0, r48l_r48h, 164
 48c: c540603d0a833000     uniform_store    2, i16, xy, 0, r104l_r104h, 166
 494: c570803d0a813000     uniform_store    2, i16, xy, 0, r46l_r46h, 168
 49c: c540a03d0a833000     uniform_store    2, i16, xy, 0, r104l_r104h, 170
 4a4: c560c03d0a813000     uniform_store    2, i16, xy, 0, r44l_r44h, 172
 4ac: c540e03d0a833000     uniform_store    2, i16, xy, 0, r104l_r104h, 174
 4b4: c550003d0b813000     uniform_store    2, i16, xy, 0, r42l_r42h, 176
 4bc: c540203d0b833000     uniform_store    2, i16, xy, 0, r104l_r104h, 178
 4c4: c540403d0b813000     uniform_store    2, i16, xy, 0, r40l_r40h, 180
 4cc: c540603d0b833000     uniform_store    2, i16, xy, 0, r104l_r104h, 182
 4d4: c530803d0b813000     uniform_store    2, i16, xy, 0, r38l_r38h, 184
 4dc: c540a03d0b833000     uniform_store    2, i16, xy, 0, r104l_r104h, 186
 4e4: c520c03d0b813000     uniform_store    2, i16, xy, 0, r36l_r36h, 188
 4ec: c540e03d0b833000     uniform_store    2, i16, xy, 0, r104l_r104h, 190
 4f4: c510003d0c813000     uniform_store    2, i16, xy, 0, r34l_r34h, 192
 4fc: c540203d0c833000     uniform_store    2, i16, xy, 0, r104l_r104h, 194
 504: c500403d0c813000     uniform_store    2, i16, xy, 0, r32l_r32h, 196
 50c: c540603d0c833000     uniform_store    2, i16, xy, 0, r104l_r104h, 198
 514: 3800                 wait             0
 516: c5f0803d0c803000     uniform_store    2, i16, xy, 0, r30l_r30h, 200
 51e: c540a03d0c833000     uniform_store    2, i16, xy, 0, r104l_r104h, 202
 526: c5e0c03d0c803000     uniform_store    2, i16, xy, 0, r28l_r28h, 204
 52e: c540e03d0c833000     uniform_store    2, i16, xy, 0, r104l_r104h, 206
 536: c5d0003d0d803000     uniform_store    2, i16, xy, 0, r26l_r26h, 208
 53e: c540203d0d833000     uniform_store    2, i16, xy, 0, r104l_r104h, 210
 546: c5c0403d0d803000     uniform_store    2, i16, xy, 0, r24l_r24h, 212
 54e: c540603d0d833000     uniform_store    2, i16, xy, 0, r104l_r104h, 214
 556: c5b0803d0d803000     uniform_store    2, i16, xy, 0, r22l_r22h, 216
 55e: c540a03d0d833000     uniform_store    2, i16, xy, 0, r104l_r104h, 218
 566: c5a0c03d0d803000     uniform_store    2, i16, xy, 0, r20l_r20h, 220
 56e: c540e03d0d833000     uniform_store    2, i16, xy, 0, r104l_r104h, 222
 576: c590003d0e803000     uniform_store    2, i16, xy, 0, r18l_r18h, 224
 57e: c540203d0e833000     uniform_store    2, i16, xy, 0, r104l_r104h, 226
 586: c580403d0e803000     uniform_store    2, i16, xy, 0, r16l_r16h, 228
 58e: c540603d0e833000     uniform_store    2, i16, xy, 0, r104l_r104h, 230
 596: 3801                 wait             1
 598: c570803d0e803000     uniform_store    2, i16, xy, 0, r14l_r14h, 232
 5a0: c540a03d0e833000     uniform_store    2, i16, xy, 0, r104l_r104h, 234
 5a8: c560c03d0e803000     uniform_store    2, i16, xy, 0, r12l_r12h, 236
 5b0: c540e03d0e833000     uniform_store    2, i16, xy, 0, r104l_r104h, 238
 5b8: c550003d0f803000     uniform_store    2, i16, xy, 0, r10l_r10h, 240
 5c0: c540203d0f833000     uniform_store    2, i16, xy, 0, r104l_r104h, 242
 5c8: c540403d0f803000     uniform_store    2, i16, xy, 0, r8l_r8h, 244
 5d0: c540603d0f833000     uniform_store    2, i16, xy, 0, r104l_r104h, 246
 5d8: c530803d0f803000     uniform_store    2, i16, xy, 0, r6l_r6h, 248
 5e0: c540a03d0f833000     uniform_store    2, i16, xy, 0, r104l_r104h, 250
 5e8: c520c03d0f803000     uniform_store    2, i16, xy, 0, r4l_r4h, 252
 5f0: c540e03d0f833000     uniform_store    2, i16, xy, 0, r104l_r104h, 254
 5f8: c510003d00803001     uniform_store    2, i16, xy, 0, r2l_r2h, 256
 600: c540203d00833001     uniform_store    2, i16, xy, 0, r104l_r104h, 258
 608: c500403d00803001     uniform_store    2, i16, xy, 0, r0l_r0h, 260
 610: c540603d00833001     uniform_store    2, i16, xy, 0, r104l_r104h, 262
 618: 8800                 stop             

compute shader:
   0: f20a3100             get_sr           r2h.cache, sr49 (thread_position_in_threadgroup.y)
   4: f2083000             get_sr           r2l.cache, sr48 (thread_position_in_threadgroup.x)
   8: 7e0445088000         mov              r1l, r2h
   e: 7e0644088000         mov              r1h, r2l
  14: 621500000000         mov_imm          r5, 0
  1a: 62190000803f         mov_imm          r6, 1065353216
  20: 621d00000040         mov_imm          r7, 1073741824
  26: 622100004040         mov_imm          r8, 1077936128
  2c: 09150402fc048000     st_tile          r5_r6_r7_r8, i32, 1, xyzw, 64, 1, 4
  34: 6800                 threadgroup_barrier 
  36: e2000000             mov_imm          r0l.cache, 0
  3a: 5288c2020000         if_icmp          r0l, seq, r1.discard, 0, 1
  40: 20c016040000         jmp_exec_none    0x456
  46: 62060000             mov_imm          r1h, 0
  4a: 620a0000             mov_imm          r2h, 0
  4e: 72041004             get_sr           r1l, sr80 (thread_position_in_grid.x)
  52: 72081104             get_sr           r2l, sr81 (thread_position_in_grid.y)
  56: 7e0dc4098000         mov              r3, u130
  5c: 7e15c0098000         mov              r5, u128
  62: 7e19bc09800c         mov              r6, u126
  68: 7e1db809800c         mov              r7, u124
  6e: 7e21b409800c         mov              r8, u122
  74: 7e25b009800c         mov              r9, u120
  7a: 7e29ac09800c         mov              r10, u118
  80: 7e2da809800c         mov              r11, u116
  86: 7e31a409800c         mov              r12, u114
  8c: 7e35a009800c         mov              r13, u112
  92: 7e399c09800c         mov              r14, u110
  98: 7e3d9809800c         mov              r15, u108
  9e: 7e419409800c         mov              r16, u106
  a4: 7e459009800c         mov              r17, u104
  aa: 7e498c09800c         mov              r18, u102
  b0: 7e4d8809800c         mov              r19, u100
  b6: 7e518409800c         mov              r20, u98
  bc: 7e558009800c         mov              r21, u96
  c2: 7e59bc098008         mov              r22, u94
  c8: 7e5db8098008         mov              r23, u92
  ce: 7e61b4098008         mov              r24, u90
  d4: 7e65b0098008         mov              r25, u88
  da: 7e69ac098008         mov              r26, u86
  e0: 7e6da8098008         mov              r27, u84
  e6: 7e71a4098008         mov              r28, u82
  ec: 7e75a0098008         mov              r29, u80
  f2: 7e799c098008         mov              r30, u78
  f8: 7e7d98098008         mov              r31, u76
  fe: 7e0194098018         mov              r32, u74
 104: 7e0590098018         mov              r33, u72
 10a: 7e098c098018         mov              r34, u70
 110: 7e0d88098018         mov              r35, u68
 116: 7e1184098018         mov              r36, u66
 11c: 7e1580098018         mov              r37, u64
 122: 7e19bc098014         mov              r38, u62
 128: 7e1db8098014         mov              r39, u60
 12e: 7e21b4098014         mov              r40, u58
 134: 7e25b0098014         mov              r41, u56
 13a: 7e29ac098014         mov              r42, u54
 140: 7e2da8098014         mov              r43, u52
 146: 7e31a4098014         mov              r44, u50
 14c: 7e35a0098014         mov              r45, u48
 152: 7e399c098014         mov              r46, u46
 158: 7e3d98098014         mov              r47, u44
 15e: 7e4194098014         mov              r48, u42
 164: 7e4590098014         mov              r49, u40
 16a: 7e498c098014         mov              r50, u38
 170: 7e4d88098014         mov              r51, u36
 176: 7e5184098014         mov              r52, u34
 17c: 7e5580098014         mov              r53, u32
 182: 7e59bc098010         mov              r54, u30
 188: 7e5db8098010         mov              r55, u28
 18e: 7e61b4098010         mov              r56, u26
 194: 7e65b0098010         mov              r57, u24
 19a: 7e69ac098010         mov              r58, u22
 1a0: 7e6da8098010         mov              r59, u20
 1a6: 7e71a4098010         mov              r60, u18
 1ac: 7e75a0098010         mov              r61, u16
 1b2: 7e799c098010         mov              r62, u14
 1b8: 7e7d98098010         mov              r63, u12
 1be: 7e0194098020         mov              r64, u10
 1c4: 7e0590098020         mov              r65, u8
 1ca: 7e098c098020         mov              r66, u6
 1d0: 7e0d88098020         mov              r67, u4
 1d6: b1808280c64a20800100 image_write_block r0l, 130, 1, r67, 0, 37, i32, 0
 1e0: b1808280c44a20800100 image_write_block r0l, 130, 1, r66, 0, 37, i32, 0
 1ea: b1808280c24a20800100 image_write_block r0l, 130, 1, r65, 0, 37, i32, 0
 1f4: b1808280c04a20800100 image_write_block r0l, 130, 1, r64, 0, 37, i32, 0
 1fe: b1808280fe4a20400100 image_write_block r0l, 130, 1, r63, 0, 37, i32, 0
 208: b1808280fc4a20400100 image_write_block r0l, 130, 1, r62, 0, 37, i32, 0
 212: b1808280fa4a20400100 image_write_block r0l, 130, 1, r61, 0, 37, i32, 0
 21c: b1808280f84a20400100 image_write_block r0l, 130, 1, r60, 0, 37, i32, 0
 226: b1808280f64a20400100 image_write_block r0l, 130, 1, r59, 0, 37, i32, 0
 230: b1808280f44a20400100 image_write_block r0l, 130, 1, r58, 0, 37, i32, 0
 23a: b1808280f24a20400100 image_write_block r0l, 130, 1, r57, 0, 37, i32, 0
 244: b1808280f04a20400100 image_write_block r0l, 130, 1, r56, 0, 37, i32, 0
 24e: b1808280ee4a20400100 image_write_block r0l, 130, 1, r55, 0, 37, i32, 0
 258: b1808280ec4a20400100 image_write_block r0l, 130, 1, r54, 0, 37, i32, 0
 262: b1808280ea4a20400100 image_write_block r0l, 130, 1, r53, 0, 37, i32, 0
 26c: b1808280e84a20400100 image_write_block r0l, 130, 1, r52, 0, 37, i32, 0
 276: b1808280e64a20400100 image_write_block r0l, 130, 1, r51, 0, 37, i32, 0
 280: b1808280e44a20400100 image_write_block r0l, 130, 1, r50, 0, 37, i32, 0
 28a: b1808280e24a20400100 image_write_block r0l, 130, 1, r49, 0, 37, i32, 0
 294: b1808280e04a20400100 image_write_block r0l, 130, 1, r48, 0, 37, i32, 0
 29e: b1808280de4a20400100 image_write_block r0l, 130, 1, r47, 0, 37, i32, 0
 2a8: b1808280dc4a20400100 image_write_block r0l, 130, 1, r46, 0, 37, i32, 0
 2b2: b1808280da4a20400100 image_write_block r0l, 130, 1, r45, 0, 37, i32, 0
 2bc: b1808280d84a20400100 image_write_block r0l, 130, 1, r44, 0, 37, i32, 0
 2c6: b1808280d64a20400100 image_write_block r0l, 130, 1, r43, 0, 37, i32, 0
 2d0: b1808280d44a20400100 image_write_block r0l, 130, 1, r42, 0, 37, i32, 0
 2da: b1808280d24a20400100 image_write_block r0l, 130, 1, r41, 0, 37, i32, 0
 2e4: b1808280d04a20400100 image_write_block r0l, 130, 1, r40, 0, 37, i32, 0
 2ee: b1808280ce4a20400100 image_write_block r0l, 130, 1, r39, 0, 37, i32, 0
 2f8: b1808280cc4a20400100 image_write_block r0l, 130, 1, r38, 0, 37, i32, 0
 302: b1808280ca4a20400100 image_write_block r0l, 130, 1, r37, 0, 37, i32, 0
 30c: b1808280c84a20400100 image_write_block r0l, 130, 1, r36, 0, 37, i32, 0
 316: b1808280c64a20400100 image_write_block r0l, 130, 1, r35, 0, 37, i32, 0
 320: b1808280c44a20400100 image_write_block r0l, 130, 1, r34, 0, 37, i32, 0
 32a: b1808280c24a20400100 image_write_block r0l, 130, 1, r33, 0, 37, i32, 0
 334: b1808280c04a20400100 image_write_block r0l, 130, 1, r32, 0, 37, i32, 0
 33e: b1808280fe4a20000100 image_write_block r0l, 130, 1, r31, 0, 37, i32, 0
 348: b1808280fc4a20000100 image_write_block r0l, 130, 1, r30, 0, 37, i32, 0
 352: b1808280fa4a20000100 image_write_block r0l, 130, 1, r29, 0, 37, i32, 0
 35c: b1808280f84a20000100 image_write_block r0l, 130, 1, r28, 0, 37, i32, 0
 366: b1808280f64a20000100 image_write_block r0l, 130, 1, r27, 0, 37, i32, 0
 370: b1808280f44a20000100 image_write_block r0l, 130, 1, r26, 0, 37, i32, 0
 37a: b1808280f24a20000100 image_write_block r0l, 130, 1, r25, 0, 37, i32, 0
 384: b1808280f04a20000100 image_write_block r0l, 130, 1, r24, 0, 37, i32, 0
 38e: b1808280ee4a20000100 image_write_block r0l, 130, 1, r23, 0, 37, i32, 0
 398: b1808280ec4a20000100 image_write_block r0l, 130, 1, r22, 0, 37, i32, 0
 3a2: b1808280ea4a20000100 image_write_block r0l, 130, 1, r21, 0, 37, i32, 0
 3ac: b1808280e84a20000100 image_write_block r0l, 130, 1, r20, 0, 37, i32, 0
 3b6: b1808280e64a20000100 image_write_block r0l, 130, 1, r19, 0, 37, i32, 0
 3c0: b1808280e44a20000100 image_write_block r0l, 130, 1, r18, 0, 37, i32, 0
 3ca: b1808280e24a20000100 image_write_block r0l, 130, 1, r17, 0, 37, i32, 0
 3d4: b1808280e04a20000100 image_write_block r0l, 130, 1, r16, 0, 37, i32, 0
 3de: b1808280de4a20000100 image_write_block r0l, 130, 1, r15, 0, 37, i32, 0
 3e8: b1808280dc4a20000100 image_write_block r0l, 130, 1, r14, 0, 37, i32, 0
 3f2: b1808280da4a20000100 image_write_block r0l, 130, 1, r13, 0, 37, i32, 0
 3fc: b1808280d84a20000100 image_write_block r0l, 130, 1, r12, 0, 37, i32, 0
 406: b1808280d64a20000100 image_write_block r0l, 130, 1, r11, 0, 37, i32, 0
 410: b1808280d44a20000100 image_write_block r0l, 130, 1, r10, 0, 37, i32, 0
 41a: b1808280d24a20000100 image_write_block r0l, 130, 1, r9, 0, 37, i32, 0
 424: b1808280d04a20000100 image_write_block r0l, 130, 1, r8, 0, 37, i32, 0
 42e: b1808280ce4a20000100 image_write_block r0l, 130, 1, r7, 0, 37, i32, 0
 438: b1808280cc4a20000100 image_write_block r0l, 130, 1, r6, 0, 37, i32, 0
 442: b1808280ca4a20000100 image_write_block r0l, 130, 1, r5, 0, 37, i32, 0
 44c: b1808280c64a20000100 image_write_block r0l, 130, 1, r3, 0, 37, i32, 0
 456: 520e00000000         pop_exec         r0l, 1
 45c: 8800                 stop             

Looks like there's a CoordsDesc and TEX_TYPES in there too 0001-imageblock-write-coordsdesc-and-tex_type.patch (note: I haven't confirmed the flags bit on that)

Writing imageblocks to texturecube_arrays
using namespace metal;

struct Test {
	float4 yay;
};

kernel void test(metal::imageblock<Test> f, texturecube_array<float, access::write> tex, ushort2 lid [[thread_position_in_threadgroup]], ushort2 gid [[thread_position_in_grid]]) {
	threadgroup_imageblock Test* t = f.data(lid);
	t->yay = float4(0, 1, 2, 3);
	threadgroup_barrier(mem_flags::mem_threadgroup_imageblock);
	if (all(lid == 0)) {
		imageblock_slice<float4, imageblock_layout_explicit> slice = f.slice(t->yay);
		for (int i = 0; i < 64; i++) {
			tex.write(slice, gid + ushort2(0, i), 47, 48);
		}
	}
}
compute shader:
   0: f20a3100             get_sr           r2h.cache, sr49 (thread_position_in_threadgroup.y)
   4: f2083000             get_sr           r2l.cache, sr48 (thread_position_in_threadgroup.x)
   8: 7e0445088000         mov              r1l, r2h
   e: 7e0644088000         mov              r1h, r2l
  14: 621500000000         mov_imm          r5, 0
  1a: 62190000803f         mov_imm          r6, 1065353216
  20: 621d00000040         mov_imm          r7, 1073741824
  26: 622100004040         mov_imm          r8, 1077936128
  2c: 09150402fc048000     st_tile          r5_r6_r7_r8, i32, 1, xyzw, 64, 1, 4
  34: 6800                 threadgroup_barrier 
  36: e2000000             mov_imm          r0l.cache, 0
  3a: 5288c2020000         if_icmp          r0l, seq, r1.discard, 0, 1
  40: 20c0b6080000         jmp_exec_none    0x8F6
  46: e2060000             mov_imm          r1h.cache, 0
  4a: 620a0000             mov_imm          r2h, 0
  4e: 620e0000             mov_imm          r3h, 0
  52: f2081104             get_sr           r2l.cache, sr81 (thread_position_in_grid.y)
  56: f2041004             get_sr           r1l.cache, sr80 (thread_position_in_grid.x)
  5a: 0e4c014008000000     iadd             r19l, 1, r2l.cache
  62: 0e38024008000000     iadd             r14l, 2, r2l.cache
  6a: 0e28034008000000     iadd             r10l, 3, r2l.cache
  72: 0e18044004000000     iadd             r6l, 4, r2l
  7a: 7e4882088000         mov              r18l, r1l.cache
  80: 7e3482088000         mov              r13l, r1l.cache
  86: 7e2482088000         mov              r9l, r1l.cache
  8c: 620c3000             mov_imm          r3l, 48
  90: 7e1442088000         mov              r5l, r1l
  96: 62503000             mov_imm          r20l, 48
  9a: 623c3000             mov_imm          r15l, 48
  9e: 622c3000             mov_imm          r11l, 48
  a2: 621c3000             mov_imm          r7l, 48
  a6: 7e4a83088000         mov              r18h, r1h.cache
  ac: 7e4e83088000         mov              r19h, r1h.cache
  b2: 7e5283088000         mov              r20h, r1h.cache
  b8: 7e3683088000         mov              r13h, r1h.cache
  be: 7e3a83088000         mov              r14h, r1h.cache
  c4: 7e3e83088000         mov              r15h, r1h.cache
  ca: 7e2683088000         mov              r9h, r1h.cache
  d0: 7e2a83088000         mov              r10h, r1h.cache
  d6: 7e2e83088000         mov              r11h, r1h.cache
  dc: 7e1683088000         mov              r5h, r1h.cache
  e2: 7e1a83088000         mov              r6h, r1h.cache
  e8: 62102f00             mov_imm          r4l, 47
  ec: 7e1e43088000         mov              r7h, r1h
  f2: 62542f00             mov_imm          r21l, 47
  f6: 62402f00             mov_imm          r16l, 47
  fa: 62302f00             mov_imm          r12l, 47
  fe: 62202f00             mov_imm          r8l, 47
 102: b1808280004f20000100 image_write_block r0l, r1l_r1h_r2l_r2h_r3l_r3h_r4l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 10c: b180a480004f20000100 image_write_block r0l, r18l_r18h_r19l_r19h_r20l_r20h_r21l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 116: b1809a80004f20000100 image_write_block r0l, r13l_r13h_r14l_r14h_r15l_r15h_r16l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 120: b1809280004f20000100 image_write_block r0l, r9l_r9h_r10l_r10h_r11l_r11h_r12l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 12a: b1808a80004f20000100 image_write_block r0l, r5l_r5h_r6l_r6h_r7l_r7h_r8l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 134: 0e58054008300000     iadd             r118l, 5, r2l.cache
 13c: 0e48064008300000     iadd             r114l, 6, r2l.cache
 144: 0e38074008300000     iadd             r110l, 7, r2l.cache
 14c: 0e28084008300000     iadd             r106l, 8, r2l.cache
 154: 0e18094008300000     iadd             r102l, 9, r2l.cache
 15c: 0e080a4008300000     iadd             r98l, 10, r2l.cache
 164: 0e780b4008200000     iadd             r94l, 11, r2l.cache
 16c: 0e680c4008200000     iadd             r90l, 12, r2l.cache
 174: 0e580d4008200000     iadd             r86l, 13, r2l.cache
 17c: 0e480e4008200000     iadd             r82l, 14, r2l.cache
 184: 0e380f4008200000     iadd             r78l, 15, r2l.cache
 18c: 0e28104008200000     iadd             r74l, 16, r2l.cache
 194: 0e18114008200000     iadd             r70l, 17, r2l.cache
 19c: 0e08124008200000     iadd             r66l, 18, r2l.cache
 1a4: 0e78134008100000     iadd             r62l, 19, r2l.cache
 1ac: 0e68144008100000     iadd             r58l, 20, r2l.cache
 1b4: 0e58154008100000     iadd             r54l, 21, r2l.cache
 1bc: 0e48164008100000     iadd             r50l, 22, r2l.cache
 1c4: 0e38174008100000     iadd             r46l, 23, r2l.cache
 1cc: 0e28184008100000     iadd             r42l, 24, r2l.cache
 1d4: 0e18194008100000     iadd             r38l, 25, r2l.cache
 1dc: 0e081a4008100000     iadd             r34l, 26, r2l.cache
 1e4: 0e781b4008000000     iadd             r30l, 27, r2l.cache
 1ec: 0e681c4008000000     iadd             r26l, 28, r2l.cache
 1f4: 0e581d4008000000     iadd             r22l, 29, r2l.cache
 1fc: 0e481e4008000000     iadd             r18l, 30, r2l.cache
 204: 0e381f4008000000     iadd             r14l, 31, r2l.cache
 20c: 0e28204008000000     iadd             r10l, 32, r2l.cache
 214: 0e18214004000000     iadd             r6l, 33, r2l
 21c: 7e5482088030         mov              r117l, r1l.cache
 222: 7e4482088030         mov              r113l, r1l.cache
 228: 7e3482088030         mov              r109l, r1l.cache
 22e: 7e2482088030         mov              r105l, r1l.cache
 234: 7e1482088030         mov              r101l, r1l.cache
 23a: 7e0482088030         mov              r97l, r1l.cache
 240: 7e7482088020         mov              r93l, r1l.cache
 246: 7e6482088020         mov              r89l, r1l.cache
 24c: 7e5482088020         mov              r85l, r1l.cache
 252: 7e4482088020         mov              r81l, r1l.cache
 258: 7e3482088020         mov              r77l, r1l.cache
 25e: 7e2482088020         mov              r73l, r1l.cache
 264: 7e1482088020         mov              r69l, r1l.cache
 26a: 7e0482088020         mov              r65l, r1l.cache
 270: 7e7482088010         mov              r61l, r1l.cache
 276: 7e6482088010         mov              r57l, r1l.cache
 27c: 7e5482088010         mov              r53l, r1l.cache
 282: 7e4482088010         mov              r49l, r1l.cache
 288: 7e3482088010         mov              r45l, r1l.cache
 28e: 7e2482088010         mov              r41l, r1l.cache
 294: 7e1482088010         mov              r37l, r1l.cache
 29a: 7e0482088010         mov              r33l, r1l.cache
 2a0: 7e7482088000         mov              r29l, r1l.cache
 2a6: 7e6482088000         mov              r25l, r1l.cache
 2ac: 7e5482088000         mov              r21l, r1l.cache
 2b2: 7e4442088000         mov              r17l, r1l
 2b8: 7e5c86088030         mov              r119l, r3l.cache
 2be: 7e4c86088030         mov              r115l, r3l.cache
 2c4: 7e3c86088030         mov              r111l, r3l.cache
 2ca: 7e2c86088030         mov              r107l, r3l.cache
 2d0: 7e1c86088030         mov              r103l, r3l.cache
 2d6: 7e0c86088030         mov              r99l, r3l.cache
 2dc: 7e7c86088020         mov              r95l, r3l.cache
 2e2: 7e6c86088020         mov              r91l, r3l.cache
 2e8: 7e5c86088020         mov              r87l, r3l.cache
 2ee: 7e4c86088020         mov              r83l, r3l.cache
 2f4: 7e3c86088020         mov              r79l, r3l.cache
 2fa: 7e2c86088020         mov              r75l, r3l.cache
 300: 7e1c86088020         mov              r71l, r3l.cache
 306: 7e0c86088020         mov              r67l, r3l.cache
 30c: 7e7c86088010         mov              r63l, r3l.cache
 312: 7e6c86088010         mov              r59l, r3l.cache
 318: 7e5c86088010         mov              r55l, r3l.cache
 31e: 7e4c86088010         mov              r51l, r3l.cache
 324: 7e3c86088010         mov              r47l, r3l.cache
 32a: 7e2c86088010         mov              r43l, r3l.cache
 330: 7e1c86088010         mov              r39l, r3l.cache
 336: 7e0c86088010         mov              r35l, r3l.cache
 33c: 7e7c86088000         mov              r31l, r3l.cache
 342: 7e6c86088000         mov              r27l, r3l.cache
 348: 7e5c86088000         mov              r23l, r3l.cache
 34e: 7e4c46088000         mov              r19l, r3l
 354: 7e5683088030         mov              r117h, r1h.cache
 35a: 7e5a83088030         mov              r118h, r1h.cache
 360: 7e5e83088030         mov              r119h, r1h.cache
 366: 7e4683088030         mov              r113h, r1h.cache
 36c: 7e4a83088030         mov              r114h, r1h.cache
 372: 7e4e83088030         mov              r115h, r1h.cache
 378: 7e3683088030         mov              r109h, r1h.cache
 37e: 7e3a83088030         mov              r110h, r1h.cache
 384: 7e3e83088030         mov              r111h, r1h.cache
 38a: 7e2683088030         mov              r105h, r1h.cache
 390: 7e2a83088030         mov              r106h, r1h.cache
 396: 7e2e83088030         mov              r107h, r1h.cache
 39c: 7e1683088030         mov              r101h, r1h.cache
 3a2: 7e1a83088030         mov              r102h, r1h.cache
 3a8: 7e1e83088030         mov              r103h, r1h.cache
 3ae: 7e0683088030         mov              r97h, r1h.cache
 3b4: 7e0a83088030         mov              r98h, r1h.cache
 3ba: 7e0e83088030         mov              r99h, r1h.cache
 3c0: 7e7683088020         mov              r93h, r1h.cache
 3c6: 7e7a83088020         mov              r94h, r1h.cache
 3cc: 7e7e83088020         mov              r95h, r1h.cache
 3d2: 7e6683088020         mov              r89h, r1h.cache
 3d8: 7e6a83088020         mov              r90h, r1h.cache
 3de: 7e6e83088020         mov              r91h, r1h.cache
 3e4: 7e5683088020         mov              r85h, r1h.cache
 3ea: 7e5a83088020         mov              r86h, r1h.cache
 3f0: 7e5e83088020         mov              r87h, r1h.cache
 3f6: 7e4683088020         mov              r81h, r1h.cache
 3fc: 7e4a83088020         mov              r82h, r1h.cache
 402: 7e4e83088020         mov              r83h, r1h.cache
 408: 7e3683088020         mov              r77h, r1h.cache
 40e: 7e3a83088020         mov              r78h, r1h.cache
 414: 7e3e83088020         mov              r79h, r1h.cache
 41a: 7e2683088020         mov              r73h, r1h.cache
 420: 7e2a83088020         mov              r74h, r1h.cache
 426: 7e2e83088020         mov              r75h, r1h.cache
 42c: 7e1683088020         mov              r69h, r1h.cache
 432: 7e1a83088020         mov              r70h, r1h.cache
 438: 7e1e83088020         mov              r71h, r1h.cache
 43e: 7e0683088020         mov              r65h, r1h.cache
 444: 7e0a83088020         mov              r66h, r1h.cache
 44a: 7e0e83088020         mov              r67h, r1h.cache
 450: 7e7683088010         mov              r61h, r1h.cache
 456: 7e7a83088010         mov              r62h, r1h.cache
 45c: 7e7e83088010         mov              r63h, r1h.cache
 462: 7e6683088010         mov              r57h, r1h.cache
 468: 7e6a83088010         mov              r58h, r1h.cache
 46e: 7e6e83088010         mov              r59h, r1h.cache
 474: 7e5683088010         mov              r53h, r1h.cache
 47a: 7e5a83088010         mov              r54h, r1h.cache
 480: 7e5e83088010         mov              r55h, r1h.cache
 486: 7e4683088010         mov              r49h, r1h.cache
 48c: 7e4a83088010         mov              r50h, r1h.cache
 492: 7e4e83088010         mov              r51h, r1h.cache
 498: 7e3683088010         mov              r45h, r1h.cache
 49e: 7e3a83088010         mov              r46h, r1h.cache
 4a4: 7e3e83088010         mov              r47h, r1h.cache
 4aa: 7e2683088010         mov              r41h, r1h.cache
 4b0: 7e2a83088010         mov              r42h, r1h.cache
 4b6: 7e2e83088010         mov              r43h, r1h.cache
 4bc: 7e1683088010         mov              r37h, r1h.cache
 4c2: 7e1a83088010         mov              r38h, r1h.cache
 4c8: 7e1e83088010         mov              r39h, r1h.cache
 4ce: 7e0683088010         mov              r33h, r1h.cache
 4d4: 7e0a83088010         mov              r34h, r1h.cache
 4da: 7e0e83088010         mov              r35h, r1h.cache
 4e0: 7e7683088000         mov              r29h, r1h.cache
 4e6: 7e7a83088000         mov              r30h, r1h.cache
 4ec: 7e7e83088000         mov              r31h, r1h.cache
 4f2: 7e6683088000         mov              r25h, r1h.cache
 4f8: 7e6a83088000         mov              r26h, r1h.cache
 4fe: 7e6e83088000         mov              r27h, r1h.cache
 504: 7e5683088000         mov              r21h, r1h.cache
 50a: 7e5a83088000         mov              r22h, r1h.cache
 510: 7e5e83088000         mov              r23h, r1h.cache
 516: 7e4643088000         mov              r17h, r1h
 51c: 7e6088088030         mov              r120l, r4l.cache
 522: 7e5088088030         mov              r116l, r4l.cache
 528: 7e4088088030         mov              r112l, r4l.cache
 52e: 7e3088088030         mov              r108l, r4l.cache
 534: 7e2088088030         mov              r104l, r4l.cache
 53a: 7e1088088030         mov              r100l, r4l.cache
 540: 7e0088088030         mov              r96l, r4l.cache
 546: 7e7088088020         mov              r92l, r4l.cache
 54c: 7e6088088020         mov              r88l, r4l.cache
 552: 7e5088088020         mov              r84l, r4l.cache
 558: 7e4088088020         mov              r80l, r4l.cache
 55e: 7e3088088020         mov              r76l, r4l.cache
 564: 7e2088088020         mov              r72l, r4l.cache
 56a: 7e1088088020         mov              r68l, r4l.cache
 570: 7e0088088020         mov              r64l, r4l.cache
 576: 7e7088088010         mov              r60l, r4l.cache
 57c: 7e6088088010         mov              r56l, r4l.cache
 582: 7e5088088010         mov              r52l, r4l.cache
 588: 7e4088088010         mov              r48l, r4l.cache
 58e: 7e3088088010         mov              r44l, r4l.cache
 594: 7e2088088010         mov              r40l, r4l.cache
 59a: 7e1088088010         mov              r36l, r4l.cache
 5a0: 7e0088088010         mov              r32l, r4l.cache
 5a6: 7e7088088000         mov              r28l, r4l.cache
 5ac: 7e6088088000         mov              r24l, r4l.cache
 5b2: 7e5048088000         mov              r20l, r4l
 5b8: b180aa80004f200c0100 image_write_block r0l, r117l_r117h_r118l_r118h_r119l_r119h_r120l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5c2: b180a280004f200c0100 image_write_block r0l, r113l_r113h_r114l_r114h_r115l_r115h_r116l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5cc: b1809a80004f200c0100 image_write_block r0l, r109l_r109h_r110l_r110h_r111l_r111h_r112l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5d6: b1809280004f200c0100 image_write_block r0l, r105l_r105h_r106l_r106h_r107l_r107h_r108l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5e0: b1808a80004f200c0100 image_write_block r0l, r101l_r101h_r102l_r102h_r103l_r103h_r104l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5ea: b1808280004f200c0100 image_write_block r0l, r97l_r97h_r98l_r98h_r99l_r99h_r100l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5f4: b180ba80004f20080100 image_write_block r0l, r93l_r93h_r94l_r94h_r95l_r95h_r96l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 5fe: b180b280004f20080100 image_write_block r0l, r89l_r89h_r90l_r90h_r91l_r91h_r92l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 608: b180aa80004f20080100 image_write_block r0l, r85l_r85h_r86l_r86h_r87l_r87h_r88l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 612: b180a280004f20080100 image_write_block r0l, r81l_r81h_r82l_r82h_r83l_r83h_r84l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 61c: b1809a80004f20080100 image_write_block r0l, r77l_r77h_r78l_r78h_r79l_r79h_r80l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 626: b1809280004f20080100 image_write_block r0l, r73l_r73h_r74l_r74h_r75l_r75h_r76l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 630: b1808a80004f20080100 image_write_block r0l, r69l_r69h_r70l_r70h_r71l_r71h_r72l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 63a: b1808280004f20080100 image_write_block r0l, r65l_r65h_r66l_r66h_r67l_r67h_r68l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 644: b180ba80004f20040100 image_write_block r0l, r61l_r61h_r62l_r62h_r63l_r63h_r64l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 64e: b180b280004f20040100 image_write_block r0l, r57l_r57h_r58l_r58h_r59l_r59h_r60l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 658: b180aa80004f20040100 image_write_block r0l, r53l_r53h_r54l_r54h_r55l_r55h_r56l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 662: b180a280004f20040100 image_write_block r0l, r49l_r49h_r50l_r50h_r51l_r51h_r52l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 66c: b1809a80004f20040100 image_write_block r0l, r45l_r45h_r46l_r46h_r47l_r47h_r48l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 676: b1809280004f20040100 image_write_block r0l, r41l_r41h_r42l_r42h_r43l_r43h_r44l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 680: b1808a80004f20040100 image_write_block r0l, r37l_r37h_r38l_r38h_r39l_r39h_r40l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 68a: b1808280004f20040100 image_write_block r0l, r33l_r33h_r34l_r34h_r35l_r35h_r36l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 694: b180ba80004f20000100 image_write_block r0l, r29l_r29h_r30l_r30h_r31l_r31h_r32l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 69e: b180b280004f20000100 image_write_block r0l, r25l_r25h_r26l_r26h_r27l_r27h_r28l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 6a8: b180aa80004f20000100 image_write_block r0l, r21l_r21h_r22l_r22h_r23l_r23h_r24l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 6b2: b180a280004f20000100 image_write_block r0l, r17l_r17h_r18l_r18h_r19l_r19h_r20l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 6bc: b1809a80004f20000100 image_write_block r0l, r13l_r13h_r14l_r14h_r15l_r15h_r16l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 6c6: b1809280004f20000100 image_write_block r0l, r9l_r9h_r10l_r10h_r11l_r11h_r12l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 6d0: b1808a80004f20000100 image_write_block r0l, r5l_r5h_r6l_r6h_r7l_r7h_r8l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 6da: 0e58224008300000     iadd             r118l, 34, r2l.cache
 6e2: 0e48234008300000     iadd             r114l, 35, r2l.cache
 6ea: 0e38244008300000     iadd             r110l, 36, r2l.cache
 6f2: 0e28254008300000     iadd             r106l, 37, r2l.cache
 6fa: 0e18264008300000     iadd             r102l, 38, r2l.cache
 702: 0e08274008300000     iadd             r98l, 39, r2l.cache
 70a: 0e78284008200000     iadd             r94l, 40, r2l.cache
 712: 0e68294008200000     iadd             r90l, 41, r2l.cache
 71a: 0e582a4008200000     iadd             r86l, 42, r2l.cache
 722: 0e482b4008200000     iadd             r82l, 43, r2l.cache
 72a: 0e382c4008200000     iadd             r78l, 44, r2l.cache
 732: 0e282d4008200000     iadd             r74l, 45, r2l.cache
 73a: 0e182e4008200000     iadd             r70l, 46, r2l.cache
 742: 0e082f4008200000     iadd             r66l, 47, r2l.cache
 74a: 0e78304008100000     iadd             r62l, 48, r2l.cache
 752: 0e68314008100000     iadd             r58l, 49, r2l.cache
 75a: 0e58324008100000     iadd             r54l, 50, r2l.cache
 762: 0e48334008100000     iadd             r50l, 51, r2l.cache
 76a: 0e38344008100000     iadd             r46l, 52, r2l.cache
 772: 0e28354008100000     iadd             r42l, 53, r2l.cache
 77a: 0e18364008100000     iadd             r38l, 54, r2l.cache
 782: 0e08374008100000     iadd             r34l, 55, r2l.cache
 78a: 0e78384008000000     iadd             r30l, 56, r2l.cache
 792: 0e68394008000000     iadd             r26l, 57, r2l.cache
 79a: 0e583a4008000000     iadd             r22l, 58, r2l.cache
 7a2: 0e483b4008000000     iadd             r18l, 59, r2l.cache
 7aa: 0e383c4008000000     iadd             r14l, 60, r2l.cache
 7b2: 0e283d4008000000     iadd             r10l, 61, r2l.cache
 7ba: 0e183e4008000000     iadd             r6l, 62, r2l.cache
 7c2: 0e083f400c000000     iadd             r2l, 63, r2l.discard
 7ca: b180aa80004f200c0100 image_write_block r0l, r117l_r117h_r118l_r118h_r119l_r119h_r120l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 7d4: b180a280004f200c0100 image_write_block r0l, r113l_r113h_r114l_r114h_r115l_r115h_r116l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 7de: b1809a80004f200c0100 image_write_block r0l, r109l_r109h_r110l_r110h_r111l_r111h_r112l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 7e8: b1809280004f200c0100 image_write_block r0l, r105l_r105h_r106l_r106h_r107l_r107h_r108l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 7f2: b1808a80004f200c0100 image_write_block r0l, r101l_r101h_r102l_r102h_r103l_r103h_r104l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 7fc: b1808280004f200c0100 image_write_block r0l, r97l_r97h_r98l_r98h_r99l_r99h_r100l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 806: b180ba80004f20080100 image_write_block r0l, r93l_r93h_r94l_r94h_r95l_r95h_r96l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 810: b180b280004f20080100 image_write_block r0l, r89l_r89h_r90l_r90h_r91l_r91h_r92l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 81a: b180aa80004f20080100 image_write_block r0l, r85l_r85h_r86l_r86h_r87l_r87h_r88l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 824: b180a280004f20080100 image_write_block r0l, r81l_r81h_r82l_r82h_r83l_r83h_r84l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 82e: b1809a80004f20080100 image_write_block r0l, r77l_r77h_r78l_r78h_r79l_r79h_r80l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 838: b1809280004f20080100 image_write_block r0l, r73l_r73h_r74l_r74h_r75l_r75h_r76l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 842: b1808a80004f20080100 image_write_block r0l, r69l_r69h_r70l_r70h_r71l_r71h_r72l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 84c: b1808280004f20080100 image_write_block r0l, r65l_r65h_r66l_r66h_r67l_r67h_r68l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 856: b180ba80004f20040100 image_write_block r0l, r61l_r61h_r62l_r62h_r63l_r63h_r64l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 860: b180b280004f20040100 image_write_block r0l, r57l_r57h_r58l_r58h_r59l_r59h_r60l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 86a: b180aa80004f20040100 image_write_block r0l, r53l_r53h_r54l_r54h_r55l_r55h_r56l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 874: b180a280004f20040100 image_write_block r0l, r49l_r49h_r50l_r50h_r51l_r51h_r52l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 87e: b1809a80004f20040100 image_write_block r0l, r45l_r45h_r46l_r46h_r47l_r47h_r48l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 888: b1809280004f20040100 image_write_block r0l, r41l_r41h_r42l_r42h_r43l_r43h_r44l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 892: b1808a80004f20040100 image_write_block r0l, r37l_r37h_r38l_r38h_r39l_r39h_r40l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 89c: b1808280004f20040100 image_write_block r0l, r33l_r33h_r34l_r34h_r35l_r35h_r36l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8a6: b180ba80004f20000100 image_write_block r0l, r29l_r29h_r30l_r30h_r31l_r31h_r32l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8b0: b180b280004f20000100 image_write_block r0l, r25l_r25h_r26l_r26h_r27l_r27h_r28l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8ba: b180aa80004f20000100 image_write_block r0l, r21l_r21h_r22l_r22h_r23l_r23h_r24l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8c4: b180a280004f20000100 image_write_block r0l, r17l_r17h_r18l_r18h_r19l_r19h_r20l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8ce: b1809a80004f20000100 image_write_block r0l, r13l_r13h_r14l_r14h_r15l_r15h_r16l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8d8: b1809280004f20000100 image_write_block r0l, r9l_r9h_r10l_r10h_r11l_r11h_r12l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8e2: b1808a80004f20000100 image_write_block r0l, r5l_r5h_r6l_r6h_r7l_r7h_r8l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8ec: b1808280004f20000100 image_write_block r0l, r1l_r1h_r2l_r2h_r3l_r3h_r4l, 1, 1, ts0, tex_cube_array, 9, i32, 0
 8f6: 520e00000000         pop_exec         r0l, 1
 8fc: 8800                 stop             

Want to make sure these match up with what you're seeing in end-of-tile programs?

@TellowKrinkle
Copy link
Contributor

TellowKrinkle commented May 19, 2023

Metal's headers pass __METAL_TEXTURE_WRITE_ROUNDING_MODE__ to their internal imageblock write functions, and indeed -ftexture-write-rounding-mode=rte flips bit 53. 0 => rte, 1 => rtz. Hijacking their builtin to pass values other than __METAL_TEXTURE_WRITE_ROUNDING_MODE__ results in rtz for all non-1 inputs (yes, the parameter passed to the builtin is the opposite of the one in the instruction).

They also have a lod input (not sure what it's for on a non-ms texture2d Edit I'm dumb, ms != mipmap), which seems to correspond a register defined by bits 24:29, 60:61 with bit 31 flipping lod between a 16-bit register (off) and an immediate (on). Weirdly immediates 256-511 overflow into bit 30, but that might just be them not expecting such large values, as 512 overflows to an instruction identical to 0.

Fun with the lod parameter
using namespace metal;

struct Test {
	float4 a;
};

kernel void test(metal::imageblock<Test> f, texture2d<float, access::write> tex, ushort2 lid [[thread_position_in_threadgroup]], ushort2 gid [[thread_position_in_grid]], constant uint* fun) {
	threadgroup_imageblock Test* t = f.data(lid);
	t->a = float4(0, 1, 2, 3);
	threadgroup_barrier(mem_flags::mem_threadgroup_imageblock);
	if (all(lid == 0)) {
		for (int i = 0; i < 64; i++) {
			imageblock_slice<float4, imageblock_layout_explicit> slice = f.slice(t->a);
			tex.write(slice, gid, i * 16);
		}
	}
}

(I didn't wire up bit 30, so enjoy some slightly broken decompilation)

compute shader:
   0: f20a3100             get_sr           r2h.cache, sr49 (thread_position_in_threadgroup.y)
   4: f2083000             get_sr           r2l.cache, sr48 (thread_position_in_threadgroup.x)
   8: 7e0445088000         mov              r1l, r2h
   e: 7e0644088000         mov              r1h, r2l
  14: 621500000000         mov_imm          r5, 0
  1a: 62190000803f         mov_imm          r6, 1065353216
  20: 621d00000040         mov_imm          r7, 1073741824
  26: 622100004040         mov_imm          r8, 1077936128
  2c: 09150402fc048000     st_tile          r5_r6_r7_r8, i32, 1, xyzw, 64, 1, 4
  34: 6800                 threadgroup_barrier 
  36: e2000000             mov_imm          r0l.cache, 0
  3a: 5288c2020000         if_icmp          r0l, seq, r1.discard, 0, 1
  40: 62060000             mov_imm          r1h, 0
  44: 620a0000             mov_imm          r2h, 0
  48: 72041004             get_sr           r1l, sr80 (thread_position_in_grid.x)
  4c: 72081104             get_sr           r2l, sr81 (thread_position_in_grid.y)
  50: 20c086020000         jmp_exec_none    0x2D6
  56: b1808280004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0
  60: b1808290004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0
  6a: b18082a0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0
  74: b18082b0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0
  7e: b1808280004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0
  88: b1808290004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0
  92: b18082a0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0
  9c: b18082b0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0
  a6: b1808280004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0
  b0: b1808290004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0
  ba: b18082a0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0
  c4: b18082b0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0
  ce: b1808280004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0
  d8: b1808290004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0
  e2: b18082a0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0
  ec: b18082b0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0
  f6: b18082c0004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0
 100: b18082d0004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0
 10a: b18082e0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0
 114: b18082f0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0
 11e: b18082c0004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0
 128: b18082d0004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0
 132: b18082e0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0
 13c: b18082f0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0
 146: b18082c0004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0
 150: b18082d0004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0
 15a: b18082e0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0
 164: b18082f0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0
 16e: b18082c0004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0
 178: b18082d0004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0
 182: b18082e0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0
 18c: b18082f0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0
 196: b1808280004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1a0: b1808290004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1aa: b18082a0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1b4: b18082b0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1be: b1808280004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1c8: b1808290004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1d2: b18082a0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1dc: b18082b0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1e6: b1808280004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1f0: b1808290004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0
 1fa: b18082a0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0
 204: b18082b0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0
 20e: b1808280004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0
 218: b1808290004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0
 222: b18082a0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0
 22c: b18082b0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0
 236: b18082c0004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0
 240: b18082d0004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0
 24a: b18082e0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0
 254: b18082f0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0
 25e: b18082c0004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0
 268: b18082d0004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0
 272: b18082e0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0
 27c: b18082f0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0
 286: b18082c0004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0
 290: b18082d0004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0
 29a: b18082e0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0
 2a4: b18082f0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0
 2ae: b18082c0004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0
 2b8: b18082d0004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0
 2c2: b18082e0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0
 2cc: b18082f0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0
 2d6: 520e00000000         pop_exec         r0l, 1
 2dc: 8800                 stop             

@alyssarosenzweig
Copy link
Contributor Author

They also have a lod input (not sure what it's for on a non-ms texture2d Edit I'm dumb, ms != mipmap), which seems to correspond a register defined by bits 24:29, 60:61 with bit 31 flipping lod between a 16-bit register (off) and an immediate (on).

This matches regular image_write https://patch-diff.githubusercontent.com/raw/dougallj/applegpu/pull/26.patch ... they're very closely related instructions and execute on the same hw block so it makes sense.

@alyssarosenzweig
Copy link
Contributor Author

Metal's headers pass METAL_TEXTURE_WRITE_ROUNDING_MODE to their internal imageblock write functions, and indeed -ftexture-write-rounding-mode=rte flips bit 53. 0 => rte, 1 => rtz. Hijacking their builtin to pass values other than METAL_TEXTURE_WRITE_ROUNDING_MODE results in rtz for all non-1 inputs (yes, the parameter passed to the builtin is the opposite of the one in the instruction).

Also consistent with regular image write

This instruction ("TODO.unkB1") is used to write out an entire block
from local memory into an image. Because it is block based and not pixel
based, in comparison to the regular image write instruction it works
even if the destination image is compressed. It is tailor fit for use in
the end-of-tile program, to blit tile memory to the framebuffer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
@TellowKrinkle
Copy link
Contributor

BTW you might want to change the class name from UnkB1InstructionDesc to something more known-sounding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants