Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native code reflection. #96

Merged
merged 7 commits into from
Apr 12, 2023
Merged

Add native code reflection. #96

merged 7 commits into from
Apr 12, 2023

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Feb 17, 2023

Using the disassembler from https://github.com/dougallj/applegpu/:

julia> Metal.code_agx(identity, Tuple{Nothing})
   0: 8800                 stop

Or for something more realistic:

julia> include("examples/vadd.jl")

julia> @device_code_agx @metal threads=len vadd(d_a, d_b, d_c)
; CompilerJob of kernel #vadd(MtlDeviceMatrix{Float32, 1}, MtlDeviceMatrix{Float32, 1}, MtlDeviceMatrix{Float32, 1}) for GPUCompiler.MetalCompilerTarget

   0: 0529100d00c4f200     device_load      0, i32, xyzw, r5_r6_r7_r8, u0_u1, 1, signed, lsl 1
   8: 3800                 wait             0
   a: 9e0b8ea228000000     imadd            r2_r3.cache, r7.cache, r5.cache, 0
  12: 9e0dcec22cc60200     imadd            r3.cache, r7.discard, r6.discard, r3.discard
  1a: f2051004             get_sr           r1.cache, sr80 (thread_position_in_grid.x)
  1e: 9e15d0a22cc60200     imadd            r5.cache, r8.discard, r5.discard, r3.discard
  26: 8e0501202c000000     iadd             r1.cache, 1, r1.discard
  2e: 920c840200010150     icmpsel          ugt, r3l.cache, r2.cache, 0, 1, 0
  36: 92028a02000101d0     icmpsel          sgt, r0h.cache, r5.cache, 0, 1, 0
  3e: 92028a0200c6108c     icmpsel          seq, r0h.cache, r5.cache, 0, r3l.discard, r0h.discard
  46: 9210820200010190     icmpsel          seq, r4l.cache, r1.cache, 0, 1, 0
  4e: 920d81000000418c     icmpsel          seq, r3.cache, r0h.cache, 0, 0, r2.discard
  56: 9209c1000000a18c     icmpsel          seq, r2.cache, r0h.discard, 0, 0, r5.discard
  5e: 920cc62224010130     icmpsel          ult, r3l.cache, r3.discard, r1, 1, 0
  66: 9202840200010130     icmpsel          ult, r0h.cache, r2.cache, 0, 1, 0
  6e: 9202c40200c6108c     icmpsel          seq, r0h.cache, r2.discard, 0, r3l.discard, r0h.discard
  76: 9202c10000c81090     icmpsel          seq, r0h.cache, r0h.discard, 0, r4l.discard, 1
  7e: e2000000             mov_imm          r0l.cache, 0
  82: 5288c1000000         if_icmp          r0l, seq, r0h.discard, 0, 1
  88: 20c0e6000000         jmp_exec_none    0x16E
  8e: 0529140d00c4f200     device_load      0, i32, xyzw, r5_r6_r7_r8, u2_u3, 1, signed, lsl 1
  96: 3800                 wait             0
  98: 9e0b8ea228000000     imadd            r2_r3.cache, r7.cache, r5.cache, 0
  a0: 9e0dcec22cc60200     imadd            r3.cache, r7.discard, r6.discard, r3.discard
  a8: 9e15d0a22cc60200     imadd            r5.cache, r8.discard, r5.discard, r3.discard
  b0: 920c840200010150     icmpsel          ugt, r3l.cache, r2.cache, 0, 1, 0
  b8: 92028a02000101d0     icmpsel          sgt, r0h.cache, r5.cache, 0, 1, 0
  c0: 92028a0200c6108c     icmpsel          seq, r0h.cache, r5.cache, 0, r3l.discard, r0h.discard
  c8: 920d81000000418c     icmpsel          seq, r3.cache, r0h.cache, 0, 0, r2.discard
  d0: 9209c1000000a18c     icmpsel          seq, r2.cache, r0h.discard, 0, 0, r5.discard
  d8: 920cc62224010130     icmpsel          ult, r3l.cache, r3.discard, r1, 1, 0
  e0: 9202840200010130     icmpsel          ult, r0h.cache, r2.cache, 0, 1, 0
  e8: 9202c40200c6108c     icmpsel          seq, r0h.cache, r2.discard, 0, r3l.discard, r0h.discard
  f0: 9202c10000001190     icmpsel          seq, r0h.cache, r0h.discard, 0, 0, 1
  f8: 5288c1000000         if_icmp          r0l, seq, r0h.discard, 0, 1
  fe: 20c070000000         jmp_exec_none    0x16E
 104: 0529180d00c4f200     device_load      0, i32, xyzw, r5_r6_r7_r8, u4_u5, 1, signed, lsl 1
 10c: 3800                 wait             0
 10e: 9e0b8ea228000000     imadd            r2_r3.cache, r7.cache, r5.cache, 0
 116: 9e0dcec22cc60200     imadd            r3.cache, r7.discard, r6.discard, r3.discard
 11e: 9e15d0a22cc60200     imadd            r5.cache, r8.discard, r5.discard, r3.discard
 126: 920c840200010150     icmpsel          ugt, r3l.cache, r2.cache, 0, 1, 0
 12e: 92028a02000101d0     icmpsel          sgt, r0h.cache, r5.cache, 0, 1, 0
 136: 92028a0200c6108c     icmpsel          seq, r0h.cache, r5.cache, 0, r3l.discard, r0h.discard
 13e: 920d81000000418c     icmpsel          seq, r3.cache, r0h.cache, 0, 0, r2.discard
 146: 9209c1000000a18c     icmpsel          seq, r2.cache, r0h.discard, 0, 0, r5.discard
 14e: 9204c6222c010130     icmpsel          ult, r1l.cache, r3.discard, r1.discard, 1, 0
 156: 9202840200010130     icmpsel          ult, r0h.cache, r2.cache, 0, 1, 0
 15e: 9202c40200c2108c     icmpsel          seq, r0h.cache, r2.discard, 0, r1l.discard, r0h.discard
 166: 1202c10000001190     icmpsel          seq, r0h, r0h.discard, 0, 0, 1
 16e: 521600000000         pop_exec         r0l, 2
 174: 721d1004             get_sr           r7, sr80 (thread_position_in_grid.x)
 178: 0529000d00c43200     device_load      0, i32, xy, r5_r6, u0_u1, 0, signed, lsl 1
 180: 0509040d00c43200     device_load      0, i32, xy, r1_r2, u2_u3, 0, signed, lsl 1
 188: 3800                 wait             0
 18a: 0519ea0600c01200     device_load      0, i32, x, r3, r5_r6, r7, unsigned
 192: 0529e20600c01200     device_load      0, i32, x, r5, r1_r2, r7, unsigned
 19a: 0509080d00c43200     device_load      0, i32, xy, r1_r2, u4_u5, 0, signed, lsl 1
 1a2: 3800                 wait             0
 1a4: 2a8dc6a22c00         fadd32           r3, r3.discard, r5.discard
 1aa: 4519e20600c01200     device_store     0, i32, x, r3, r1_r2, r7, unsigned, 0
 1b2: 8800                 stop

WIP as this uses a bunch of unreleased things.

Closes #95

@maleadt
Copy link
Member Author

maleadt commented Feb 17, 2023

FWIW, with @inbounds the code is much better:

julia> Metal.code_agx(vadd, Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}})
Disassembly of constant_program:
   0: 0501000d00c43200     device_load      0, i32, xy, r0_r1, u0_u1, 0, signed, lsl 1
   8: 3800                 wait             0
   a: c500c03d00803000     uniform_store    2, i16, xy, 0, r0l_r0h, 12
  12: c508e03d00803000     uniform_store    2, i16, xy, 0, r1l_r1h, 14
  1a: 0501040d00c43200     device_load      0, i32, xy, r0_r1, u2_u3, 0, signed, lsl 1
  22: 3800                 wait             0
  24: c500003d01803000     uniform_store    2, i16, xy, 0, r0l_r0h, 16
  2c: c508203d01803000     uniform_store    2, i16, xy, 0, r1l_r1h, 18
  34: 0501080d00c43200     device_load      0, i32, xy, r0_r1, u4_u5, 0, signed, lsl 1
  3c: 3800                 wait             0
  3e: c500403d01803000     uniform_store    2, i16, xy, 0, r0l_r0h, 20
  46: c508603d01803000     uniform_store    2, i16, xy, 0, r1l_r1h, 22
  4e: 8800                 stop

Disassembly of main:
   0: 72051004             get_sr           r1, sr80 (thread_position_in_grid.x)
   4: 05012c0e00c01200     device_load      0, i32, x, r0, u6_u7, r1, unsigned
   c: 0511200e10c01200     device_load      0, i32, x, r2, u8_u9, r1, unsigned
  14: 3800                 wait             0
  16: 2a81c0422c00         fadd32           r0, r0.discard, r2.discard
  1c: 4501240e10c01200     device_store     0, i32, x, r0, u10_u11, r1, unsigned, 0
  24: 8800                 stop

@maleadt
Copy link
Member Author

maleadt commented Mar 17, 2023

As part of this, I should probably rename cod_metal to code_air.

EDIT: done

@maleadt maleadt marked this pull request as ready for review April 12, 2023 08:34
@maleadt maleadt merged commit 71d41cd into main Apr 12, 2023
@maleadt maleadt deleted the tb/device_code_agx branch April 12, 2023 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native code reflection
1 participant