# 参考資料

## 計算の重さ

- 演算量
  - 一般的には浮動小数点演算(四則演算)を意味する
  - 四則演算の処理負荷(ハードウエア処理サイクル)は、演算の 種類で異なる
  - 同じ計算式を実行する場合でも、ハードウエア(CPU)、ソフトウエア(コンパイラ)の選択やオプションにより演算量が異なる
    - 例えばFX10のあるオプション評価では...

+, -, x : 1 flop

÷ : 8 flops(単精度), 13 flops(倍精度)

abs() : 1 flops

- ユーザ申告モードではユーザ独自に計算の「重さ」を考慮した 計算量を申告することも可能
  - 四則演算以外の複雑な演算を一つの計算単位として計上

#### **HWPC**

- プロセッサ固有のハードウエアパフォーマンスカウンタ (HWPC)について
- 京のHWPCとPAPIインタフェイス
- Intel XeonのHWPCとPAPIインタフェイス
- PAPI 高水準インタフェイスと低水準インタフェイス

### ハードウエアカウンタについて

#### • 京•FX10 preset event

#### Available events and hardware information.

-----

Vendor string and code : Sun (7)

Model string and code : Fujitsu SPARC64 IXfx (141)

CPU Revision : 0.000000

CPU Megahertz : 1650.000000

CPU Clock Megahertz : 1650 CPU's in this Node : 16

Nodes in this System : 1 Total CPU's : 16

Number Hardware Counters: 8 Max Multiplex Counters: 512 Name Code Deriv Description (Note)

PAPI\_L1\_DCM 0x80000000 No Level 1 data cache misses

PAPI\_L1\_ICM 0x80000001 No Level 1 instruction cache misses

PAPI L1 TCM 0x80000006 Yes Level 1 cache misses

PAPI\_L2\_TCM 0x80000007 Yes Level 2 cache misses

PAPI\_CA\_INV 0x8000000c No Requests for cache line invalidation

PAPI\_CA\_ITV 0x8000000d No Requests for cache line intervention

PAPI\_TLB\_DM 0x80000014 No Data translation lookaside buffer misses

PAPI\_TLB\_IM 0x80000015 No Instruction translation lookaside buffer misses

PAPI TLB TL 0x80000016 Yes Total translation lookaside buffer misses

PAPI\_MEM\_SCY 0x80000022 No Cycles Stalled Waiting for memory accesses

PAPI STL ICY 0x80000025 No Cycles with no instruction issue

PAPI\_FUL\_ICY 0x80000026 No Cycles with maximum instruction issue

PAPI\_STL\_CCY 0x80000027 Yes Cycles with no instructions completed

PAPI\_FUL\_CCY 0x80000028 Yes Cycles with maximum instructions completed

PAPI\_HW\_INT 0x80000029 No Hardware interrupts

PAPI\_BR\_MSP 0x8000002e No Conditional branch instructions mispredicted

PAPI BR PRC 0x8000002f Yes Conditional branch instructions correctly predicted

PAPI\_FMA\_INS 0x80000030 Yes FMA instructions completed

PAPI TOT IIS 0x80000031 Yes Instructions issued

PAPI\_TOT\_INS 0x80000032 No Instructions completed

PAPI\_FP\_INS 0x80000034 Yes Floating point instructions

PAPI\_LD\_INS 0x80000035 Yes Load instructions

PAPI\_SR\_INS 0x80000036 Yes Store instructions

PAPI\_BR\_INS 0x80000037 No Branch instructions

PAPI\_VEC\_INS 0x80000038 Yes Vector/SIMD instructions

PAPI\_TOT\_CYC 0x8000003b No Total cycles

PAPI\_LST\_INS 0x8000003c No Load/store instructions completed

PAPI\_L2\_TCH 0x80000056 Yes Level 2 total cache hits

PAPI\_L2\_TCA 0x80000059 Yes Level 2 total cache accesses

PAPI\_FP\_OPS 0x80000066 Yes Floating point operations

### ハードウエアカウンタについて

#### Intel Xeon E5 preset event

Hdw Threads per core : 1
Cores per Socket : 8
Sockets : 2
NUMA Nodes : 2
CPUs per Node : 8
Total CPUs : 16
Running in a VM : no
Number Hardware Counters : 11
Max Multiplex Counters : 32

Code Deriv Description (Note) Name PAPI L1 DCM 0x80000000 No Level 1 data cache misses PAPI L1 ICM 0x80000001 No Level 1 instruction cache misses PAPI L2 DCM 0x80000002 Yes Level 2 data cache misses PAPI L2 ICM 0x80000003 No Level 2 instruction cache misses PAPI L1 TCM 0x80000006 Yes Level 1 cache misses PAPI L2 TCM 0x80000007 No Level 2 cache misses PAPI L3 TCM 0x80000008 No Level 3 cache misses PAPI TLB DM 0x80000014 Yes Data translation lookaside buffer misses PAPI TLB IM 0x80000015 No Instruction TLBmisses PAPI L1 LDM 0x80000017 No Level 1 load misses PAPI L1 STM 0x80000018 No Level 1 store misses PAPI L2 STM 0x8000001a No Level 2 store misses PAPI STL ICY 0x80000025 No Cycles with no instruction issue PAPI BR UCN 0x8000002a Yes Unconditional branch instructions PAPI BR CN 0x8000002b No Conditional branch instructions PAPI BR TKN 0x8000002c Yes Conditional branch taken PAPI BR NTK 0x8000002d No Conditional branch not taken PAPI BR MSP 0x8000002e No Conditional branch mispredicted PAPI BR PRC 0x8000002f Yes Conditional branch correctly predicted PAPI TOT INS 0x80000032 No Instructions completed

PAPI FP INS 0x80000034 Yes Floating point instructions PAPI LD INS 0x80000035 No Load instructions PAPI SR INS 0x80000036 No Store instructions PAPI BR INS 0x80000037 No Branch instructions PAPI TOT CYC 0x8000003b No Total cycles PAPI L2 DCH 0x8000003f Yes Level 2 data cache hits PAPI\_L2\_DCA 0x80000041 No Level 2 data cache accesses PAPI L3 DCA 0x80000042 Yes Level 3 data cache accesses PAPI L2 DCR 0x80000044 No Level 2 data cache reads PAPI L3 DCR 0x80000045 No Level 3 data cache reads PAPI L2 DCW 0x80000047 No Level 2 data cache writes PAPI L3 DCW 0x80000048 No Level 3 data cache writes PAPI L2 ICH 0x8000004a No Level 2 instruction cache hits PAPI L2 ICA 0x8000004d No Level 2 instruction cache accesses PAPI L3 ICA 0x8000004e No Level 3 instruction cache accesses PAPI L2 ICR 0x80000050 No Level 2 instruction cache reads PAPI L3 ICR 0x80000051 No Level 3 instruction cache reads PAPI L2 TCA 0x80000059 Yes Level 2 total cache accesses PAPI L3 TCA 0x8000005a No Level 3 total cache accesses PAPI L2 TCR 0x8000005c Yes Level 2 total cache reads PAPI L3 TCR 0x8000005d Yes Level 3 total cache reads PAPI L2 TCW 0x8000005f No Level 2 total cache writes PAPI L3 TCW 0x80000060 No Level 3 total cache writes PAPI FDV INS 0x80000063 No Floating point divide instructions PAPI FP OPS 0x80000066 Yes Floating point operations PAPI SP OPS 0x80000067 Yes Floating point operations; optimized to count scaled single precision vector operations PAPI\_DP\_OPS 0x80000068 Yes Floating point operations; optimized to count scaled double precision vector operations PAPI VEC SP 0x80000069 Yes Single precision vector/SIMD instructions PAPI VEC DP 0x8000006a Yes Double precision vector/SIMD instructions PAPI REF CYC 0x8000006b No Reference clock cycles

## ハードウエアカウンタ Xeon E5 preset とnative

PAPI FP OPS Intel Xeon E5ではPAPI FP OPSとPAPI FP INSは同じ内容を表示 Event name: Event Code: 0x80000066 Number of Native Events: IFP instructions Short Description: Long Description: |Floating point instructions| Developer's Notes: Derived Type: IDERIVED ADDI Postfix Processing String: Native Code[0]: 0x4000001c | FP COMP OPS EXE:SSE SCALAR DOUBLE Native Event Description: | Counts number of floating point events, masks: Number of SSE double precision FP scalar uops executed | Native Code[1]: 0x4000001d | FP COMP OPS EXE:SSE FP SCALAR SINGLE| Native Event Description: | Counts number of floating point events, masks: Number of SSE single precision FP scalar upps executed \$ papi avail -e PAPI DP OPS Event name: PAPI DP OPS Event Code: 0x80000068 Number of Native Events: Short Description: IDP operations Long Description: Native Code[0]: 0x4000001c | FP COMP OPS EXE:SSE SCALAR DOUBLE| Native Event Description: | Counts number of floating point events, masks: Number of SSE double precision FP scalar uops executed | Native Code[1]: 0x40000020 | FP COMP OPS EXE:SSE FP PACKED DOUBLE| Native Event Description: |Counts number of floating point events, masks: Number of SSE double precision FP packed uops executed Native Code[2]: 0x40000021 | SIMD FP 256:PACKED DOUBLE Native Event Description: |Counts 256-bit packed floating point instructions, masks:Counts 256-bit packed double-precision| \$ papi\_avail -e PAPI VEC DP Event name: PAPI VEC DP Event Code: 0x8000006a Number of Native Events: Short Description: IDP Vector/SIMD instrl Long Description: |Double precision vector/SIMD instructions Native Code[0]: 0x40000020 | FP COMP OPS EXE:SSE FP PACKED DOUBLE

Native Code[1]: 0x40000021 |SIMD\_FP\_256:PACKED\_DOUBLE

## ハードウエアカウンタ SPARC64 VIIIfx preset とnative

\$ papi\_avail -e PAPI\_FP\_OPS

Event name: PAPI\_FP\_OPS

Number of Native Events:

Short Description: | FP operations |

Long Description: |Floating point operations|

Derived Type: | DERIVED\_POSTFIX|

Native Code[0]: 0x40000010 |FLOATING\_INSTRUCTIONS|

Native Event Description: | Counts the number of committed floating-point operation instructions. |

Native Code[1]: 0x40000011 | FMA INSTRUCTIONS|

Native Event Description: | Counts the number of committed floating-point Multiply-and-Add operation instructions. |

Native Code[2]: 0x40000008 | SIMD FLOATING INSTRUCTIONS

Native Event Description: | Counts the number of committed floating-point SIMD instructions of one operation in SIMD. |

Native Code[3]: 0x40000009 | SIMD FMA INSTRUCTIONS|

Native Event Description: | Counts the number of committed floating-point SIMD instructions of two operation in SIMD. |

## 「京」コンピュータプロセッサ・ノード

CPU: SPARC64 VIIIfxプロセッサ SPARC64™VIIIfx



#### ■ Specification

- 8 Cores
- 6 MB Shared L2 Cache
- FMA×4 (2 SIMD)/core
- 256 (64bit) DP Reg. /core
- 2GHz

#### ■ Peak Performance

- FP Performance 128GFlop/s
- Memory Bandwidth 64GB/s

#### ■ Power Consumption

■ 58W (LINPACK Max)

Node: SPARC64 VIIIfx x 1 4 nodes/ board



### Intel Xeon E5プロセッサ・ノード



node: E5-2670 x 2, QPI, PCIe, ...

