

# Introduction to the XC30

For Users of Cray's XT5 and XK7

Aaron Vose



## XT5/XK7 to XC30 Changes: Overview

- Hardware changes:
  - CPU:
    - Move from AMD Istanbul/Interlagos to Intel SNB/IVB
    - AMD's CMT versus Intel's HyperThreading
  - Interconnect:
    - XT5's SeaStar2+ or XE/XK7's Gemini -> XC30's Aries
- Software and environment is very similar -- some changes



## XT5/XK7 AMD CPU Overview

AMD Istanbul



#### AMD Interlagos



Figure 1: Bulldozer 2-Core Processor Module Architecture

(images: amd.com)



### **XC30 Intel CPU Overview**

• Intel Sandy Bridge



(image: intel.com)



# **AMD and Intel CPU Quick Comparison**

|              | AMD<br>Istanbul | AMD<br>Interlagos | Intel<br>SandyBridge |
|--------------|-----------------|-------------------|----------------------|
| System       | Kraken (XT5)    | Titan (XK7)       | Eos (XC30)           |
| Socket/Node  | 2               | 1                 | 2                    |
| FPU/Socket   | 6               | 8                 | 8                    |
| INT/Socket   | 6               | 16                | 8                    |
| Threads/Core | 1               | 1                 | 1 or 2               |
| DP FP/FPU    | 4 /clock        | 8 /clock          | 8 /clock             |



#### Intel's HT versus AMD's CMT

- AMD CPU module contains 2 integer cores sharing an FPU:
  - Shared FPU (default) by running with 16 threads/CPU
  - Dedicated FPU per thread by running 8 threads/CPU
- Intel CPU can run 2 threads per core with "HyperThreading":
  - aprun -j 2 -> Two ranks per core ("DualStream")
  - aprun -j 1 -> One rank per core ("SingleStream")
    (default: -j 1)



### **XC30 Aries Interconnect: Nodes**





### XC30 Aries Interconnect: Rank 1 & 2 Networks





#### **XC30 Aries Interconnect: Rank 3 Network**



- Each two-cabinet group is connected to all others optically
- Similar to other levels, packets travel via intermediate hops



## **XC30 Aries Interconnect: User Perspective**

- Switch from 3D Torus to dragonfly network
- User Highlights:
  - Adaptive routing avoids network hotspots
  - Node placement not as important
  - Greatly increased global bandwidth
- Much more detailed Aries / Cascade presentation:
  - http://www.youtube.com/watch?v=XEdrIpeXQnw
    (by Cray's Nathan Wichmann)



## **Software Changes: ACML -> MKL**

- AMD Core Math Library is gone; replaced with:
- Intel's Math Kernel Library
  - Update code:
    - call vrda\_exp(VL,RF(1,1),RF(1,1)) (ACML)
    - call vdexp(VL,RF(1,1),RF(1,1)) (MKL)
  - CCE / GNU / PGI: module load intel and link with:
    - -L\$(MKLROOT)/lib/intel64/ -lmkl\_intel\_lp64
      -lmkl\_sequential -lmkl\_core



## **Software Changes: Intel Compiler**

- Intel Compiler: module load PrgEnv-intel
  - ftn/cc/CC commands wrap ifort/icc/icpc
  - man ifort / man icc / man icpc
- Useful Flags:
  - -openmp (Enables OpenMP)
  - -xAVX (Enables AVX)
  - -mkl (Enables MKL)
  - -Qvec\_report1 (Report successfully vectorized code)



# XT5/XK7 to XC30 Changes: Summary

- Hardware:
  - Move to Intel SandyBridge / IvyBridge
  - Intel HyperThreading
  - Aries Interconnect -- Dragonfly Topology
- Software and environment:
  - Intel MKL
  - Intel Compiler