# CS540 Practice Assignment 5

Dustin Ingram, Aaron Rosenfeld, Tom Wambold  $\label{eq:December 9, 2011}$ 

# 1 Tasks

- 1. Determine machine parameters (CPU type, CPU speed, CPU info such as pipeline and functional units, memory, cache info)
- $2. \ \,$  Time and instrument matrix multiplication code.
- 3. Experiment with variants of matrix multiplication.
- 4. Install ATLAS and MKL (compare to Numeric Recipes)

#### 2 Results

### 2.1 System & Kernel Information

```
\ uname -a Linux float.cs.drexel.edu 2.6.35-28- generic \#50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux
```

### 2.2 GCC Version Information

```
$ gcc --version
gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
```

### 2.3 CPU Information

```
$ cat /proc/cpuinfo
processor
                : GenuineIntel
vendor_id
                : 6
cpu family
model
                : 44
                : Intel(R) Xeon(R) CPU
                                                   L5630 @
model name
    2.13 \mathrm{GHz}
                : 2
stepping
                : 1600.000
cpu MHz
                : 12288 KB
cache size
physical id
                : 1
siblings
                : 8
core id
                : 10
                : 4
cpu cores
apicid
                : 53
initial apicid: 53
                : yes
fpu_exception
                : yes
cpuid level
                : 11
wp
                 : yes
flags
                : fpu vme de pse tsc msr pae mce cx8 apic
    sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx
   fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
   lm constant_tsc arch_perfmon pebs bts rep_good
```

xtopology nonstop\_tsc aperfmperf pni pclmulqdq dtes64 monitor ds\_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4\_1 sse4\_2 popcnt aes lahf\_lm ida arat dts

tpr\_shadow vnmi flexpriority ept vpid bogomips : 4266.84 clflush size : 64

cache\_alignment : 64

address sizes : 40 bits physical, 48 bits virtual power management:

## 2.4 Memory Information

\$ papi\_mem\_info

Memory Cache and TLB Hierarchy Information.

#### TLB Information.

There may be multiple descriptors for each level of TLB if multiple page sizes are supported.

#### L1 Instruction TLB:

Page Size: 2048 KB Number of Entries: 7 Associativity: Full

#### L1 Instruction TLB:

Page Size: 4096 KB Number of Entries: 7 Associativity: Full

#### L1 Data TLB:

Page Size: 4 KB Number of Entries: 64 Associativity: 4

#### L1 Data TLB:

Page Size: 2048 KB Number of Entries: 32 Associativity: 4

#### L1 Data TLB:

Page Size: 4096 KB Number of Entries: 32 Associativity: 4

#### L1 Instruction TLB:

Page Size: 4 KB Number of Entries: 64 Associativity: 4

Cache Information.

L1 Data Cache:

Total size: 32 KB Line size: 64 B Number of Lines: 512 Associativity: 8

L1 Instruction Cache:

Total size: 32 KB Line size: 64 B Number of Lines: 512 Associativity: 4

L2 Unified Cache:

Total size: 256 KB Line size: 64 B Number of Lines: 4096 Associativity: 8

L3 Unified Cache:

Total size: 12288 KB Line size: 64 B Number of Lines: 196608 Associativity: 16

mem\_info.c PASSED

# 2.5 Timing of Matrix Multiplication Code

Performance Plot – MFLOPs per Second



# 2.6 Comparison with ATLAS and MKL



