# ECE571: Advanced Microprocessor Design – Homework 6 Spring 2016

Due: Friday 3 March 2016, 3:30pm

Create a document that contains the data and answers described in the sections below. A .pdf or .txt file is preferred but I can accept MS Office or Libreoffice format if necessary.

#### 1. Bzip2 prefetch behavior on the x86 Haswell Machine

For this section, log into the Haswell machine just like in previous homeworks.

Run the bzip2 benchmark on the Haswell machine.

(a) Measure (in one command) bzip using the following events: L2\_RQSTS:ALL\_DEMAND\_REFERENCES (r53e724:u) which is total L2 cache accesses, L2\_RQSTS:DEMAND\_DATA\_RD\_MISS (r532124:u) which is total L2 cache misses, and L2\_RQSTS:ALL\_PF (r53f824:u) which is total prefetches from the L2 cache.

```
perf stat -e r53e724:u,r532124:u,r53f824:u \
/opt/ece571/401.bzip2/bzip2 -k -f ./input.source
```

Calculate the L2 cache miss rate from the first two results, also note the total time.

## 2. Software Prefetching and bzip2 on Haswell

(a) Re-run the previous prefetch results on Haswell, but instead of running bzip2 run bzip2.swprefetch which was compiled with -fprefetch-loop-arrays which enables sw prefetch instructions.

Record the miss rate and total time.

```
perf stat -e r53e724:u,r532124:u,r53f824:u \
/opt/ece571/401.bzip2/bzip2.swprefetch -k -f ./input.source
```

### 3. equake\_l prefetch behavior on the x86 Haswell Machine

Run equake\_1:

```
(a) perf stat -e r53e724:u,r532124:u,r53f824:u \
   /opt/ece571/equake_l.specomp/equake_l < \
   /opt/ece571/equake_l.specomp/inp.in</pre>
```

Calculate the L2 cache miss rate from the first two results, also note the total time.

#### 4. equake\_l software prefetch behavior on the x86 Haswell Machine

Run equake\_1 with software prefetch enabled:

```
(a) perf stat -e r53e724:u,r532124:u,r53f824:u \
   /opt/ece571/equake_l.specomp/equake_l.swprefetch < \
   /opt/ece571/equake_l.specomp/inp.in</pre>
```

Calculate the L2 cache miss rate from the first two results, also note the total time.

#### 5. Hardware Prefetch Disabled

It is possible to disable hardware prefetch on modern Intel processors.

See https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors for details.

It requires root permissions, so I have done the measurements for you, included below.

| benchmark           | L2-total       | L2-miss        | L2-prefetches | time  |
|---------------------|----------------|----------------|---------------|-------|
| bzip2               | 291,991,138    | 128,965,209    | 155,118       | 3.47  |
| bzip2.swprefetch    | 291,643,272    | 129,178,945    | 151,348       | 3.39  |
| equake_l            | 28,041,463,608 | 19,000,119,990 | 3,370,942     | 159.9 |
| equake_l.swprefetch | 28,341,994,698 | 18,978,027,534 | 3,373,831     | 159.8 |

## 6. Short Answer Questions

- (a) Did enabling software prefetch help on bzip2? (i.e. the results in question 1 and question 2?)
- (b) Did enabling software prefetch help on equake\_1? (i.e. the results in question 3 and question 4?)
- (c) How did turning off the prefetcher affect the bzip2 results (i.e. question 1 vs question 5?)
- (d) How did turning off the prefetcher affect the equake\_l results (i.e. question 3 vs question 5?)
- (e) With the hardware prefetcher disabled, did enabling software prefetch help at all? (question 5)
- (f) Why do you think the software prefetch performance is so underwhelming?

### 7. Submitting your work.

- Create the document containing the data as well as answers to the questions asked.
- Please make sure your name appears in the document.
- e-mail the file to me by the homework deadline.