Added statistics for polluting prefetches #165

ngober · 2021-05-25T16:06:04Z

This patch adds support for measuring the number of polluting prefetches. Loosely defined, a polluting prefetch is one that evicts a useful block and is itself not useful. Such a prefetch incurs 2 units of traffic while increasing the number of misses by 1. It is meaningful then to distinguish polluting prefetches from useless prefetches, which do not increase the number of misses.

sethpugsley · 2021-06-05T01:10:17Z

I'm not sure this can really be done without a full shadow cache, which ignores prefetches. This seems to only cover the situation where a prefetch directly kicks a future-useful cache line out, and not where it causes a chain of events that eventually leads to a future-useful cache line being evicted. Also, I don't think it correctly handles the situation where the cache line would have been evicted anyway, even without prefetching.

Here's an example of the second issue:
Imagine a 2-way set in a cache that starts empty, with this access pattern: A B A C B A
Allocate A
Allocate B
Hit A
C evicts B
B evicts A
A evicts C

If we add a prefetch P into the middle, here: A B A C P B A, then we get
Allocate A
Allocate B
Hit A
C evicts B
P evicts A
B evicts C
A evicts P, and P gets blamed for being a polluting prefetch, even though it was really only useless, and not polluting.

Am I understanding how this code works correctly? Would it really mischaracterize P in this case?

Also, consider this example:

Imagine a data set fits perfectly in an 8-way associative set, and we repeatedly scan through it and get non-stop hits. If a prefetch evicts one of the cache lines, then it will have a cascade effect that will actually cause 8 subsequent cache misses, and not just the 1 to re-fetch the cache line that it evicted.

This PR would only say that the prefetch was responsible for 1 unit of pollution, when it was really the ultimate cause of 8 misses. Should some instances of pollution be worth more than others, and if so, how do we communicate that? If the situation were similar, but there were 2 prefetches that caused a streaming data object to just barely not fit in the cache, then what is the right way to attribute blame to them? Do the individual prefetches need to be blamed, or do we just need a running tally of all the misses that wouldn't have occurred if not for prefetching?

ngober added 3 commits February 8, 2021 16:22

Added statistics for polluting prefetches

30ce266

Merge branch 'develop' into pollution_tracking

8a7f931

Fixed pollution tracking for eventually useful prefetches

4fa31be

ngober added the Low Priority label Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added statistics for polluting prefetches #165

Added statistics for polluting prefetches #165

ngober commented May 25, 2021

sethpugsley commented Jun 5, 2021

Added statistics for polluting prefetches #165

Are you sure you want to change the base?

Added statistics for polluting prefetches #165

Conversation

ngober commented May 25, 2021

sethpugsley commented Jun 5, 2021