Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added statistics for polluting prefetches #165

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

ngober
Copy link
Collaborator

@ngober ngober commented May 25, 2021

This patch adds support for measuring the number of polluting prefetches. Loosely defined, a polluting prefetch is one that evicts a useful block and is itself not useful. Such a prefetch incurs 2 units of traffic while increasing the number of misses by 1. It is meaningful then to distinguish polluting prefetches from useless prefetches, which do not increase the number of misses.

@sethpugsley
Copy link
Collaborator

I'm not sure this can really be done without a full shadow cache, which ignores prefetches. This seems to only cover the situation where a prefetch directly kicks a future-useful cache line out, and not where it causes a chain of events that eventually leads to a future-useful cache line being evicted. Also, I don't think it correctly handles the situation where the cache line would have been evicted anyway, even without prefetching.

Here's an example of the second issue:
Imagine a 2-way set in a cache that starts empty, with this access pattern: A B A C B A
Allocate A
Allocate B
Hit A
C evicts B
B evicts A
A evicts C

If we add a prefetch P into the middle, here: A B A C P B A, then we get
Allocate A
Allocate B
Hit A
C evicts B
P evicts A
B evicts C
A evicts P, and P gets blamed for being a polluting prefetch, even though it was really only useless, and not polluting.

Am I understanding how this code works correctly? Would it really mischaracterize P in this case?

Also, consider this example:

Imagine a data set fits perfectly in an 8-way associative set, and we repeatedly scan through it and get non-stop hits. If a prefetch evicts one of the cache lines, then it will have a cascade effect that will actually cause 8 subsequent cache misses, and not just the 1 to re-fetch the cache line that it evicted.

This PR would only say that the prefetch was responsible for 1 unit of pollution, when it was really the ultimate cause of 8 misses. Should some instances of pollution be worth more than others, and if so, how do we communicate that? If the situation were similar, but there were 2 prefetches that caused a streaming data object to just barely not fit in the cache, then what is the right way to attribute blame to them? Do the individual prefetches need to be blamed, or do we just need a running tally of all the misses that wouldn't have occurred if not for prefetching?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants