-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About L3 evict #213
Comments
The L3 groups measures traffic between L2 and L3. Same for the L2 groups, which measures between L1 and L2. The cache lines evicted by L3 cache is closely related to the write bandwidth to the memory controllers. Most people want core-local data and there is no "evicts from L3" event for cores. You can use the LLC segments but they are not per core but per socket. L3 data volume is the amount of data loaded and evicted from/to L3 from the perspective of CPU cores, the processing units, thus between L2 and L3. In most Intel systems it is the data flowing through L3 because the L3 is inclusive. For Skylake and later, the L3 is a victim cache why the L3 group is somewhat misleading on these platforms because not all data is flowing through L3, it can be loaded into L2 from memory directly. |
Thanks, Thomas. By saying "You can use the LLC segments", do you mean the per socket uncore event, right? Still, I think using "L2_LINES_IN_ALL" may not be very appropriate and meaningful from the perspective of LLC performance analysis. Let's just consider the inclusive cache architecture. As I said, if most demand requests from some core-x that miss L2 hit L3, they actually won't cause inter-core interference to others cores because they won't evict any cache lines from other cores. These requests do increase the traffic of this core between its private L2 and shared L3 and become part of the "L3 data volume" defined in likwid. Even for Skylake's non-inclusive LLC architecture, the requests that miss L2 still need to access L3 (If I understand this correctly). So my above claim also holds for Skylake and its later architecture. |
Yes, the CBo (Intel name) or CBOX (LIKWID name) counters. The L3 data volume should contain all line fillings and evicts of L3 from/to the backend (memory, interconnect, network,...) as well? Or do you mean traffic caused by remote L2 HITM accesses? Or is it just the name that bothers you and if it would be 'L2 <-> L3 data volume', we wouldn't have this conversation? You are free to propose changes through pull requests. If you just need another group, create it yourself https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#defining-custom-performance-groups |
in your first reply, you mentioned "L3 data volume is the amount of data loaded and evicted from/to L3 from the perspective of CPU cores, the processing units, thus between L2 and L3". It is not related to remote L2 HITM accesses. I just want to say the event "L2_LINES_IN_ALL" (L2 cache lines filling L2) contains some lines that are filled in L3 (For example, some core's requests may miss L2 many times but always hit L3). So, yes, "L2<->L3 data volume" is definitely a clearer name. In other words, if you use "L2<->L3 data volume" or "L2<->L3 traffic", it makes sense to use "L2_LINES_IN_ALL"; but if you want to use some metric that should contain the lines filling of L3 itself, "L2_LINES_IN_ALL" may not be accurate (Instead, "MEM_LOAD_UOPS_RETIRED.LLC_MISS" or "MEM_LOAD_UOPS_RETIRED.L3_MISS" could be more accurate. In addition, "LLC Misses"(num: 2E, umask: 41H) seems a more general event that collects the LLC misses.) Please correct me if I'm wrong. Thank you again! |
It's nice that you think about the event selection that mindful. The names could be more specific but until now it was clear to anyone what is measured with the L3 group. The most important part in my description you cited is "from the perspective of CPU cores". The CPU cores don't care what the L3 has to do to provide the cache lines (hit in some L3 segment or miss and load from memory). The MEM_LOAD_UOPS* events are known to have problems: I tested the LLC Misses event (originally called LONGEST_LAT_CACHE.MISS) quite often and it undercounts dramatically. If you need line fills and evicts to/from L3, use the MEM group and measure directly at the memory controllers. The counts are highly accurate. |
thanks TomTheBear, it's very clear for me now, i will close this issue. |
hi,
I noticed that many metrics about L3 evict used the event "L2_TRANS_L2_WB". According to Chapter 19.6 in "Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3B", this event is used to "Counts L2 dirty (modified) cache lines evicted by a
demand request." Obviously it does not count the lines that are evicted by l3 cache. Is something wrong with this or is there any special reason for you to use it in this way?
In addition, in the file "groups/sandybrige-ep/L3.txt, "L3 data volume" is calculated with "L2_LINES_IN_ALL" and "L2_TRANS_L2_WB". What does "L3 data volume" mean here? Does it mean the data flowing through L3? If so, there is a problem using "L2_LINES_IN_ALL". If a load request misses L3, the fetched new line will be brought into both L3 and L2 from memory; but if the load misses L2 but hits L3, the missed line will be fetched into L2 from L3 and there will be no new line filling L3. I'm really confused with this formula in this .txt file.
Any help is appreciated.
The text was updated successfully, but these errors were encountered: