New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Linux perf JIT support (/tmp/perf-$pid.map) #1432
Conversation
|
||
#ifdef USE_LINUX_PERF | ||
if (getenv("DOLPHIN_PERF")) { | ||
char filename[500]; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Looks good. I really like the chart it generates.
A function name is possible less than useful. We probably care more about which basic block is profiling badly than which PowerPC function is profiling badly.
The only way we can avoid this is by avoiding using address space twice. In the future we may be able to lift the < 4gb limitation which will allow us to dedicate a huge chunk of the 64bit address space just for jitted code. Another solution might be to hash the basic block before compiling it. If the code and the jit parameters the same the exact same code will be generated as before, we can compile it again at the same address and avoid wasting address space (or even keep a cached version around). |
@dolphin-emu-bot rebuild |
Just FYI, I'm interested in writing a Python script or something to convert this data into DWARF info (eww, I know) for use in Instruments etc. However, for this purpose I'd also like to emit a .s file for the JITted blocks and output "line number" info that maps native instructions to PPC instructions in the .s file. I can do this myself if I ever get around to it, but just mentioning it in case it interests anyone. |
Jit64() : code_buffer(32000) | ||
{ | ||
#ifdef USE_LINUx_PERF | ||
perf_map_file = NULL; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@comex: Could you elaborate on the DWARF thing? You would like to generate DWARF info about the block ranges? How do you expect to use it? I thought about generating DWARF info in order to use this debugger JIT interface: this way GDB/LLDB would be JIT aware as well. We might even (?) be able to add stack unwinding DWARF info in order to be able to unwind the stack. On possible application would be to use GDB to sample the stack instead of perf with support for unwinding the stack after JITed code. |
// This is very slow, can I do this efficiently? | ||
// Symbol* symbol symbol = g_symbolDB.GetSymbolFromAddr(b->originalAddress); | ||
std::fprintf(perf_map_file, "%llx %x JIT_PPC_b%x\n", | ||
(long long int) b->normalEntry, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@randomstuff Hmm, I suppose being able to debug the code live would be an advantage of generating it directly in Dolphin rather than some script. My plan, however, was to generate basic info about function spans and line info, and dump it into a Mach-O file to load after the fact into Instruments.app, which does not support any nice JIT-specific API. I doubt using GDB as a profiler would have anything other than significantly increased overhead compared to Dolphin's built in profiler. Though it would be cool to be able to break into JIT code and randomly see the PPC stack... :p |
When using a high sampling rate, a GDB based profiler will not be very efficient. When using a low sampling rate, a GDB-based profiler might have lower overhead than the builtin profiler (which needs to instrument each basic block entry and exit). |
@comex: I might be interested in the DWARF generation thing. (I already wrote some DWARF consuming code.) |
I should probably move this in JitCache.cpp alongside its USE_OPROFILE and USE_VTUNE friends (in FinalizeBlock). It would be shared with the ARM JIT and JITIL. |
if (perf_map_file.is_open()) | ||
{ | ||
perf_map_file << StringFromFormat( | ||
"%" PRIx64 " %" PRIx32 " EmuCode_%" PRIx32 "\n", |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Do we really need a compile time option at all? |
@randomstuff Is it possible to group by ppc instruction type instead of by ppc address? Is it fine to just reuse the same name? Configureable of course :) I think it would be also nice to see how much time is spend in eg floating point operations. I'm still surprised how few code this PR is, good job. |
@degasus: What do you mean "group by ppc" instruction. You would like to see how much time is spent for each time os PowerPC opcode? We'd have to generate target IP ranges mapping to each individual PowerPC instruction. The main difficulty is that it would generate a lot of ranges. I'm not sure it would scale so well at this level of granularity but it might be worth trying. It might be more scalable if we can ask dolphin at runtime to only generate ranges for categories of opcode we are interested in ("only generate ranges for FP instruction"). Is that what you have I mind? |
I don't think there is any problem with associating the same name for different IP ranges. We could as well generate unique name (such as OP_$opcodename_$powerpcip). Then with the same dataset with could either group by both (opcodename + PpowerPC IP) or by opcode name (by |
Yeah, that's why I meaned. So if I use the same name twice, will these blocks be combined in perf? |
@comex: Yes, the compile-time option is not really necessary. I initially wanted to use std::string filename = StringFromFormat("perf-%d.map", GetCurrentProcessId()). |
{ | ||
if (original_address) { | ||
perf_map_file << StringFromFormat( | ||
"%" PRIx64 " %x %s_%x\n", |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Can we change this over to a namespace instead of a bunch of functions in declared in a header polluting the global namespace? This can allow us to remove the ugly extern declared perf_map_file in the header as well. |
I brought VTUNE support in |
You would need to add JitRegister.cpp/.h to the Visual Studio project file in Common as well. |
if (original_address) | ||
{ | ||
snprintf(buf, 100, "%s_%x", name, original_address); | ||
symbol_name = buf; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@dolphin-emu-bot rebuild |
Fixed the include order. |
@dolphin-emu-bot rebuild |
I was expecting the |
'perf' is the standard builtin tool for performance analysis on recent Linux kernel. Its source code is shipped within the kernel repository. 'perf' has basic support for JIT. For each process, it can read a file named /tmp/perf-$PID.map. This file contains mapping from address range to function name in the format: 41187e2a 1a EmuCode_804a33fc with the following entries: 1. beginning of the range (hexadecimal); 2. size of the range (hexadecimal); 3. name of the function. We supply the PowerPC address of the basic block as function name. Usage: DOLPHIN_PERF_DIR=/tmp dolphin-emu & perf record -F99 -p $(pgrep dolphin-emu) --call-graph dwarf perf script | stackcollapse-perf.pl | grep EmuCode__ | flamegraph.pl > profile.svg Issue: perf does not have support for region invalidation. It reads the file in postprocessing. It probably does not work very well if a JIT region is reused for another basic block: wrong results should be expected in this case. Currently, nothing is done to prevent this.
Move the JITed function/basic-block registration logic out of the CPU subsystem in order to add JIT registration to JITed DSP and Video/VertexLoader code. This necessary in order to add /tmp/perf-$pid.map support to other JITed code as they need to write to the same file.
"/tmp" is hardcoded into perf. |
@dolphin-emu-bot rebuild |
Is there anything holding it from |
Add Linux perf JIT support (/tmp/perf-$pid.map)
Sorry, I've just missed this PR. Through there are still some unsupported features: |
I'll try too look at this. I have a WIP (I'm not sure it's correct) branch |
This one is likely also nice: https://github.com/dolphin-emu/dolphin/blob/master/Source/Core/Core/PowerPC/Jit64/JitAsm.cpp#L19 |
'perf' is the standard builtin tool for performance analysis on recent
Linux kernel. Its source code is shipped within the kernel repository.
'perf' has basic support for JIT. For each process, it can read a file
named /tmp/perf-$PID.map. This file contains mapping from address
range to function name in the format:
41187e2a 1a JIT_PPC_b804a33fc
with the following entries:
Currently, we supply the address of the basic block instead of a
function name.
Usage:
Issues:
file in preprocessing. It does not work very well if a JIT region
is reused for another basic block. Can we avoid this?
To be fixed:
Long term evolutions:
Generated FlameGraph :
(Here we are sampling on time but it is possible to sample cache-misses …)