Add Linux perf JIT support (/tmp/perf-$pid.map) #1432

randomstuff · 2014-10-28T21:30:31Z

'perf' is the standard builtin tool for performance analysis on recent
Linux kernel. Its source code is shipped within the kernel repository.

'perf' has basic support for JIT. For each process, it can read a file
named /tmp/perf-$PID.map. This file contains mapping from address
range to function name in the format:

41187e2a 1a JIT_PPC_b804a33fc

with the following entries:

address of the start of the range (hexadecimal);
size of the range (hexadecimal);
name of the function.

Currently, we supply the address of the basic block instead of a
function name.

Usage:

DOLPHIN_PERF=1 dolphin-emu &
perf record -F99 -p $(pgrep dolphin-emu) --call-graph dwarf
perf script | stackcollapse-perf.pl | c++filt | flamegraph.pl > profile.svg

Issues:

perf does not have support for region invalidation. It reads the
file in preprocessing. It does not work very well if a JIT region
is reused for another basic block. Can we avoid this?

To be fixed:

Add a GUI option.

Long term evolutions:

add the function name as well in the file;
add support for ARM JIT;
add support for DSP JIT as well;
stack unwinding for JIT functions.

Generated FlameGraph :

(Here we are sampling on time but it is possible to sample cache-misses …)

Source/Core/Core/PowerPC/Jit64/Jit.cpp

+
+#ifdef USE_LINUX_PERF
+	if (getenv("DOLPHIN_PERF")) {
+		char filename[500];


phire · 2014-10-28T22:42:46Z

Looks good. I really like the chart it generates.

Currently, we supply the address of the basic block instead of a
function name.

A function name is possible less than useful. We probably care more about which basic block is profiling badly than which PowerPC function is profiling badly.

perf does not have support for region invalidation. It reads the file in preprocessing. It does not work very well if a JIT region is reused for another basic block. Can we avoid this?

The only way we can avoid this is by avoiding using address space twice.
We could create a custom allocator which will return memory at a unique address each time it is called. Unfortunately due to the < 4gb limitation, we will still eventually run out of address space. But it's probably enough space to get decently sized profiles before it fails.

In the future we may be able to lift the < 4gb limitation which will allow us to dedicate a huge chunk of the 64bit address space just for jitted code.

Another solution might be to hash the basic block before compiling it. If the code and the jit parameters the same the exact same code will be generated as before, we can compile it again at the same address and avoid wasting address space (or even keep a cached version around).

phire · 2014-10-28T22:42:54Z

@dolphin-emu-bot rebuild

comex · 2014-10-28T22:49:11Z

Just FYI, I'm interested in writing a Python script or something to convert this data into DWARF info (eww, I know) for use in Instruments etc. However, for this purpose I'd also like to emit a .s file for the JITted blocks and output "line number" info that maps native instructions to PPC instructions in the .s file. I can do this myself if I ever get around to it, but just mentioning it in case it interests anyone.

Source/Core/Core/PowerPC/Jit64/Jit.h

+	Jit64() : code_buffer(32000)
+	{
+#ifdef USE_LINUx_PERF
+		perf_map_file = NULL;


randomstuff · 2014-10-28T22:59:13Z

@comex: Could you elaborate on the DWARF thing? You would like to generate DWARF info about the block ranges? How do you expect to use it?

I thought about generating DWARF info in order to use this debugger JIT interface: this way GDB/LLDB would be JIT aware as well. We might even (?) be able to add stack unwinding DWARF info in order to be able to unwind the stack.

On possible application would be to use GDB to sample the stack instead of perf with support for unwinding the stack after JITed code.

Source/Core/Core/PowerPC/Jit64/Jit.cpp

+		// This is very slow, can I do this efficiently?
+		// Symbol* symbol symbol = g_symbolDB.GetSymbolFromAddr(b->originalAddress);
+		std::fprintf(perf_map_file, "%llx %x JIT_PPC_b%x\n",
+		  (long long int) b->normalEntry,


comex · 2014-10-28T23:09:23Z

@randomstuff Hmm, I suppose being able to debug the code live would be an advantage of generating it directly in Dolphin rather than some script. My plan, however, was to generate basic info about function spans and line info, and dump it into a Mach-O file to load after the fact into Instruments.app, which does not support any nice JIT-specific API.

I doubt using GDB as a profiler would have anything other than significantly increased overhead compared to Dolphin's built in profiler. Though it would be cool to be able to break into JIT code and randomly see the PPC stack... :p

randomstuff · 2014-10-28T23:26:13Z

When using a high sampling rate, a GDB based profiler will not be very efficient. When using a low sampling rate, a GDB-based profiler might have lower overhead than the builtin profiler (which needs to instrument each basic block entry and exit).

randomstuff · 2014-10-28T23:31:34Z

@comex: I might be interested in the DWARF generation thing. (I already wrote some DWARF consuming code.)

randomstuff · 2014-10-29T01:43:09Z

I should probably move this in JitCache.cpp alongside its USE_OPROFILE and USE_VTUNE friends (in FinalizeBlock). It would be shared with the ARM JIT and JITIL.

Source/Core/Core/PowerPC/JitCommon/JitCache.cpp

+		if (perf_map_file.is_open())
+		{
+			perf_map_file << StringFromFormat(
+				"%" PRIx64 " %" PRIx32 " EmuCode_%" PRIx32 "\n",


comex · 2014-10-29T03:37:25Z

Do we really need a compile time option at all?

degasus · 2014-10-29T07:40:54Z

@randomstuff Is it possible to group by ppc instruction type instead of by ppc address? Is it fine to just reuse the same name? Configureable of course :) I think it would be also nice to see how much time is spend in eg floating point operations.

I'm still surprised how few code this PR is, good job.

randomstuff · 2014-10-29T08:52:15Z

@degasus: What do you mean "group by ppc" instruction. You would like to see how much time is spent for each time os PowerPC opcode? We'd have to generate target IP ranges mapping to each individual PowerPC instruction. The main difficulty is that it would generate a lot of ranges. I'm not sure it would scale so well at this level of granularity but it might be worth trying. It might be more scalable if we can ask dolphin at runtime to only generate ranges for categories of opcode we are interested in ("only generate ranges for FP instruction"). Is that what you have I mind?

randomstuff · 2014-10-29T08:59:24Z

I don't think there is any problem with associating the same name for different IP ranges. We could as well generate unique name (such as OP_$opcodename_$powerpcip). Then with the same dataset with could either group by both (opcodename + PpowerPC IP) or by opcode name (by sed 's/OP_$[a-z.]*$_[a-z]*/OP_\1/').

degasus · 2014-10-29T09:00:04Z

Yeah, that's why I meaned. So if I use the same name twice, will these blocks be combined in perf?
But I think this is likely out of scope of this PR. Basic usage first

randomstuff · 2014-10-29T09:08:55Z

@comex: Yes, the compile-time option is not really necessary. I initially wanted to use #ifdef __linux__ but in fact the only requirement is getpid() and, as you pointed out, I could be used for other OSes. I thought about #ifndef _WIN32 but it would be useful for Windows as well, I just need a different code for the filename there. Maybe something like:

std::string filename = StringFromFormat("perf-%d.map", GetCurrentProcessId()).

Source/Core/Common/JitRegister.h

+	{
+		if (original_address) {
+			perf_map_file << StringFromFormat(
+			  "%" PRIx64 " %x %s_%x\n",


Sonicadvance1 · 2014-11-02T15:55:45Z

Can we change this over to a namespace instead of a bunch of functions in declared in a header polluting the global namespace? This can allow us to remove the ugly extern declared perf_map_file in the header as well.

randomstuff · 2014-11-03T00:28:14Z

I brought VTUNE support in JitRegister. It seems the VTUNE JIT profiling API makes a copy of the method_name for itself so we don't need to keep it this string around.

lioncash · 2014-11-04T14:21:59Z

You would need to add JitRegister.cpp/.h to the Visual Studio project file in Common as well.

Source/Core/Common/JitRegister.cpp

+	if (original_address)
+	{
+		snprintf(buf, 100, "%s_%x", name, original_address);
+		symbol_name = buf;


Stevoisiak · 2014-11-09T16:35:36Z

@dolphin-emu-bot rebuild

randomstuff · 2014-11-16T23:51:32Z

Fixed the include order.

Stevoisiak · 2014-11-16T23:53:34Z

@dolphin-emu-bot rebuild

randomstuff · 2014-11-17T19:53:59Z

I was expecting the check-includes.py to exit with 0 when everything is OK: I was wrong. This time it should be really OK.

'perf' is the standard builtin tool for performance analysis on recent Linux kernel. Its source code is shipped within the kernel repository. 'perf' has basic support for JIT. For each process, it can read a file named /tmp/perf-$PID.map. This file contains mapping from address range to function name in the format: 41187e2a 1a EmuCode_804a33fc with the following entries: 1. beginning of the range (hexadecimal); 2. size of the range (hexadecimal); 3. name of the function. We supply the PowerPC address of the basic block as function name. Usage: DOLPHIN_PERF_DIR=/tmp dolphin-emu & perf record -F99 -p $(pgrep dolphin-emu) --call-graph dwarf perf script | stackcollapse-perf.pl | grep EmuCode__ | flamegraph.pl > profile.svg Issue: perf does not have support for region invalidation. It reads the file in postprocessing. It probably does not work very well if a JIT region is reused for another basic block: wrong results should be expected in this case. Currently, nothing is done to prevent this.

Move the JITed function/basic-block registration logic out of the CPU subsystem in order to add JIT registration to JITed DSP and Video/VertexLoader code. This necessary in order to add /tmp/perf-$pid.map support to other JITed code as they need to write to the same file.

Tilka · 2014-11-24T23:53:00Z

"/tmp" is hardcoded into perf.

randomstuff · 2014-11-24T23:56:22Z

@Tilka: Yes the idea was that it might be useful for people which are not using perf (see @comex) and might not need to store it in /tmp (and even people on Windows without /tmp).

skidau · 2014-12-07T23:35:24Z

@dolphin-emu-bot rebuild

randomstuff · 2014-12-28T21:57:51Z

Is there anything holding it from master?

Add Linux perf JIT support (/tmp/perf-$pid.map)

degasus · 2014-12-28T22:37:50Z

Sorry, I've just missed this PR. Through there are still some unsupported features:
We do have a far code cache. It's used for rarely used code and it's generated per block. So also add such an entry would be usful. We also have a vertex loader jit which could get a symbol name. Maybe also our shared assembly.

randomstuff · 2014-12-28T22:40:05Z

I'll try too look at this. I have a WIP (I'm not sure it's correct) branch jit-register with support for the vertex loader JIT and shared assembly JIT as well.

degasus · 2014-12-28T22:53:28Z

This one is likely also nice: https://github.com/dolphin-emu/dolphin/blob/master/Source/Core/Core/PowerPC/Jit64/JitAsm.cpp#L19

phire reviewed Oct 28, 2014
View reviewed changes

Source/Core/Core/PowerPC/Jit64/Jit.cpp

#ifdef USE_LINUX_PERF

if (getenv("DOLPHIN_PERF")) {

char filename[500];

This comment was marked as off-topic.

Sign in to view

lioncash reviewed Oct 28, 2014
View reviewed changes

Source/Core/Core/PowerPC/Jit64/Jit.h

Jit64() : code_buffer(32000)

{

#ifdef USE_LINUx_PERF

perf_map_file = NULL;

This comment was marked as off-topic.

Sign in to view

lioncash reviewed Oct 28, 2014
View reviewed changes

lioncash reviewed Oct 29, 2014
View reviewed changes

Source/Core/Core/PowerPC/JitCommon/JitCache.cpp

if (perf_map_file.is_open())

{

perf_map_file << StringFromFormat(

"%" PRIx64 " %" PRIx32 " EmuCode_%" PRIx32 "\n",

This comment was marked as off-topic.

Sign in to view

Sonicadvance1 reviewed Nov 2, 2014
View reviewed changes

Source/Core/Common/JitRegister.h

{

if (original_address) {

perf_map_file << StringFromFormat(

"%" PRIx64 " %x %s_%x\n",

This comment was marked as off-topic.

Sign in to view

lioncash reviewed Nov 4, 2014
View reviewed changes

Source/Core/Common/JitRegister.cpp

if (original_address)

{

snprintf(buf, 100, "%s_%x", name, original_address);

symbol_name = buf;

This comment was marked as off-topic.

Sign in to view

randomstuff added 3 commits November 24, 2014 23:18

Use CLI argument for Linux perf JIT support

5b9aeaa

degasus added a commit that referenced this pull request Dec 28, 2014

Merge pull request #1432 from randomstuff/linux-perf

c5a0b6b

Add Linux perf JIT support (/tmp/perf-$pid.map)

degasus merged commit c5a0b6b into dolphin-emu:master Dec 28, 2014

randomstuff mentioned this pull request Jan 1, 2015

Add support of more JIT-compiled code for profiling #1800

Merged

randomstuff deleted the linux-perf branch February 28, 2015 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Linux perf JIT support (/tmp/perf-$pid.map) #1432

Add Linux perf JIT support (/tmp/perf-$pid.map) #1432

randomstuff commented Oct 28, 2014

This comment was marked as off-topic.

phire commented Oct 28, 2014

phire commented Oct 28, 2014

comex commented Oct 28, 2014

This comment was marked as off-topic.

randomstuff commented Oct 28, 2014

This comment was marked as off-topic.

comex commented Oct 28, 2014

randomstuff commented Oct 28, 2014

randomstuff commented Oct 28, 2014

randomstuff commented Oct 29, 2014

This comment was marked as off-topic.

comex commented Oct 29, 2014

degasus commented Oct 29, 2014

randomstuff commented Oct 29, 2014

randomstuff commented Oct 29, 2014

degasus commented Oct 29, 2014

randomstuff commented Oct 29, 2014

This comment was marked as off-topic.

Sonicadvance1 commented Nov 2, 2014

randomstuff commented Nov 3, 2014

lioncash commented Nov 4, 2014

This comment was marked as off-topic.

Stevoisiak commented Nov 9, 2014

randomstuff commented Nov 16, 2014

Stevoisiak commented Nov 16, 2014

randomstuff commented Nov 17, 2014

Tilka commented Nov 24, 2014

randomstuff commented Nov 24, 2014

skidau commented Dec 7, 2014

randomstuff commented Dec 28, 2014

degasus commented Dec 28, 2014

randomstuff commented Dec 28, 2014

degasus commented Dec 28, 2014

Add Linux perf JIT support (/tmp/perf-$pid.map) #1432

Add Linux perf JIT support (/tmp/perf-$pid.map) #1432

Conversation

randomstuff commented Oct 28, 2014

This comment was marked as off-topic.

phire commented Oct 28, 2014

phire commented Oct 28, 2014

comex commented Oct 28, 2014

This comment was marked as off-topic.

randomstuff commented Oct 28, 2014

This comment was marked as off-topic.

comex commented Oct 28, 2014

randomstuff commented Oct 28, 2014

randomstuff commented Oct 28, 2014

randomstuff commented Oct 29, 2014

This comment was marked as off-topic.

comex commented Oct 29, 2014

degasus commented Oct 29, 2014

randomstuff commented Oct 29, 2014

randomstuff commented Oct 29, 2014

degasus commented Oct 29, 2014

randomstuff commented Oct 29, 2014

This comment was marked as off-topic.

Sonicadvance1 commented Nov 2, 2014

randomstuff commented Nov 3, 2014

lioncash commented Nov 4, 2014

This comment was marked as off-topic.

Stevoisiak commented Nov 9, 2014

randomstuff commented Nov 16, 2014

Stevoisiak commented Nov 16, 2014

randomstuff commented Nov 17, 2014

Tilka commented Nov 24, 2014

randomstuff commented Nov 24, 2014

skidau commented Dec 7, 2014

randomstuff commented Dec 28, 2014

degasus commented Dec 28, 2014

randomstuff commented Dec 28, 2014

degasus commented Dec 28, 2014