Skip to content

Benchmarking

Alan Jeffrey edited this page Nov 3, 2015 · 15 revisions

Dromaeo

Running all the tests

Benchmarking should be done with a release build:

./mach build --release

There is a test runner which downloads and runs all the Dromaeo tests in headless mode:

./mach test-dromaeo

Once the tests are downloaded locally, you can run them inside servo:

./target/release/servo tests/dromaeo/dromaeo/web/index.html

Running a single test

Before running individual tests, you need to create the test harness:

cp tests/dromaeo/dromaeo/web/htmlrunner.js tests/dromaeo/dromaeo/

The default test harness runs test once. If you want to run each test more than once (and you probably do) then edit tests/dromaeo/dromaeo/web/htmlrunner.js, for example:

var startTest = parent.startTest || function(){};
var test = parent.test || function(name, fn){ 
  console.log(name);
  for (var i=0; i<1000; i++) { fn(); }
};
var endTest = parent.endTest || function(){};
var prep = parent.prep || function(fn){ fn(); };

You can then run individual test pages, for example:

./target/release/servo --exit tests/dromaeo/dromaeo/tests/dom-traverse.html 

Profiling with google-perftools

On a Debian-based system, install the google perftools package:

sudo apt-get install google-perftools

Run servo on a test page with the profiling library:

LD_PRELOAD=/usr/lib/libprofiler.so.0 \
CPUPROFILE=/tmp/servo-cpu.log \
target/release/servo --exit tests/dromaeo/dromaeo/tests/dom-traverse.html

Generate a call graph from the log:

google-pprof --svg --focus=dom target/release/servo \
/tmp/servo-cpu.log > /tmp/servo-cpu.svg

The flag --focus=dom filters the call graph to only show calls involving the dom namespace.

Core DOM performance

Servo and Firefox running the DOM Core tests:

Dromaeo DOM Core results for Servo Dromaeo DOM Core results for Firefox

Speculating about the cases where Firefox is getting significantly better performance:

  • getAttribute: a difference this significant is probably due to JIT optimization in Spidermonkey, resulting in the attribute read being hoisted out of the loop. (See https://github.com/servo/servo/pull/8040).

  • DOM Query: the implementation of HTMLCollection could benefit from caching to avoid traversing the document tree on every access (the problem here is triggering cache invalidation when the DOM is modified). (See https://github.com/servo/servo/issues/1916 and https://github.com/servo/servo/issues/3381). Caching and cache invalidation are implemented in https://github.com/servo/servo/pull/8227, which gets about a 1000x speed-up on the relevant DOM query tests.

  • DOM Traversal: From looking at the call graph, a surprising amount of time is spent in item and len. The issue with len appears to be rooting, since everything else compiles to a field access:

Dromaeo dom-traverse call graph

Looking at the generated x86, you can see vector code mixed in with what should really just be a pointer indirection:

gdb target/release/servo
disassemble 'dom::nodelist::_$LT$impl$GT$::len::hfe60577417cf4a8eGC7' 
Dump of assembler code for function _ZN3dom8nodelist13_$LT$impl$GT$3len20hfe60577417cf4a8eGC7E:
...
0x0000000000951474 <+228>:	mov    %r15,%rdi
0x0000000000951477 <+231>:	callq  0x62b8d0 <_ZN7raw_vec13_$LT$impl$GT$6double6double21h16855681377121452665E>
0x000000000095147c <+236>:	mov    0x10(%r15),%rbp
0x0000000000951480 <+240>:	jmpq   0x9513e0 <_ZN3dom8nodelist13_$LT$impl$GT$3len20hfe60577417cf4a8eGC7E+80>
...
End of assembler dump.

Core JS performance

Comparing Servo's performance with Firefox's on the core JS tests, FF is about 30% faster on some tests:

Dromaeo JS Core results for Firefox Dromaeo JS Core results for Servo

These tests are just JS tests, so can be run in the JS shell. In an FF build, the shell is at obj-architecture/dist/bin/js, and in a servo build, it's in target/release/build/mozjs_sys-hash/out/dist/bin/js. Looking at the call graphs, the main difference is in low-level operations like memchr:

Dromaeo JS Core call graph for Firefox's JS shell Dromaeo JS Core call graph for Servo's JS shell

Looking at the gcc command which generates the js executable, for servo it's in target/release/build/mozjs_sys-hash/output

/home/ajeffrey/gdrive/scratch/servo/target/release/build/mozjs_sys-1a3ebeaec3a7c092/out/_virtualen/bin/python /home/ajeffrey/gdrive/scratch/mozjs/mozjs/config/expandlibs_exec.py --uselist -- \
g++ -o js  -Wall -Wsign-compare -Wtype-limits -Wno-invalid-offsetof -Wcast-align \
-fno-rtti -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -pipe  -DNDEBUG -DTRIMMED -g \
-freorder-blocks -O3 -fomit-frame-pointer  Unified_cpp_js_src_shell0.o   -lpthread  \
-Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /home/ajeffrey/gdrive/scratch/servo/target/release/build/mozjs_sys-1a3ebeaec3a7c092/out/build/unix/gold \
-Wl,-rpath-link,../../../dist/bin -Wl,-rpath-link,/usr/local/lib ../../../mozglue/build/libmozglue.a ../../../js/src/editline/libeditline.a ../../../js/src/libjs_static.a \
-lm -ldl  -lz -lm -ldl 

This uses libjs_static, which is built with:

/home/ajeffrey/gdrive/scratch/servo/target/release/build/mozjs_sys-1a3ebeaec3a7c092/out/_virtualenv/bin/python /home/ajeffrey/gdrive/scratch/mozjs/mozjs/config/expandlibs_exec.py --extract -- ar crs libjs_static.a \
RegExp.o Parser.o ExecutableAllocatorPosix.o jsarray.o jsatom.o jsmath.o jsutil.o pm_linux.o TraceLogging.o TraceLoggingGraph.o TraceLoggingTypes.o \
Unified_cpp_js_src0.o Unified_cpp_js_src1.o Unified_cpp_js_src10.o Unified_cpp_js_src11.o Unified_cpp_js_src12.o Unified_cpp_js_src2.o Unified_cpp_js_src3.o Unified_cpp_js_src4.o Unified_cpp_js_src5.o Unified_cpp_js_src6.o Unified_cpp_js_src7.o Unified_cpp_js_src8.o Unified_cpp_js_src9.o \
../../mozglue/build/libmozglue.a ../../config/external/icu/libicu.a ../../config/external/nspr/libnspr.a ../../config/external/zlib/libzlib.a 

The FF build doesn't keep a log, but can be run with the --verbose flag using script to save it's output:

/home/ajeffrey/tmp/mozilla-central-235004/obj-x86_64-unknown-linux-gnu/_virtualenv/bin/python /home/ajeffrey/tmp/mozilla-central-235004/config/expandlibs_exec.py --extract -- ar crs libjs_static.a \
RegExp.o CTypes.o Library.o Parser.o ExecutableAllocatorPosix.o jsarray.o jsatom.o jsmath.o jsutil.o pm_linux.o TraceLogging.o TraceLoggingGraph.o TraceLoggingTypes.o \
Unified_cpp_js_src0.o Unified_cpp_js_src1.o Unified_cpp_js_src10.o Unified_cpp_js_src11.o Unified_cpp_js_src12.o Unified_cpp_js_src2.o Unified_cpp_js_src3.o Unified_cpp_js_src4.o Unified_cpp_js_src5.o Unified_cpp_js_src6.o Unified_cpp_js_src7.o Unified_cpp_js_src8.o Unified_cpp_js_src9.o \
../../config/external/ffi/libffi.a ../../config/external/icu/libicu.a ../../config/external/libmemory.a ../../config/external/nspr/libnspr.a ../../config/external/zlib/libzlib.a

The important difference is that the FF build uses libmemory, and the servo build doesn't. Libmemory is the shim SpiderMonkey uses to jemalloc, and indeed using strings on the binaries shows that the FF build has lots of je_malloc symbols, but the servo build has none.

Enabling jemalloc for stand-alone js is discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1134039, and ported to servo in https://github.com/servo/mozjs/pull/61. This change, together with enabling native regexps (https://github.com/servo/rust-mozjs/pull/210) gets the servo performance on a par with FF:

Dromaeo JS Core results for modified Servo

One remaining issue is whether or not to disable the gczeal option in Servo (https://github.com/servo/mozjs/pull/60). Zealous GC is a useful debugging tool, but seems to give about a 5% hit in performance when run on allocation-heavy tests.

Clone this wiki locally