New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early benchmarking of rete versus greedy pattern rewriter #2
Conversation
bollu
commented
Dec 11, 2021
------------ Performance counter stats for '/home/bollu/work/1-hoopl/build/release/bin/hoopl --bench-rete /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir': 2,684.85 msec task-clock # 0.998 CPUs utilized 379 context-switches # 141.163 /sec 1 cpu-migrations # 0.372 /sec 5,149 page-faults # 1.918 K/sec 8,30,20,19,196 cycles # 3.092 GHz 6,66,40,28,715 instructions # 0.80 insn per cycle 1,62,77,59,783 branches # 606.277 M/sec 45,28,371 branch-misses # 0.28% of all branches 2.691142846 seconds time elapsed 2.353686000 seconds user 0.319053000 seconds sys ------------ Performance counter stats for '/home/bollu/work/1-hoopl/build/release/bin/hoopl --bench-greedy /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir': 268.81 msec task-clock # 0.989 CPUs utilized 59 context-switches # 219.487 /sec 0 cpu-migrations # 0.000 /sec 4,001 page-faults # 14.884 K/sec 83,33,77,093 cycles # 3.100 GHz 77,91,95,474 instructions # 0.93 insn per cycle 16,33,50,158 branches # 607.683 M/sec 25,91,387 branch-misses # 1.59% of all branches 0.271878822 seconds time elapsed 0.122807000 seconds user 0.146161000 seconds sys ---------------------------------- 36.98% hoopl hoopl [.] alpha_memory_activation 31.69% hoopl hoopl [.] BetaTokensMemory::join_activation 14.04% hoopl hoopl [.] std::__cxx11::list<WME*, std::allocator<WME*> >::remove 1.42% hoopl [kernel.vmlinux] [k] syscall_exit_to_user_mode 0.97% hoopl ld-2.33.so [.] do_lookup_x 0.74% hoopl [kernel.vmlinux] [k] entry_SYSCALL_64 0.73% hoopl [kernel.vmlinux] [k] syscall_return_via_sysret 0.46% hoopl [kernel.vmlinux] [k] preempt_count_add 0.43% hoopl [kernel.vmlinux] [k] _raw_read_unlock_irqrestore 0.41% hoopl ld-2.33.so [.] strcmp 0.40% hoopl [kernel.vmlinux] [k] n_tty_write 0.31% hoopl [kernel.vmlinux] [k] ep_poll_callback 0.30% hoopl [kernel.vmlinux] [k] tty_write 0.28% hoopl hoopl [.] toRete 0.28% hoopl [kernel.vmlinux] [k] _raw_spin_lock_irqsave 0.26% hoopl [kernel.vmlinux] [k] preempt_count_sub 0.25% hoopl [kernel.vmlinux] [k] _raw_read_lock_irqsave 0.24% hoopl [kernel.vmlinux] [k] __wake_up_common 0.23% hoopl [kernel.vmlinux] [k] native_queued_spin_lock_slowpath 0.18% hoopl libc-2.33.so [.] _int_malloc 0.17% hoopl [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 0.16% hoopl [kernel.vmlinux] [k] queue_work_on 0.14% hoopl [kernel.vmlinux] [k] __fsnotify_parent 0.14% hoopl [kernel.vmlinux] [k] apparmor_file_permission 0.14% hoopl [kernel.vmlinux] [k] vfs_write 0.14% hoopl [kernel.vmlinux] [k] insert_work 0.13% hoopl [kernel.vmlinux] [k] __audit_syscall_exit 0.13% hoopl [kernel.vmlinux] [k] __check_object_size 0.13% hoopl [kernel.vmlinux] [k] update_rq_clock 0.13% hoopl [kernel.vmlinux] [k] try_to_wake_up 0.12% hoopl [kernel.vmlinux] [k] pty_write 0.12% hoopl [kernel.vmlinux] [k] resched_curr 0.12% hoopl [kernel.vmlinux] [k] tty_insert_flip_string_fixed_flag 0.11% hoopl ld-2.33.so [.] _dl_map_object 0.11% hoopl [kernel.vmlinux] [k] select_task_rq_fair
for a test node (test-join WME[wme-ix] == Token[tok-ix][tok-field-ix]), keep data structures: - val2WMEs: value -> set<WME> (invariant: ∀ wme ∈ val2WMEs[v], wme[wme-ix] == v) - val2Toks: value -> set<Token> (invariant: ∀ tok ∈ val2Toks[v], tok[tok-ix][tok-field-ix] == v) This lets us process new tokens / new WMEs in O(# of real joins). This makes our perf report look like: ``` 43.01% hoopl hoopl [.] std::__cxx11::list<WME*, std::allocator<WME*> >::remove 5.89% hoopl [kernel.vmlinux] [k] syscall_exit_to_user_mode 3.54% hoopl ld-2.33.so [.] do_lookup_x 2.64% hoopl [kernel.vmlinux] [k] syscall_return_via_sysret 2.21% hoopl [kernel.vmlinux] [k] entry_SYSCALL_64 1.47% hoopl ld-2.33.so [.] strcmp 1.35% hoopl [kernel.vmlinux] [k] n_tty_write 1.23% hoopl [kernel.vmlinux] [k] _raw_spin_lock_irqsave 1.13% hoopl hoopl [.] JoinNode::alpha_activation ... ``` Now replace the huge list of WMEs in `ReteContext` with something more efficient, like a set for quick removal.
Greedy: 0.45 / rete: 0.76 I need to bench how much of the difference comes from `fromRete`, which spends a while rematerializing the internal rete state out into MLIR. ---- Performance counter stats for '/home/bollu/work/1-hoopl/build/release/bin/hoopl --bench-greedy /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir': 492.89 msec task-clock # 0.999 CPUs utilized 2 context-switches # 4.058 /sec 0 cpu-migrations # 0.000 /sec 20,683 page-faults # 41.963 K/sec 1,654,417,515 cycles # 3.357 GHz 2,501,363,745 instructions # 1.51 insn per cycle 533,254,192 branches # 1.082 G/sec 4,885,449 branch-misses # 0.92% of all branches 0.493355440 seconds time elapsed 0.452724000 seconds user 0.039877000 seconds sys ----- Performance counter stats for '/home/bollu/work/1-hoopl/build/release/bin/hoopl --bench-rete /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir': 761.01 msec task-clock # 0.999 CPUs utilized 9 context-switches # 11.826 /sec 0 cpu-migrations # 0.000 /sec 41,724 page-faults # 54.827 K/sec 2,496,300,059 cycles # 3.280 GHz 3,568,834,203 instructions # 1.43 insn per cycle 772,742,616 branches # 1.015 G/sec 6,290,441 branch-misses # 0.81% of all branches 0.761639491 seconds time elapsed 0.707385000 seconds user 0.053313000 seconds sys
We accelerate join node by keeping caches for a test node
This lets us process new tokens / new WMEs in This makes our perf report on 100,000 nodes look like:
If we remove the performance overhead of
|
_druid@siddharth-lean ~/phd/mlir-hoopl-rete/build/bin ‹master› ╰─$ LLVM_SYMBOLIZER_PATH=`which llvm-symbolizer-7` ./rete --bench-pdl ../../test/simple.mlir rete: /home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/mlir/lib/Rewrite/ByteCode.cpp:489: void (anonymous namespace)::Generator::generate(mlir::ModuleOp): Assertion `matcherFunc && rewriterModule && "invalid PDL Interpreter module"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: ./rete --bench-pdl ../../test/simple.mlir #0 0x000000000049ad43 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./rete+0x49ad43) #1 0x00000000004989fe llvm::sys::RunSignalHandlers() (./rete+0x4989fe) #2 0x000000000049b386 SignalHandler(int) (./rete+0x49b386) #3 0x00007f5d7e0a5ef0 __restore_rt (/nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libpthread.so.0+0x12ef0) #4 0x00007f5d7dbc7baa __GI_raise (/nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libc.so.6+0x3bbaa) #5 0x00007f5d7dbb2523 __GI_abort (/nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libc.so.6+0x26523) #6 0x00007f5d7dbb241f _nl_load_domain.cold.0 (/nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libc.so.6+0x2641f) #7 0x00007f5d7dbc05f2 (/nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libc.so.6+0x345f2) #8 0x0000000001226e2d mlir::detail::PDLByteCode::PDLByteCode(mlir::ModuleOp, llvm::StringMap<std::function<mlir::LogicalResult (llvm::ArrayRef<mlir::PDLValue>, mlir::ArrayAttr, mlir::PatternRewriter&)>, llvm::MallocAllocator>, llvm::StringMap<std::function<void (llvm::ArrayRef<mlir::PDLValue>, mlir::ArrayAttr, mlir::PatternRewriter&, mlir::PDLResultList&)>, llvm::MallocAllocator>) (./rete+0x1226e2d) #9 0x000000000121ef05 std::_MakeUniq<mlir::detail::PDLByteCode>::__single_object std::make_unique<mlir::detail::PDLByteCode, mlir::ModuleOp&, llvm::StringMap<std::function<mlir::LogicalResult (llvm::ArrayRef<mlir::PDLValue>, mlir::ArrayAttr, mlir::PatternRewriter&)>, llvm::MallocAllocator>, llvm::StringMap<std::function<void (llvm::ArrayRef<mlir::PDLValue>, mlir::ArrayAttr, mlir::PatternRewriter&, mlir::PDLResultList&)>, llvm::MallocAllocator> >(mlir::ModuleOp&, llvm::StringMap<std::function<mlir::LogicalResult (llvm::ArrayRef<mlir::PDLValue>, mlir::ArrayAttr, mlir::PatternRewriter&)>, llvm::MallocAllocator>&&, llvm::StringMap<std::function<void (llvm::ArrayRef<mlir::PDLValue>, mlir::ArrayAttr, mlir::PatternRewriter&, mlir::PDLResultList&)>, llvm::MallocAllocator>&&) (./rete+0x121ef05) <Up>[1] 23174 abort LLVM_SYMBOLIZER_PATH=`which llvm-symbolizer-7` ./rete --bench-pdl
╭─siddu_druid@siddharth-lean ~/phd/mlir-hoopl-rete/test ‹master●› ╰─$ mlir-opt pdl-simple.mlir -allow-unregistered-dialect -test-pdl-bytecode-pass 148 ↵ PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: mlir-opt pdl-simple.mlir -allow-unregistered-dialect -test-pdl-bytecode-pass #0 0x00000000008be623 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x8be623) #1 0x00000000008bc2de llvm::sys::RunSignalHandlers() (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x8bc2de) #2 0x00000000008bec16 SignalHandler(int) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x8bec16) #3 0x00007fad2398d730 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12730) #4 0x0000000000faa046 std::enable_if<!(std::is_convertible<mlir::ValueRange&, mlir::Operation*>::value), void>::type mlir::ResultRange::replaceAllUsesWith<mlir::ValueRange&>(mlir::ValueRange&) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-pr oject/build/bin/mlir-opt+0xfaa046) #5 0x0000000001862fac mlir::RewriterBase::replaceOp(mlir::Operation*, mlir::ValueRange) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x1862fac) #6 0x00000000018d6ad5 (anonymous namespace)::ByteCodeExecutor::execute(mlir::PatternRewriter&, llvm::SmallVectorImpl<mlir::detail::PDLByteCode::MatchResult>*, llvm::Optional<mlir::Location>) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-proj ect/build/bin/mlir-opt+0x18d6ad5) #7 0x00000000018d8ec1 mlir::detail::PDLByteCode::rewrite(mlir::PatternRewriter&, mlir::detail::PDLByteCode::MatchResult const&, mlir::detail::PDLByteCodeMutableState&) const (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mli r-opt+0x18d8ec1) #8 0x00000000018f15b6 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResu lt (mlir::Pattern const&)>) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x18f15b6) #9 0x00000000017aab4c mlir::applyPatternsAndFoldGreedily(llvm::MutableArrayRef<mlir::Region>, mlir::FrozenRewritePatternSet const&, mlir::GreedyRewriteConfig) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x17aab4c ) signed int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x174f404) +0x1725bf0) try&, llvm::ThreadPool*) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x1723dfa) ol, bool, bool) (/home/siddu_druid/phd/mlir-hoopl-rete/llvm-project/build/bin/mlir-opt+0x1723aaa) [2] 27957 segmentation fault mlir-opt pdl-simple.mlir -allow-unregistered-dialect -test-pdl-bytecode-pass ╭─siddu_druid@siddharth-lean ~/phd/mlir-hoopl-rete/test ‹master●›