Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early benchmarking of rete versus greedy pattern rewriter #2

Merged
merged 11 commits into from
Dec 15, 2021

Commits on Dec 11, 2021

  1. Early benchmarking, rete is much slower

    ------------
    Performance counter stats for
    '/home/bollu/work/1-hoopl/build/release/bin/hoopl
       --bench-rete
       /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir':
    
              2,684.85 msec task-clock                #    0.998 CPUs utilized
                   379      context-switches          #  141.163 /sec
                     1      cpu-migrations            #    0.372 /sec
                 5,149      page-faults               #    1.918 K/sec
        8,30,20,19,196      cycles                    #    3.092 GHz
        6,66,40,28,715      instructions              #    0.80  insn per cycle
        1,62,77,59,783      branches                  #  606.277 M/sec
             45,28,371      branch-misses             #    0.28% of all branches
    
           2.691142846 seconds time elapsed
    
           2.353686000 seconds user
           0.319053000 seconds sys
    ------------
    
    Performance counter stats for
    '/home/bollu/work/1-hoopl/build/release/bin/hoopl
      --bench-greedy
      /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir':
    
                268.81 msec task-clock                #    0.989 CPUs utilized
                    59      context-switches          #  219.487 /sec
                     0      cpu-migrations            #    0.000 /sec
                 4,001      page-faults               #   14.884 K/sec
          83,33,77,093      cycles                    #    3.100 GHz
          77,91,95,474      instructions              #    0.93  insn per cycle
          16,33,50,158      branches                  #  607.683 M/sec
             25,91,387      branch-misses             #    1.59% of all branches
    
           0.271878822 seconds time elapsed
    
           0.122807000 seconds user
           0.146161000 seconds sys
    ----------------------------------
    
      36.98%  hoopl    hoopl                                [.] alpha_memory_activation
      31.69%  hoopl    hoopl                                [.] BetaTokensMemory::join_activation
      14.04%  hoopl    hoopl                                [.] std::__cxx11::list<WME*, std::allocator<WME*> >::remove
       1.42%  hoopl    [kernel.vmlinux]                     [k] syscall_exit_to_user_mode
       0.97%  hoopl    ld-2.33.so                           [.] do_lookup_x
       0.74%  hoopl    [kernel.vmlinux]                     [k] entry_SYSCALL_64
       0.73%  hoopl    [kernel.vmlinux]                     [k] syscall_return_via_sysret
       0.46%  hoopl    [kernel.vmlinux]                     [k] preempt_count_add
       0.43%  hoopl    [kernel.vmlinux]                     [k] _raw_read_unlock_irqrestore
       0.41%  hoopl    ld-2.33.so                           [.] strcmp
       0.40%  hoopl    [kernel.vmlinux]                     [k] n_tty_write
       0.31%  hoopl    [kernel.vmlinux]                     [k] ep_poll_callback
       0.30%  hoopl    [kernel.vmlinux]                     [k] tty_write
       0.28%  hoopl    hoopl                                [.] toRete
       0.28%  hoopl    [kernel.vmlinux]                     [k] _raw_spin_lock_irqsave
       0.26%  hoopl    [kernel.vmlinux]                     [k] preempt_count_sub
       0.25%  hoopl    [kernel.vmlinux]                     [k] _raw_read_lock_irqsave
       0.24%  hoopl    [kernel.vmlinux]                     [k] __wake_up_common
       0.23%  hoopl    [kernel.vmlinux]                     [k] native_queued_spin_lock_slowpath
       0.18%  hoopl    libc-2.33.so                         [.] _int_malloc
       0.17%  hoopl    [kernel.vmlinux]                     [k] _raw_spin_unlock_irqrestore
       0.16%  hoopl    [kernel.vmlinux]                     [k] queue_work_on
       0.14%  hoopl    [kernel.vmlinux]                     [k] __fsnotify_parent
       0.14%  hoopl    [kernel.vmlinux]                     [k] apparmor_file_permission
       0.14%  hoopl    [kernel.vmlinux]                     [k] vfs_write
       0.14%  hoopl    [kernel.vmlinux]                     [k] insert_work
       0.13%  hoopl    [kernel.vmlinux]                     [k] __audit_syscall_exit
       0.13%  hoopl    [kernel.vmlinux]                     [k] __check_object_size
       0.13%  hoopl    [kernel.vmlinux]                     [k] update_rq_clock
       0.13%  hoopl    [kernel.vmlinux]                     [k] try_to_wake_up
       0.12%  hoopl    [kernel.vmlinux]                     [k] pty_write
       0.12%  hoopl    [kernel.vmlinux]                     [k] resched_curr
       0.12%  hoopl    [kernel.vmlinux]                     [k] tty_insert_flip_string_fixed_flag
       0.11%  hoopl    ld-2.33.so                           [.] _dl_map_object
       0.11%  hoopl    [kernel.vmlinux]                     [k] select_task_rq_fair
    bollu committed Dec 11, 2021
    Configuration menu
    Copy the full SHA
    bc96ee9 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2021

  1. Configuration menu
    Copy the full SHA
    34d8537 View commit details
    Browse the repository at this point in the history
  2. more gitignores

    bollu committed Dec 12, 2021
    Configuration menu
    Copy the full SHA
    cf9fc09 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1e5905f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7d2c79e View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2021

  1. Configuration menu
    Copy the full SHA
    2428998 View commit details
    Browse the repository at this point in the history
  2. [WIP] accelerate join node by keeping caches

    for a test node (test-join WME[wme-ix] == Token[tok-ix][tok-field-ix]),
    keep data structures:
    - val2WMEs: value -> set<WME>
         (invariant: ∀ wme ∈ val2WMEs[v], wme[wme-ix] == v)
    
    - val2Toks: value -> set<Token>
         (invariant: ∀ tok ∈ val2Toks[v], tok[tok-ix][tok-field-ix] == v)
    
    This lets us process new tokens / new WMEs in O(# of real joins).
    
    This makes our perf report look like:
    
    ```
      43.01%  hoopl    hoopl                             [.] std::__cxx11::list<WME*, std::allocator<WME*> >::remove
       5.89%  hoopl    [kernel.vmlinux]                  [k] syscall_exit_to_user_mode
       3.54%  hoopl    ld-2.33.so                        [.] do_lookup_x
       2.64%  hoopl    [kernel.vmlinux]                  [k] syscall_return_via_sysret
       2.21%  hoopl    [kernel.vmlinux]                  [k] entry_SYSCALL_64
       1.47%  hoopl    ld-2.33.so                        [.] strcmp
       1.35%  hoopl    [kernel.vmlinux]                  [k] n_tty_write
       1.23%  hoopl    [kernel.vmlinux]                  [k] _raw_spin_lock_irqsave
       1.13%  hoopl    hoopl                             [.] JoinNode::alpha_activation
       ...
    ```
    
    Now replace the huge list of WMEs in `ReteContext` with something more efficient, like a
    set for quick removal.
    bollu committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    9459eae View commit details
    Browse the repository at this point in the history
  3. We keep up with greedy rewriter in asymptotics.

    Greedy: 0.45 / rete: 0.76
    
    I need to bench how much of the difference comes from `fromRete`, which
    spends a while rematerializing the internal rete state out into MLIR.
    
    ----
    
     Performance counter stats for '/home/bollu/work/1-hoopl/build/release/bin/hoopl --bench-greedy /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir':
    
                492.89 msec task-clock                #    0.999 CPUs utilized
                     2      context-switches          #    4.058 /sec
                     0      cpu-migrations            #    0.000 /sec
                20,683      page-faults               #   41.963 K/sec
         1,654,417,515      cycles                    #    3.357 GHz
         2,501,363,745      instructions              #    1.51  insn per cycle
           533,254,192      branches                  #    1.082 G/sec
             4,885,449      branch-misses             #    0.92% of all branches
    
           0.493355440 seconds time elapsed
    
           0.452724000 seconds user
           0.039877000 seconds sys
    
     -----
    
     Performance counter stats for '/home/bollu/work/1-hoopl/build/release/bin/hoopl --bench-rete /home/bollu/work/1-hoopl/test/rand-program-seed-0.mlir':
    
                761.01 msec task-clock                #    0.999 CPUs utilized
                     9      context-switches          #   11.826 /sec
                     0      cpu-migrations            #    0.000 /sec
                41,724      page-faults               #   54.827 K/sec
         2,496,300,059      cycles                    #    3.280 GHz
         3,568,834,203      instructions              #    1.43  insn per cycle
           772,742,616      branches                  #    1.015 G/sec
             6,290,441      branch-misses             #    0.81% of all branches
    
           0.761639491 seconds time elapsed
    
           0.707385000 seconds user
           0.053313000 seconds sys
    bollu committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    4ccd0f3 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e80cff7 View commit details
    Browse the repository at this point in the history
  5. switch to program of size 1e5

    bollu committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    12c9565 View commit details
    Browse the repository at this point in the history
  6. call remove WME

    bollu committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    ebcc93b View commit details
    Browse the repository at this point in the history