Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider mutable bindings map for perf #416

Closed
borkdude opened this issue Sep 19, 2020 · 12 comments
Closed

Consider mutable bindings map for perf #416

borkdude opened this issue Sep 19, 2020 · 12 comments

Comments

@borkdude
Copy link
Collaborator

See joinr@5b18289
Needs perf benchmarking.

cc @joinr

@borkdude
Copy link
Collaborator Author

I have considered using mutable bindings, but this will complicate multi-threaded code. Not sure how to handle this and it if's worth it yet...

@joinr
Copy link

joinr commented Dec 20, 2020

I think concurrent hashmap would probably be okay, but who knows.

@borkdude
Copy link
Collaborator Author

borkdude commented Dec 20, 2020

If we would have one concurrent hash-map for the entire environment, what would happen in the following:

(let [x 1] ;; push bindings onto concurrent hash-map
  (future (Thread/sleep 1000) x) ;; try to resolve x from concurrent hash-map... but already has been popped?
  ) ;; end of body, pop bindings from concurrent hash-map

@joinr
Copy link

joinr commented Dec 21, 2020

Good point. I did not include futures and similar threading in my original scope of thought for the interpreter. Immutable bindings definitely simplify this process. For the single threaded context, mutable bindings wouldn't be a problem. Alternately, detecting when threads and doing some copying (no idea how cost prohibitive this would be). Hmm.

@borkdude
Copy link
Collaborator Author

borkdude commented Jan 8, 2021

@joinr I think InheritableThreadLocal might offer a way here:

user=> (def tl (proxy [java.lang.InheritableThreadLocal] [] (childValue [v] (.clone ^java.util.Map v))))
#'user/tl
user=> (.set tl (java.util.HashMap. {:a 1 :b 2}))
nil
user=> (.get tl)
{:a 1, :b 2}
user=> (.put (.get tl) :b 3)
2
user=> (.get tl)
{:a 1, :b 3}
user=> @(future (.put (.get tl) :b 4) (.get tl))
{:a 1, :b 4}
user=> (.get tl)
{:a 1, :b 3}

@borkdude
Copy link
Collaborator Author

borkdude commented Jan 24, 2022

A better approached is demonstrated in uclj by @erdos and I think that approach can be successfully translated to SCI. It uses two arrays for a function: one contains the closed over environment (let's call it env-array) and one array contains both the enclosed environment and the function args (let's call it args-array). Argument + env lookups are then interpreted as direct lookups in args-array array using aget. Recur is implemented as a series of aset on the args-array.

This should give us a 3x-10x or so speedup for loops compared to the immutable map approach.

@joinr
Copy link

joinr commented Jan 25, 2022

I'm not an interpreter viking, but is there any tradeoff to going for arrays vs a mutable map? I guess if you can perfectly resolve all references and statically bind them to indices then you should be fine right? Arrays also open up some possibilities for primitive args and results.

@joinr
Copy link

joinr commented Jan 25, 2022

Ah I think I see what's going on w.r.t. compile-time lexical analysis and variable resolution. similar to https://craftinginterpreters.com/local-variables.html

@borkdude
Copy link
Collaborator Author

borkdude commented Jan 25, 2022

@joinr I suspect using pre-allocated arrays will have even better performance.

user=> (time (let [opts {:a :foo}] (dotimes [i (Math/pow 10 9)] (:a opts))))
"Elapsed time: 4674.451531 msecs"
nil
user=> (defrecord Foo [a])
user.Foo
user=> (time (let [opts (->Foo 1)] (dotimes [i (Math/pow 10 9)] (:a opts))))
"Elapsed time: 2153.796838 msecs"
nil
user=> (time (let [opts (java.util.HashMap. {:a 1})] (dotimes [i (Math/pow 10 9)] (.get ^java.util.Map opts :a))))
"Elapsed time: 3083.784156 msecs"
nil
user=> (time (let [opts (into-array [:foo :bar])] (dotimes [i (Math/pow 10 9)] (aget ^objects opts 0))))
"Elapsed time: 267.298268 msecs"

@borkdude
Copy link
Collaborator Author

@joinr Kind of. The solution @erdos has is even more efficient, there is no pop necessary.

@joinr
Copy link

joinr commented Jan 26, 2022

IIRC it's always a bit faster to invoke the map as a function instead of keyword-as-function (I'm on java 8, ymmv):
some micros from this repo

(let [opts {:a :foo}] (c/quick-bench (:a opts)))
;;Evaluation count : 63058308 in 6 samples of 10509718 calls.
;;             Execution time mean : 7.798485 ns

(let [opts {:a :foo}] (c/quick-bench (opts :a)))
;;Evaluation count : 86278110 in 6 samples of 14379685 calls.
;;             Execution time mean : 5.187680 ns

(let [opts (java.util.HashMap. {:a 1})] (c/quick-bench (.get ^java.util.Map opts :a)))
;;Evaluation count : 67981740 in 6 samples of 11330290 calls.
;;             Execution time mean : 6.905192 ns

(let [opts (java.util.HashMap. {:a 1})] (c/quick-bench (.get ^java.util.Map opts :a)))
;;Evaluation count : 70373034 in 6 samples of 11728839 calls.
;;             Execution time mean : 6.857820 ns

(defrecord Foo [a])

(let [opts (->Foo 1)] (c/quick-bench (:a opts)))
;;Evaluation count : 93603156 in 6 samples of 15600526 calls.
;;Execution time mean : 4.542764 ns

(let [opts (->Foo 1)] (c/quick-bench (.a ^Foo opts)))
;;Evaluation count : 119118888 in 6 samples of 19853148 calls.
;;Execution time mean : 3.535142 ns

(r/fastrecord FooFast [a])

(let [opts (->FooFast 1)] (c/quick-bench (:a opts)))
;;Evaluation count : 93603156 in 6 samples of 15600526 calls.
;;Execution time mean : 4.542764 ns

(let [opts (->FooFast 1)] (c/quick-bench (opts :a)))
;;Evaluation count : 102556866 in 6 samples of 17092811 calls.
;;Execution time mean : 4.012662 ns

(let [opts (into-array [:foo :bar])]
  (c/quick-bench (aget ^objects opts 0)))
;;Evaluation count : 108937308 in 6 samples of 18156218 calls.
;;Execution time mean : 3.812408 ns

Arrays appear to edge out field access on classes a bit (judging by sub-nanos though...)

borkdude added a commit that referenced this issue Feb 14, 2022
Takes care of approx 2-4x speedup in loops.
@borkdude
Copy link
Collaborator Author

Merged a first pass of using mutable arrays for bindings to master. Loops are roughly 2-4 faster (depending on JVM, GraalVM, v8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants