Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Using Rust's HashMap for a LispHashTable #276

Closed
DavidDeSimone opened this issue Jul 26, 2017 · 6 comments
Closed

[RFC] Using Rust's HashMap for a LispHashTable #276

DavidDeSimone opened this issue Jul 26, 2017 · 6 comments

Comments

@DavidDeSimone
Copy link
Collaborator

Rendered

@birkenfeld
Copy link
Collaborator

Very nice writeup! It would be cool to try (but that's easy to say when you're not the one doing it).

As for rooting: not that I oppose an API to do explicit rooting, but doesn't the GC scan the stack and mark any LispObjects found there anyway? (Remembering my brief past forays into Emacs C code, there used to be these GCPRO macros that C functions use to make sure their variables weren't collected, but they're completely gone now.)

@DavidDeSimone
Copy link
Collaborator Author

Thanks for the feedback @birkenfeld . It looks like you are correct that the lisp GC does scan the stack and mark what it believes are Lisp_Objects. The GC calls mark_threads defined in thread.c which comes back to mark_stack in alloc.c. I am going to play around with this to make sure that it is recognizing Rust LispObjects, but assuming that all works, I think we can remove the rooting API from the spec.

@Wilfred
Copy link
Collaborator

Wilfred commented Jul 30, 2017

This looks great, thanks for taking the time to write this up.

Performance

I would be interested in benchmarks, to see if Rust's HashMap is faster regardless of how a user tunes rehash-size and rehash-threshold. Naively I think HashMap may be faster, because it's had more people use it in more situations.

We should probably consider a hash function like fnv. I don't believe there's any DoS protection in the current hashmap implementation, so we should take the faster hash function.

Compatibility

rehash-size and rehash-threshold are exposed to user elisp, so we need to preserve backwards compatibility.

We should probably just return placeholder values, so something like:

;; ignore :rehash-size and :rehash-threshold
ELISP> (setq h (make-hash-table :rehash-size 1.9 :rehash-threshold 0.9))
;; don't bother showing rehash-size or rehash-threshold in the
;; printed representation
#s(hash-table size 65 test eql data ())

ELISP> (hash-table-rehash-size h)
nil ;; or 1.5?
ELISP> (hash-table-rehash-threshold h)
nil ;; or 0.8125?

Note that the default value of rehash-threshold has changed from 0.8 to 0.8125 in Emacs 26 (see commits 7207b63 and 83c9c6f).

Usage of rehash-size and rehash-threshold

Googling, I'm unable to find any examples of people actually customising these values. I can find lots of examples of tracebacks which happen to show hash tables:

#s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8 data ())

But the only code specifying :rehash-size and :rehash-threshold I can find is:

;; https://github.com/sigma/pcache/blob/master/pcache.el#L189-L191
;; this is just creating a hash table with the same properties as another
    (oset cache :entries (make-hash-table :test test :rehash-size resize
                                          :rehash-threshold threshold
                                          :weakness weakness)))

;; https://www.emacswiki.org/emacs/c-eldoc.el
;; This just uses the default Emacs 25 values
  (defun* cache-make-cache (init-fun test-fun cleanup-fun
                                     &optional &key
                                     (test #'eql)
                                     (size 65)
                                     (rehash-size 1.5)
                                     (rehash-threshold 0.8)
                                     (weakness nil))

Core Emacs seems to be the only place where :rehash-size is used, and core doesn't use :rehash-threshold.

lisp/registry.el
153:	    (make-hash-table :size 10000 :rehash-size 2.0 :test 'equal)))
155:      (setq tracker (make-hash-table :size 100 :rehash-size 2.0)))))
186:		 (make-hash-table :size 800 :rehash-size 2.0 :test 'equal)

lisp/progmodes/verilog-mode.el
8035:	 (let ((ht (make-hash-table :test 'equal :rehash-size 4.0))
8036:	       (ht-not (make-hash-table :test 'equal :rehash-size 4.0))
8065:	 (let ((ht (make-hash-table :test 'equal :rehash-size 4.0))
8093:	 (let ((ht (make-hash-table :test 'equal :rehash-size 4.0))
9417:	    (make-hash-table :test 'equal :rehash-size 4.0)))))
10099:			 (make-hash-table :test 'equal :rehash-size 4.0)))

lisp/emacs-lisp/cl-print.el
246:  (let ((print-number-table (make-hash-table :test 'eq :rehash-size 2.0)))

@DavidDeSimone
Copy link
Collaborator Author

Thanks for the feedback @Wilfred . I will be working today/tomorrow on some bench marks to see what the performance characteristics look like. I'll post them once I have all the data, and I'll update the RFC as well.

@DavidDeSimone
Copy link
Collaborator Author

Performance Report:

So I wrote up a basic performance test around basic put/get functionality for a prototype hashmap vs the current implementation of a hashtable. I have attached the elisp code I used to this post. So far, the numbers are coming out ahead for using Rust's HashMap (specifically the Fnv hashing suggested by Wilfred.)

I disabled GC at the C layer in my build of remacs, since I haven't hooked up my prototype LispHashMap object into the GC yet, and so I didn't view it to be an apples to apples comparison if I have the LispHashTables generating garbage and causing GCs, while the LispHashMaps were not. This is why I am caring the result of benchmark-run and ignoring the gc section of that list.

Test one involved iteratively inserting a key/value pair, and then immediately getting it from the map. Test two involved putting a key into the map a single time, and then repeatedly getting it from the map. Test three involved putting a key into the map. The results were

Round One: 
"Difference between map and table for equal get/put: 12.216275%"
"Difference between map and table for iterative get: 7.466898%"
"Difference between map and table for iterative put: 35.321307%"

Round Two: 
"Difference between map and table for equal get/put: 15.066796%"
"Difference between map and table for iterative get: 6.155819%"
"Difference between map and table for iterative put: 39.435811%"

Round Three: 
"Difference between map and table for equal get/put: 18.082451%"
"Difference between map and table for iterative get: 7.627629%"
"Difference between map and table for iterative put: 34.178090%"

If I map the results of (random) to number between 0-2048 before inserting/getting from the table, I get a different picture:

Round 1:
"Difference between map and table for equal get/put: 5.926061%" 
"Difference between map and table for iterative get: 5.750925%"
"Difference between map and table for iterative put: 1.785505%"
Round 2: 
"Difference between map and table for equal get/put: 2.346474%" 
"Difference between map and table for iterative get: 9.283818%"
"Difference between map and table for iterative put: 5.088501%"
Round 3: 
"Difference between map and table for equal get/put: 5.992483%"
"Difference between map and table for iterative get: 8.711063%"
"Difference between map and table for iterative put: 1.645166%"

A far less pronounced, but still noticeable and measurable difference.

Based on these results, I think it makes sense to go forward on using Rust's HashMap over porting the current C layer hash map.

benchmark.zip

@DavidDeSimone
Copy link
Collaborator Author

After playing around with it most of the day, I am going to remove the Rust GC portion of this RFC, so that I can focus on the HashMap portion of the RFC. This PR will be large and different enough that we don't need to complicate it with the start of another large project. I will update the RFC to detail how I will be handling managing the LispHashTables even with it's rust allocation.

@DavidDeSimone DavidDeSimone changed the title [RFC] Using Rust's HashMap for a LispHashTable, and the start of porting the Lisp GC to Rust. [RFC] Using Rust's HashMap for a LispHashTable Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants