Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
In this feature branch we fix hashtable problems and maybe reimplement token storing and indexing.
In master, there is a problem with hashtables that are used for token storing and indexing. Long
time ago we decided to compute hash keys for tokens and use these hash keys for storing tokens with
no checking of collisions. This was a hack, but it worked well for a long time with no bugs and thus
it was completely forgotten. Now with Mikko's stream processing examples we have finally started to
get collisions and it is time to fix this.
We should rethink how tokens are stored and indexed. Is the current approach sane?
Take 2: Trying to use
- First, we should test that this thing works. It should be pretty easy. For example (see src/util/hashtable-extension-test.lisp):
a) Make a hash function that return a few values: 0 for lists, 1 for numbers, 2 for other
b) Make a comparison function, which calls eql
(defun my-hash-table-test (a b) (eql a b))
(defun my-hash-function (x) (cond ((numberp x) 0)
((listp x) 1)
(t 2)))
(sb-ext:define-hash-table-test my-hash-table-test my-hash-function)
(defun my-hash-table-experiment (n)
(let ((table (make-hash-table :test #'my-hash-table-test)))
(loop for i from 0 below n
for item = (gethash i table)
when item
do (format t "~&!!!Got item = ~S~%" item)
else do (setf (gethash i table) (list i)))
(loop for i from 0 below n
for item = (gethash i table)
do (format t "~&Got item = ~S~%" item)
do (setf (cdr item) (list i)))
(loop for i from 0 below n
for item = (gethash i table)
do (format t "~&Got item = ~S~%" item))))
Seems to work
- Second, get rid of the mutable variables that are added to tokens. These come from
a) existence-start-node (active-p-var, counter-var)
add-initial-data: (make-token this (make-singleton-token) (list active-p-var counter-var) (list nil 0))
b) exists-start-node, optional-start-node
add-token: (make-token this token (list active-p-var counter-var) (list t 0))
(setf (token-value this new-token active-p-var) nil)
remove-token: (setf (token-value this stored-token (existence-active-p-var this)) t)
(setf (token-value this stored-token (existence-active-p-var this)) nil)
c) exists-end-node, optional-end-node
add-token: (incf (token-value this token (existence-counter-var start-node)))
remove-token (decf (token-value this token (existence-counter-var start-node)))
d) filter memory (prev-value-var)
(setf token (make-token this token (list prev-value-var) (list new-value)))
(setf (token-value this stored-token prev-value-var) new-value)
e) aggregate-join-node:
aggregate-join-get-group: (make-token this token (list (aggregate-join-group-var this)) (list group))
Items a-c can be fixed by adding token-indexed tables to nodes:
* existence-start-node-status: token -> (active-p, counter)
Item d similarly:
* filter-memory-previous-value: token -> value
Item e:
* aggregate-join-node-token-group: token -> group
A solution:
- Refactor into two token-related base classes: token-store and token-map. The former we get by renaming memory to token-store. For the latter,
token-map has a hashtable from tokens to items, which are (i) (active-p . counter) -pairs in existence start node and (ii) prev-value in filter-memory.
Rename filter-memory to filter-with-previous-value or something.
- Creating token-store: make-hash-table :test #'token= :hash-function #'token-hash
- Creating token-map: make-hash-table :test #'token= :hash-function #'token-hash
- Indices: make-hash-table :test #'rdf-term-list= :hash-function #'rdf-term-list-hash
Replace key in parameters of index functions by rdf-term-list
We can use 1) "Finding code to fix" below to.
Take 1: Did not complete this, because it would be messy.
1) Finding code to fix
To begin with, we collect all files that refer to hashtable or gethash. We get
src/main/compile-sparql-file.lisp: sane
src/parser/srx-parser.lisp: sane
src/rete/indices.lisp: tainted
src/rete/interpreter.lisp: tainted
- create-stores-and-indices: maybe fix calls
- add-initial-data: maybe fix calls
- uses join-beta-index and join-alpha-index, which are tainted
- initialize-reporting: maybe fix calls
- report-sizes: maybe fix calls
- add/remove(-alpha/beta)-token: Check index handling calls
- aggregate-join-get-group: Check index handling calls
- process-query-input: tainted
src/rete/store.lisp: tainted
- store-get-token: tainted
* We should use collision list here.
* The entry in the hashtable should contain a list of tokens with the same hashkey.
* We should filter the token from these entries.
* We can then (hopefully) drop all checks.
src/rete/triple-pattern-matcher.lisp: tainted
- gethash-or-else-update: tainted
* We should use collision list here.
* The entry in the hashtable should contain a list of tokens with the same hashkey.
* We should filter the token from these entries.
src/sparql/sparql-classes.lisp: tainted
- hashkeyed: tainted
- compute-hashkey: tainted
- get-hashkey: tainted
- rdf-term: tainted
- sparql-var: tainted
- make-uniquely-named-object: sane
- find-type-descriptor: sane
- add-sparql-op-library: sane
- find-sparql-op-library: sane
- add-sparql-op: sane
- find-sparql-op: sane
- list-sparql-ops: sane
src/sparql/sparql-helper-functions.lisp: tainted
- quad-hash-key: tainted
- rdf-graphs-isomorphic-p: tainted
src/sparql/sparql-macros.lisp: sane
- define-xsd-value-type: sane
src/util/misc.lisp: sane
- maph: sane
src/util/table-bindings.lisp: tainted
- hash-characters: tainted
- add-table-binding: tainted
- get-table-binding: tainted
To summarize, the tainted code uses sxhash-based numbers as keys to hashtables. This must be
accompanied by collision lists.
2) Order of fixing this.
Since the grievances have started in token store, maybe we should start there. It should be rather
easy, because all methods have token, not the dangerous sxhash number, as a parameter. Thus no changes
to calls should be required.
To be safe, lets find all the calls to the store methods to see how store is used.
The methods can be listed in zsh: fgrep defgeneric src/rete/store.lisp|awk '{print $2}':
To get the calls, we run in zsh: fgrep -fl <(fgrep defgeneric src/rete/store.lisp|awk '{print $2}') src/**/*.lisp
(src/cl-ll1/datastruct.lisp, wrong hit)
(src/cl-ll1/generator.lisp, wrong hit)
(src/parser/sparql-parser.lisp, wrong hit)
- The calls are OK.
(src/rete/rete-classes.lisp, wrong hit)
- The methods are defined here
A regexp for finding the uses in a file:
Before we start, lets compile and run tests. 586 tests passed as before.
Now we are ready. Lets add a collision list to the store.
Done. After making the code for store-tokens to produce a fresh list, passes 586 tests as before!
But the stuff does not work with Mikko's inputs! It seems that token-equal does not work.
NOTE! In add-token etc. we could replace the pattern (unless (store-get-token ..) (store-put-token ...) ...) by something that does the test only once.