Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set-cdr! can cause list? to crash in safe mode #599

Closed
LiberalArtist opened this issue Dec 8, 2021 · 7 comments
Closed

set-cdr! can cause list? to crash in safe mode #599

LiberalArtist opened this issue Dec 8, 2021 · 7 comments

Comments

@LiberalArtist
Copy link
Contributor

This program (by @mflatt, from racket/racket#4072 (comment)) creates an engine that calls list? on a pair which is initially a list. While the engine is suspended, it uses set-cdr! to mutate one of the interior pairs so that it is no longer a list. If the interruption comes at just the right moment, resuming the engine will lead list? to perform an invalid memory reference: the program iteratively finds an amount of fuel that will reveal the error.

(define N 10000)

(define ml (let loop ([i N])
             (if (zero? i)
                 '()
                 (cons i (loop (sub1 i))))))
(define mid (list-tail ml (floor (* N 1/4))))
(define mid-rest (cdr mid))

(define Q 100)

(let find-crash ([j 0])
  (define e (make-engine (lambda () (list? ml))))
  (printf "~s\n" j)
  (set-cdr! mid mid-rest)       
  (e (floor (* (/ j Q) N))
     (lambda (t v) (printf "done! ~s\n" v))
     (lambda (e2)
       (set-cdr! mid 0)
       (e2 N (lambda (t v) (printf "~s\n" v)) (lambda (e2) "still not done"))
       (unless (= j Q)
         (find-crash (add1 j))))))

Example output:

Chez Scheme Version 9.5.4
Copyright 1984-2020 Cisco Systems, Inc.

0
#f
1
#f
2
#f
3
#f
4
#f
5
#f
6
#f
7
#f
8
#f
9
#f
10
#f
11
#f
12
#f
13
Exception: invalid memory reference.  Some debugging context lost
@jltaylor-us
Copy link
Contributor

Thanks for the interesting bug report!

The implementation of list? is not safe in the presence of concurrent modification (via threads or engines). Since it's part of the Chez standard libraries, it's compiled with optimize-level 3, so it really crashes when something goes wrong. If it were compiled at 02 or below then it would still error, but with Exception in cdr: 0 is not a pair.

A fix for this particular case is simple: add a check for (pair? tortoise) to the and just before calling (cdr tortoise). That's obviously less efficient in the common case, but necessary if the function is supposed to be thread or engine -safe. The same tortoise/hare idiom to detect circular lists is used several other places, and one could reasonably assume they all suffer from the same deficiency.

I think maintainers probably need a little discussion to decide whether these functions are intended to be correct in the presence of concurrent modification and if so to identify other functions with the same issue.

@LiberalArtist
Copy link
Contributor Author

I'd also point to some other nuances highlighted by @gus-massa in racket/racket#4072 (comment) (mcdr, mpair?, and mlist are Racket's spellings for cdr, pair?, and list with mutable pairs.):

I agree that checking (mpair? tortoise) would be nice. Anyway, some hard questions:

If the user replace a mcdr between the tortoise and the hare with ...

... null, then the result should be #t or #f?

... 7l, then the result should be #t or #f?

... a cyclic pseudo-mlist and the original object is another cyclic pseudo-mlist, then the check never ends. Is this fixable? Do we care?

@jltaylor-us
Copy link
Contributor

So far our general consensus is that there is no well-defined result possible in the face of concurrent modification. (My first proposed solution does have the distinction of being definitely bad, in that it could answer #f for a list whose only edit was to change a cell's tail from pointing to one proper list to another. A better solution would be to replace tortoise with hare if it suddenly became a non-pair.)

The only real well-defined behavior I've been able to come up with is to raise an exception if we happen to detect concurrent modification. I'll let you guess what the reaction was when I asked what sort of performance impact it would have to start each call to list? with a call1/cc.

So far it's leaning towards not changing the behavior, but other maintainers could still weigh in with different opinions.

@mnieper
Copy link
Contributor

mnieper commented Nov 4, 2022

NB: In SRFI 226, I propose a condition &concurrent-modification so that an implementation can raise an exception of this type when it detects a concurrent modification.

@Sshanfeng
Copy link

Sshanfeng commented Nov 4, 2022 via email

@mflatt
Copy link
Contributor

mflatt commented Dec 19, 2022

After returning to this issue, I'd like to advocate in favor of the change that @jltaylor-us describes.

I don't see how list operations in general can be well-defined if the list is modified concurrently — at least, not without something dramatically more complex/expensive, like tracking all reads and writes. The solution described above takes care to return #t if the list is extended in a particular way while the list? predicate is in flight, but it would still possible to expose the tortoise—hare internals by carefully moving a pair from the early in the list to later, so that the hare and tortoise are the same and #f is returned, even if all the intermediate states are lists.

So, if we agree that list operations are undefined in the presence of mutation, then list? is free to return a random boolean in the event of mutation. It's nice when safe mode can detect and report an error for undefined behavior, but that's not always so easy — as in this case, since allowing an exception interferes with optimizations — and so it's not always done. But avoiding memory faults is more of a priority.

mflatt added a commit to racket/ChezScheme that referenced this issue Dec 19, 2022
This is the solution described in cisco#599.

One extra test in the `list?` implementation is cheap, since it's a
highly predictable branch. The effect of that check is maybe less
important, since the behavior of list operations just has to be
undefined when the list is mutated concurrent to the operation. It's
nice to raise an exception when undefined behavior is detected in safe
mode, but sometimes that's too expensive; then, any convenient
behavior is ok, but avoiding memory faults is a priority.
mflatt added a commit to mflatt/racket that referenced this issue Dec 19, 2022
This is the solution described in cisco/ChezScheme#599.

One extra test in the `list?` implementation is cheap, since it's a
highly predictable branch. The effect of that check is maybe less
important, since the behavior of list operations just has to be
undefined when the list is mutated concurrent to the operation. It's
nice to raise an exception when undefined behavior is detected in safe
mode, but sometimes that's too expensive; then, any convenient
behavior can be ok, but avoiding memory faults is a priority.
mflatt added a commit to mflatt/racket that referenced this issue Dec 19, 2022
This is the solution described in cisco/ChezScheme#599.

One extra test in the `list?` implementation is cheap, since it's a
highly predictable branch. The effect of that check is maybe less
important, since the behavior of list operations just has to be
undefined when the list is mutated concurrent to the operation. It's
nice to raise an exception when undefined behavior is detected in safe
mode, but sometimes that's too expensive; then, any convenient
behavior can be ok, but avoiding memory faults is a priority.
mflatt added a commit to mflatt/ChezScheme that referenced this issue Oct 10, 2023
This is the solution described in cisco#599.

One extra test in the `list?` implementation is cheap, since it's a
highly predictable branch. The effect of that check is maybe less
important, since the behavior of list operations just has to be
undefined when the list is mutated concurrent to the operation. It's
nice to raise an exception when undefined behavior is detected in safe
mode, but sometimes that's too expensive; then, any convenient
behavior is ok, but avoiding memory faults is a priority.

Original commit: racket/ChezScheme@26f79b5
mflatt added a commit to mflatt/ChezScheme that referenced this issue Oct 10, 2023
This is the solution described in cisco#599.

One extra test in the `list?` implementation is cheap, since it's a
highly predictable branch. The effect of that check is maybe less
important, since the behavior of list operations just has to be
undefined when the list is mutated concurrent to the operation. It's
nice to raise an exception when undefined behavior is detected in safe
mode, but sometimes that's too expensive; then, any convenient
behavior is ok, but avoiding memory faults is a priority.

Original commit: racket/ChezScheme@0d9db98
@mflatt
Copy link
Contributor

mflatt commented Nov 21, 2023

The changes that @jltaylor-us described were made as part of the v9.9.9 merge.

@mflatt mflatt closed this as completed Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants