New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler (optimization?) bug for empty R6RS records #226

Closed
christoff-buerger opened this Issue Oct 22, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@christoff-buerger

christoff-buerger commented Oct 22, 2017

I am using empty (i.e., without any fields), opaque, sealed R6RS-records in my RACR library to create unique key entities that can be used to ensure any kind of comparison with eq?, eqv? or equal? is only #t for the instance itself. The compiler of Chez Scheme is broken for such empty record instances. It looks like it reduces all created instances to a single unique instance. Petite Chez Scheme does not suffer from this bug. I implemented a workaround in my library (see christoff-buerger/racr#85 and christoff-buerger/racr@4eac958).

For example, in Petite Chez Scheme we have:

> (define-record-type atom     (sealed #t)(opaque #t))
> (define r1 (make-atom))
> (define r2 (make-atom))
> (eq? r1 r2)
#f
> 

But in Chez Sheme (the compiler) we have:

> (define-record-type atom     (sealed #t)(opaque #t))
> (define r1 (make-atom))
> (define r2 (make-atom))
> (eq? r1 r2)
#t
> 

My library works well with Racket, Larceny and Sagittarius, which agree that (eq? r1 r2) is #f for above example.

@dybvig

This comment has been minimized.

Show comment
Hide comment
@dybvig

dybvig Oct 24, 2017

Member

This behavior results from cp0 folding calls to record constructors for immutable record types with constant arguments.

While the R6RS library document requires eqv? and eq? to return #f when applied to the results of two different calls to the same record constructor, with no exemption for immutable records, this seems to have been an editorial mistake. The R6RS Records SRFI leaves unspecified the value of eqv? in this situation for immutable records, and although the issue was discussed on the SRFI mailing list I see nothing in the R6RS minutes or tickets where this decision was overturned and one place where it was effectively confirmed implicitly (the response to formal ticket 155).

In any case, I do not want to disable the folding of such calls generally. If someone can convince me this isn't a mistake in R6RS, I would be open to having the versions of make-record-type-descriptor and define-record-type imported from (rnrs)---but not (chezscheme)---create record types whose constructor calls cannot be folded.

Member

dybvig commented Oct 24, 2017

This behavior results from cp0 folding calls to record constructors for immutable record types with constant arguments.

While the R6RS library document requires eqv? and eq? to return #f when applied to the results of two different calls to the same record constructor, with no exemption for immutable records, this seems to have been an editorial mistake. The R6RS Records SRFI leaves unspecified the value of eqv? in this situation for immutable records, and although the issue was discussed on the SRFI mailing list I see nothing in the R6RS minutes or tickets where this decision was overturned and one place where it was effectively confirmed implicitly (the response to formal ticket 155).

In any case, I do not want to disable the folding of such calls generally. If someone can convince me this isn't a mistake in R6RS, I would be open to having the versions of make-record-type-descriptor and define-record-type imported from (rnrs)---but not (chezscheme)---create record types whose constructor calls cannot be folded.

@christoff-buerger

This comment has been minimized.

Show comment
Hide comment
@christoff-buerger

christoff-buerger Oct 24, 2017

Thanks a lot for your fast answer!

I am a bit puzzled now and would like to take a chance on "If someone can convince me this isn't a mistake in R6RS, I would be open [...]":

  1. It is not nice that the Petite interpreter and the Chez Scheme compiler disagree on the subject. In particular when the subject at hand depends on the whim of optimisations that might be never applied by interpreters. With respect to portability, one also does not like to rely whether some optimisation is done by a Scheme system or not if its free to choose.

  2. The Chez Scheme compiler is not consistently folding immutable records. For example, it is not folding the following simple case:

> (define-record-type atom     (sealed #t)(opaque #t)(fields v))
> (define r1 (make-atom 1))
> (define r2 (make-atom 1))
> (eq? r1 r2)
#f
> 
  1. I always thought the idea of (opaque #t) for R6RS records was to prevent investigation of the fields of record entities (at least that is what opaque means in most languages: leave it alone). This is for example important to be able to use cyclic record entities as keys in eq?/eqv?/equal?-based hashtables. This is not an uncommon use-case; consider for example a node record with an edges field used to represent arbitrary graphs (including multi graphs where the same edge can exist several times between nodes). Hashing nodes is a reasonable scenario. If now most nodes are not constant, but some can be derived to be constant and therefore folded to a single entity, very nasty bugs result, like miscounting the number of multi-edges/nodes. To leave opaque records alone comprises for me not to optimise them away, including the special case of records without mutable fields.

  2. The main reason I am puzzled is, how would you implement a portable unique key working for whichever equality-comparing function is applied on it? An unique identity that cannot be recreated by accident. Vectors, lists, pairs etc disqualify, because their content is investigated by equal?, opening opportunities to create key copies by accident (constructing an equal? vector, list, pair). As I see it, functions are not guaranteed to have unique identities (maybe I am wrong here). This problem is particularly important for huge programs depending on third party libraries one has no control over. I depend on unique entities for security reasons, using them internally -- never exposed to library users -- to save-guard and check values, that can be either user given or internal flags, which of both they are.

christoff-buerger commented Oct 24, 2017

Thanks a lot for your fast answer!

I am a bit puzzled now and would like to take a chance on "If someone can convince me this isn't a mistake in R6RS, I would be open [...]":

  1. It is not nice that the Petite interpreter and the Chez Scheme compiler disagree on the subject. In particular when the subject at hand depends on the whim of optimisations that might be never applied by interpreters. With respect to portability, one also does not like to rely whether some optimisation is done by a Scheme system or not if its free to choose.

  2. The Chez Scheme compiler is not consistently folding immutable records. For example, it is not folding the following simple case:

> (define-record-type atom     (sealed #t)(opaque #t)(fields v))
> (define r1 (make-atom 1))
> (define r2 (make-atom 1))
> (eq? r1 r2)
#f
> 
  1. I always thought the idea of (opaque #t) for R6RS records was to prevent investigation of the fields of record entities (at least that is what opaque means in most languages: leave it alone). This is for example important to be able to use cyclic record entities as keys in eq?/eqv?/equal?-based hashtables. This is not an uncommon use-case; consider for example a node record with an edges field used to represent arbitrary graphs (including multi graphs where the same edge can exist several times between nodes). Hashing nodes is a reasonable scenario. If now most nodes are not constant, but some can be derived to be constant and therefore folded to a single entity, very nasty bugs result, like miscounting the number of multi-edges/nodes. To leave opaque records alone comprises for me not to optimise them away, including the special case of records without mutable fields.

  2. The main reason I am puzzled is, how would you implement a portable unique key working for whichever equality-comparing function is applied on it? An unique identity that cannot be recreated by accident. Vectors, lists, pairs etc disqualify, because their content is investigated by equal?, opening opportunities to create key copies by accident (constructing an equal? vector, list, pair). As I see it, functions are not guaranteed to have unique identities (maybe I am wrong here). This problem is particularly important for huge programs depending on third party libraries one has no control over. I depend on unique entities for security reasons, using them internally -- never exposed to library users -- to save-guard and check values, that can be either user given or internal flags, which of both they are.

@burgerrg

This comment has been minimized.

Show comment
Hide comment
@burgerrg

burgerrg Oct 24, 2017

Contributor

Could you use a gensym to provide the unique key? For example,

(define atom '#{atom a36te8smo2f3wi8abindq3ds5p9j9t93-0})

But this particular construct may be Chez Scheme specific.

Contributor

burgerrg commented Oct 24, 2017

Could you use a gensym to provide the unique key? For example,

(define atom '#{atom a36te8smo2f3wi8abindq3ds5p9j9t93-0})

But this particular construct may be Chez Scheme specific.

@burgerrg

This comment has been minimized.

Show comment
Hide comment
@burgerrg

burgerrg Oct 24, 2017

Contributor

I suppose a portable way is to use a regular symbol with a UUID as its name:

(define atom 'atom-a36te8smo2f3wi8abindq3ds5p9j9t93)

Contributor

burgerrg commented Oct 24, 2017

I suppose a portable way is to use a regular symbol with a UUID as its name:

(define atom 'atom-a36te8smo2f3wi8abindq3ds5p9j9t93)

@jltaylor-us

This comment has been minimized.

Show comment
Hide comment
@jltaylor-us

jltaylor-us Oct 24, 2017

Contributor

The obvious way to avoid having the immutable record optimized away into a single instance is to give it a mutable field. I don't know if that makes the garbage collector work harder than it needs to sometimes.

Contributor

jltaylor-us commented Oct 24, 2017

The obvious way to avoid having the immutable record optimized away into a single instance is to give it a mutable field. I don't know if that makes the garbage collector work harder than it needs to sometimes.

@dybvig

This comment has been minimized.

Show comment
Hide comment
@dybvig

dybvig Oct 24, 2017

Member

I see I wasn't clear. I meant that someone would need to prove that the editors' intent is correctly reflected in the R6RS, not that the requirement is correct in some absolute sense. Otherwise it should be added to the R6RS errata.

However, I will give you my thoughts on the absolute correctness of the requirement as stated. The language leaves unspecified the eqv-ness of all other immutable objects, including the empty string, empty vector, empty bytevector, syntax objects, numbers, constant pairs, constant vectors, and constant strings. It is inconsistent for it to specify the eqv-ness of immutable records. While I agree that unspecified behavior can lead to bugs when programmers incorrectly extrapolate from the behavior of one or more programs in one implementation, I have never found this a compelling reason for overspecification of language semantics, which sometimes forces programs to be overspecified. In this case, I would find it strange that unique (or any) locations must be set aside for two immutable values simply so they can be compared for pointer equivalence when they are otherwise indistinguishable.

As @jltaylor-us points out, a mutable record record would work for your purposes. So would a pair or a single-element vector, string or bytevector, since all you care about is pointer equivalence (by way of eq?) and there's no way to forge pointer equivalence. If you care about hiding the content, you can use an opaque mutable record, though the content is irrelevant to pointer equivalence.

Opaque simply means that you can't extract the record-type descriptor and use it to inspect the contents or create new instances. For this reason, the opaque marker actually gives the compiler more freedom to do what it pleases with the representation. It might, for example, decide not to represent fields that it determines are never referenced.

I can see why it looks to you like the compiler doesn't fold consistently and will try to explain what it does in a separate comment.

Member

dybvig commented Oct 24, 2017

I see I wasn't clear. I meant that someone would need to prove that the editors' intent is correctly reflected in the R6RS, not that the requirement is correct in some absolute sense. Otherwise it should be added to the R6RS errata.

However, I will give you my thoughts on the absolute correctness of the requirement as stated. The language leaves unspecified the eqv-ness of all other immutable objects, including the empty string, empty vector, empty bytevector, syntax objects, numbers, constant pairs, constant vectors, and constant strings. It is inconsistent for it to specify the eqv-ness of immutable records. While I agree that unspecified behavior can lead to bugs when programmers incorrectly extrapolate from the behavior of one or more programs in one implementation, I have never found this a compelling reason for overspecification of language semantics, which sometimes forces programs to be overspecified. In this case, I would find it strange that unique (or any) locations must be set aside for two immutable values simply so they can be compared for pointer equivalence when they are otherwise indistinguishable.

As @jltaylor-us points out, a mutable record record would work for your purposes. So would a pair or a single-element vector, string or bytevector, since all you care about is pointer equivalence (by way of eq?) and there's no way to forge pointer equivalence. If you care about hiding the content, you can use an opaque mutable record, though the content is irrelevant to pointer equivalence.

Opaque simply means that you can't extract the record-type descriptor and use it to inspect the contents or create new instances. For this reason, the opaque marker actually gives the compiler more freedom to do what it pleases with the representation. It might, for example, decide not to represent fields that it determines are never referenced.

I can see why it looks to you like the compiler doesn't fold consistently and will try to explain what it does in a separate comment.

@christoff-buerger

This comment has been minimized.

Show comment
Hide comment
@christoff-buerger

christoff-buerger Oct 24, 2017

Thanks for this insightful rational! I was not aware of the background that the R6RS standard's equivalence definition (eq?, eqv? and equal?) for immutable records is an editorial mistake.

If I understand right, an opaque record with a single mutable field is save to be unique regarding any kind of standard-equivalence test (eq?, eqv? and equal?) according to R6RS? And this approach should be portable for any standard conform R6RS system? That is what I need, as I am not always in control of the equivalence function used (I corrected my previous statement to use eq?/eqv?/equal?-based hashtables to make that clear).

christoff-buerger commented Oct 24, 2017

Thanks for this insightful rational! I was not aware of the background that the R6RS standard's equivalence definition (eq?, eqv? and equal?) for immutable records is an editorial mistake.

If I understand right, an opaque record with a single mutable field is save to be unique regarding any kind of standard-equivalence test (eq?, eqv? and equal?) according to R6RS? And this approach should be portable for any standard conform R6RS system? That is what I need, as I am not always in control of the equivalence function used (I corrected my previous statement to use eq?/eqv?/equal?-based hashtables to make that clear).

@dybvig

This comment has been minimized.

Show comment
Hide comment
@dybvig

dybvig Oct 25, 2017

Member

If a record definition is local to a library, module, lambda, or other body, the record type is nongenerative, and the field arguments to the constructor are constant, the compiler will fold the constructor call, assuming the protocol is well-behaved, as is the default protocol. For example:

> (expand/optimize
    '(let ()
       (define-record-type atom (nongenerative) (fields v))
       (define (f) (make-atom 1))
       (define r1 (f))
       (define r2 (f))
       (eq? r1 r2)))
#t

In general, the compiler folds calls to record constructors for immutable records when it is able to determine the constructor, the record-type descriptor, and the field values at compile time, again assuming a well-behaved protocol. It sometimes can't do so or runs out of the time it alots itself for the attempt.

In your first example, the compiler doesn't actually fold the calls to make-atom, because make-atom is defined in the interactive top level, and its value can change at any time. That is, in this case the compiler doesn't even recognize (make-atom) as a constructor call. The compiler does, however, fold the call to record-constructor in the expansion of the define-record-type form down to a lambda expression that returns a quoted record instance:

> (print-gensym #f)
> (expand '(define-record-type atom (sealed #t) (opaque #t)))
(begin
  (set! make-atom
    (#2%r6rs:record-constructor
      '#<record constructor descriptor>))
  (set! atom? (#2%record-predicate '#<record type atom>))
  (#2%void))
> (expand/optimize '(define-record-type atom (sealed #t) (opaque #t)))
(begin
  (set! make-atom (lambda () '#<atom>))
  (set! atom?
    (lambda (g1) (#3%$sealed-record? g1 '#<record type atom>)))
  (#2%void))

The compiler can do this despite the record type being generative, because the expander takes advantage of the fact that a top-level expression is evaluated at most once to create the record-type descriptor at expansion time.

As a result, the two calls to make-atom return the same value, by eq?.

In your second example, the compiler doesn't fold the calls to make-atom, again because make-atom is defined in the interactive top level. Even if it did, the two calls would not produce the same value by eq?. The compiler does not attempt to commonize the results of two different calls, which would be a different optimization.

Member

dybvig commented Oct 25, 2017

If a record definition is local to a library, module, lambda, or other body, the record type is nongenerative, and the field arguments to the constructor are constant, the compiler will fold the constructor call, assuming the protocol is well-behaved, as is the default protocol. For example:

> (expand/optimize
    '(let ()
       (define-record-type atom (nongenerative) (fields v))
       (define (f) (make-atom 1))
       (define r1 (f))
       (define r2 (f))
       (eq? r1 r2)))
#t

In general, the compiler folds calls to record constructors for immutable records when it is able to determine the constructor, the record-type descriptor, and the field values at compile time, again assuming a well-behaved protocol. It sometimes can't do so or runs out of the time it alots itself for the attempt.

In your first example, the compiler doesn't actually fold the calls to make-atom, because make-atom is defined in the interactive top level, and its value can change at any time. That is, in this case the compiler doesn't even recognize (make-atom) as a constructor call. The compiler does, however, fold the call to record-constructor in the expansion of the define-record-type form down to a lambda expression that returns a quoted record instance:

> (print-gensym #f)
> (expand '(define-record-type atom (sealed #t) (opaque #t)))
(begin
  (set! make-atom
    (#2%r6rs:record-constructor
      '#<record constructor descriptor>))
  (set! atom? (#2%record-predicate '#<record type atom>))
  (#2%void))
> (expand/optimize '(define-record-type atom (sealed #t) (opaque #t)))
(begin
  (set! make-atom (lambda () '#<atom>))
  (set! atom?
    (lambda (g1) (#3%$sealed-record? g1 '#<record type atom>)))
  (#2%void))

The compiler can do this despite the record type being generative, because the expander takes advantage of the fact that a top-level expression is evaluated at most once to create the record-type descriptor at expansion time.

As a result, the two calls to make-atom return the same value, by eq?.

In your second example, the compiler doesn't fold the calls to make-atom, again because make-atom is defined in the interactive top level. Even if it did, the two calls would not produce the same value by eq?. The compiler does not attempt to commonize the results of two different calls, which would be a different optimization.

@dybvig

This comment has been minimized.

Show comment
Hide comment
@dybvig

dybvig Oct 25, 2017

Member

You are correct. You can also use an opaque record type with an immutable field if you can arrange to put a different value in the field each time a record is constructed. For example:

(define-record-type atom
  (sealed #t)
  (opaque #t)
  (fields counter)
  (protocol
    (let ([counter 0])
      (lambda (new)
        (lambda ()
          (set! counter (+ counter 1))
          (new counter))))))

This version is not thread-safe, however, if that matters.

I suggest adding a (nongenerative) clause to every record definition unless you need the record type to be generative (which is rare and shouldn't be the default) so the record-type descriptor can be constructed at compile time and be a constant in the code rather than a variable reference at run time.

Member

dybvig commented Oct 25, 2017

You are correct. You can also use an opaque record type with an immutable field if you can arrange to put a different value in the field each time a record is constructed. For example:

(define-record-type atom
  (sealed #t)
  (opaque #t)
  (fields counter)
  (protocol
    (let ([counter 0])
      (lambda (new)
        (lambda ()
          (set! counter (+ counter 1))
          (new counter))))))

This version is not thread-safe, however, if that matters.

I suggest adding a (nongenerative) clause to every record definition unless you need the record type to be generative (which is rare and shouldn't be the default) so the record-type descriptor can be constructed at compile time and be a constant in the code rather than a variable reference at run time.

@christoff-buerger

This comment has been minimized.

Show comment
Hide comment
@christoff-buerger

christoff-buerger Oct 26, 2017

Your detailed answer is very informative! Thank you very much for the insights!

I will adapt my code and introduce mutable fields for portability and nongenerative where it fits. From my point of view the issue can be closed. I am sorry for spamming the issue tracker!

christoff-buerger commented Oct 26, 2017

Your detailed answer is very informative! Thank you very much for the insights!

I will adapt my code and introduce mutable fields for portability and nongenerative where it fits. From my point of view the issue can be closed. I am sorry for spamming the issue tracker!

@dybvig

This comment has been minimized.

Show comment
Hide comment
@dybvig

dybvig Nov 1, 2017

Member

I discussed this with Mike Sperber. He agrees with me that the editors did not intend for eqv? to be required to return #f for two otherwise indistinguishable immutable records produced by different record constructor calls, and he has added an R6RS errata entry that corrects the overspecification.

Member

dybvig commented Nov 1, 2017

I discussed this with Mike Sperber. He agrees with me that the editors did not intend for eqv? to be required to return #f for two otherwise indistinguishable immutable records produced by different record constructor calls, and he has added an R6RS errata entry that corrects the overspecification.

@dybvig dybvig closed this Nov 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment