Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foreach loops and sequence comprehensions #9

Closed
wants to merge 15 commits into from

Conversation

robin-aws
Copy link
Member

@robin-aws robin-aws commented Mar 31, 2022

Design of the first step of dafny-lang/dafny#1753, and follow-up to #4

cc @samuelgruetter, @backtracking, @mschlaipfer, @parno, @RustanLeino

UPDATE: I'm planning on revising this soon based on ideas from the awesome discussion so far, so any interested parties should at least be aware of this comment, and may want to wait for the next revision (and I may even cut a fresh RFC to reset and avoid any confusion).

Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving initial comments. Yay, that looks like a very useful feature, especially seq comprehensions

foreach right | right in s {
result := result + (left, right);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to have the following

foreach left, right | left in s && right in s {
  result := result + (left, right);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that thought too - that's equally valid (and does illustrate the power of the more general foreach) but I was a little wary of implying this is always the more idiomatic way to do it. I felt more confident it made sense in the sequence comprehension version.

// Or even better, as an expression:
function AllPairs(s: seq<nat>): seq<(nat, nat)> {
seq left, right | left in s && right in s
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that would be a nice addition to the language.

If you want to make your example more compelling (meaning no comprehension can emulate your basic example, you could consider doing two operations at once, like computing an out variable "maximum".

maximum := 0;
foreach left, right | left in s && right in s {
  result := result + (left, right);
  var tmpMax := left + right;
  maximum := if tmpMax > maximum then tmpMax else maximum;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll put some thought into improving the motivating example - I think even the eventual user-facing documentation should have a good example of where a foreach loop is necessary because a sequence comprehension doesn't cut it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems to me that, even if foreach and sequence comprehensions were identical in what they could express, foreach would feel a lot more natural if the body is very imperative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of why it feels right to me to design both features at once, as they are kind of two sides of the same fundamental coin: imperative and functional.

```dafny
foreach i | i in s {
print i, "\n";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be soo compelling. I hope you can break this down further and obtain an iterator for a set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a strong future possibility at least. Only supporting these higher-level operations over the elements of a set buys us a ton for now though, so I'm comfortable leaving it as future work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to be able to write those three lines, too!

I'd also love to see set iterators in the future, but agree that it's not necessary in the first iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with what @samuelgruetter said above, though: since in Dafny i in [1,2,3] <==> i in [3,2,1], shouldn't foreach i | i in [1,2,3] { … } be the same as foreach i | i in [3,2,1] { … }?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since foreach is not an expression but a statement, its semantics require an ordering, and we would syntactically extract this ordering from the expression. In my answer below, I explain how we could encode the ordering for the verifier, so that the two foreach statement you mention wouldn't be the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not right: var s: seq<int> :| set(s) == {1,2,3} is an expression, and it doesn't require an ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not require ordering because there is only one statement to execute :-)
To be fair, I should have said, is a statement that will execute other statements in some orders


```dafny
foreach x: nat
decreases 10 - x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you add the invariant that x is less or equal to 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is what happens when you can't actually run the verifier to catch your mistakes. :) Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be added is not an invariant, but a range expression. There would be two problems with an invariant x <= 10. One problem would be that this does not hold true of all values that the foreach statement will consider (since this example used a foreach loop with no range expression). The other problem is that the loop invariant of a foreach statement is not allowed to mention the index variable (this is also true of the for statement).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need a better example here - it was questionable in the first place and trying to patch it with an invariant made it worse.

Note that the range expression is optional, and if omitted the loop will iterate over all
possible values of the types of the bound variables. This is only permitted if all such types
are enumerable, which is not true of the `real` type, for example. This supports an elegant
pattern for iterating over simple datatypes and other types with few values, including subset types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a wonderful use case for testing !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is cool!

* This means that the `&&` operation is not fully symmetrical, but this is already true as verification considers it to be "short-circuiting":
`b != 0 && a/b > 1` is valid but `a/b > 1 && b != 0` is not.
* This definition is chosen such that the multiplicities of results from `A && B` is the same as `B && A`, even if the ordering is not the same.
* For an expression of the form `A || B`, the enumeration will simply produce the values enumerated by `A` followed by those enumerated by `B`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was not covered even before for set comprehensions, it would complain with "heuristics cannot figure out blablah".
The solution is to manually compute the sequence on which to iterate. For example, replacing x in a || x in b by x in (a + b).
I think it would make sense to omit support for finding enumerations based on ||, and the IDE could suggest such rewritings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the existing heuristics only consider conjuncts and give up on disjuncts. I may do as you say and trim disjuncts out of scope initially - I just wanted to make sure there was a reasonable answer to the semantics of enumerating disjuncts so we have a firm foundation. If and when there's need for it, though, I'd prefer to just implement it rather than have users rewrite their code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikaelMayer But part of the motivation here is efficiency, and merging sets isn't a great idea for performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that's true. a + b at least strongly implies that a new value needs to be allocated at runtime, whereas enumerating x in a || x in b less so. There are definitely ways to optimize that (e.g. the lazy sequence concatenation logic in the C# and Java runtimes) but I like the idea of making it easier to infer the rough runtime cost just by looking at the structure of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you wanted to enumerate the elements of a + b without duplication, you would explicitly write x in a || (x in b && !x in a), correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that would work! It would only be efficient if both a and b had efficient in tests though.

0003-for-each.md Outdated Show resolved Hide resolved
0003-for-each.md Outdated Show resolved Hide resolved
0003-for-each.md Outdated Show resolved Hide resolved
0003-for-each.md Outdated
seq x: T | !A == seq x: T - seq x | A // Not an actual built-in operation, but a generalization of set subtraction.
// Only well-formed if T itself is enumerable.

seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2, 1]

?
Also:

Suggested change
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]
(seq x | x in [1, 2, 2] && x in [2, 2, 1]) == [1, 2, 2, 2, 2]

If you could fix parentheses below that'd be great. Parsing will always consider the seq expression to continue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question! The way I've defined the enumeration for conjuncts, the multiplicities are multiplied. The way this example breaks down is:

seq x | x in [1, 2, 2] && x in [2, 2, 1] 
== seq x | x in [1] && x in [2, 2, 1] 
 + seq x | x in [2, 2] && x in [2, 2, 1]
== [1] 
 + [2, 2, 2, 2]

And thanks for catching all the missing parentheses. We'll see if I'm sufficiently annoyed by them to improve that. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. It's not obvious at all but it makes sense now. That would work the same way for a multiset, right?

Co-authored-by: Mikaël Mayer <MikaelMayer@users.noreply.github.com>
@robin-aws
Copy link
Member Author

Oh interesting. It's not obvious at all but it makes sense now. That would work the same way for a multiset, right?

(accepting the edit resolved the conversation and even un-resolving it still won't let me reply inline :)

Yes, the behaviour of multisets is similar:

(seq x | x in [1, 2, 2] && x in multiset{2, 1, 2}) == [1, 2, 2, 2, 2]
(seq x | x in multiset{2, 1, 2} && x in multiset{2, 1, 2}) == [2, 2, 1, 2, 2] // Or possibly other orderings

It's a bit unfortunate that although these rules are also well-defined for multiset comprehensions if we add those later, they don't line up with the semantics of multiset operations in Dafny: multiset intersection (*) produces the minimum cardinality rather than the product.


type SmallIndex = x: nat | 0 <= x < 8

foreach i: SmallIndex {
Copy link
Member

@keyboardDrummer keyboardDrummer Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the enumeration values computed here? Is it possible to translate x: nat | 0 <= x < 8 to something that efficiently creates the right values of x ? And what if the condition is more complex, does it become an enumeration over all natural numbers at some point?

Does the developer need to know the default order of basic types? Could there be scenarios where the developer has an incorrect assumption about the order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it definitely possible, and existing quantifiers and comprehensions already have a concept of "Bounded Pools" of values and how to iterate through them:

https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/AST/DafnyAst.cs#L11349

If you use a set comprehension like set x: nat | 0 <= x < 8 in a compiled context, the resolver already detects that the range is a single IntBoundedPool, and the compiler will emit something like a foreach loop in the target language to build up the runtime set value: https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/Compilers/SinglePassCompiler.cs#L4671-L4686

With the existing features, if the heuristics can't figure out a finite pool to search through, you get an error. That will be the case here as well unless you're using a foreach loop with an explicit decreases clause.

You're right that for some cases users will want to know the enumeration ordering of some of these pools. I make the argument in other places that we should already document all of this better, since it affects the runtime performance of several existing Dafny features like the above. That will likely be a beneficial side-effect of this work.

```
datatype Suit = Clubs | Diamonds | Hearts | Spades

foreach s: Suit {
Copy link
Member

@keyboardDrummer keyboardDrummer Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the type has an infinite size. How are the enumeration values generated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous answer - does that cover it?

but is substantially more general and expressive. A very simple example only looks slightly different than expected:

```dafny
foreach x | x in [1, 2, 3, 4, 5] {
Copy link
Member

@keyboardDrummer keyboardDrummer Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the time complexity of computing the enumeration values? Suppose I have

foreach x | x in xs && x in ys {

}

What time complexity should I expect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and I should include that dimension in the more finalized definition of domain enumeration.

In that particular case it should be O(|xs| * L), where L is the cost of a in membership test in the ys collection. If ys is a set you'd expect it to be O(1) on average, but if it's a sequence you'd expect it to be O(|ys|).

In some cases we could likely optimize more cleverly, but we should be able to set upper bounds on the average and worst cases.

Copy link
Member

@keyboardDrummer keyboardDrummer Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That xs serves as the value generating source here seems too implicit. You're saying that x in ys && x in xs could result in better performance if the membership test for xs is faster than for ys, but given that && is usually a lazy operator, my intuition would be to put the fastest membership test first.

which is generally a good default choice since this datatype offers constant-time access on average.
Although different copies of the same set must place each value into the same hashing bucket, multiple
values may be assigned to the same bucket, so it is necessary to use a secondary data structure of some kind for each bucket.
A common simple approach is to use a linked list,
Copy link
Member

@keyboardDrummer keyboardDrummer Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the requirement of a deterministic ordering based only on the set of contained elements.

I'm confused, I would think that either:

  • You want to enumeration order over a collection to be defined, and then it must be deterministic as well
  • You don't define the enumeration order, and then it doesn't have to be deterministic either since apparently you don't care about it, likely because the order doesn't affect the output of your program. You might be summing the elements of a set for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the order to be deterministic so that, for example, the identity S == S' ==> (seq x | x in S)[0] == (seq x | x in S')[0] always holds. That's critical for verification to be sound and agree with compiled runtime behavior. But you don't actually have to know whether seq x | x in {1, 2} is actually [1, 2] or [2, 1]. That's what I mean by "under-specified".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the origin of the problem seems to be that you want to have an ordered iteration (like (seq x | x in S)) over an unordered collection.

Why not disallow an ordered iteration over an unordered collection?

[prior-art]: #prior-art

Sequence comprehensions bear a strong resemblance to list comprehensions in Python, which also include similar abilities to
bind multiple variables and filter the enumerated values: `[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]`
Copy link
Member

@keyboardDrummer keyboardDrummer Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find there's a big difference to Python's comprehensions and the ones introduced in this RFC, namely that in Python a value generator has to be specified, while in this RFC the comprehension seems to extract a value generator from a type and a boolean expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. It's really only related prior art in that Python's comprehensions are values expressed in a similar way rather than loops. Sequence comprehensions are definitely more declarative and less explicit about the implementation.


// After:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
foreach left | left in s {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse the word "for"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in still support the existing for <variable> := <lo> to <hi> { ... } syntax but also support for <variables> | <range> { ... }?

That should be doable in the parser AFAICT. I have a slight preference for how foreach reads but perhaps that's not worth the cost of yet another keyword. On the other hand communicating about the two different features will be easier if we can just say "for loops and foreach loops" versus "for loops over a range and for loops over a quantifier domain" or something like that. Open to opinions on this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use different keywords for for and foreach loops. Their loop invariants are different, and (depending on the enumerator used) the latter may have an effect on the state beyond just the index variable.

foreach k, v | k in m.Keys && m[k] == v {
print "m[", k, "] == ", v;
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked it even more when I just had

foreach k, v | m[k] == v {
  print "m[", k, "] == ", v;
}

but then realized I needed the k in m.Keys part. :)

Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual,
but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature.
Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as
`seq i | 0 <= i < size :: f(i)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we could eventually move seq(size, f) to a definition in the standard library, perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely - @RustanLeino was willing to fully deprecate the old version as long as we handled the equivalent sequence comprehension equally well in verification. Having an equivalent standard library version would make dropping support for it easier!

a boolean expression and iterates through all matching bindings. For example, the statement `foreach(a[int i] != 0) n += i;`
sums the indices of all non-zero elements of the array `a`. The JMatch version declares the bound variables inline in
the boolean expression, which is also proposed here as a future possibility below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another instance I'm aware of that provides a capability not available in this proposal. I don't know if it's worth the complexity of supporting, but perhaps worth considering (or perhaps including in here to say why it's not the right thing to do).

Cryptol has two sequence comprehension separators, sequential and parallel. The sequential separators work like those in Python (and Haskell), and like && in this proposal. For example, the cross product of two sequences is:

crossProduct xs ys = [ (x, y) | x <- xs, y <- ys ]

But you can also use a | instead of a comma to get this:

zip xs ys = [ (x, y) | x <- xs | y <- ys ]

You can also get this by zipping two iterators together first and then iterating over the result, which isn't much harder. The slightly more powerful (but sometimes confusing!) use case is when you alternate , and | in arbitrary ways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Cryptol's version is at least worth naming as prior art as it's even closer to this proposal - thanks for the pointer!

My plan is more likely to handle that case when we support user-defined collection types. I included zipping as a key higher-order operation in these experiments, and I'm fairly convinced it's maximally "Dafny-like" to require the LHS and RHS have provably-equal lengths. Between that and the fact that there's no natural boolean operator to express zipping, I'd say it's it doesn't fit as well here.

@samuelgruetter
Copy link

It seems to me that this RFC introduces a notion of QuantifierDomain which mixes two things that all other languages I know keep separate:

  1. the source collections being iterated
  2. filtering predicates

If I understood correctly, the semantics of a seq comprehension depends on the syntactic structure of a QuantifierDomain, which looks like a normal Dafny expression, and there exist rewrites which, when performed on a normal Dafny expression do not change its semantics, but when performed on the same Dafny expression treated as a QuantifierDomain do change its semantics. This will be confusing both for users and for implementers of the verifier.

To give an example of such a rewrite, let's consider the expression

seq x | x in [1, 2, 3] && x in [3, 2, 1] && x < 3

If I understood the proposed semantics correctly, it would evaluate to [1, 2]:

seq x | x in [1, 2, 3] && x in [3, 2, 1] && x < 3
== seq x in [1] && x in [3, 2, 1] && x < 3
 + seq x in [2] && x in [3, 2, 1] && x < 3
 + seq x in [3] && x in [3, 2, 1] && x < 3
== [1] + [2] + []
== [1, 2]

Now add let's make the definition

function c(a: int, s1: seq<int>, s2: seq<int>): bool {
  a in s1 && a in s2
}

and ask: Does the seq comprehension

seq x | c(x, [1, 2, 3], [3, 2, 1]) && x < 3

produce the same result as the original one?
No matter whether the answer is Yes or No, I think there is a problem:

If the answer is No, it would mean that applying the rewrite lemma

forall x, c(x, [1, 2, 3], [3, 2, 1]) == x in [1, 2, 3] && x in [3, 2, 1]

in the original seq comprehension would change its result, which violates basic expectations about the behavior of equality (if A = B and A appears in C, replacing A by B in C does not change the result of C).

On the other hand, if the answer is Yes, what if I define

function c_swapped(a: int, s1: seq<int>, s2: seq<int>): bool {
  a in s2 && a in s1
}

and now consider the expression

seq x | c_swapped(x, [1, 2, 3], [3, 2, 1]) && x < 3

This one will produce [2, 1], because first, [3, 2, 1] is iterated, and then [1, 2, 3] serves as a filtering condition. But again, this is inconsistent with the rewrite:

forall x, c(x, [1, 2, 3], [3, 2, 1]) == c_swapped(x, [1, 2, 3], [3, 2, 1])

(note that to prove this equality, the Dafny verifier will unfold c and c_swapped and then use commutativity of &&, which doesn't hold in general in Dafny, but does hold in this particular example).

Now, a potential solution to this problem would be to say that no rewrites are allowed in QuantifierDomain expressions. However, this would also lead to surprises, because if we make the definition

function f(x: int): bool {
  x <= 2
}

and consider the modified seq comprehension

seq x | x in [1, 2, 3] && x in [3, 2, 1] && f(x)

Dafny should be able to prove that it's equivalent to the original one (otherwise proofs would become very cumbersome and confusing), but that does require applying the rewrite

forall x, (x < 3) == f(x)

in the original seq comprehension.

This problem would go away if we syntactically separated the source collection being iterated from the filtering predicates. The above example would then be written as

seq x <- [1, 2, 3] | x in [3, 2, 1] && x < 3

which would make it clear that [1, 2, 3] is the source collection, and x in [3, 2, 1] && x < 3 is the filtering predicate to be applied to each element.

Copy link
Member

@cpitclaudel cpitclaudel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this exceptional write-up, Robin.

I have left detailed comments. I have some high-level feedback, too.

  • On the performance aspect, I think we shouldn't require iterators to get good performance. There is no reason why removing one value from a set should be a slow operation. In fact, it should be fast, even if we are talking about pure sets. Let's fix that.
  • On determinism: at the moment, we don't need our sets, maps, etc. to have predictable ordering. Adding that is introduces a performance concern, and we should evaluate whether this is the best approach. Would it really be an issue if iteration order was unpredictable for sequences, even in expressions? You could still write sequence comprehensions in functions, and you could implement them "by method" as needed, allowing you to leverage the nondeterminism for performance and restore determinism as needed.

Note that this last point about sequence comprehensions in functions is already true today: in fact, we have sequence comprehensions already! It's already possible to ask Dafny (in a function context, where performance does not matter) for a var s: seq<T> :| forall t: T :: t in s <==> P(t) (which is essentially a sequence comprehension with filter P). It would be nice to have better syntax for this in ghost function contexts, but do we really need it in function method context?


// Or even better, as an expression:
function AllPairs(s: seq<nat>): seq<(nat, nat)> {
seq left, right | left in s && right in s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this code doesn't compute the same thing as above, does it? Is order guaranteed?

```

This is functionally correct, but because set subtraction requires creating a fresh copy,
when compiled this loop requires quadratic time and memory (or at least garbage-collection work) in the size of the set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a bug in the runtime, no?


A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall`
loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop
executes its body once for each tuple of quantified values in sequence, one at a time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain of a forall statement doesn't have to be enumerable, I think, so it's not superseded by foreach AFAICT.

but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`.

```dafny
foreach x: real | 0.0 <= x < 1.0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not allowed, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment to that effect just to be super clear :)

}
```

The domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the domain here? If you have an invariant bounding the variable is that still an infinite domain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think assuming that the invariant is allowed to refer to the bounds variables at all, I don't think it should be allowed to influence the domain enumeration.

Most mainstream programming languages include rich libraries for working with collections and iterators,
and define much the same operations such as "filter" and "flatMap".
The proposed Dafny features provide similar functionality at a higher level
of abstraction more amenable to verification: quantification expressions are used to succinctly declare the behavior of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about more amenable to verification: Do we think that the verifier will be better at dealing with seq expressions and the like than at dealing with plain for loops and/or recursive functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have said here that they should be friendlier for users, in terms of how much effort it takes to prove properties of a sequence comprehension compared to an equivalent loop or recursive function. My hope here is that the verifier will already know enough about an expression like seq x | x in mySeq && P(x) such that no additional work is necessary.


```dafny
var setOfReals := {3.141, 2.718, 1.618};
var sortedList := seq x | x is real && x in setOfReals;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need x is real here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to provide a LHS that drives the desired ordering. I need to add this case (and more) to the ordering rules.


It would likely be prudent to limit these cases to those where the size of the domain
is easy to infer statically, so that the size of the array to allocate is known before
enumerating the values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could say that these are equivalent to sequence comprehension + cast to array? I don't think we have a conversion to arrays from sequences, do we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly, although it turns out there's a syntax I wasn't aware of similar to sequence constructions:

var arr := new int[10] (i => i * i);


This mirrors the `Collector` concept from the Java streaming APIs, and ideally the shape of the
parameter to `collect` expressions would define similar operations for merging intermediate results
to leave the door open for parallel or even distributed computation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should look into the loop construct in lisp, which does all the above and more.

https://lispcookbook.github.io/cl-cookbook/iteration.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for the link! That's roughly the shape of what I'm thinking of.


One of the first milestones when implementing this RFC will be to prototype just enough verification
plumbing to evaluate how difficult proving such properties will be, and how many additional axioms
may need to be added to the Dafny prelude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a proof with Dafny as of today:

predicate method IsLower(s: set<int>, m: int) {
  forall y :: y in s ==> m <= y
}

function Min(s: set<int>) : (m: int)
  requires s != {}
  ensures m in s
  ensures IsLower(s, m)
{
  var m :| m in s;
  if y :| y in s && y < m then
    var s' := s - {m};
    assert s == s' + {m};
    Min(s')
  else m
}

predicate Increasing(s: seq<int>) {
  forall i, j | 0 <= i < j < |s| :: s[i] <= s[j]
}

lemma IncreasingConcat(hd: int, tl: seq<int>)
  requires Increasing(tl)
  requires forall x | x in tl :: hd <= x
  ensures Increasing([hd] + tl)
{
  var s := [hd] + tl;
  forall i, j | 0 <= i < j < |s| ensures s[i] <= s[j] {
    assert s[j] in tl;
  }
}

lemma IsLowerSeq(s: set<int>, sq: seq<int>, m: int)
  requires IsLower(s, m)
  requires multiset(sq) == multiset(s)
  ensures forall x | x in sq :: m <= x
{
  forall x | x in sq ensures m <= x {
    assert x in multiset(sq);
  }
}

function method Sort(sq: seq<int>) : (sq': seq<int>)
  ensures Increasing(sq')
  ensures multiset(sq') == multiset(sq)

lemma MultiSetAddInj<T>(b: multiset<T>, a1: multiset<T>, a2: multiset<T>)
  requires b + a1 == b + a2
  ensures a1 == a2
{
  assert forall k :: a1[k] == a2[k] by {
    forall k {
      calc {
        a1[k];
        (b[k] + a1[k]) - b[k];
        (b + a1)[k] - b[k];
        (b + a2)[k] - b[k];
        (b[k] + a2[k]) - b[k];
        a2[k];
      }
    }
  }
}

lemma SortedElements_Unique(s: set<int>, sq: seq<int>)
  requires multiset(sq) == multiset(s)
  requires Increasing(sq)
  ensures sq == SortedElements(s)
{
  if sq == [] {
    assert |sq| == |multiset(s)| == 0;
  } else {
    var hd := sq[0];
    var sq' := sq[1..];
    var s' := s - {hd};
    calc { hd in sq; hd in multiset(sq); hd in multiset(s); hd in s; }

    assert sq == [hd] + sq';
    assert multiset(sq) == multiset([hd]) + multiset(sq');
    assert s == {hd} + s';
    assert multiset(sq) == multiset([hd]) + multiset(s');
    MultiSetAddInj(multiset([hd]), multiset(sq'), multiset(s'));

    assert Increasing(sq');
    SortedElements_Unique(s', sq');
  }
}

function SortedElements(s: set<int>): (sq: seq<int>)
  ensures multiset(sq) == multiset(s)
  ensures Increasing(sq)
{
  if s == {} then []
  else
    var m := Min(s);
    var s := s - {m};
    var sq := SortedElements(s);
    IsLowerSeq(s, sq, m); IncreasingConcat(m, sq);
    [m] + sq
} by method {
  sq := [];
  var leftover := s;
  ghost var removed := {};
  while leftover != {}
    decreases leftover
    invariant leftover !! removed
    invariant multiset(sq) == multiset(removed)
    invariant multiset(leftover + removed) == multiset(s)
  {
    var x :| x in leftover;
    leftover := leftover - {x};
    removed := removed + {x};
    sq := sq + [x];
  }
  sq := Sort(sq);
  SortedElements_Unique(s, sq);
}

method Foo() {
  var smallSet := {1, 2};
  var smallSortedSeq := SortedElements(smallSet);
  // smallSortedSeq is fully specified, but can the verifier figure that out?
  assert multiset(smallSortedSeq) == multiset{1, 2};
  assert smallSortedSeq == [1, 2];
}

Can we think about how that proof would change?

@atomb
Copy link
Member

atomb commented Apr 1, 2022

I had another thought, related to @samuelgruetter's response, and perhaps conveying roughly the same idea in a different way. In this proposal, the in keyword is being used in two ways: as a binder and as a predicate. The existing languages I'm aware of avoid some of these semantic complexities by allowing a variable to be bound only once and providing another mechanism for membership testing. So, in Haskell, you could write [x | x <- xs, member x ys] to give you all elements of xs that are also in ys. But [x | x <- xs, x <- ys] wouldn't be allowed.

@MikaelMayer
Copy link
Member

To the excellent @samuelgruetter 's analysis of cases, I answer this:

  • Similar for set comprehension, it's not possible to put the expression x in s in an external predicate and have the resolution find a domain from it. If the predicate was o.pred, it would be changed in implementing traits, for example". Hence, I'm not worried about this: if there is a predicate in the expression, it will always use it as a filter and not as a collection source.
  • For sure, the verifier would not be able to prove sequence equality just by proving that the two predicates are equivalent, based on your examples. Although it was the case for set and multiset, it cannot be the case for seq because of ordering. Hence, unless we find clear rules to encode ordering into logical formulas, the only property we will have about each element is the predicate that the elements were taken from. We wouldn't have the axiom (forall x. P(x) == Q(x)) ==> (seq x | P(x)) == (seq x | Q(x)), but we would definitely have (forall x. P(x) == Q(x)) ==> set(seq x | P(x)) == set(seq x | Q(x)) or equivalent.
  • If we wanted to encode the problem of ordering in the verifier, we should definitely explicitly encode the collection as parsed to it. For example, consider your example again:

seq x | x in X && x in Y && P(x) :: F(x)

Based on Robin's proposal, Dafny's heuristics will decide that the collection to iterate on is X and that the rest x in Y && P(x) is a predicate, which is implicit, and for the verifier we need to make it explicit. However, we cannot simply assign a mapping for each index to its value, because they will be offset every time a predicate is false. This is more complicated than what we ever did before with multiset.
What we really want to achieve for the verifier is this unambiguous encoding:

var howManyindicesBefore = j => |(set j' | x' == X[j] && x' in Y && P(x')) && j' < j)|;
var seqImage := map i, x | i in X && (x == X[i] && x in Y && P(x)) :: howManyindicesBefore(i) := x
seq(|indices|, i => (var x := seqImage(i); F(x)))

That should take care of everything, including proving or disproving equality of seq expressions.


// After:
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
foreach left | left in s {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use different keywords for for and foreach loops. Their loop invariants are different, and (depending on the enumerator used) the latter may have an effect on the state beyond just the index variable.


// Or even better, as an expression:
function AllPairs(s: seq<nat>): seq<(nat, nat)> {
seq left, right | left in s && right in s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed seq comprehension on L47 does not have an analogous set comprehension today, because a set comprehension with more than one bound variable is required to include a :: Expr at the end. To look more like what we have today, L47 would be better rendered as

seq left, right | left in s && right in s :: (left, right)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that's a typo. Will fix.

// s is a set<int>
var ss := s;
while ss != {}
decreases |ss|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decreases clause is not needed, since the default decreases ss works fine for this loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, don't tell me, tell the reference manual I copied it from. :) I can address it there separately.

The runtime implementation of this loop can use the native iteration features of the target language.

Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and
expressions rather than statements allows logic to be used in specifications as well as implementation,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up until this point in the document, there are several feedback questions about the ordering. This will be answered later in the document, I presume, but it would be good to add a sentence somewhere here in the motivation section that reassures the reader that order will be dealt with later, and for now the examples are more intended in a dream-like fashion ("wouldn't it be nice if you could write something like this--and don't think too hard about the ordering yet").


```dafny
var myDoubleDeckerSeq: seq<seq<int>> := ...;
foreach x, y | x in myDoubleDeckerSeq && y in x && y != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I imagine that this particular range expression can be equivalently written as y != 0 && y in x && x in myDoubleDeckerSeq.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That version would be rejected, actually, since the rules for && say that the enumeration is nested left-to-right. y != 0 is enumerable (as a sub-domain of int) but not finite, and then y in x for each value of y would not even be enumerable (i.e. all possible sequences that contain y).


A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall`
loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop
executes its body once for each tuple of quantified values in sequence, one at a time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the forall statement is not a loop. It is an aggregate statement. Its compiled form is like a generalization of simultaneous assignment. The beauty of the forall statement is that you use it without a loop invariant. :O This is made possible because--again--the forall statement is not a loop in the first place!

Loops (while, for, foreach) sequence their iterations. And to use them, you need a loop invariant, so that you can capture "what has been done so far".

Comment on lines +136 to +137
Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop
must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain has to be enumerable only if the set comprehension or assign-such-that statement is compiled. It is not a requirement in ghost contexts. Is that true also for foreach? It seems like a ghost foreach could support domains that are not enumerable, provided they still are ordered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just chatted about this and came to the conclusion that a ghost foreach should still require an enumerable domain, since otherwise it's hard to imagine the semantics of a loop where the body can observe the effects of the executions before it.

Furthermore it should also be finite, since otherwise there isn't a well-defined notion of the state of the program "after" the infinite loop. This isn't the case for a forall statement since there is no looping, and the body executes once for each set of variable bindings simultaneously.

but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`.

```dafny
foreach x: real | 0.0 <= x < 1.0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not.


```dafny
foreach x: nat
decreases 10 - x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be added is not an invariant, but a range expression. There would be two problems with an invariant x <= 10. One problem would be that this does not hold true of all values that the foreach statement will consider (since this example used a foreach loop with no range expression). The other problem is that the loop invariant of a foreach statement is not allowed to mention the index variable (this is also true of the for statement).

Comment on lines +193 to +194
There is no builtin support for automatically tracking the enumerated values so far, as there is for
`iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be needed for every loop that you want to prove something about. It would be good to have built-in support for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually hoping that many such loops can be expressed as sequence comprehensions instead, where that isn't necessary. I'm open to using the same xs trick if we have to, but trying to leave it out of the initial scope.

@robin-aws
Copy link
Member Author

Thank you SO much for all the thoughtful reviews and comments everyone! I honestly wasn't expecting this much to go through so quickly and it's going to take me multiple days to respond, but rest assured I have no intention of ignoring any questions or feedback. Quite the contrary, I'm deeply enjoying the discussion. :)

One thing I know already is that I need to make my sketch of the enumeration domain calculation more complete and rigorous, so I'm definitely going to tackle that as a response to several comments.

@robin-aws robin-aws self-assigned this Apr 5, 2022
@robin-aws
Copy link
Member Author

I was intending to clarify how I've defined the ordered enumeration of quantifier domains, but after thinking about some of the common feedback here I've decided it really would be better not to extract it implicitly out of the range expression. I believe I have an alternative that loses barely any useful expressivity, if any. It will take some time to revise the whole proposal, but in a nutshell:

  • The definition of bound variables for any QuantifierDomain will be extended to allow in expressions specifying the source of the ordered enumeration, such as seq x in [1, 2, 3] :: x * x. This is optional, and an undecorated variable declaration will be enumerated according to its type's natural ordering, as in seq x | 0 <= x < 10.
  • Such domains can still bind multiple variables, and the enumeration source of one can depend on the variables bound earlier, as in seq x in myDoubleDeckerSeq, y in x | y != 0 :: y.
  • The LHS of the _ in C syntax may be a CasePattern as in let expressions and variable declaration statements. This allows for easy destructuring, e.g. seq (k, v) in myMap.Items :: v, and can also be used to filter before binding variables: the example of extracting optional values becomes set Some(v) in myOptionals :: v or even just set Some(v) in myOptionals.

The only case this doesn't handle cleanly is the x in C1 || x in C2 application with the semantics of concatenating enumerations. There could be more syntax to handle this as well but I can't think of anything intuitive, and I don't think it's worth it.

@MikaelMayer
Copy link
Member

  • Such domains can still bind multiple variables, and the enumeration source of one can depend on the variables bound earlier, as in seq x in myDoubleDeckerSeq, y in x | y != 0 :: y.

That's a very interesting idea, but I think it raises new questions. For example:

seq i in [0, 1, 2, 3, 4], y in x[F(i+2)], z in w[y] | i % 3 != 0 && y % 2  == 0 && z %2 == 1 :: z

How would you compile that? You cannot emit the range after iterating over the three variables, that would be under-performing at best, or even cause crashes at worst (for example if the condition of F(i+2) requires input % 2 != 2.
You cannot emit the range before iterating over the three variables. Hence, you'll have to find heuristically an interleaving.
But now, for verifications, you'll need this same interleaving, which is going to be a nightmare.

Consider now the much lower complexity of your original proposal, both for verification and compilation:

seq i: int, y: int, z: int | i in [0, 1, 2, 3, 4] && i % 3 != 0 && y in x[F(i+2)] && y % 2  == 0 && z in w[y] &&  z %2 == 1 :: z

Now it's clear how it is going to be both compiled and verified, just following the order of the operands.

If you are really concerned about differentiating condition from enumeration, you could add the binary operator "<-" to the set of expressions, which translates to "in", except for comprehensions.
So for example, the above could be written as this:

seq i: int, y: int, z: int | i <- [0, 1, 2, 3, 4] && i % 3 != 0 && y <- x[F(i+2)] && y % 2  == 0 && z <- w[y] &&  z %2 == 1 :: z

That way, it would be obvious that the operator "<-" is not tolerated in expressions other than comprehensions so they cannot be refactored.

Moreover, you could easily distinguish an enumeration from a pure test, so you could actually have this:

set i: int | i <- A && i in B

does not mean the same computationally as

set i: int | i in A && i <- B

The first one will enumerate A (and ensure i in A) before checking at run-time that i in A && i in B (and the first check could be optimized away)
The second will surely enumerate B (and ensure i in B) before checking at run-time that i in A && i in B (and the second check could be optimized away) and a clever IDE would surely mention a warning about defining enumeration after a test, that we can still infer a meaning but it's not obvious.

@robin-aws
Copy link
Member Author

robin-aws commented Apr 12, 2022

@MikaelMayer Yes, the new form would sacrifice some expressive power, in favour of avoiding the confusion of overloading boolean expressions with ordering/multiplicity semantics. In your example as written, I would expect the F(i+2) subexpression to be rejected as not well-formed, if F had a precondition that was only satisfied if i % 3 != 0 as specified in the range. That can be worked around by pulling the necessary filters into nested sequence comprehensions though, which still don't necessarily need to be materialized as actual sequence values at runtime:

seq i in (seq i in [0, 1, 2, 3, 4] | i % 3 != 0), 
    y in (seq y in x[F(i+2)] | y % 2  == 0), 
    z in w[y] | z % 2 == 1 
    :: z

I think this aligns well with the fact that the LHS of an && expression can be assumed true on the RHS, but Dafny does not currently go out of its way to find other true expressions from elsewhere in the program to try to make the RHS well-formed.

I should also say I'm hoping I can weaken the statement that "in expressions on bound variables define the enumeration source" to "in expressions on bound variables define the ordering of results". That would give the heuristics more leeway to choose a more efficient enumeration strategy, and allow cases where the variable type is ordered but not enumerable (which was previously a less crisp Future Possibility, but seems more natural with this version):

var setOfReals := {3.141, 2.718, 1.618};
// Ordered by the implicit `real` type of `x`
// At runtime, setOfReals is the enumeration source,
// but the results are then sorted.
var sortedList := seq x | x in setOfReals;  
assert sortedList == [1.618, 2.718, 3.141];

@robin-aws
Copy link
Member Author

robin-aws commented Apr 13, 2022

@cpitclaudel - I've been meaning to respond to your top-level comments for a while, apologies for the delay. :)

  • On the performance aspect, I think we shouldn't require iterators to get good performance. There is no reason why removing one value from a set should be a slow operation. In fact, it should be fast, even if we are talking about pure sets. Let's fix that.

It's not a question of the runtimes being inefficient though, it's a more fundamental problem with sets being immutable values. If you have the statement s := s - {x}, the simplest implementation of that is to create a copy of s with x removed, which involves allocating a new set and copying over all the other values. You could implement the subtraction lazily as sequence concatenation is for some backends, but that often just shifts the O(n) work to later on, and a loop to remove all elements as in the example still ends up taking quadratic time. You could make the runtime representation of sets mutable, but then you need to know that no other code has a reference to the old value of s anywhere, so you need either sophisticated aliasing analysis in the resolver or to augment the Dafny language itself with some notion of linear typing.

  • On determinism: at the moment, we don't need our sets, maps, etc. to have predictable ordering. Adding that is introduces a performance concern, and we should evaluate whether this is the best approach. Would it really be an issue if iteration order was unpredictable for sequences, even in expressions? You could still write sequence comprehensions in functions, and you could implement them "by method" as needed, allowing you to leverage the nondeterminism for performance and restore determinism as needed.

Making the ordering unpredictable in expressions leads to unsoundness unfortunately (and thanks to @RustanLeino for the example, which I've tweaked to use a sequence comprehension instead of :|):

function method Pick(s: set<int>): int
  requires s != {}
{
  (seq x in s)[0]
}

method Main() {
  var s0 := {6, 28, 496};
  var s1 := {6, 496} + {28};
  assert s0 == s1;
  var x, y := Pick(s0), Pick(s1);
  assert x == y;
  if x != y {
    var badnessEnsues := 5 / 0;
  }
}

Note that this last point about sequence comprehensions in functions is already true today: in fact, we have sequence comprehensions already! It's already possible to ask Dafny (in a function context, where performance does not matter) for a var s: seq<T> :| forall t: T :: t in s <==> P(t) (which is essentially a sequence comprehension with filter P). It would be nice to have better syntax for this in ghost function contexts, but do we really need it in function method context?

You're correct that you can describe any notion of sequence comprehensions in a specification context (your expression is weaker than what I'm proposing since it doesn't specify ordering or multiplicity, but I'm sure there's a more complicated version that does as Mikael outlined earlier :). I've definitely seen a lot of compiled Dafny code that needs to process set data somehow though, and the expressions we're talking about here will at best be extremely inefficient to calculate at runtime, and more likely not compilable at all.

@cpitclaudel
Copy link
Member

the simplest implementation of that is to create a copy of s with x removed

Well, let's not use the simplest implementation then :) Pure sets are very fast in practice, and certainly don't need to be quadratic. A simple balanced tree guarantees O(log n) for all operations, and we can do much better if iteration order is left unspecified.

Making the ordering unpredictable in expressions leads to unsoundness unfortunately

Not more so than it does today with :|, which is sound and unpredictable. What am I missing?

@robin-aws
Copy link
Member Author

the simplest implementation of that is to create a copy of s with x removed

Well, let's not use the simplest implementation then :) Pure sets are very fast in practice, and certainly don't need to be quadratic. A simple balanced tree guarantees O(log n) for all operations, and we can do much better if iteration order is left unspecified.

My bad, I'm talking about the simplest way to compile the source language semantics to the target runtime, not the simplicity of the runtime representation of set<T> itself. Let's assume for the moment that the target language uses an implementation called TreeSet<T>. Suppose the source Dafny to compile is:

var s := {1, 2, 3};
var x := 2;
var oldS := s;
s := s - {x};
print oldS;
print s;

The way the code generation currently works would produce output something like:

var s = new TreeSet<int>({1, 2, 3});
var x = 2;
var oldS = s;
var sTmp = new TreeSet<int>(s);
sTmp.Remove(x);
s = sTmp;
Console.Write(oldS); // {1, 2, 3}
Console.Write(s);    // {1, 3}

Having to make the fresh copy of s so we can remove x is the part that dominates the asymptotic runtime, regardless of how efficient the Remove implementation is. But we have to make the copy in case there is a reference to the original value somewhere, so this translation would not be valid:

var s = new TreeSet<int>({1, 2, 3});
var x = 2;
var oldS = s;
s.Remove(x);
Console.Write(oldS); // {1, 3} <- incorrect
Console.Write(s);    // {1, 3}

I was describing alternate translation strategies above for optimizing this, such as creating a TreeSetWithOneRemoved(s, x) lazy proxy of some kind, or doing the analysis to detect when a reference like oldS is floating around. But none of them are easy or without their downsides. I'll add this detail to the proposal so it's clearer why providing a higher level feature for this iteration is a better tradeoff.

Making the ordering unpredictable in expressions leads to unsoundness unfortunately

Not more so than it does today with :|, which is sound and unpredictable. What am I missing?

The missing part is that a :| instance has to be predictable if compiled, because it has to produce an actual concrete value. If I used var x :| x in s; x instead of (seq x in s)[0] in my example, then the verifier would produce the error "to be compilable, the value of a let-such-that expression must be uniquely determined", precisely so you don't get the unsound runtime behavior the example shows. The "Compiling Hilbert's epsilon operator" paper describes this restriction in detail, but I don't see it anywhere in the reference manual and we need to fix that! :)

This is why I'm proposing an implementation strategy for making more domains predictably enumerable, which as a side-effect would allow us to remove this uniquely-determined restriction on compiled :| expressions. I am tempted to factor that out into future work though.

@MikaelMayer
Copy link
Member

MikaelMayer commented Apr 14, 2022 via email

@cpitclaudel
Copy link
Member

cpitclaudel commented Apr 14, 2022

Having to make the fresh copy of s so we can remove x is the part that dominates the asymptotic runtime, regardless of how efficient the Remove implementation is.

I'm not sure I agree. With the right TreeSet representation (specifically, an immutable treeset), copies are O(1); hence, the concern really is how efficient the remove operation is, and a decent implementation will give you log(n) on remove.

I was describing alternate translation strategies above for optimizing this, such as creating a TreeSetWithOneRemoved(s, x) lazy proxy of some kind, or doing the analysis to detect when a reference like oldS is floating around. But none of them are easy or without their downsides.

Eventually we do want to do this, but we don't need to right now: The C# ImmutableHashSet, for example, has O(1) copies and O(log n) removal. So even with our current implementation we can get good performance out of our existing loops. The default f# set, which is very similar to Dafny, is the same (https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/set.fs).

We could even get rid of the log factor if we didn't have to be deterministic — but unfortunately we do.

This is why I'm proposing an implementation strategy for making more domains predictably enumerable, which as a side-effect would allow us to remove this uniquely-determined restriction on compiled :| expressions.

I think that would be bad: when I write a function with x :| P(x) I really mean to say something about any such x, not a specific one. Should I really be able to prove anything about x after writing x :| P(x), beyond the fact that P(x) holds?

@cpitclaudel
Copy link
Member

@MikaelMayer I'm not sold on 3) (in the operator and in the keyword being confusing): don't C# and Python do the same?

@MikaelMayer
Copy link
Member

@MikaelMayer I'm not sold on 3) (in the operator and in the keyword being confusing): don't C# and Python do the same?

Indeed, maybe just I am confused. You win 3) :-)

@robin-aws
Copy link
Member Author

robin-aws commented May 12, 2022

Closing in favour of the revised version at #9 #10.

I've tried very hard to make sure all the comments have been responded to, if they aren't now moot, but please let me knew on the new PR if I've missed any!

@robin-aws robin-aws closed this May 12, 2022
@MikaelMayer
Copy link
Member

Closing in favour of the revised version at #9.

You mean #10 ?

@robin-aws
Copy link
Member Author

Closing in favour of the revised version at #9.

You mean #10 ?

That's the one. :) Thanks, edited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants