foreach loops and sequence comprehensions #9

robin-aws · 2022-03-31T05:00:05Z

Design of the first step of dafny-lang/dafny#1753, and follow-up to #4

cc @samuelgruetter, @backtracking, @mschlaipfer, @parno, @RustanLeino

UPDATE: I'm planning on revising this soon based on ideas from the awesome discussion so far, so any interested parties should at least be aware of this comment, and may want to wait for the next revision (and I may even cut a fresh RFC to reset and avoid any confusion).

MikaelMayer

I'm leaving initial comments. Yay, that looks like a very useful feature, especially seq comprehensions

MikaelMayer · 2022-03-31T14:34:56Z

0003-for-each.md

+    foreach right | right in s {
+      result := result + (left, right);
+    }
+  }


It would also be good to have the following

foreach left, right | left in s && right in s { result := result + (left, right); }

I had that thought too - that's equally valid (and does illustrate the power of the more general foreach) but I was a little wary of implying this is always the more idiomatic way to do it. I felt more confident it made sense in the sequence comprehension version.

MikaelMayer · 2022-03-31T14:39:11Z

0003-for-each.md

+// Or even better, as an expression:
+function AllPairs(s: seq<nat>): seq<(nat, nat)> {
+  seq left, right | left in s && right in s
+}


Indeed, that would be a nice addition to the language.

If you want to make your example more compelling (meaning no comprehension can emulate your basic example, you could consider doing two operations at once, like computing an out variable "maximum".

maximum := 0; foreach left, right | left in s && right in s { result := result + (left, right); var tmpMax := left + right; maximum := if tmpMax > maximum then tmpMax else maximum; }

Good point, I'll put some thought into improving the motivating example - I think even the eventual user-facing documentation should have a good example of where a foreach loop is necessary because a sequence comprehension doesn't cut it.

Yeah, it seems to me that, even if foreach and sequence comprehensions were identical in what they could express, foreach would feel a lot more natural if the body is very imperative.

This is part of why it feels right to me to design both features at once, as they are kind of two sides of the same fundamental coin: imperative and functional.

MikaelMayer · 2022-03-31T14:40:18Z

0003-for-each.md

+```dafny
+foreach i | i in s {
+  print i, "\n";
+}


That would be soo compelling. I hope you can break this down further and obtain an iterator for a set.

It's a strong future possibility at least. Only supporting these higher-level operations over the elements of a set buys us a ton for now though, so I'm comfortable leaving it as future work.

I really want to be able to write those three lines, too!

I'd also love to see set iterators in the future, but agree that it's not necessary in the first iteration.

In line with what @samuelgruetter said above, though: since in Dafny i in [1,2,3] <==> i in [3,2,1], shouldn't foreach i | i in [1,2,3] { … } be the same as foreach i | i in [3,2,1] { … }?

Since foreach is not an expression but a statement, its semantics require an ordering, and we would syntactically extract this ordering from the expression. In my answer below, I explain how we could encode the ordering for the verifier, so that the two foreach statement you mention wouldn't be the same.

That's not right: var s: seq<int> :| set(s) == {1,2,3} is an expression, and it doesn't require an ordering.

It does not require ordering because there is only one statement to execute :-)
To be fair, I should have said, is a statement that will execute other statements in some orders

MikaelMayer · 2022-03-31T14:54:20Z

0003-for-each.md

+
+```dafny
+foreach x: nat 
+  decreases 10 - x


Shouldn't you add the invariant that x is less or equal to 10?

Yup, this is what happens when you can't actually run the verifier to catch your mistakes. :) Thanks.

What should be added is not an invariant, but a range expression. There would be two problems with an invariant x <= 10. One problem would be that this does not hold true of all values that the foreach statement will consider (since this example used a foreach loop with no range expression). The other problem is that the loop invariant of a foreach statement is not allowed to mention the index variable (this is also true of the for statement).

I think I need a better example here - it was questionable in the first place and trying to patch it with an invariant made it worse.

MikaelMayer · 2022-03-31T14:55:30Z

0003-for-each.md

+Note that the range expression is optional, and if omitted the loop will iterate over all
+possible values of the types of the bound variables. This is only permitted if all such types
+are enumerable, which is not true of the `real` type, for example. This supports an elegant
+pattern for iterating over simple datatypes and other types with few values, including subset types.


That's a wonderful use case for testing !

Yeah, this is cool!

MikaelMayer · 2022-03-31T15:24:22Z

0003-for-each.md

+    * This means that the `&&` operation is not fully symmetrical, but this is already true as verification considers it to be "short-circuiting": 
+      `b != 0 && a/b > 1` is valid but `a/b > 1 && b != 0` is not.
+    * This definition is chosen such that the multiplicities of results from `A && B` is the same as `B && A`, even if the ordering is not the same.
+  * For an expression of the form `A || B`, the enumeration will simply produce the values enumerated by `A` followed by those enumerated by `B`.


This case was not covered even before for set comprehensions, it would complain with "heuristics cannot figure out blablah".
The solution is to manually compute the sequence on which to iterate. For example, replacing x in a || x in b by x in (a + b).
I think it would make sense to omit support for finding enumerations based on ||, and the IDE could suggest such rewritings.

True, the existing heuristics only consider conjuncts and give up on disjuncts. I may do as you say and trim disjuncts out of scope initially - I just wanted to make sure there was a reasonable answer to the semantics of enumerating disjuncts so we have a firm foundation. If and when there's need for it, though, I'd prefer to just implement it rather than have users rewrite their code.

@MikaelMayer But part of the motivation here is efficiency, and merging sets isn't a great idea for performance.

Yup that's true. a + b at least strongly implies that a new value needs to be allocated at runtime, whereas enumerating x in a || x in b less so. There are definitely ways to optimize that (e.g. the lazy sequence concatenation logic in the C# and Java runtimes) but I like the idea of making it easier to infer the rough runtime cost just by looking at the structure of the code.

So if you wanted to enumerate the elements of a + b without duplication, you would explicitly write x in a || (x in b && !x in a), correct?

Yup that would work! It would only be efficient if both a and b had efficient in tests though.

0003-for-each.md

MikaelMayer · 2022-03-31T15:27:22Z

0003-for-each.md

+seq x: T | !A == seq x: T - seq x | A       // Not an actual built-in operation, but a generalization of set subtraction.
+                                            // Only well-formed if T itself is enumerable.
+
+seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]


Why not

Suggested change

seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]

seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2, 1]

?
Also:

Suggested change

seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]

(seq x | x in [1, 2, 2] && x in [2, 2, 1]) == [1, 2, 2, 2, 2]

If you could fix parentheses below that'd be great. Parsing will always consider the seq expression to continue.

Excellent question! The way I've defined the enumeration for conjuncts, the multiplicities are multiplied. The way this example breaks down is:

seq x | x in [1, 2, 2] && x in [2, 2, 1] == seq x | x in [1] && x in [2, 2, 1] + seq x | x in [2, 2] && x in [2, 2, 1] == [1] + [2, 2, 2, 2]

And thanks for catching all the missing parentheses. We'll see if I'm sufficiently annoyed by them to improve that. :)

Oh interesting. It's not obvious at all but it makes sense now. That would work the same way for a multiset, right?

Co-authored-by: Mikaël Mayer <MikaelMayer@users.noreply.github.com>

robin-aws · 2022-03-31T20:23:48Z

Oh interesting. It's not obvious at all but it makes sense now. That would work the same way for a multiset, right?

(accepting the edit resolved the conversation and even un-resolving it still won't let me reply inline :)

Yes, the behaviour of multisets is similar:

(seq x | x in [1, 2, 2] && x in multiset{2, 1, 2}) == [1, 2, 2, 2, 2]
(seq x | x in multiset{2, 1, 2} && x in multiset{2, 1, 2}) == [2, 2, 1, 2, 2] // Or possibly other orderings

It's a bit unfortunate that although these rules are also well-defined for multiset comprehensions if we add those later, they don't line up with the semantics of multiset operations in Dafny: multiset intersection (*) produces the minimum cardinality rather than the product.

keyboardDrummer · 2022-04-01T10:20:01Z

0003-for-each.md

+
+type SmallIndex = x: nat | 0 <= x < 8
+
+foreach i: SmallIndex {


How are the enumeration values computed here? Is it possible to translate x: nat | 0 <= x < 8 to something that efficiently creates the right values of x ? And what if the condition is more complex, does it become an enumeration over all natural numbers at some point?

Does the developer need to know the default order of basic types? Could there be scenarios where the developer has an incorrect assumption about the order?

Is it definitely possible, and existing quantifiers and comprehensions already have a concept of "Bounded Pools" of values and how to iterate through them:

https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/AST/DafnyAst.cs#L11349

If you use a set comprehension like set x: nat | 0 <= x < 8 in a compiled context, the resolver already detects that the range is a single IntBoundedPool, and the compiler will emit something like a foreach loop in the target language to build up the runtime set value: https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/Compilers/SinglePassCompiler.cs#L4671-L4686

With the existing features, if the heuristics can't figure out a finite pool to search through, you get an error. That will be the case here as well unless you're using a foreach loop with an explicit decreases clause.

You're right that for some cases users will want to know the enumeration ordering of some of these pools. I make the argument in other places that we should already document all of this better, since it affects the runtime performance of several existing Dafny features like the above. That will likely be a beneficial side-effect of this work.

keyboardDrummer · 2022-04-01T10:37:13Z

0003-for-each.md

+```
+datatype Suit = Clubs | Diamonds | Hearts | Spades
+
+foreach s: Suit {


What happens if the type has an infinite size. How are the enumeration values generated?

See my previous answer - does that cover it?

keyboardDrummer · 2022-04-01T10:51:22Z

0003-for-each.md

+but is substantially more general and expressive. A very simple example only looks slightly different than expected:
+
+```dafny
+foreach x | x in [1, 2, 3, 4, 5] {


What is the time complexity of computing the enumeration values? Suppose I have

foreach x | x in xs && x in ys { }

What time complexity should I expect?

Good question, and I should include that dimension in the more finalized definition of domain enumeration.

In that particular case it should be O(|xs| * L), where L is the cost of a in membership test in the ys collection. If ys is a set you'd expect it to be O(1) on average, but if it's a sequence you'd expect it to be O(|ys|).

In some cases we could likely optimize more cleverly, but we should be able to set upper bounds on the average and worst cases.

That xs serves as the value generating source here seems too implicit. You're saying that x in ys && x in xs could result in better performance if the membership test for xs is faster than for ys, but given that && is usually a lazy operator, my intuition would be to put the fastest membership test first.

keyboardDrummer · 2022-04-01T11:02:09Z

0003-for-each.md

+which is generally a good default choice since this datatype offers constant-time access on average.
+Although different copies of the same set must place each value into the same hashing bucket, multiple
+values may be assigned to the same bucket, so it is necessary to use a secondary data structure of some kind for each bucket.
+A common simple approach is to use a linked list,


the requirement of a deterministic ordering based only on the set of contained elements.

I'm confused, I would think that either:

You want to enumeration order over a collection to be defined, and then it must be deterministic as well

You don't define the enumeration order, and then it doesn't have to be deterministic either since apparently you don't care about it, likely because the order doesn't affect the output of your program. You might be summing the elements of a set for example.

You need the order to be deterministic so that, for example, the identity S == S' ==> (seq x | x in S)[0] == (seq x | x in S')[0] always holds. That's critical for verification to be sound and agree with compiled runtime behavior. But you don't actually have to know whether seq x | x in {1, 2} is actually [1, 2] or [2, 1]. That's what I mean by "under-specified".

To me the origin of the problem seems to be that you want to have an ordered iteration (like (seq x | x in S)) over an unordered collection.

Why not disallow an ordered iteration over an unordered collection?

keyboardDrummer · 2022-04-01T11:14:13Z

0003-for-each.md

+[prior-art]: #prior-art
+
+Sequence comprehensions bear a strong resemblance to list comprehensions in Python, which also include similar abilities to 
+bind multiple variables and filter the enumerated values: `[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]`


I find there's a big difference to Python's comprehensions and the ones introduced in this RFC, namely that in Python a value generator has to be specified, while in this RFC the comprehension seems to extract a value generator from a type and a boolean expression.

Definitely. It's really only related prior art in that Python's comprehensions are values expressed in a similar way rather than loops. Sequence comprehensions are definitely more declarative and less explicit about the implementation.

cpitclaudel · 2022-04-01T14:56:17Z

0003-for-each.md

+
+// After:
+method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
+  foreach left | left in s {


Why not reuse the word "for"?

As in still support the existing for <variable> := <lo> to <hi> { ... } syntax but also support for <variables> | <range> { ... }?

That should be doable in the parser AFAICT. I have a slight preference for how foreach reads but perhaps that's not worth the cost of yet another keyword. On the other hand communicating about the two different features will be easier if we can just say "for loops and foreach loops" versus "for loops over a range and for loops over a quantifier domain" or something like that. Open to opinions on this one.

I think it's better to use different keywords for for and foreach loops. Their loop invariants are different, and (depending on the enumerator used) the latter may have an effect on the state beyond just the index variable.

atomb · 2022-04-01T20:38:13Z

0003-for-each.md

+foreach k, v | k in m.Keys && m[k] == v {
+  print "m[", k, "] == ", v;
+}
+```


Oh, I like this!

I liked it even more when I just had

foreach k, v | m[k] == v { print "m[", k, "] == ", v; }

but then realized I needed the k in m.Keys part. :)

atomb · 2022-04-01T20:45:16Z

0003-for-each.md

+Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual, 
+but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature.
+Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as 
+`seq i | 0 <= i < size :: f(i)`.


And we could eventually move seq(size, f) to a definition in the standard library, perhaps?

Most likely - @RustanLeino was willing to fully deprecate the old version as long as we handled the equivalent sequence comprehension equally well in verification. Having an equivalent standard library version would make dropping support for it easier!

atomb · 2022-04-01T21:32:35Z

0003-for-each.md

+a boolean expression and iterates through all matching bindings. For example, the statement `foreach(a[int i] != 0) n += i;`
+sums the indices of all non-zero elements of the array `a`. The JMatch version declares the bound variables inline in
+the boolean expression, which is also proposed here as a future possibility below.
+


There's another instance I'm aware of that provides a capability not available in this proposal. I don't know if it's worth the complexity of supporting, but perhaps worth considering (or perhaps including in here to say why it's not the right thing to do).

Cryptol has two sequence comprehension separators, sequential and parallel. The sequential separators work like those in Python (and Haskell), and like && in this proposal. For example, the cross product of two sequences is:

crossProduct xs ys = [ (x, y) | x <- xs, y <- ys ]

But you can also use a | instead of a comma to get this:

zip xs ys = [ (x, y) | x <- xs | y <- ys ]

You can also get this by zipping two iterators together first and then iterating over the result, which isn't much harder. The slightly more powerful (but sometimes confusing!) use case is when you alternate , and | in arbitrary ways.

Interesting! Cryptol's version is at least worth naming as prior art as it's even closer to this proposal - thanks for the pointer!

My plan is more likely to handle that case when we support user-defined collection types. I included zipping as a key higher-order operation in these experiments, and I'm fairly convinced it's maximally "Dafny-like" to require the LHS and RHS have provably-equal lengths. Between that and the fact that there's no natural boolean operator to express zipping, I'd say it's it doesn't fit as well here.

samuelgruetter · 2022-04-01T22:36:50Z

It seems to me that this RFC introduces a notion of QuantifierDomain which mixes two things that all other languages I know keep separate:

the source collections being iterated
filtering predicates

If I understood correctly, the semantics of a seq comprehension depends on the syntactic structure of a QuantifierDomain, which looks like a normal Dafny expression, and there exist rewrites which, when performed on a normal Dafny expression do not change its semantics, but when performed on the same Dafny expression treated as a QuantifierDomain do change its semantics. This will be confusing both for users and for implementers of the verifier.

To give an example of such a rewrite, let's consider the expression

seq x | x in [1, 2, 3] && x in [3, 2, 1] && x < 3

If I understood the proposed semantics correctly, it would evaluate to [1, 2]:

seq x | x in [1, 2, 3] && x in [3, 2, 1] && x < 3
== seq x in [1] && x in [3, 2, 1] && x < 3
 + seq x in [2] && x in [3, 2, 1] && x < 3
 + seq x in [3] && x in [3, 2, 1] && x < 3
== [1] + [2] + []
== [1, 2]

Now add let's make the definition

function c(a: int, s1: seq<int>, s2: seq<int>): bool {
  a in s1 && a in s2
}

and ask: Does the seq comprehension

seq x | c(x, [1, 2, 3], [3, 2, 1]) && x < 3

produce the same result as the original one?
No matter whether the answer is Yes or No, I think there is a problem:

If the answer is No, it would mean that applying the rewrite lemma

forall x, c(x, [1, 2, 3], [3, 2, 1]) == x in [1, 2, 3] && x in [3, 2, 1]

in the original seq comprehension would change its result, which violates basic expectations about the behavior of equality (if A = B and A appears in C, replacing A by B in C does not change the result of C).

On the other hand, if the answer is Yes, what if I define

function c_swapped(a: int, s1: seq<int>, s2: seq<int>): bool {
  a in s2 && a in s1
}

and now consider the expression

seq x | c_swapped(x, [1, 2, 3], [3, 2, 1]) && x < 3

This one will produce [2, 1], because first, [3, 2, 1] is iterated, and then [1, 2, 3] serves as a filtering condition. But again, this is inconsistent with the rewrite:

forall x, c(x, [1, 2, 3], [3, 2, 1]) == c_swapped(x, [1, 2, 3], [3, 2, 1])

(note that to prove this equality, the Dafny verifier will unfold c and c_swapped and then use commutativity of &&, which doesn't hold in general in Dafny, but does hold in this particular example).

Now, a potential solution to this problem would be to say that no rewrites are allowed in QuantifierDomain expressions. However, this would also lead to surprises, because if we make the definition

function f(x: int): bool {
  x <= 2
}

and consider the modified seq comprehension

seq x | x in [1, 2, 3] && x in [3, 2, 1] && f(x)

Dafny should be able to prove that it's equivalent to the original one (otherwise proofs would become very cumbersome and confusing), but that does require applying the rewrite

forall x, (x < 3) == f(x)

in the original seq comprehension.

This problem would go away if we syntactically separated the source collection being iterated from the filtering predicates. The above example would then be written as

seq x <- [1, 2, 3] | x in [3, 2, 1] && x < 3

which would make it clear that [1, 2, 3] is the source collection, and x in [3, 2, 1] && x < 3 is the filtering predicate to be applied to each element.

cpitclaudel

Thanks for this exceptional write-up, Robin.

I have left detailed comments. I have some high-level feedback, too.

On the performance aspect, I think we shouldn't require iterators to get good performance. There is no reason why removing one value from a set should be a slow operation. In fact, it should be fast, even if we are talking about pure sets. Let's fix that.
On determinism: at the moment, we don't need our sets, maps, etc. to have predictable ordering. Adding that is introduces a performance concern, and we should evaluate whether this is the best approach. Would it really be an issue if iteration order was unpredictable for sequences, even in expressions? You could still write sequence comprehensions in functions, and you could implement them "by method" as needed, allowing you to leverage the nondeterminism for performance and restore determinism as needed.

Note that this last point about sequence comprehensions in functions is already true today: in fact, we have sequence comprehensions already! It's already possible to ask Dafny (in a function context, where performance does not matter) for a var s: seq<T> :| forall t: T :: t in s <==> P(t) (which is essentially a sequence comprehension with filter P). It would be nice to have better syntax for this in ghost function contexts, but do we really need it in function method context?

cpitclaudel · 2022-04-01T14:57:13Z

0003-for-each.md

+
+// Or even better, as an expression:
+function AllPairs(s: seq<nat>): seq<(nat, nat)> {
+  seq left, right | left in s && right in s


But this code doesn't compute the same thing as above, does it? Is order guaranteed?

cpitclaudel · 2022-04-01T14:58:33Z

0003-for-each.md

+```
+
+This is functionally correct, but because set subtraction requires creating a fresh copy,
+when compiled this loop requires quadratic time and memory (or at least garbage-collection work) in the size of the set.


Sounds like a bug in the runtime, no?

cpitclaudel · 2022-04-01T15:02:32Z

0003-for-each.md

+
+A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall`
+loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop
+executes its body once for each tuple of quantified values in sequence, one at a time.


The domain of a forall statement doesn't have to be enumerable, I think, so it's not superseded by foreach AFAICT.

cpitclaudel · 2022-04-01T15:03:21Z

0003-for-each.md

+but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`.
+
+```dafny
+foreach x: real | 0.0 <= x < 1.0 {


So this is not allowed, right?

I hope not.

I'll add a comment to that effect just to be super clear :)

cpitclaudel · 2022-04-01T15:03:51Z

0003-for-each.md

+}
+```
+
+The domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided.


What is the domain here? If you have an invariant bounding the variable is that still an infinite domain?

Good question. I think assuming that the invariant is allowed to refer to the bounds variables at all, I don't think it should be allowed to influence the domain enumeration.

cpitclaudel · 2022-04-01T15:23:28Z

0003-for-each.md

+Most mainstream programming languages include rich libraries for working with collections and iterators,
+and define much the same operations such as "filter" and "flatMap".
+The proposed Dafny features provide similar functionality at a higher level
+of abstraction more amenable to verification: quantification expressions are used to succinctly declare the behavior of


I don't know about more amenable to verification: Do we think that the verifier will be better at dealing with seq expressions and the like than at dealing with plain for loops and/or recursive functions?

I should have said here that they should be friendlier for users, in terms of how much effort it takes to prove properties of a sequence comprehension compared to an equivalent loop or recursive function. My hope here is that the verifier will already know enough about an expression like seq x | x in mySeq && P(x) such that no additional work is necessary.

cpitclaudel · 2022-04-01T19:23:25Z

0003-for-each.md

+
+```dafny
+var setOfReals := {3.141, 2.718, 1.618};
+var sortedList := seq x | x is real && x in setOfReals;


Why do you need x is real here?

The idea was to provide a LHS that drives the desired ordering. I need to add this case (and more) to the ordering rules.

cpitclaudel · 2022-04-01T19:26:00Z

0003-for-each.md

+
+It would likely be prudent to limit these cases to those where the size of the domain
+is easy to infer statically, so that the size of the array to allocate is known before
+enumerating the values.


Or we could say that these are equivalent to sequence comprehension + cast to array? I don't think we have a conversion to arrays from sequences, do we?

Not directly, although it turns out there's a syntax I wasn't aware of similar to sequence constructions:

var arr := new int[10] (i => i * i);

cpitclaudel · 2022-04-01T19:27:43Z

0003-for-each.md

+
+This mirrors the `Collector` concept from the Java streaming APIs, and ideally the shape of the
+parameter to `collect` expressions would define similar operations for merging intermediate results
+to leave the door open for parallel or even distributed computation.


Maybe we should look into the loop construct in lisp, which does all the above and more.

https://lispcookbook.github.io/cl-cookbook/iteration.html

Interesting, thanks for the link! That's roughly the shape of what I'm thinking of.

cpitclaudel · 2022-04-01T22:26:19Z

0003-for-each.md

+
+One of the first milestones when implementing this RFC will be to prototype just enough verification
+plumbing to evaluate how difficult proving such properties will be, and how many additional axioms
+may need to be added to the Dafny prelude.


Here's a proof with Dafny as of today:

predicate method IsLower(s: set<int>, m: int) { forall y :: y in s ==> m <= y } function Min(s: set<int>) : (m: int) requires s != {} ensures m in s ensures IsLower(s, m) { var m :| m in s; if y :| y in s && y < m then var s' := s - {m}; assert s == s' + {m}; Min(s') else m } predicate Increasing(s: seq<int>) { forall i, j | 0 <= i < j < |s| :: s[i] <= s[j] } lemma IncreasingConcat(hd: int, tl: seq<int>) requires Increasing(tl) requires forall x | x in tl :: hd <= x ensures Increasing([hd] + tl) { var s := [hd] + tl; forall i, j | 0 <= i < j < |s| ensures s[i] <= s[j] { assert s[j] in tl; } } lemma IsLowerSeq(s: set<int>, sq: seq<int>, m: int) requires IsLower(s, m) requires multiset(sq) == multiset(s) ensures forall x | x in sq :: m <= x { forall x | x in sq ensures m <= x { assert x in multiset(sq); } } function method Sort(sq: seq<int>) : (sq': seq<int>) ensures Increasing(sq') ensures multiset(sq') == multiset(sq) lemma MultiSetAddInj<T>(b: multiset<T>, a1: multiset<T>, a2: multiset<T>) requires b + a1 == b + a2 ensures a1 == a2 { assert forall k :: a1[k] == a2[k] by { forall k { calc { a1[k]; (b[k] + a1[k]) - b[k]; (b + a1)[k] - b[k]; (b + a2)[k] - b[k]; (b[k] + a2[k]) - b[k]; a2[k]; } } } } lemma SortedElements_Unique(s: set<int>, sq: seq<int>) requires multiset(sq) == multiset(s) requires Increasing(sq) ensures sq == SortedElements(s) { if sq == [] { assert |sq| == |multiset(s)| == 0; } else { var hd := sq[0]; var sq' := sq[1..]; var s' := s - {hd}; calc { hd in sq; hd in multiset(sq); hd in multiset(s); hd in s; } assert sq == [hd] + sq'; assert multiset(sq) == multiset([hd]) + multiset(sq'); assert s == {hd} + s'; assert multiset(sq) == multiset([hd]) + multiset(s'); MultiSetAddInj(multiset([hd]), multiset(sq'), multiset(s')); assert Increasing(sq'); SortedElements_Unique(s', sq'); } } function SortedElements(s: set<int>): (sq: seq<int>) ensures multiset(sq) == multiset(s) ensures Increasing(sq) { if s == {} then [] else var m := Min(s); var s := s - {m}; var sq := SortedElements(s); IsLowerSeq(s, sq, m); IncreasingConcat(m, sq); [m] + sq } by method { sq := []; var leftover := s; ghost var removed := {}; while leftover != {} decreases leftover invariant leftover !! removed invariant multiset(sq) == multiset(removed) invariant multiset(leftover + removed) == multiset(s) { var x :| x in leftover; leftover := leftover - {x}; removed := removed + {x}; sq := sq + [x]; } sq := Sort(sq); SortedElements_Unique(s, sq); } method Foo() { var smallSet := {1, 2}; var smallSortedSeq := SortedElements(smallSet); // smallSortedSeq is fully specified, but can the verifier figure that out? assert multiset(smallSortedSeq) == multiset{1, 2}; assert smallSortedSeq == [1, 2]; }

Can we think about how that proof would change?

atomb · 2022-04-01T23:36:23Z

I had another thought, related to @samuelgruetter's response, and perhaps conveying roughly the same idea in a different way. In this proposal, the in keyword is being used in two ways: as a binder and as a predicate. The existing languages I'm aware of avoid some of these semantic complexities by allowing a variable to be bound only once and providing another mechanism for membership testing. So, in Haskell, you could write [x | x <- xs, member x ys] to give you all elements of xs that are also in ys. But [x | x <- xs, x <- ys] wouldn't be allowed.

MikaelMayer · 2022-04-04T15:01:30Z

To the excellent @samuelgruetter 's analysis of cases, I answer this:

Similar for set comprehension, it's not possible to put the expression x in s in an external predicate and have the resolution find a domain from it. If the predicate was o.pred, it would be changed in implementing traits, for example". Hence, I'm not worried about this: if there is a predicate in the expression, it will always use it as a filter and not as a collection source.
For sure, the verifier would not be able to prove sequence equality just by proving that the two predicates are equivalent, based on your examples. Although it was the case for set and multiset, it cannot be the case for seq because of ordering. Hence, unless we find clear rules to encode ordering into logical formulas, the only property we will have about each element is the predicate that the elements were taken from. We wouldn't have the axiom (forall x. P(x) == Q(x)) ==> (seq x | P(x)) == (seq x | Q(x)), but we would definitely have (forall x. P(x) == Q(x)) ==> set(seq x | P(x)) == set(seq x | Q(x)) or equivalent.
If we wanted to encode the problem of ordering in the verifier, we should definitely explicitly encode the collection as parsed to it. For example, consider your example again:

seq x | x in X && x in Y && P(x) :: F(x)

Based on Robin's proposal, Dafny's heuristics will decide that the collection to iterate on is X and that the rest x in Y && P(x) is a predicate, which is implicit, and for the verifier we need to make it explicit. However, we cannot simply assign a mapping for each index to its value, because they will be offset every time a predicate is false. This is more complicated than what we ever did before with multiset.
What we really want to achieve for the verifier is this unambiguous encoding:

var howManyindicesBefore = j => |(set j' | x' == X[j] && x' in Y && P(x')) && j' < j)|;
var seqImage := map i, x | i in X && (x == X[i] && x in Y && P(x)) :: howManyindicesBefore(i) := x
seq(|indices|, i => (var x := seqImage(i); F(x)))

That should take care of everything, including proving or disproving equality of seq expressions.

RustanLeino · 2022-04-04T18:37:13Z

0003-for-each.md

+
+// After:
+method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) {
+  foreach left | left in s {


I think it's better to use different keywords for for and foreach loops. Their loop invariants are different, and (depending on the enumerator used) the latter may have an effect on the state beyond just the index variable.

RustanLeino · 2022-04-04T18:40:03Z

0003-for-each.md

+
+// Or even better, as an expression:
+function AllPairs(s: seq<nat>): seq<(nat, nat)> {
+  seq left, right | left in s && right in s


The proposed seq comprehension on L47 does not have an analogous set comprehension today, because a set comprehension with more than one bound variable is required to include a :: Expr at the end. To look more like what we have today, L47 would be better rendered as

seq left, right | left in s && right in s :: (left, right)

Good catch, that's a typo. Will fix.

RustanLeino · 2022-04-04T18:41:39Z

0003-for-each.md

+// s is a set<int>
+var ss := s;
+while ss != {}
+  decreases |ss|


This decreases clause is not needed, since the default decreases ss works fine for this loop.

Ha, don't tell me, tell the reference manual I copied it from. :) I can address it there separately.

RustanLeino · 2022-04-04T18:48:12Z

0003-for-each.md

+The runtime implementation of this loop can use the native iteration features of the target language.
+
+Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and
+expressions rather than statements allows logic to be used in specifications as well as implementation,


Up until this point in the document, there are several feedback questions about the ordering. This will be answered later in the document, I presume, but it would be good to add a sentence somewhere here in the motivation section that reassures the reader that order will be dealt with later, and for now the examples are more intended in a dream-like fashion ("wouldn't it be nice if you could write something like this--and don't think too hard about the ordering yet").

RustanLeino · 2022-04-04T18:51:27Z

0003-for-each.md

+
+```dafny
+var myDoubleDeckerSeq: seq<seq<int>> := ...;
+foreach x, y | x in myDoubleDeckerSeq && y in x && y != 0 {


(I imagine that this particular range expression can be equivalently written as y != 0 && y in x && x in myDoubleDeckerSeq.)

That version would be rejected, actually, since the rules for && say that the enumeration is nested left-to-right. y != 0 is enumerable (as a sub-domain of int) but not finite, and then y in x for each value of y would not even be enumerable (i.e. all possible sequences that contain y).

RustanLeino · 2022-04-04T19:03:31Z

0003-for-each.md

+
+A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall`
+loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop
+executes its body once for each tuple of quantified values in sequence, one at a time.


Note that the forall statement is not a loop. It is an aggregate statement. Its compiled form is like a generalization of simultaneous assignment. The beauty of the forall statement is that you use it without a loop invariant. :O This is made possible because--again--the forall statement is not a loop in the first place!

Loops (while, for, foreach) sequence their iterations. And to use them, you need a loop invariant, so that you can capture "what has been done so far".

RustanLeino · 2022-04-04T19:06:21Z

0003-for-each.md

+Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop
+must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable,


The domain has to be enumerable only if the set comprehension or assign-such-that statement is compiled. It is not a requirement in ghost contexts. Is that true also for foreach? It seems like a ghost foreach could support domains that are not enumerable, provided they still are ordered.

We just chatted about this and came to the conclusion that a ghost foreach should still require an enumerable domain, since otherwise it's hard to imagine the semantics of a loop where the body can observe the effects of the executions before it.

Furthermore it should also be finite, since otherwise there isn't a well-defined notion of the state of the program "after" the infinite loop. This isn't the case for a forall statement since there is no looping, and the body executes once for each set of variable bindings simultaneously.

RustanLeino · 2022-04-04T19:06:52Z

0003-for-each.md

+but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`.
+
+```dafny
+foreach x: real | 0.0 <= x < 1.0 {


I hope not.

RustanLeino · 2022-04-04T19:09:42Z

0003-for-each.md

+
+```dafny
+foreach x: nat 
+  decreases 10 - x


What should be added is not an invariant, but a range expression. There would be two problems with an invariant x <= 10. One problem would be that this does not hold true of all values that the foreach statement will consider (since this example used a foreach loop with no range expression). The other problem is that the loop invariant of a foreach statement is not allowed to mention the index variable (this is also true of the for statement).

RustanLeino · 2022-04-04T19:16:41Z

0003-for-each.md

+There is no builtin support for automatically tracking the enumerated values so far, as there is for
+`iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). 


This is going to be needed for every loop that you want to prove something about. It would be good to have built-in support for this.

I am actually hoping that many such loops can be expressed as sequence comprehensions instead, where that isn't necessary. I'm open to using the same xs trick if we have to, but trying to leave it out of the initial scope.

robin-aws · 2022-04-05T04:27:09Z

Thank you SO much for all the thoughtful reviews and comments everyone! I honestly wasn't expecting this much to go through so quickly and it's going to take me multiple days to respond, but rest assured I have no intention of ignoring any questions or feedback. Quite the contrary, I'm deeply enjoying the discussion. :)

One thing I know already is that I need to make my sketch of the enumeration domain calculation more complete and rigorous, so I'm definitely going to tackle that as a response to several comments.

robin-aws · 2022-04-09T16:20:57Z

I was intending to clarify how I've defined the ordered enumeration of quantifier domains, but after thinking about some of the common feedback here I've decided it really would be better not to extract it implicitly out of the range expression. I believe I have an alternative that loses barely any useful expressivity, if any. It will take some time to revise the whole proposal, but in a nutshell:

The definition of bound variables for any QuantifierDomain will be extended to allow in expressions specifying the source of the ordered enumeration, such as seq x in [1, 2, 3] :: x * x. This is optional, and an undecorated variable declaration will be enumerated according to its type's natural ordering, as in seq x | 0 <= x < 10.
Such domains can still bind multiple variables, and the enumeration source of one can depend on the variables bound earlier, as in seq x in myDoubleDeckerSeq, y in x | y != 0 :: y.
The LHS of the _ in C syntax may be a CasePattern as in let expressions and variable declaration statements. This allows for easy destructuring, e.g. seq (k, v) in myMap.Items :: v, and can also be used to filter before binding variables: the example of extracting optional values becomes set Some(v) in myOptionals :: v or even just set Some(v) in myOptionals.

The only case this doesn't handle cleanly is the x in C1 || x in C2 application with the semantics of concatenating enumerations. There could be more syntax to handle this as well but I can't think of anything intuitive, and I don't think it's worth it.

MikaelMayer · 2022-04-11T16:14:59Z

Such domains can still bind multiple variables, and the enumeration source of one can depend on the variables bound earlier, as in seq x in myDoubleDeckerSeq, y in x | y != 0 :: y.

That's a very interesting idea, but I think it raises new questions. For example:

seq i in [0, 1, 2, 3, 4], y in x[F(i+2)], z in w[y] | i % 3 != 0 && y % 2  == 0 && z %2 == 1 :: z

How would you compile that? You cannot emit the range after iterating over the three variables, that would be under-performing at best, or even cause crashes at worst (for example if the condition of F(i+2) requires input % 2 != 2.
You cannot emit the range before iterating over the three variables. Hence, you'll have to find heuristically an interleaving.
But now, for verifications, you'll need this same interleaving, which is going to be a nightmare.

Consider now the much lower complexity of your original proposal, both for verification and compilation:

seq i: int, y: int, z: int | i in [0, 1, 2, 3, 4] && i % 3 != 0 && y in x[F(i+2)] && y % 2  == 0 && z in w[y] &&  z %2 == 1 :: z

Now it's clear how it is going to be both compiled and verified, just following the order of the operands.

If you are really concerned about differentiating condition from enumeration, you could add the binary operator "<-" to the set of expressions, which translates to "in", except for comprehensions.
So for example, the above could be written as this:

seq i: int, y: int, z: int | i <- [0, 1, 2, 3, 4] && i % 3 != 0 && y <- x[F(i+2)] && y % 2  == 0 && z <- w[y] &&  z %2 == 1 :: z

That way, it would be obvious that the operator "<-" is not tolerated in expressions other than comprehensions so they cannot be refactored.

Moreover, you could easily distinguish an enumeration from a pure test, so you could actually have this:

set i: int | i <- A && i in B

does not mean the same computationally as

set i: int | i in A && i <- B

The first one will enumerate A (and ensure i in A) before checking at run-time that i in A && i in B (and the first check could be optimized away)
The second will surely enumerate B (and ensure i in B) before checking at run-time that i in A && i in B (and the second check could be optimized away) and a clever IDE would surely mention a warning about defining enumeration after a test, that we can still infer a meaning but it's not obvious.

robin-aws · 2022-04-12T23:09:39Z

@MikaelMayer Yes, the new form would sacrifice some expressive power, in favour of avoiding the confusion of overloading boolean expressions with ordering/multiplicity semantics. In your example as written, I would expect the F(i+2) subexpression to be rejected as not well-formed, if F had a precondition that was only satisfied if i % 3 != 0 as specified in the range. That can be worked around by pulling the necessary filters into nested sequence comprehensions though, which still don't necessarily need to be materialized as actual sequence values at runtime:

seq i in (seq i in [0, 1, 2, 3, 4] | i % 3 != 0), 
    y in (seq y in x[F(i+2)] | y % 2  == 0), 
    z in w[y] | z % 2 == 1 
    :: z

I think this aligns well with the fact that the LHS of an && expression can be assumed true on the RHS, but Dafny does not currently go out of its way to find other true expressions from elsewhere in the program to try to make the RHS well-formed.

I should also say I'm hoping I can weaken the statement that "in expressions on bound variables define the enumeration source" to "in expressions on bound variables define the ordering of results". That would give the heuristics more leeway to choose a more efficient enumeration strategy, and allow cases where the variable type is ordered but not enumerable (which was previously a less crisp Future Possibility, but seems more natural with this version):

var setOfReals := {3.141, 2.718, 1.618};
// Ordered by the implicit `real` type of `x`
// At runtime, setOfReals is the enumeration source,
// but the results are then sorted.
var sortedList := seq x | x in setOfReals;  
assert sortedList == [1.618, 2.718, 3.141];

robin-aws · 2022-04-13T17:19:01Z

@cpitclaudel - I've been meaning to respond to your top-level comments for a while, apologies for the delay. :)

On the performance aspect, I think we shouldn't require iterators to get good performance. There is no reason why removing one value from a set should be a slow operation. In fact, it should be fast, even if we are talking about pure sets. Let's fix that.

It's not a question of the runtimes being inefficient though, it's a more fundamental problem with sets being immutable values. If you have the statement s := s - {x}, the simplest implementation of that is to create a copy of s with x removed, which involves allocating a new set and copying over all the other values. You could implement the subtraction lazily as sequence concatenation is for some backends, but that often just shifts the O(n) work to later on, and a loop to remove all elements as in the example still ends up taking quadratic time. You could make the runtime representation of sets mutable, but then you need to know that no other code has a reference to the old value of s anywhere, so you need either sophisticated aliasing analysis in the resolver or to augment the Dafny language itself with some notion of linear typing.

On determinism: at the moment, we don't need our sets, maps, etc. to have predictable ordering. Adding that is introduces a performance concern, and we should evaluate whether this is the best approach. Would it really be an issue if iteration order was unpredictable for sequences, even in expressions? You could still write sequence comprehensions in functions, and you could implement them "by method" as needed, allowing you to leverage the nondeterminism for performance and restore determinism as needed.

Making the ordering unpredictable in expressions leads to unsoundness unfortunately (and thanks to @RustanLeino for the example, which I've tweaked to use a sequence comprehension instead of :|):

function method Pick(s: set<int>): int
  requires s != {}
{
  (seq x in s)[0]
}

method Main() {
  var s0 := {6, 28, 496};
  var s1 := {6, 496} + {28};
  assert s0 == s1;
  var x, y := Pick(s0), Pick(s1);
  assert x == y;
  if x != y {
    var badnessEnsues := 5 / 0;
  }
}

Note that this last point about sequence comprehensions in functions is already true today: in fact, we have sequence comprehensions already! It's already possible to ask Dafny (in a function context, where performance does not matter) for a var s: seq<T> :| forall t: T :: t in s <==> P(t) (which is essentially a sequence comprehension with filter P). It would be nice to have better syntax for this in ghost function contexts, but do we really need it in function method context?

You're correct that you can describe any notion of sequence comprehensions in a specification context (your expression is weaker than what I'm proposing since it doesn't specify ordering or multiplicity, but I'm sure there's a more complicated version that does as Mikael outlined earlier :). I've definitely seen a lot of compiled Dafny code that needs to process set data somehow though, and the expressions we're talking about here will at best be extremely inefficient to calculate at runtime, and more likely not compilable at all.

cpitclaudel · 2022-04-13T17:28:52Z

the simplest implementation of that is to create a copy of s with x removed

Well, let's not use the simplest implementation then :) Pure sets are very fast in practice, and certainly don't need to be quadratic. A simple balanced tree guarantees O(log n) for all operations, and we can do much better if iteration order is left unspecified.

Making the ordering unpredictable in expressions leads to unsoundness unfortunately

Not more so than it does today with :|, which is sound and unpredictable. What am I missing?

robin-aws · 2022-04-13T20:16:25Z

the simplest implementation of that is to create a copy of s with x removed

Well, let's not use the simplest implementation then :) Pure sets are very fast in practice, and certainly don't need to be quadratic. A simple balanced tree guarantees O(log n) for all operations, and we can do much better if iteration order is left unspecified.

My bad, I'm talking about the simplest way to compile the source language semantics to the target runtime, not the simplicity of the runtime representation of set<T> itself. Let's assume for the moment that the target language uses an implementation called TreeSet<T>. Suppose the source Dafny to compile is:

var s := {1, 2, 3};
var x := 2;
var oldS := s;
s := s - {x};
print oldS;
print s;

The way the code generation currently works would produce output something like:

var s = new TreeSet<int>({1, 2, 3});
var x = 2;
var oldS = s;
var sTmp = new TreeSet<int>(s);
sTmp.Remove(x);
s = sTmp;
Console.Write(oldS); // {1, 2, 3}
Console.Write(s);    // {1, 3}

Having to make the fresh copy of s so we can remove x is the part that dominates the asymptotic runtime, regardless of how efficient the Remove implementation is. But we have to make the copy in case there is a reference to the original value somewhere, so this translation would not be valid:

var s = new TreeSet<int>({1, 2, 3});
var x = 2;
var oldS = s;
s.Remove(x);
Console.Write(oldS); // {1, 3} <- incorrect
Console.Write(s);    // {1, 3}

I was describing alternate translation strategies above for optimizing this, such as creating a TreeSetWithOneRemoved(s, x) lazy proxy of some kind, or doing the analysis to detect when a reference like oldS is floating around. But none of them are easy or without their downsides. I'll add this detail to the proposal so it's clearer why providing a higher level feature for this iteration is a better tradeoff.

Making the ordering unpredictable in expressions leads to unsoundness unfortunately

Not more so than it does today with :|, which is sound and unpredictable. What am I missing?

The missing part is that a :| instance has to be predictable if compiled, because it has to produce an actual concrete value. If I used var x :| x in s; x instead of (seq x in s)[0] in my example, then the verifier would produce the error "to be compilable, the value of a let-such-that expression must be uniquely determined", precisely so you don't get the unsound runtime behavior the example shows. The "Compiling Hilbert's epsilon operator" paper describes this restriction in detail, but I don't see it anywhere in the reference manual and we need to fix that! :)

This is why I'm proposing an implementation strategy for making more domains predictably enumerable, which as a side-effect would allow us to remove this uniquely-determined restriction on compiled :| expressions. I am tempted to factor that out into future work though.

MikaelMayer · 2022-04-14T20:52:17Z

The form you suggest is an interesting workaround, but 1) Unless you support iterator comprehensions, it will build all the intermediate data structures so we lose both performance and memory and 2) As a user, I don’t think it would be very friendly to write it this way, with a redundant “seq I in (seq I in” , and only for the last variable I can put a condition in the range and 3) “x in y” is still a predicate and people will still wonder why they can’t refactor it, it’s because as the same syntax as a predicate. 4) How would you encode integer ranges anyway, when you have the lower and upper bound? At this point, since this is a proposal, one might propose to just add the conditions between the ranges like this: seq i: int if 0 <= I <= 1000 && i % 3 != 0, y in x[F(i+2)] if y % 2 == 0), z in w[y] if z % 2 == 1 :: z But then, wait, this syntax is really isomorphic with your initial proposal, the only thing that differs is that we can explicit what is the range and what is the test in the presence of “in” expressions. Hence, I would love to have your feedback on introducing something like the new left arrow operator that indicates a “in” but with the hint that it’s going to be enumerated. From: Robin Salkeld ***@***.***> Sent: Tuesday, April 12, 2022 6:10 PM To: dafny-lang/rfcs ***@***.***> Cc: Mayer, Mikael ***@***.***>; Mention ***@***.***> Subject: Re: [dafny-lang/rfcs] foreach loops and sequence comprehensions (PR #9) @MikaelMayer<https://github.com/MikaelMayer> Yes, the new form would sacrifice some expressive power, in favour of avoiding the confusion of overloading boolean expressions with ordering/multiplicity semantics. In your example as written, I would expect the F(i+2) subexpression to be rejected as not well-formed, if F had a precondition that was only satisfied if i % 3 != 0 as specified in the range. That can be worked around by pulling the necessary filters into nested sequence comprehensions though, which still don't necessarily need to be materialized as actual sequence values at runtime: seq i in (seq i [0, 1, 2, 3, 4] | i % 3 != 0), y in (seq y in x[F(i+2)] | y % 2 == 0), z in w[y] | z % 2 == 1 :: z I think this aligns well with the fact that the LHS of an && expression can be assumed true on the RHS, but Dafny does not currently go out of its way to find other true expressions from elsewhere in the program to try to make the RHS well-formed. I should also say I'm hoping I can weaken the statement that "in expressions on bound variables define the enumeration source" to "in expressions on bound variables define the ordering of results". That would give the heuristics more leeway to choose a more efficient enumeration strategy, and allow cases where the variable type is ordered but not enumerable (which was previously a less crisp Future Possibility, but seems more natural with this version): var setOfReals := {3.141, 2.718, 1.618}; // Ordered by the implicit `real` type of `x` // At runtime, setOfReals is the enumeration source, // but the results are then sorted. var sortedList := seq x | x in setOfReals; assert sortedList == {1.618, 2.718, 3.141}; — Reply to this email directly, view it on GitHub<#9 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA3PFNYTYSA5COTMA3EB54TVEX7D7ANCNFSM5SEEJZWA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

cpitclaudel · 2022-04-14T21:50:57Z

Having to make the fresh copy of s so we can remove x is the part that dominates the asymptotic runtime, regardless of how efficient the Remove implementation is.

I'm not sure I agree. With the right TreeSet representation (specifically, an immutable treeset), copies are O(1); hence, the concern really is how efficient the remove operation is, and a decent implementation will give you log(n) on remove.

I was describing alternate translation strategies above for optimizing this, such as creating a TreeSetWithOneRemoved(s, x) lazy proxy of some kind, or doing the analysis to detect when a reference like oldS is floating around. But none of them are easy or without their downsides.

Eventually we do want to do this, but we don't need to right now: The C# ImmutableHashSet, for example, has O(1) copies and O(log n) removal. So even with our current implementation we can get good performance out of our existing loops. The default f# set, which is very similar to Dafny, is the same (https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/set.fs).

We could even get rid of the log factor if we didn't have to be deterministic — but unfortunately we do.

This is why I'm proposing an implementation strategy for making more domains predictably enumerable, which as a side-effect would allow us to remove this uniquely-determined restriction on compiled :| expressions.

I think that would be bad: when I write a function with x :| P(x) I really mean to say something about any such x, not a specific one. Should I really be able to prove anything about x after writing x :| P(x), beyond the fact that P(x) holds?

cpitclaudel · 2022-04-14T22:01:02Z

@MikaelMayer I'm not sold on 3) (in the operator and in the keyword being confusing): don't C# and Python do the same?

MikaelMayer · 2022-04-15T00:29:44Z

@MikaelMayer I'm not sold on 3) (in the operator and in the keyword being confusing): don't C# and Python do the same?

Indeed, maybe just I am confused. You win 3) :-)

robin-aws · 2022-05-12T20:41:44Z

Closing in favour of the revised version at #9 #10.

I've tried very hard to make sure all the comments have been responded to, if they aren't now moot, but please let me knew on the new PR if I've missed any!

MikaelMayer · 2022-05-16T13:59:58Z

Closing in favour of the revised version at #9.

You mean #10 ?

robin-aws · 2022-05-16T14:15:25Z

Closing in favour of the revised version at #9.

You mean #10 ?

That's the one. :) Thanks, edited.

robin-aws added 14 commits February 25, 2022 16:16

Notes so far

56a64bb

Tons more notes

10790f2

Backing up more work

8eb98a6

Progress on motivation and guide-level explanation

f0a48ef

Filled in most of guide-level explanation

f7bb69b

More detail

7ea59d7

Cleaning up content and more notes

ab02028

More back-half content

0a747d9

Starting to sketch out domain ordering rules

419236a

Starting to fill out reference guide explanation

aaf3208

Compilation strategy etc.

540a6f2

Expanded most remaining notes

89f85be

Remaining notes

7fb76e2

Add PR number

0347301

robin-aws requested a review from a team March 31, 2022 05:04

robin-aws mentioned this pull request Mar 31, 2022

sequence comprehensions are ugly dafny-lang/dafny#1332

Open

MikaelMayer reviewed Mar 31, 2022

View reviewed changes

Apply suggestions from code review

c286506

Co-authored-by: Mikaël Mayer <MikaelMayer@users.noreply.github.com>

keyboardDrummer reviewed Apr 1, 2022

View reviewed changes

cpitclaudel reviewed Apr 1, 2022

View reviewed changes

atomb reviewed Apr 1, 2022

View reviewed changes

cpitclaudel reviewed Apr 1, 2022

View reviewed changes

robin-aws mentioned this pull request Apr 1, 2022

Improve set/map implementation for Go dafny-lang/dafny#1897

Open

RustanLeino reviewed Apr 4, 2022

View reviewed changes

robin-aws self-assigned this Apr 5, 2022

This was referenced Apr 23, 2022

Iterating over sets in compiled code is slow dafny-lang/dafny#2062

Open

Support adapting a C# MemoryStream as a Dafny.Sequence dafny-lang/dafny#2067

Open

robin-aws mentioned this pull request May 12, 2022

Ordered quantification (foreach loops and sequence comprehensions - take 2) #10

Open

robin-aws closed this May 12, 2022

	seq x \| x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]
	seq x \| x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2, 1]

	seq x \| x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2]
	(seq x \| x in [1, 2, 2] && x in [2, 2, 1]) == [1, 2, 2, 2, 2]


		type SmallIndex = x: nat \| 0 <= x < 8

		foreach i: SmallIndex {

		Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop
		must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable,

		There is no builtin support for automatically tracking the enumerated values so far, as there is for
		`iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types).

foreach loops and sequence comprehensions #9

foreach loops and sequence comprehensions #9

Conversation

robin-aws commented Mar 31, 2022 • edited Loading

MikaelMayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robin-aws commented Mar 31, 2022

keyboardDrummer Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

keyboardDrummer Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelgruetter commented Apr 1, 2022

cpitclaudel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robin-aws commented Mar 31, 2022 •

edited

Loading

keyboardDrummer Apr 1, 2022 •

edited

Loading

keyboardDrummer Apr 1, 2022 •

edited

Loading

keyboardDrummer Apr 1, 2022 •

edited

Loading

keyboardDrummer Apr 4, 2022 •

edited

Loading

keyboardDrummer Apr 1, 2022 •

edited

Loading

keyboardDrummer Apr 1, 2022 •

edited

Loading

atomb commented Apr 1, 2022 •

edited

Loading