Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick sort and related Arrays utilities #58

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

prvshah51
Copy link
Contributor

Fix #54
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@prvshah51 prvshah51 changed the base branch from master to cpitclaudel_triggers October 21, 2022 20:39
@prvshah51 prvshah51 changed the base branch from cpitclaudel_triggers to master October 21, 2022 20:39
cmp(t0, t1)
}

predicate Complete??(t0: T, t1: T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the ?? suffix. We may want to engage in some bikeshedding about style here before we introduce it. Based on my experience even Valid? below is not common, Valid is much more common and is even baked into the language through {:autocontracts}.

I'm not necessarily to using ? in user-defined constructs, I'd just like to agree on concrete guidance in the style guide on it. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see where the idea came from at least - you almost want to overload Complete? to accept either a pair or a set.

Could this be CompleteOnPair perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the ?? was due to have two layers. Happy with calling them Complete and Complete' (the OCaml style) or Complete1 (the Lisp style) or even Complete_ or Complete2. CompletePair is long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's go with removing all of the single ? suffixes (except for Le? and Ge? above), and replacing all of the double ?? suffixes with a ' instead.

So in this case Complete?? will become Complete', and Complete? will become Complete.

Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So naturally I've focussed most of my reviewing capacity on arguing about question marks. :) I do actually think it's a really important point for readability though, so I'd love to nail that down first before I look more deeply at the proofs.

src/Collections/Sets/Sets.dfy Outdated Show resolved Hide resolved
src/Collections/Arrays/Arrays.dfy Outdated Show resolved Hide resolved
src/Collections/Arrays/Arrays.dfy Outdated Show resolved Hide resolved
reads arr
ensures t in arr[lo..hi]
{
arr[lo] // TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either implement this or drop the function and just inline this choice of pivot above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's implement it — it's not hard and it's an important part of making the algorithm safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per offline conversation we decided not to block on this, but in that case we should add a comment on QuickSort that it doesn't yet have the intended O(n log n) worst case runtime yet (which is what @cpitclaudel was referring to as "safe").

}
}

lemma Sortable_Slice(arr: array<T>, lo: int, hi: int, lo': int, hi': int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lemma Sortable_Slice(arr: array<T>, lo: int, hi: int, lo': int, hi': int)
lemma SortableSlice(arr: array<T>, lo: int, hi: int, lo': int, hi': int)

Only instance of an underscore I see here and it really sticks out.

Also consider SortableSubslice instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortableSlice or SortableSubslice both makes sense putting Subslice describes it better.

import opened Comparison
import opened Wrappers

trait Sorter<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs an example in examples to demonstrate using it. I assume you have to not only provide a concrete class that implements Sorter<T> and fills in Cmp, but perhaps even has to invoke a lemma or two from the trait in order to prove that it works.

Is it possible to also provide a PredicateSorter class that is constructed around a (T, T) -> Cmp function value? Perhaps it could even be a partial function if you also provide a ghost set of all values you intend to sort with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, I'd strongly prefer to have an example, but the PredicateSorter idea can wait for a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yikes, I went to write an example just to play around with it and immediately got the A class may only extend a trait in the same module, unless the parent trait is annotated with {:termination false} error. That means this is currently useless unless we add {:termination false}.

Unfortunately we should treat that as a hard blocker given there's already been resistance to putting {:termination false} on any standard library code: #22 And especially given the discovery of dafny-lang/dafny#2500 I'm not inclined to push back on that.

We can think about whether there's a good alternative for this code that doesn't use traits, but it might be better to stick with this if it's good UX and at least wait for the solution to dafny-lang/dafny#2500.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not right: the proper API to use is PredicateSorter, which is a regular class, defined right below Sorter. That doesn't require :termination false.

Original code here, with examples: https://github.com/dafny-lang/compiler-bootstrap/blob/4f616822209828e48cf63d3da66ee1c010f689d4/src/Utils/Lib.Sort.dfy#L396-L420

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cpitclaudel, I see the issue now is this PR doesn't include the PredicateSorter class. We should just add that as well.

src/Comparison.dfy Outdated Show resolved Hide resolved
Comment on lines +23 to +24
const Le? := this != Gt
const Ge? := this != Lt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like this use of ? at least, since it's a very natural extension of the built-in Lt?/Eq?/Gt? discriminator syntax.

Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts)
}

predicate Sorted(sq: seq<T>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this have a ? as well then?

I'll be happiest if we come up with a simple rule for where ? belongs. I definitely don't think it should be every predicate, especially since changing Valid to Valid? will be very disruptive for arguably not a lot of benefit. It looks like the pattern here is if the predicate is "about" the receiver rather than the arguments, but that feels very fuzzy to me, and I don't find it improves readability personally.

Perhaps to start, the straw-person proposal could be to only use it in symbols that aren't used like function calls, such as the Le? and Ge? constants above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment.

cmp(t0, t1)
}

predicate Complete??(t0: T, t1: T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see where the idea came from at least - you almost want to overload Complete? to accept either a pair or a set.

Could this be CompleteOnPair perhaps?

requires 0 <= lo <= hi <= arr.Length
reads arr
{
multiset(arr[lo..hi]) == old(multiset(arr[lo..hi]))
Copy link
Contributor Author

@prvshah51 prvshah51 Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember adding this to almost every every function method in MergeSort.dfy will need to replace that with shuffled instead.

Co-authored-by: Robin Salkeld <salkeldr@amazon.com>
@prvshah51 prvshah51 self-assigned this Nov 29, 2022
import opened Comparison
import opened Wrappers

trait Sorter<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, I'd strongly prefer to have an example, but the PredicateSorter idea can wait for a future PR.

cmp(t0, t1)
}

predicate Complete??(t0: T, t1: T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's go with removing all of the single ? suffixes (except for Le? and Ge? above), and replacing all of the double ?? suffixes with a ' instead.

So in this case Complete?? will become Complete', and Complete? will become Complete.

{}

}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}

}

predicate {:opaque} Valid?(ts: set<T>) {
Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts)
Complete?(ts) && */ Transitive?(ts)

Just following the general rule of not checking in commented out code. We could put a comment saying something like "we want to include Antisymmetric(ts) as well but it isn't necessary and makes verification timeout" (for example, I don't actually know in this case).

}

predicate Complete??(t0: T, t1: T) {
cmp(t0, t1) == cmp(t1, t0).Flip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It threw me a bit that I didn't recognize this isn't one of the standard properties of total orderings, until I realized that "Transitive" really meant "cmp(...).Le? is a transitive binary relation" and similarly for "Reflexive". Is "complete" a standard term for this property of this kind of comparator?

I expect at the point where we try to connect Relations to this version, we'll have a lemma that proves that C.Valid(s) ==> Relations.TotalOrdering((t1, t2) => C.cmp(t1, t2).Le?)

ensures Shuffled(arr, lo, hi)
{}

twostate predicate SameElements(arr: array<T>, lo: int, hi: int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider UnchangedExceptSliceShuffled instead.

reads arr
ensures t in arr[lo..hi]
{
arr[lo] // TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per offline conversation we decided not to block on this, but in that case we should add a comment on QuickSort that it doesn't yet have the intended O(n log n) worst case runtime yet (which is what @cpitclaudel was referring to as "safe").

Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts)
}

predicate Sorted(sq: seq<T>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment.

Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I've realized this won't work as is and can't be made to work safely with current Dafny limitations - see my first last comment. We'll have to pause on this for a while.

cmp(t0, t1)
}

predicate CompleteonPair(t0: T, t1: T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predicate CompleteonPair(t0: T, t1: T) {
predicate Complete'(t0: T, t1: T) {

CompleteonPair was my earlier not-as-good suggestion, let's do Complete' instead :)


function method Cmp(t0: T, t1: T): Cmp

twostate predicate UnchangedSlice(arr: array<T>, lo: int, hi: int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just food for thought, or a potential learning exercise: It might make all of this more readable if we had an actual ArraySlice datatype to pass around containing these values. It's a great example of the classic Design Pattern of introducing a type to hold collections of values that are always passed around together. Something like:

  datatype ArraySlice_<T> = ArraySlice_(arr: array<T>, lo: int, hi: int) {
    predicate Valid() {
      0 <= lo <= hi <= arr.Length
    }

    twostate predicate Unchanged() 
      requires Valid()
      reads arr
    {
      arr[lo..hi] == old(arr[lo..hi])
    }

    // etc.
  }

  type ArraySlice<T> = a: ArraySlice_<T> | a.Valid() witness *

AFAICT this would be useful even if we only used it in ghost code to avoid any runtime cost (and even then it's possible we could optimize the wrapper away in the future, just as we now do for single-field datatypes).

import opened Comparison
import opened Wrappers

trait Sorter<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yikes, I went to write an example just to play around with it and immediately got the A class may only extend a trait in the same module, unless the parent trait is annotated with {:termination false} error. That means this is currently useless unless we add {:termination false}.

Unfortunately we should treat that as a hard blocker given there's already been resistance to putting {:termination false} on any standard library code: #22 And especially given the discovery of dafny-lang/dafny#2500 I'm not inclined to push back on that.

We can think about whether there's a good alternative for this code that doesn't use traits, but it might be better to stick with this if it's good UX and at least wait for the solution to dafny-lang/dafny#2500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: QuickSort (or any other more efficient alternative to MergeSort)
3 participants