Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding Merge Sort and Binary Search #49

Merged
merged 33 commits into from
Sep 16, 2022
Merged

Conversation

prvshah51
Copy link
Contributor

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! Here are some top-level suggestions (which will make some of the inline comments I made moot, but I added those anyway just for the sake of learning).

  1. This code should be under src rather than examples - the latter is for sample uses of the actual functions/methods. I would suggest src/Collections/Arrays/Arrays.dfy to align with the established layout. The additional total ordering predicates on compare would fit nicely into the existing src/Functions.dfy file.

  2. For sorting, I think proving HeapSort correct will take quite a bit of work. Why not steal the MergeSort implementation I did for the report generator instead? https://github.com/dafny-lang/dafny-reportgenerator/blob/main/src/StandardLibrary.dfy#L38 It definitely won't be as efficient given that it creates lots of intermediate subsequences (especially since most runtimes implement that as a copy instead of a reference), but it's already proven correct, and having that sorting algorithm here will still be an improvement over not having any sorting algorithm. :) You'll just have to replace calls to f(x) < f(y) with compare(x, y) just as you already have in BinarySearch.




method BinarySearch<T>(a: array<T>, key: T,compare: (T, T) -> bool) returns (r: int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to return an Option<nat> instead, so we don't have to use -1 as a special flag value.

Comment on lines 19 to 21
var temp:= a[0];
a[0]:= a[i];
a[i]:= temp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var temp:= a[0];
a[0]:= a[i];
a[i]:= temp;
a[0], a[i] := a[i], a[0];

// Implemetation of Heap sort
include "../src/Collections/Sequences/Seq.dfy"

method SortArray<T>(a: array<T>, compare: (T, T) -> bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name this HeapSortArray to be more precise, since there are other ways to sort (and may even be more in this repo in the future)



method BinarySearch<T>(a: array<T>, key: T,compare: (T, T) -> bool) returns (r: int)
requires forall i,j :: 0 <= i < j < a.Length ==> compare(a[i],a[j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the pre-conditions that ensure compare is a total ordering, which is why some of the assertions about correctness aren't verifying. You should be able to copy them here with proper attribution (I know all of the authors on that file would be happy to have their code in this repo :) - https://github.com/aws/aws-encryption-sdk-dafny/blob/mainline/src/Util/Sorting.dfy#L126-L127

You'll need exactly the same requires clauses on the sorting methods too.

include "../src/Collections/Sequences/Seq.dfy"

method SortArray<T>(a: array<T>, compare: (T, T) -> bool)
requires multiset(a[..]) == multiset(old(a[..]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an ensures instead - it's describing the state of the array AFTER executing the method.

@prvshah51 prvshah51 requested a review from atomb August 16, 2022 20:16
{}

function method MergeSortBy<T>(a: seq<T>, compare: (T, T) -> bool ): (result :seq<T>)
ensures multiset(a) == multiset(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also probably need a postcondition that the result is sorted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to add SortedBy in a postcondition here, you'll need to export it from the module, as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think you'll also need these preconditions:

requires StandardLibrary.Transitive(compare)
requires Connected(compare)

src/Collections/Sequences/Sorting.dfy Outdated Show resolved Hide resolved
requires StandardLibrary.Transitive(compare)
requires Connected(compare)
{
MergeSortBy(s,compare)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Sort is a method, you'll need something like result := MergeSortBy(s, compare);.

}

method BinarySearch<T>(a: array<T>, key: T,compare: (T, T) -> bool) returns (r : nat)
requires forall i,j :: 0 <= i < j < a.Length ==> compare(a[i], a[j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to do this to allow the array to contain duplicates.

Suggested change
requires forall i,j :: 0 <= i < j < a.Length ==> compare(a[i], a[j])
requires forall i,j :: 0 <= i < j < a.Length ==> compare(a[i], a[j]) || a[i] == a[j]

ensures 0 <= r ==> r < a.Length && a[r] == key
ensures r < 0 ==> key !in a[..]
requires StandardLibrary.Transitive(compare)
requires Connected(compare)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method requires an even more obscure property of the compare function. :)

requires StandardLibrary.Trichotomous(compare)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it turns out that if you have that precondition you don't need Transitive or Connected.

{
}

method BinarySearch<T>(a: array<T>, key: T,compare: (T, T) -> bool) returns (r : nat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the spec is written right now, this method should return int instead of nat, though making it return Option<nat> might be even nicer, as I mention below.

@prvshah51 prvshah51 changed the title adding Heap Sort and Bianry Search adding Heap Sort and Binary Search Aug 17, 2022
@prvshah51 prvshah51 requested a review from atomb August 17, 2022 15:17
@prvshah51 prvshah51 marked this pull request as ready for review August 17, 2022 18:54
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! I just suggested a few small changes for consistency and removal of duplication.

Eventually, I suspect we should split up StandardLibrary.dfy to put some of its contents in the more specialized modules that already exist (and maybe some new ones). For now, though, this at least makes a bunch more definitions available.




module BinarySearch{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module BinarySearch{
module BinarySearch {

import opened Wrappers = StandardLibrary.Wrappers


predicate Reflexive<T(!new)>(R: (T, T) -> bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predicate Reflexive<T(!new)>(R: (T, T) -> bool) {
predicate Reflexive<T(!new)>(R: (T, T) -> bool) {

And I'd suggest similar changes throughout this file to ensure the indentation is consistent.

include "../../UInt.dfy"
include "../Sequences/Seq.dfy"

module sorting{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module sorting{
module Sorting {

import opened UInt = StandardLibrary.UInt
import opened Seq

predicate Reflexive<T(!new)>(R: (T, T) -> bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate and a bunch of those after it are duplicated between here and the BinarySearch module. Maybe they should go in StandardLibrary.dfy until we reorganize it to split it up a little more?

src/UInt.dfy Outdated

module StandardLibrary.UInt {

// TODO: Depend on types defined in dafny-lang/libraries instead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment probably isn't needed anymore. :)


module BinarySearch {
export
provides StandardLibrary, UInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want provides BinarySearch, too, so that people can use it.

import opened UInt = StandardLibrary.UInt
import opened Wrappers = StandardLibrary.Wrappers

method BinarySearch<T>(a: array<T>, key: T, compare: (T, T) -> bool) returns (r:Option<nat>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have this indented two spaces to be consistent with the imports above it.

@prvshah51 prvshah51 requested a review from atomb August 18, 2022 16:35
atomb
atomb previously approved these changes Aug 18, 2022
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done!

@atomb atomb dismissed robin-aws’s stale review August 18, 2022 17:11

I believe your comments have been addressed and that you'd prefer to have this merged sooner than later. :)

@robin-aws
Copy link
Member

@prvshah51 I've pushed several commits to remove more code we don't need (yet!), and made a few other changes to get it ready to merge IMO:

  • After discussion with @atomb I've removed Comparison.dfy and stuck to plain old (T, T) -> bool relations, which simplifies things for now.
  • Moved Relations.dfy to the top-level src directory (since we also have Functions.dfy there already)
  • Added an example that sorts and then searches, which also revealed that the "sorted" precondition on BinarySearch wasn't correct!

Let me know if you have any concerns, and @atomb / @MikaelMayer I'll leave it to you two to approve since I feel a little biased now that I've thrown my own commits on this. :)

@robin-aws robin-aws self-requested a review September 13, 2022 21:36
@robin-aws robin-aws dismissed their stale review September 13, 2022 21:37

Addressed own suggestions, recusing myself as biased :)

@robin-aws
Copy link
Member

@MikaelMayer I just noticed you had some style requests that were lost in the shuffle. I'll respond to the ones I see that still seem relevant, but please just provide a quick comment at the top level if there are others we missed.

Re: the naming convention for predicates, e.g. "IsTotalOrder" instead of "TotalOrder", that's potentially a good idea but the latter convention is already established here in predicates like Functions.Injective. I'd suggest if you still feel strongly about it that you cut a repo issue for discussion, similar to #47.

Re: "lessThan" instead of "less", I don't have a strong opinion so I'd be willing to change it, personally. @prvshah51?

@prvshah51
Copy link
Contributor Author

prvshah51 commented Sep 14, 2022

Re: "lessThan" instead of "less", I don't have a strong opinion so I'd be willing to change it, personally. @prvshah51?

Both options are fine with me.

import opened Relations

//Splits a sequence in two, sorts the two subsequences using itself, and merge the two sorted sequences using `MergeSortedWith`
function method MergeSortBy<T>(a: seq<T>, compare: (T, T) -> bool): (result :seq<T>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function method MergeSortBy<T>(a: seq<T>, compare: (T, T) -> bool): (result :seq<T>)
function method MergeSortBy<T>(a: seq<T>, lessEqThan: (T, T) -> bool): (result :seq<T>)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perhaps worth noting that we could change this in the future without breaking client code. I'd maybe prefer lessEq or lessThanOrEq, to avoid the awkward reading of "equal than".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestions , I am changing compare to lessThanOrEq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikaelMayer I just noticed you had some style requests that were lost in the shuffle. I'll respond to the ones I see that still seem relevant, but please just provide a quick comment at the top level if there are others we missed.

Re: the naming convention for predicates, e.g. "IsTotalOrder" instead of "TotalOrder", that's potentially a good idea but the latter convention is already established here in predicates like Functions.Injective. I'd suggest if you still feel strongly about it that you cut a repo issue for discussion, similar to #47.

Re: "lessThan" instead of "less", I don't have a strong opinion so I'd be willing to change it, personally. @prvshah51?

Could you use the attribute {:options "/functionSyntax:4"} and change all your predicate method to predicate ?

function method {:opaque} Sort(s: seq, compare: (T, T) -> bool)
instead of a method

Add the following comment to MergeSortBy
// Splits a sequence in two, sorts the two subsequences using itself, and merge the two sorted sequences using MergeSortedBy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the attribute {:options "/functionSyntax:4"} and change all your predicate method to predicate ?

function method {:opaque} Sort(s: seq, compare: (T, T) -> bool)
instead of a method

I did that for MergeSort.dfy, but for Relations.dfy SortedBy was the only predicate method, and I actually made that a non-compiled predicate instead like the others. As per my comment on the original version in dafny-reportgenerator:

// TODO: Make this a predicate-by-method so it has linear runtime instead of quadratic.
// For now it's only used at runtime in tests.

This change doesn't need it to be compilable yet so I'd rather address that later.

Add the following comment to MergeSortBy
// Splits a sequence in two, sorts the two subsequences using itself, and merge the two sorted sequences using
MergeSortedBy

It's already there? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to use {:options "/functionSyntax:4"} ran into what looks like a bug in Dafny 3.5.0 (which this repo is using) that I can't reproduce with the latest. I've reverted it for now and we can come back to that too: #55

atomb
atomb previously approved these changes Sep 14, 2022
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice simplification of the original. :) Ordering continues to be a tricky topic, and I left a couple of comments related to it, though I'm not sure that any of them should hold up merging.

* Note that being trichotomous implies being irreflexive.
*/
predicate Trichotomous<T(!new)>(less: (T, T) -> bool) {
(forall x, y :: less(x, y) || x == y || less(y, x)) && // at least one of the three
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I commented earlier about defining this for a comparison result data type, @cpitclaudel pointed out very rightly that my first attempt wouldn't work for comparison based on projection (e.g., looking only at the first element of a tuple), and I think this definition suffers from the same problem, using == here. That's not necessarily fatal, but it's something to be aware of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but I notice this isn't actually used anywhere at this point. Should we remove it, given that issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'd prefer to remove it too now that it's unused.

import opened Relations

method BinarySearch<T>(a: array<T>, key: T, less: (T, T) -> bool) returns (r: Option<nat>)
requires SortedBy(a[..], (x, y) => less(x, y) || x == y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have the same issue I commented on for Trichotomous, making it impossible to sort by projections. Would it be possible to make the proof go through if we required TotalOrdering instead of StrictTotalOrdering?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted that at first, but I was discouraged not only by verification not immediately working, but also the fact that it makes the specification messier as well: instead of

ensures r.Some? ==> r.value < a.Length && a[r.value] == key

You would need

ensures r.Some? ==> r.value < a.Length && lessThanOrEqual(a[r.value], key) && lessThanOrEqual(key, a[r.value])

Which is just plain much harder to read and understand.

I think for that reason, it makes more sense to improve this at the same time that we add comparison functions that return CmpResult instead as per #53.

Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve but I still don't feel comfortable having a comparison function called twice in a binary search.

@robin-aws
Copy link
Member

I approve but I still don't feel comfortable having a comparison function called twice in a binary search.

I definitely don't love it either. Both of these implementations are far from ideal when it comes to efficiency at runtime. I'm personally much more dissatisfied with my own MergeSort here rather than the binary search: it's using s[..1] expressions to recurse, and with the current compilation behavior that involves making a new copy, so the runtime is far worse than log-linear.

This whole repository needs more work to ensure it compiles to efficient-enough code for each supported language (see #24). In fact we're not even guaranteeing it is compilable at all for each target - it doesn't work for Java and hence the existence of https://github.com/dafny-lang/libraries-without-variance.

In general, I consider it good forwards progress to have a verified but inefficient definition of a concept here, since many users can live with the inefficiency in their context. But I'm looking forward to adding a lot of by method's in the future! :)

@robin-aws robin-aws merged commit bfbdda8 into dafny-lang:master Sep 16, 2022
@prvshah51 prvshah51 mentioned this pull request Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants