Add `/randomSeedIterations` flag for verification stability analysis #567

atomb · 2022-04-19T23:00:59Z

This PR adds a new flag, /randomSeedIterations:<n>, which instructs Boogie to send each verification condition to the SMT solver n times, with the goal of analyzing the stability of proof across those n iterations. When used with /vcsCores:<n>, the iterations can be evaluated in parallel.

From the user perspective, the output of /trace and /xml now includes iteration numbers along with assertion batch indices.

Fixes #494.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

We want to be able to check one split with multiple checkers, and this fixed link between Checker and Split is easy to remove.

This doesn't yet include the logic to ensure that each attempt uses a different random seed, which will be necessary for real stability analysis.

As long as /randomSeed is provided. Otherwise, do no randomization.

This makes it easier to keep track of what’s going on when producing multiple results from a single split.

Previously, some would occasionally be lost.

keyboardDrummer · 2022-04-20T09:13:04Z

It's most useful in conjunction with /randomSeed

Is it any use without /randomSeed ?

in which case it'll use a different random seed for each iteration of Boogie -> SMT translation

Shouldn't it always do that, even if no /randomSeed is specified ?

keyboardDrummer · 2022-04-20T09:18:44Z

Source/VCGeneration/Checker.cs

@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit)
    /// </summary>
    private void Setup(Program prog, ProverContext ctx, Split split = null)
    {
-      SolverOptions.RandomSeed = split?.RandomSeed ?? Options.RandomSeed;
+      if (Options.VcsStabilityIterations < 2 || SolverOptions.RandomSeed == null) {


Do we need the Options.VcsStabilityIterations < 2 ?

Would SolverOptions.RandomSeed ??= split?.RandomSeed ?? Options.RandomSeed not work?

Consider not checking Options.VcsStabilityIterations:

private void Setup(Program prog, ProverContext ctx, Split split = null) { if (SolverOptions.RandomSeed == null) { SolverOptions.RandomSeed = split?.RandomSeed ?? Options.RandomSeed } else { SolverOptions.RandomSeed = new Random(SolverOptions.RandomSeed.Value).Next(); } var random = SolverOptions.RandomSeed == null ? null : new Random(SolverOptions.RandomSeed.Value); Program = prog; .. }

The first time I wrote it, I did exactly that, but it makes it impossible to use different random seeds for different methods using the {:random_seed N} attribute (which we fortunately have a test to check!).

keyboardDrummer · 2022-04-20T09:23:52Z

Source/VCGeneration/Checker.cs

@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit)
    /// </summary>
    private void Setup(Program prog, ProverContext ctx, Split split = null)


I wonder whether this code leads to nondeterministic behavior when you're running multiple splits in parallel and multiple stability iterations. The splits can share the same checker so they might influence each other's random seeds. That might not be a problem since we're doing stability analysis so determinism is not the goal, but not being able to get reproducible output might be concerning to some.

If you want deterministic behavior, update the RandomSeed of the Split on each iteration, and set SolverOptions.RandomSeed using that.

Ah, interesting idea. I'll look into that.

On further thought, I think that this approach would have the same issue that the comment above does: methods explicitly annotated with {:random_seed N} would instead have a random seed that would evolve over time (though in a predictable way). As it's currently written, the non-determinism you mention would only occur in the case where /vcsStabilityIterations is specified in which case I think it's at least arguably desirable.

I agree with your assessment, but still I believe that any CLI behavior which is significant, such as whether a Split has verification failures on some iterations, preferably is reproducible by calling the CLI again with the same arguments. I think you could achieve that with minor changes.

Yeah, I think you're right. I came up with something that should achieve that while having slightly simpler logic overall.

keyboardDrummer · 2022-04-20T09:28:57Z

Source/VCGeneration/ConditionGeneration.cs

@@ -524,14 +525,16 @@ void ObjectInvariant()
        Contract.Invariant(cce.NonNullElements(vcResults));
      }

-      public readonly List<Counterexample> /*!>!*/ examples = new List<Counterexample>();
-      public readonly List<VCResult> /*!>!*/ vcResults = new List<VCResult>();
+      public readonly ConcurrentQueue<Counterexample> /*!>!*/


Oof, this line cutoff is so narrow! What's the scenario in which we want 80 chars maximum?

I often view code in an 80-character terminal, though I don't have a strong preference about that. But the main reason for this is that I originally wrote it as a diff on top of the code that existed before you joined that line. ;)

Fixed by simplifying the initialization expressions to that they're on one <80 character line. :)

You can make the RHS of these statements just new ();, which saves quite a few characters!

Yep, that's what I did!

keyboardDrummer · 2022-04-20T09:29:42Z

Source/VCGeneration/ConditionGeneration.cs

+      public readonly ConcurrentQueue<Counterexample> /*!>!*/
+        examples = new ConcurrentQueue<Counterexample>();
+      public readonly ConcurrentQueue<VCResult> /*!>!*/
+        vcResults = new ConcurrentQueue<VCResult>();

      public override void OnCounterexample(Counterexample ce, string /*?*/ reason)


Mixing counterexamples from different iterations might be confusing. Can we avoid that?

I already find the current structure a bit confusing:

record VerificationResult(List<Counterexample> Errors, List<VCResult> VCResults); record VCResult(List<Counterexample> Counterexamples, List<AssertCmd> Asserts, TimeSpan RunTime);

There's two lists of Counterexample in there!

I'd prefer to be more explicit:

record VerificationResult(List<SplitResult> SplitResults); record SplitResult(List<AssertCmd> Asserts, List<IterationResult> IterationResult); record IterationResult(List<Counterexample> Counterexamples, TimeSpan RunTime);

With the option of later renaming Split and SplitResult to AssertionBatch and AssertionBatchResult

I like this approach. I'll give it a try.

Unfortunately, the implementation is a bit more intricately tied to that design than I expected. In particular, there's a notion of "recycled failing assertions" that I don't fully understand but that depends on counterexamples that aren't tied to our current notion of splits. Houdini also has some complex interactions with this list of counterexamples.

So, although I'd love to get rid of that duplicated list of counterexamples, I think it'll take a good bit more investigation to figure out how to do that without breaking things.

When using /vcsStabilityIterations, this does mean that errors (including counterexamples) will be reported more than once. However, after thinking about it a bit more, I think that's the right thing to do. Different iterations might produce different counterexamples, and those differences could be informative.

It is a bit unfortunate that all of these counterexamples are aggregated into a single list. However, it was already the case that counterexamples from different splits were aggregated into that list. So, although this PR doesn't fix that issue, I'd argue that it doesn't really make it worse, either.

When using /vcsStabilityIterations, this does mean that errors (including counterexamples) will be reported more than once. However, after thinking about it a bit more, I think that's the right thing to do.

When we previously looked at Excel reports to analyse Dafny verification stability, one of the key indicators we looked at was % of iterations that had errors. I think it would be good to be able to report the # of errors per iteration. Wouldn't concatenating errors from all iterations in one list prevent that?

Also, I don't see a use-case for showing errors including the detailed error message when using /vcsStabilityIterations. However, if you do it, then wouldn't you want to remove duplicates?

How about we leave the list of counterexamples in VerificationResult that is operated on by Houdini and RecycledFailedAssertions, but update the list that's only tied to Splits?

record VerificationResult(List<Counterexample> Counterexamples, List<SplitResult> SplitResults); // here we leave in the existing list. record SplitResult(List<AssertCmd> Asserts, List<IterationResult> IterationResult); record IterationResult(List<Counterexample> Counterexamples, TimeSpan RunTime);

Each VCResult has an outcome, so it'll still be easy to determine what % have errors. Any client that doesn't want the aggregate list of counterexamples can ignore it, and just use the shorter, nested lists. That aggregate list exists only for backward compatibility, and I hope we can eventually remove it, I just don't yet fully understand how to do so without breaking some existing functionality.

When it comes to what Boogie prints, my mental model of the feature, and the one I can imagine most easily communicating, is that it's like running Boogie multiple times, but interleaved. And with that model, seeing the same error multiple times makes sense.

I'm not 100% sure I understand your last comment. If what you're suggesting is that we package the aggregate list of counterexamples in the top-level VerificationResult, so it's not hanging out on its own, I think that makes sense.

More specifically, I'm not quite sure what you mean by "update the list that's only tied to Splits".

Oh, VerificationResult already includes the list of counterexamples, so there's nothing to do there, unless you had something different in mind with that last comment.

I haven't created a distinction between VCResult/SplitResult and IterationResult. Although doing so would make the data type slighlty more intuitive, it would make most of the clients of that data type more complex without a clear benefit. Grouping a list of VCResult by the vcNum field can achieve the same outcome, in cases where it's useful. I did just make a change to avoid duplicating identical lists of AssertCmd between VCResults, however.

Source/VCGeneration/SplitAndVerifyWorker.cs

atomb · 2022-04-20T16:19:00Z

It's most useful in conjunction with /randomSeed

Is it any use without /randomSeed ?

For debugging, it can be somewhat useful to remove the randomness factor. It is somewhat unfortunate that the current command-line option structure makes that the default, though.

in which case it'll use a different random seed for each iteration of Boogie -> SMT translation

Shouldn't it always do that, even if no /randomSeed is specified ?

Arguably, yes. Boogie would have to invent a random seed in that case, but either a fixed or random value could be reasonable.

This is in preparation for ultimately having all counterexamples live in the `VCResult` record, rather than in a separate list in a `VerificationResultCollector` instance. It’ll take some more serious refactoring to make that work, though.

atomb · 2022-04-20T18:46:31Z

Shouldn't it always do that, even if no /randomSeed is specified ?

Arguably, yes. Boogie would have to invent a random seed in that case, but either a fixed or random value could be reasonable.

I've now adapted it to assume /randomSeed:0 if there was no /randomSeed option. That way we always get some variation but deterministically so. The debugging use case probably isn't worth focusing on.

robin-aws · 2022-04-20T19:03:50Z

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?

I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

atomb · 2022-04-20T19:14:13Z

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?

I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too?

robin-aws · 2022-04-20T22:22:58Z

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?
I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too?

Definitely, I think there's tons of different ways of perturbing the verification across different runs. If the option is named something more general like /vcsStabilityIterations:<n> it implies that we'll need a lot more knobs in the future for exactly how we get those n variants. I'm a little worried that we'll want more precise other options like (don't take these too literally) /vcsArithRewriteDepth:<m> and /vcsRenamings:<k>, at which point that top-level n might be redundant because you'll end up with m * k variants based on the other options.

I could see this getting complicated enough that we start to want a configuration file of some kind. If we think that it's worth the efficiency gain to have a single Boogie process do multiple verification runs, the downside is we almost need a DSL to describe what an external script would have done with multiple calls to Boogie instead.

atomb · 2022-04-20T22:41:28Z

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?
I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too?

Definitely, I think there's tons of different ways of perturbing the verification across different runs. If the option is named something more general like /vcsStabilityIterations:<n> it implies that we'll need a lot more knobs in the future for exactly how we get those n variants. I'm a little worried that we'll want more precise other options like (don't take these too literally) /vcsArithRewriteDepth:<m> and /vcsRenamings:<k>, at which point that top-level n might be redundant because you'll end up with m * k variants based on the other options.

Ah, right! At first I thought you were concerned that the original name was too narrow, but you were concerned that it was too broad. I agree, and I've changed it to /randomSeedIterations.

keyboardDrummer · 2022-04-21T08:55:25Z

Source/ExecutionEngine/CommandLineOptions.cs

+                Attempt to prove each VC n times with n random seeds. If
+                /randomSeed has been provided, each proof attempt will use
+                a new random seed derived from this original seed. If not,
+                it will implicitly assume /randomSeed:0 to ensure difference


Consider saying use instead of assume

atomb added 7 commits April 13, 2022 06:35

Remove Checker and ProverTask from Split

82a0da2

We want to be able to check one split with multiple checkers, and this fixed link between Checker and Split is easy to remove.

Add option to prove each VC multiple times

cc68cfd

This doesn't yet include the logic to ensure that each attempt uses a different random seed, which will be necessary for real stability analysis.

Different random seed for each stability iteration

9752010

As long as /randomSeed is provided. Otherwise, do no randomization.

Use extra randomness only with stability analysis

f07ce42

Organize splits by index and iteration

3a74f4b

This makes it easier to keep track of what’s going on when producing multiple results from a single split.

Use concurrent collections for VC results, cels

ea8105e

Previously, some would occasionally be lost.

Merge remote-tracking branch 'upstream/master' into stability-analysis

e6bdae1

atomb requested review from keyboardDrummer and robin-aws April 19, 2022 23:01

atomb added 4 commits April 19, 2022 16:02

Remove merge artifact

b9d7236

Add test for /vcsStabilityIterations

891bb15

Remove errant comment prefix

b3ce16f

Fix MaxKeepGoingSplits test

ba6df6c

keyboardDrummer reviewed Apr 20, 2022

View reviewed changes

atomb added 2 commits April 20, 2022 10:49

Simplify field initializers

368261a

atomb requested a review from keyboardDrummer April 20, 2022 18:11

Remove splitIdx parameters; assume /randomSeed:0

9289de1

More deterministic random seeds per split

c522530

/vcsStabilityIterations -> /randomSeedIterations

5d4be65

Avoid duplicating lists of AssertCmd

a1423fc

atomb changed the title ~~Add /vcsStabilityIterations flag for verification stability analysis~~ Add /randomSeedIterations flag for verification stability analysis Apr 21, 2022

keyboardDrummer previously approved these changes Apr 22, 2022

View reviewed changes

Apply wording change from @keyboardDrummer

8c1e2a7

atomb dismissed keyboardDrummer’s stale review via 8c1e2a7 April 22, 2022 15:32

atomb requested a review from keyboardDrummer April 22, 2022 15:34

keyboardDrummer approved these changes Apr 22, 2022

View reviewed changes

Merge branch 'master' into stability-analysis

66c08a4

atomb mentioned this pull request May 2, 2022

Bump version to 2.14.2 and revert https://github.com/boogie-org/boogie/pull/561/files #570

Merged

Merge branch 'master' into stability-analysis

f33cd49

robin-aws merged commit ca37298 into boogie-org:master May 2, 2022

atomb deleted the stability-analysis branch January 4, 2024 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `/randomSeedIterations` flag for verification stability analysis #567

Add `/randomSeedIterations` flag for verification stability analysis #567

atomb commented Apr 19, 2022 •

edited

Loading

keyboardDrummer commented Apr 20, 2022

keyboardDrummer Apr 20, 2022

atomb Apr 20, 2022

keyboardDrummer Apr 20, 2022 •

edited

Loading

atomb Apr 20, 2022

atomb Apr 20, 2022

keyboardDrummer Apr 20, 2022 •

edited

Loading

atomb Apr 20, 2022

keyboardDrummer Apr 20, 2022

atomb Apr 20, 2022

atomb Apr 20, 2022

robin-aws Apr 20, 2022

atomb Apr 20, 2022

keyboardDrummer Apr 20, 2022 •

edited

Loading

atomb Apr 20, 2022

atomb Apr 20, 2022

keyboardDrummer Apr 21, 2022 •

edited

Loading

atomb Apr 21, 2022

atomb Apr 21, 2022

atomb Apr 21, 2022

atomb commented Apr 20, 2022

atomb commented Apr 20, 2022

robin-aws commented Apr 20, 2022

atomb commented Apr 20, 2022

robin-aws commented Apr 20, 2022

atomb commented Apr 20, 2022

keyboardDrummer Apr 21, 2022

		@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit)
		/// </summary>
		private void Setup(Program prog, ProverContext ctx, Split split = null)

Add /randomSeedIterations flag for verification stability analysis #567

Add /randomSeedIterations flag for verification stability analysis #567

Conversation

atomb commented Apr 19, 2022 • edited Loading

keyboardDrummer commented Apr 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atomb commented Apr 20, 2022

atomb commented Apr 20, 2022

robin-aws commented Apr 20, 2022

atomb commented Apr 20, 2022

robin-aws commented Apr 20, 2022

atomb commented Apr 20, 2022

Choose a reason for hiding this comment

Add `/randomSeedIterations` flag for verification stability analysis #567

Add `/randomSeedIterations` flag for verification stability analysis #567

atomb commented Apr 19, 2022 •

edited

Loading

keyboardDrummer Apr 20, 2022 •

edited

Loading

keyboardDrummer Apr 20, 2022 •

edited

Loading

keyboardDrummer Apr 20, 2022 •

edited

Loading

keyboardDrummer Apr 21, 2022 •

edited

Loading