-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add /randomSeedIterations
flag for verification stability analysis
#567
Conversation
We want to be able to check one split with multiple checkers, and this fixed link between Checker and Split is easy to remove.
This doesn't yet include the logic to ensure that each attempt uses a different random seed, which will be necessary for real stability analysis.
As long as /randomSeed is provided. Otherwise, do no randomization.
This makes it easier to keep track of what’s going on when producing multiple results from a single split.
Previously, some would occasionally be lost.
Is it any use without /randomSeed ?
Shouldn't it always do that, even if no /randomSeed is specified ? |
Source/VCGeneration/Checker.cs
Outdated
@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit) | |||
/// </summary> | |||
private void Setup(Program prog, ProverContext ctx, Split split = null) | |||
{ | |||
SolverOptions.RandomSeed = split?.RandomSeed ?? Options.RandomSeed; | |||
if (Options.VcsStabilityIterations < 2 || SolverOptions.RandomSeed == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the Options.VcsStabilityIterations < 2
?
Would SolverOptions.RandomSeed ??= split?.RandomSeed ?? Options.RandomSeed
not work?
Consider not checking Options.VcsStabilityIterations
:
private void Setup(Program prog, ProverContext ctx, Split split = null)
{
if (SolverOptions.RandomSeed == null) {
SolverOptions.RandomSeed = split?.RandomSeed ?? Options.RandomSeed
} else {
SolverOptions.RandomSeed = new Random(SolverOptions.RandomSeed.Value).Next();
}
var random = SolverOptions.RandomSeed == null ? null : new Random(SolverOptions.RandomSeed.Value);
Program = prog;
..
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time I wrote it, I did exactly that, but it makes it impossible to use different random seeds for different methods using the {:random_seed N}
attribute (which we fortunately have a test to check!).
Source/VCGeneration/Checker.cs
Outdated
@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit) | |||
/// </summary> | |||
private void Setup(Program prog, ProverContext ctx, Split split = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this code leads to nondeterministic behavior when you're running multiple splits in parallel and multiple stability iterations. The splits can share the same checker so they might influence each other's random seeds. That might not be a problem since we're doing stability analysis so determinism is not the goal, but not being able to get reproducible output might be concerning to some.
If you want deterministic behavior, update the RandomSeed of the Split on each iteration, and set SolverOptions.RandomSeed
using that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting idea. I'll look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, I think that this approach would have the same issue that the comment above does: methods explicitly annotated with {:random_seed N}
would instead have a random seed that would evolve over time (though in a predictable way). As it's currently written, the non-determinism you mention would only occur in the case where /vcsStabilityIterations
is specified in which case I think it's at least arguably desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment, but still I believe that any CLI behavior which is significant, such as whether a Split has verification failures on some iterations, preferably is reproducible by calling the CLI again with the same arguments. I think you could achieve that with minor changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right. I came up with something that should achieve that while having slightly simpler logic overall.
@@ -524,14 +525,16 @@ void ObjectInvariant() | |||
Contract.Invariant(cce.NonNullElements(vcResults)); | |||
} | |||
|
|||
public readonly List<Counterexample> /*!>!*/ examples = new List<Counterexample>(); | |||
public readonly List<VCResult> /*!>!*/ vcResults = new List<VCResult>(); | |||
public readonly ConcurrentQueue<Counterexample> /*!>!*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, this line cutoff is so narrow! What's the scenario in which we want 80 chars maximum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often view code in an 80-character terminal, though I don't have a strong preference about that. But the main reason for this is that I originally wrote it as a diff on top of the code that existed before you joined that line. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by simplifying the initialization expressions to that they're on one <80 character line. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the RHS of these statements just new ();
, which saves quite a few characters!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what I did!
public readonly ConcurrentQueue<Counterexample> /*!>!*/ | ||
examples = new ConcurrentQueue<Counterexample>(); | ||
public readonly ConcurrentQueue<VCResult> /*!>!*/ | ||
vcResults = new ConcurrentQueue<VCResult>(); | ||
|
||
public override void OnCounterexample(Counterexample ce, string /*?*/ reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing counterexamples from different iterations might be confusing. Can we avoid that?
I already find the current structure a bit confusing:
record VerificationResult(List<Counterexample> Errors, List<VCResult> VCResults);
record VCResult(List<Counterexample> Counterexamples, List<AssertCmd> Asserts, TimeSpan RunTime);
There's two lists of Counterexample in there!
I'd prefer to be more explicit:
record VerificationResult(List<SplitResult> SplitResults);
record SplitResult(List<AssertCmd> Asserts, List<IterationResult> IterationResult);
record IterationResult(List<Counterexample> Counterexamples, TimeSpan RunTime);
With the option of later renaming Split
and SplitResult
to AssertionBatch
and AssertionBatchResult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. I'll give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the implementation is a bit more intricately tied to that design than I expected. In particular, there's a notion of "recycled failing assertions" that I don't fully understand but that depends on counterexamples that aren't tied to our current notion of splits. Houdini also has some complex interactions with this list of counterexamples.
So, although I'd love to get rid of that duplicated list of counterexamples, I think it'll take a good bit more investigation to figure out how to do that without breaking things.
When using /vcsStabilityIterations
, this does mean that errors (including counterexamples) will be reported more than once. However, after thinking about it a bit more, I think that's the right thing to do. Different iterations might produce different counterexamples, and those differences could be informative.
It is a bit unfortunate that all of these counterexamples are aggregated into a single list. However, it was already the case that counterexamples from different splits were aggregated into that list. So, although this PR doesn't fix that issue, I'd argue that it doesn't really make it worse, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using /vcsStabilityIterations, this does mean that errors (including counterexamples) will be reported more than once. However, after thinking about it a bit more, I think that's the right thing to do.
When we previously looked at Excel reports to analyse Dafny verification stability, one of the key indicators we looked at was % of iterations that had errors. I think it would be good to be able to report the # of errors per iteration. Wouldn't concatenating errors from all iterations in one list prevent that?
Also, I don't see a use-case for showing errors including the detailed error message when using /vcsStabilityIterations
. However, if you do it, then wouldn't you want to remove duplicates?
How about we leave the list of counterexamples in VerificationResult
that is operated on by Houdini and RecycledFailedAssertions, but update the list that's only tied to Splits?
record VerificationResult(List<Counterexample> Counterexamples, List<SplitResult> SplitResults); // here we leave in the existing list.
record SplitResult(List<AssertCmd> Asserts, List<IterationResult> IterationResult);
record IterationResult(List<Counterexample> Counterexamples, TimeSpan RunTime);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each VCResult
has an outcome, so it'll still be easy to determine what % have errors. Any client that doesn't want the aggregate list of counterexamples can ignore it, and just use the shorter, nested lists. That aggregate list exists only for backward compatibility, and I hope we can eventually remove it, I just don't yet fully understand how to do so without breaking some existing functionality.
When it comes to what Boogie prints, my mental model of the feature, and the one I can imagine most easily communicating, is that it's like running Boogie multiple times, but interleaved. And with that model, seeing the same error multiple times makes sense.
I'm not 100% sure I understand your last comment. If what you're suggesting is that we package the aggregate list of counterexamples in the top-level VerificationResult
, so it's not hanging out on its own, I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, I'm not quite sure what you mean by "update the list that's only tied to Splits".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, VerificationResult
already includes the list of counterexamples, so there's nothing to do there, unless you had something different in mind with that last comment.
I haven't created a distinction between VCResult
/SplitResult
and IterationResult
. Although doing so would make the data type slighlty more intuitive, it would make most of the clients of that data type more complex without a clear benefit. Grouping a list of VCResult
by the vcNum
field can achieve the same outcome, in cases where it's useful. I did just make a change to avoid duplicating identical lists of AssertCmd
between VCResults
, however.
For debugging, it can be somewhat useful to remove the randomness factor. It is somewhat unfortunate that the current command-line option structure makes that the default, though.
Arguably, yes. Boogie would have to invent a random seed in that case, but either a fixed or random value could be reasonable. |
This is in preparation for ultimately having all counterexamples live in the `VCResult` record, rather than in a separate list in a `VerificationResultCollector` instance. It’ll take some more serious refactoring to make that work, though.
I've now adapted it to assume |
Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way? I'm wondering if it would be clearer and more accurate to just call this option |
That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too? |
Definitely, I think there's tons of different ways of perturbing the verification across different runs. If the option is named something more general like I could see this getting complicated enough that we start to want a configuration file of some kind. If we think that it's worth the efficiency gain to have a single Boogie process do multiple verification runs, the downside is we almost need a DSL to describe what an external script would have done with multiple calls to Boogie instead. |
Ah, right! At first I thought you were concerned that the original name was too narrow, but you were concerned that it was too broad. I agree, and I've changed it to |
/vcsStabilityIterations
flag for verification stability analysis/randomSeedIterations
flag for verification stability analysis
Attempt to prove each VC n times with n random seeds. If | ||
/randomSeed has been provided, each proof attempt will use | ||
a new random seed derived from this original seed. If not, | ||
it will implicitly assume /randomSeed:0 to ensure difference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider saying use
instead of assume
This PR adds a new flag,
/randomSeedIterations:<n>
, which instructs Boogie to send each verification condition to the SMT solvern
times, with the goal of analyzing the stability of proof across thosen
iterations. When used with/vcsCores:<n>
, the iterations can be evaluated in parallel.From the user perspective, the output of
/trace
and/xml
now includes iteration numbers along with assertion batch indices.Fixes #494.
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.