Add an abstract interpreter 'history' that gives (local) per-path analysis and / or loop unwinding #5404

martin-cs · 2020-07-01T21:52:08Z

In the past users have had to modify the program before analysis to get this kind of precision. Doing so is expensive in terms of memory for both the program and the analysis. This gives a more efficient route. The is an equivalent of an unwinding limit that controls the number of histories that can be generated for each location. This is a better measure as it should scale memory linearly and time linearly(ish) while loop unwinding can have seriously non-linear effects when there are nested loops.

This is the enabling technology for a "loop unwinding" analysis that can automatically over-approximate loop bounds. It is also probably the last "core" change to the AI infrastructure before VSD can be merged in full.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2020-07-02T02:05:14Z

Codecov Report

Merging #5404 into develop will increase coverage by 0.01%.
The diff coverage is 80.43%.

@@             Coverage Diff             @@
##           develop    #5404      +/-   ##
===========================================
+ Coverage    68.23%   68.25%   +0.01%     
===========================================
  Files         1178     1180       +2     
  Lines        97588    97720     +132     
===========================================
+ Hits         66589    66695     +106     
- Misses       30999    31025      +26

Flag	Coverage Δ
#cproversmt2	`42.80% <71.42%> (+<0.01%)`	⬆️
#regression	`65.42% <80.43%> (+0.02%)`	⬆️
#unit	`32.23% <71.42%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/analyses/ai.h	`57.27% <0.00%> (-0.53%)`	⬇️
src/analyses/call_stack_history.h	`89.47% <ø> (ø)`
src/goto-analyzer/goto_analyzer_parse_options.h	`100.00% <ø> (ø)`
src/analyses/local_control_flow_history.cpp	`70.27% <70.27%> (ø)`
src/analyses/local_control_flow_history.h	`89.65% <89.65%> (ø)`
src/analyses/ai.cpp	`73.42% <100.00%> (-0.19%)`	⬇️
src/analyses/ai_history.cpp	`15.38% <100.00%> (+15.38%)`	⬆️
src/analyses/ai_history.h	`75.86% <100.00%> (ø)`
src/analyses/call_stack_history.cpp	`70.37% <100.00%> (+1.13%)`	⬆️
src/goto-analyzer/goto_analyzer_parse_options.cpp	`69.78% <100.00%> (+2.46%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d30335...e30da66. Read the comment docs.

martin-cs · 2020-07-07T12:10:48Z

Please can someone with access to the AWS CodeBuild outputs let me know what the problem is? Last time it was @chrisr-diffblue or @thk123 who could do this...

chrisr-diffblue · 2020-07-07T14:15:00Z

Looks like all of the failed AWS codebuild jobs failed really early due to infrastructure problems (failed to apt-get install dependencies). I've restarted all those failed jobs.

martin-cs · 2020-07-07T14:18:00Z

Thanks @chrisr-diffblue they are listed as "failed" rather than "pending" for me at the moment. I'll give them a minute...

martin-cs · 2020-07-07T14:54:42Z

These still look failed to me, plus the code coverage numbers are kinda wacky.

chrisr-diffblue · 2020-07-07T16:38:46Z

Unfortunately, it looks like although the restarted jobs seem to have passed, they haven't posted status back to github, sigh. You might have to go for the rebase/dummy commit + force push to make github start new jobs.

smowton · 2020-07-07T17:39:18Z

@chrisr-diffblue IIRC updating Github status after a rerun is a Codebuild option, @rjmunro knows more

rjmunro · 2020-07-07T18:58:43Z

@smowton @chrisr-diffblue Yes, there is an option "Report build statuses to source provider when your builds start and finish" in the "Edit" -> "Source" menu, hidden in "Additional configuration (Git clone depth, Git submodules)" (even though it is neither of those things). Without this, codebuild will report status to github only where the build was triggered by a github webhook. I can see it's not turned on, but I'd rather not change the setting myself as I'm not involved in this project. I'm happy to do it if someone officially asks me to.

Finding that setting was a huge win for us because it let us trigger codebuild jobs from other codebuild jobs, rather than from webhooks. This meant we could build a CI pipeline.

thk123 · 2020-07-09T09:44:40Z

@rjmunro I'm happy to do it - any reason to not have this on? Seems like a no-brainer...

rjmunro · 2020-07-13T08:56:54Z

@thk123 The guess the main reason not to have it on would be if you were testing your codebuild config and didn't want failures to block PRs or something, but in general, I would turn it on.

martin-cs · 2020-07-13T10:39:34Z

Thanks for the help all but ... I still can't see what is actually causing the failures. I'd really like to get this PR moving and I'm keen to fix the issues but without the info...

thk123 · 2020-07-16T08:45:54Z

@martin-cs as Chris said - the failures seem spurious - you should just be able to rebase this and push to retrigger the builds. I'll try and get the updated status thing working today.

martin-cs · 2020-07-16T14:51:12Z

Thank you @thk123 that seems to have done the trick.

Do any of the maintainers have time for a review of this PR?

smowton

Mostly lgtm, some requests for clarity, particularly clarifying either by naming a pass-through function or using a named nullptr when histories are being omitted because they're irrelevant, and when they're being omitted due to a strategic choice to ignore history and merge all states.

smowton · 2020-07-20T09:06:38Z

src/analyses/ai.cpp

    p_call,
    calling_function_id,
    l_return,
+    nullptr, // Not needed as we are skipping the function call


This and other nullptr args are pretty opaque -- how about #define CALLER_UNSPECIFIED nullptr?

I don't really agree with using a define for this. How about constexpr whatever* UnspecifiedCaller = nullptr;?

Added to ai_historyt as a member variable as @hannes-steffenhagen-diffblue suggests.

( Unfortunately constexpr doesn't actually work because ai_history_baset has a virtual destructor but a static const variable does. )

smowton · 2020-07-20T09:07:06Z

src/analyses/ai.cpp

      p_call,
      callee_function_id,
      l_begin,
+      nullptr,


Why is this one not given?

Because it is only used on function call return edges, this is a call, so p_call already has the caller info.

smowton · 2020-07-20T09:07:54Z

src/analyses/ai.h


-    auto step_return = p->step(n, *(storage->abstract_traces_before(n)));
+    auto step_return =
+      p->step(n, *(storage->abstract_traces_before(n)), nullptr);


Does the nullptr here imply we're conflating some states that we could potentially distinguish?

Good question but in this case no. The step is local so the history is not needed. I have added a comment to this effect.

( There is a separate conversation about whether such an interface is even meaningful w.r.t. function calls and goto instructions. )

smowton · 2020-07-20T09:14:55Z

src/analyses/local_control_flow_history.h

+    auto tmp = std::dynamic_pointer_cast<const local_control_flow_historyt<
+      track_forward_jumps,
+      track_backward_jumps>>(in);
+    PRECONDITION(tmp != nullptr);


POSTCONDITION?

I think it is a precondition of down_cast that this conversion works. To quote the comment above caller's responsibility to ensure correct types.

smowton · 2020-07-20T09:15:56Z

src/analyses/local_control_flow_history.h

+  std::shared_ptr<const local_control_flow_historyt<
+    track_forward_jumps,
+    track_backward_jumps>>
+  down_cast(trace_ptrt in) const


rename to_local_control_flow_history?

smowton · 2020-07-20T09:22:03Z

src/analyses/local_control_flow_history.cpp

+}
+
+bool local_control_flow_decisiont::
+operator<(const local_control_flow_decisiont &op) const


Looks like you intentionally want more informative orderings to be lower in the sort order? If so say so up top

Ummm... I am not entirely sure I have an idea of what constitutes informative in this context. Where the choice of ordering matters (both in this and in local_control_flow_historyt<track_forward_jumps, track_backward_jumps>:: operator<) it is commented and in one case checked by assertion (if there is a merge history it should sort last). Specifically this includes:

// Lower locations first, generally this means depth first // Branch not taken takes priority // This prioritises complete histories over merged ones

smowton · 2020-07-20T09:23:44Z

src/analyses/local_control_flow_history.cpp

+  return false;
+}
+
+// Instantiate the versions we need


Being all possible versions :)

That too! (although now I think about it maybe the version that doesn't track either isn't actually used... )

smowton · 2020-07-20T09:25:07Z

src/goto-analyzer/goto_analyzer_parse_options.cpp

    }
+    else if(cmdline.isset("loop-unwind"))
+    {
+      options.set_option("local-control-flow-history", true);


Considering there's some trickiness here, perhaps print a line describing the options you've inferred from the higher-level options provided

The options that are set are internal only and are intended to make the handling of these options simpler. I am reluctant to print anything here because the user can't set any of these options directly.

smowton · 2020-07-20T09:25:31Z

src/goto-analyzer/goto_analyzer_parse_options.cpp

+        "local-control-flow-history-limit", cmdline.get_value("loop-unwind"));
+      options.set_option("history set", true);
+    }
+    else if(cmdline.isset("branching"))


The else-ifs here imply --loop-unwind --branching won't work as expected, right? If so check for this case.

That is correct and in part why the combined option is known as --loop-unwind-and-branching. I am reluctant to address this issue in this PR because it affects almost all options to goto-analyze`, all of the major options have an implicit ordering and check for absence but not duplication.

hannes-steffenhagen-diffblue · 2020-07-31T08:39:41Z

src/analyses/ai.cpp

  trace_ptrt p,
  const irep_idt &to_function_id,
  locationt to_l,
+  trace_ptrt caller_history,


I do have a pretty strong preference for not using nullptr to indicate optional arguments – this isn't obvious from the signature, and frequently makes the logic inside more complicated if you need to disambiguate between pointers you're allowed to dereference and ones you aren't...

Also, are these trace_ptrts all shared_ptr? In that case it'd seem a bit wasteful to me, unless these are stored somewhere (I can't quite see it, maybe somewhere further down in the call stack)?

I have renamed as you suggested earlier. I hope the invariants in ai_history.h show when the optional argument is present. step kind of has to be a case switch because of the different types of instruction.

Yes, they are all shared_ptrs. They are stored in something that implements the ai_storaget interface, allowing you to store states and histories with varying degrees of sensitivity (per history, per location, soon per function and just once). It's not great but I had a solution with unique_ptr but it involved way too many copies.

martin-cs · 2020-08-03T16:11:12Z

@smowton and @hannes-steffenhagen-diffblue I have responded to and (mostly) implemented your review comments. They are two separate commits at the moment for simplicity. If you are happy I will squash them into the relevant earlier commits.

martin-cs · 2020-08-10T10:06:50Z

@smowton and @hannes-steffenhagen-diffblue sorry to hassle but please can you have a look at the changes I have made and see if they address the issues you raised?

smowton · 2020-08-11T19:37:02Z

I stand by my green tick :) (meaning I'm happy with this as-is but have suggestions)

smowton · 2020-08-11T19:37:55Z

Apologies for slowness btw, was on holiday the past week

martin-cs · 2020-08-12T09:44:43Z

@smowton Thanks! Hope you had a nice holiday and sorry for hassling.

@hannes-steffenhagen-diffblue @danpoe @peterschrammel @chrisr-diffblue Any chance you could help with a second green tick? This is kind of blocking for me and the next PR is something that might make your lives easier...

chrisr-diffblue

A couple of extremely minor nitpicks, but nothing blocking - apologies for the massive delay in me getting around to reviewing....

src/analyses/call_stack_history.cpp

src/analyses/local_control_flow_history.cpp

This is similar to a three-way merge in that it allows histories to not only use their state and histories in the next location but also to access the history at the call site. Doing so makes it possible to create "function local" history where all call-sites merge and drop their previous history but it is picked back up on function return. This is needed for managing scalability in some of the more involved histories.

Track every forwards and / or backwards jump within a function. This can give the same effect as loop unrolling without modifying the program. It can also give lazy merging of paths through if statements. For performance reasons it is not interprocedural.

These allow the user to track just forwards, just backwards or both kinds of jump. This gives per-path analysis (up to the limit of number of times visiting each location), loop unwinding or both.

martin-cs · 2020-08-12T16:25:45Z

I have addressed @chrisr-diffblue comments ( thanks for the review and no worries about the lag ) and squashed the changes in. I will merge once CI has passed, @hannes-steffenhagen-diffblue shout now if you aren't happy with how your comments have been resolved.

martin-cs requested review from chrisr-diffblue, danpoe, hannes-steffenhagen-diffblue, peterschrammel, smowton and thk123 as code owners July 1, 2020 21:52

martin-cs force-pushed the feature/ai-local-history branch from 162cd4f to ff6e64c Compare July 2, 2020 00:58

martin-cs force-pushed the feature/ai-local-history branch from ff6e64c to 9498bfd Compare July 3, 2020 20:08

martin-cs force-pushed the feature/ai-local-history branch from 9498bfd to 3407f45 Compare July 16, 2020 09:59

smowton approved these changes Jul 20, 2020

View reviewed changes

hannes-steffenhagen-diffblue reviewed Jul 31, 2020

View reviewed changes

martin-cs force-pushed the feature/ai-local-history branch from 3407f45 to afd7980 Compare August 3, 2020 16:09

martin-cs force-pushed the feature/ai-local-history branch from afd7980 to 1f36c23 Compare August 4, 2020 10:04

chrisr-diffblue approved these changes Aug 12, 2020

View reviewed changes

src/analyses/call_stack_history.cpp Outdated Show resolved Hide resolved

src/analyses/local_control_flow_history.cpp Outdated Show resolved Hide resolved

martin added 4 commits August 12, 2020 17:19

Add three new history options that use local_control_flow_historyt

ee6c32e

These allow the user to track just forwards, just backwards or both kinds of jump. This gives per-path analysis (up to the limit of number of times visiting each location), loop unwinding or both.

Tests for the new functionality in goto-analyzer

e30da66

martin-cs force-pushed the feature/ai-local-history branch from 1f36c23 to e30da66 Compare August 12, 2020 16:19

martin-cs merged commit 2bd790d into diffblue:develop Aug 13, 2020

Add an abstract interpreter 'history' that gives (local) per-path analysis and / or loop unwinding #5404

Add an abstract interpreter 'history' that gives (local) per-path analysis and / or loop unwinding #5404

Uh oh!

Conversation

martin-cs commented Jul 1, 2020

Uh oh!

codecov bot commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

martin-cs commented Jul 7, 2020

Uh oh!

chrisr-diffblue commented Jul 7, 2020

Uh oh!

martin-cs commented Jul 7, 2020

Uh oh!

martin-cs commented Jul 7, 2020

Uh oh!

chrisr-diffblue commented Jul 7, 2020

Uh oh!

smowton commented Jul 7, 2020

Uh oh!

rjmunro commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thk123 commented Jul 9, 2020

Uh oh!

rjmunro commented Jul 13, 2020

Uh oh!

martin-cs commented Jul 13, 2020

Uh oh!

thk123 commented Jul 16, 2020

Uh oh!

martin-cs commented Jul 16, 2020

Uh oh!

smowton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-cs Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 2, 2020 •

edited

Loading

rjmunro commented Jul 7, 2020 •

edited

Loading

martin-cs Aug 3, 2020 •

edited

Loading

smowton commented Aug 11, 2020 •

edited

Loading