Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the shape abstract domain and to memory safety analysis #120

Merged
merged 70 commits into from
Aug 13, 2018

Conversation

viktormalik
Copy link
Collaborator

Multiple improvements and bug fixes mainly related to dynamic memory analysis. Changes are mostly related to the paper submitted to FMCAD'18.
Main contributions are:

  1. Analysis of pointer and free safety for abstract dynamic objects.
  2. Splitting abstract dynamic objects into multiple when it is needed for a sound analysis.
  3. Added examples that need combined reasoning about shape and value properties.

This is useful when analysing pointer and  memory safety, respectively.
If there is and incompatibility between the type of the template row expression
and the address of pointed object, use typecast.
This is currently used when dealing with CPROVER-specific pointer variables,
which are of (void *) type. Hence, we allow incompatible type returned
by the solver for these variables only.
When adding paths from other row, remove the other row as a destination if it
has been added to another path.
When clearing pointed rows, do not clear itself and remove poining row from
the pointed_by set.
This information is used to prevent creation of PHI nodes for dynamic objects
for 'if' branches in case where the object allocation occurs in one incoming
branch only.
… condition

This prevents from getting non-deterministic values of dynamic object fields
incoming from other branches (where the dynamic object was not allocated) during
analysis.
…ignment

This is needed for correct RHS concretisation.
This test demonstrates the power of the combination of pointer and interval
domains.
…object

The concrete object has "$co" suffix and there is always only one allocated.
It is created non-deterministically, but with the condition that no other
pointer in the loop points to this object (i.e. it has not been allocated
before).
…n template

These guards are used in the post-condition of template rows of the heap domain
that contain members of dynamic objects in the same loop where the object was
allocated. This is required to make sure that the object is analysed only when
it is created.
Modify assertions for dereferencing deallocated objects to perform checks on
concrete objects only. Since a concrete object allocated within a loop has a
non-deterministic position within the abstract object, this approach is sound.
…ression

A pair is stored as an "and" expression. This is used to track dependency
between a pair of pointer variables.
… pointers

This is useful to prove that a deallocated object is never dereferenced. Since
the assertions checks for this property are done on conrete objects only, we
need to prove that once a concrete object was deallocated, it cannot be reached
in the next iteration.
It shows that tracking the points-to relation is enough since we use abstract
objects and a path is expressed by field of an abstract object pointing to the
same object.
Tests check pointer safety for linked lists.
This version of the test did not previously work because of a bug related to
symbolic dereference objects occuring on both sides of an assignment.
…ides

The bug was related to finding the symbol at LHS (since there may be multiple
chained members and we need to get to the root object).
…s reachable

Loop guards collected in the SSA form are pairs (loop-select and loop head
guards). When checking the solver for symbolic path, both guards must be true
to set the appropriate loop-select guard in the sympath.
GOTO instructions having "false" as guard are removed since some of them were
backwards jumps treated as loops.
Previously, we set free::record to true, which in case of multiple frees in
a loop caused just the last free to be recorded into the invariant.
Now, we let free::record to be undefined, but instead update template rows for
__CPROVER_deallocated only if one of free::record in the loop is true.
Currently, this is a part of heap-interval domain. This should be refactored to
heap-polyhedra domain in future.
Disabling 'packet_filter' since it takes more than 500 seconds.
Moving failing version of running example from 'heap' to 'heap-data'.
Now, the format is:
malloc_value = cond1 ? (void *)&dyn_obj1 : (cond2 ? (void *)&dyn_obj2 ....)
Also use read_rhs when collecting allocation guards to obtain correct versions
of variables occuring in guards (this is needed when dealing with #co objects
since their guards contain other pointers).
Multiple tests end inconclusive due to dynamic object splitting.
For more detail see 6f8efca.
This strenghtens the condition for not creating a PHI node for a dynamic object
field when merging branches of a conditional in case the object was allocated
in one branch only.
We cannot iterate container while changing it.
We cannot iterate container while changing it.
The renaming commit probably forgot to delete the fmcad18 directory.
@@ -1,4 +1,4 @@
CORE
KNOWNBUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these fail now? Please create an issue for them so that the bug can be fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fail because we split abstract dynamic objects into multiple and such abstraction is not strong enough to reason about these programs. Should I create an issue for a stronger shape domain?

}

}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious newline (several occurrences)

@@ -1745,6 +1804,8 @@ void twols_parse_optionst::help()
" --zones use zone domain\n"
" --octagons use octagon domain\n"
" --heap-interval use heap domain with interval domain for values\n" // NOLINT(*)
" --heap-zones use heap domain with zones domain for values\n" // NOLINT(*)
" --heap-zones use heap domain with dynamically incrementing stregth of value domain\n" // NOLINT(*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--heap-values-incremental?
stregth -> strength

@@ -37,7 +38,8 @@ class optionst;
"(version)" \
"(i386-linux)(i386-macos)(i386-win32)(win32)(winx64)(gcc)" \
"(ppc-macos)(unsigned-char)" \
"(havoc)(intervals)(zones)(octagons)(equalities)(heap)(heap-interval)"\
"(havoc)(intervals)(zones)(octagons)(equalities)"\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put each option on a separate line to simplify merging in future

bool strategy_solver_heapt::is_cprover_symbol(const exprt &expr)
{
return expr.id()==ID_symbol &&
id2string(to_symbol_expr(expr).get_identifier()).find("__CPROVER_")==0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use has_prefix() and CPROVER_PREFIX


// Do not include CPROVER symbols
if(lhs.id()==ID_symbol &&
id2string(to_symbol_expr(lhs).get_identifier()).find("__CPROVER")!=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -0,0 +1,318 @@
/*******************************************************************\

Module: Analysis of a number of instances of abstract dynamic objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number ?

@@ -0,0 +1,192 @@
/*******************************************************************\

Module: Analysis of a number of instances of abstract dynamic objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -1,5 +1,6 @@
DIRS = heap nontermination termination kiki \
preconditions interprocedural invariants
DIRS = nontermination termination kiki \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put every directory on a separate line

@@ -0,0 +1,6 @@
CORE
main.c
--heap-values-incremental --sympath --inline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure -incremental is a good name, maybe better -refine? (or add a --refine-domains option -- see below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name to -refine. I suggest we leave implementation of a more generic solution with --refine-domains switch to Matej as a part of his work on generic domain combinations. This is closely related to it and I think it should be designed all together.

}

if(cmdline.isset("horn-encoding"))
while(!finished)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must not be implemented here, but inside summary_checker_ait. I'd generalise this right away by adding a --refine-domains option.

if(options.get_bool_option("pointer-check"))
{
allow_record_malloc(goto_model);
// allow_record_free(goto_model);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out? Use #if 0 plus a comment to justify the deactivation or remove this line.

@@ -1243,6 +1286,21 @@ bool twols_parse_optionst::process_goto_program(
inline_main(goto_model);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird indent in the lines above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any bad indentation

bool found=false;
for(const auto &i : must_alias_relations)
{
unsigned long n;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t? (further instances below)

\*******************************************************************/
class must_alias_setst:public union_find<exprt>
{
protected:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the public section to come first.

friend class dynobj_instance_domaint;
};


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superfluous blank line


void set_var_always_to_true(
goto_modelt &goto_model,
bool (*name_cond)(std::string &))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::function preferred

@@ -10,6 +10,7 @@ Author: Daniel Kroening, kroening@kroening.com
#define CPROVER_2LS_SSA_SSA_VALUE_SET_H

#include <analyses/ai.h>
#include <iostream>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unused, please remove

… domains.

Rename --heap-values-incremental to --heap-values-refine.
Move refinement into summary_checker_ai.
1. Move long and utility functions into dedicated modules
2. Add braces for multi-line statements and multi-line conditionals
3. Remove commented code
4. Use better types (size_t instead of unsigned)
5. Replace function pointer by std::function
Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nitpicks

" --heap-interval use heap domain with interval domain for values\n" // NOLINT(*)
" --sympath compute invariant for each symbolic path (only usable with --heap-interval switch)" // NOLINT(*)
" --heap-interval use heap domain with interval domain for"
"values\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd align values with use in the output

@@ -1744,8 +1773,14 @@ void twols_parse_optionst::help()
" --heap use heap domain\n"
" --zones use zone domain\n"
" --octagons use octagon domain\n"
" --heap-interval use heap domain with interval domain for values\n" // NOLINT(*)
" --sympath compute invariant for each symbolic path (only usable with --heap-interval switch)" // NOLINT(*)
" --heap-interval use heap domain with interval domain for"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n at the end of the line

" --sympath compute invariant for each symbolic path (only usable with --heap-interval switch)" // NOLINT(*)
" --heap-interval use heap domain with interval domain for"
"values\n"
" --heap-zones use heap domain with zones domain for values"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n at the end of the line

"values\n"
" --heap-zones use heap domain with zones domain for values"
"\n"
" --heap-values-refine use heap domain with a dynamically"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n at the end of the line

" --heap-zones use heap domain with zones domain for values"
"\n"
" --heap-values-refine use heap domain with a dynamically"
"refinement of the strength of the value domain\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align output

"\n"
" --heap-values-refine use heap domain with a dynamically"
"refinement of the strength of the value domain\n"
" --sympath compute invariant for each symbolic path"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n at the end of the line

" --heap-values-refine use heap domain with a dynamically"
"refinement of the strength of the value domain\n"
" --sympath compute invariant for each symbolic path"
"(only usable with --heap-interval switch)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align output

@peterschrammel peterschrammel merged commit 6aa1f00 into diffblue:master Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants