33
44# Folder util
55
6- \author Martin Brain, Owen Jones
6+ \author Martin Brain, Owen Jones, Chris Smowton
77
88\section data_structures Data Structures
99
@@ -12,17 +12,57 @@ CPROVER codebase.
1212
1313\subsection irept_section irept: a 4-tuple (data, named-sub, comments, sub)
1414
15- See documentation at \ref irept.
15+ See detailed documentation at \ref irept.
1616
17- As that documentation says, [ irept] (\ref irept)s are generic tree nodes. You
18- should think of them as having a single string ([ data] ( irept::data ) , actually
19- an \ref irep_idt) and lots of child nodes, some of which are numbered
17+ [ irept] (\ref irept)s are generic tree nodes. You
18+ should think of each node as holding a single string ([ data] ( irept::data ) ,
19+ actually an \ref irep_idt) and lots of child nodes, some of which are numbered
2020([ sub] (\ref irept::dt::sub)) and some of which are labelled, and the label
2121can either start with a “\# ” ([ comments] (\ref irept::dt::comments)) or without
2222one ([ named_sub] (\ref irept::dt::named_sub)). The meaning of the “\# ” is that
2323this child should not be considered important, for example it shouldn't be
2424counted when comparing two [ irept] (\ref irept)s for equality.
2525
26+ They are used to represent all manner of structured objects throughout the
27+ CPROVER codebase, such as expressions, types and code. An \ref exprt represents
28+ expressions, including for example \ref equal_exprt, an equality predicate, or
29+ \ref dereference_exprt, which represents the ` * ` unary dereference operator
30+ found in C. A \ref typet represents a type, and may have other \ref typet nodes
31+ as children: for example, \ref array_typet represents an array, and has a single
32+ numbered child that gives the type of the array elements. Finally, \ref codet
33+ represents imperative code, such as \ref code_assignt, which represents an
34+ imperative assignment. It has two numbered operands, its left- and right-hand
35+ sides.
36+
37+ The implementation of \ref irept allows saving memory by sharing instances of
38+ its internal storage using a
39+ [ copy-on-write] ( https://en.wikipedia.org/wiki/Copy-on-write ) scheme. For
40+ example, the statement ` irep1.sub()[0] = irep2; ` will share rather than copy
41+ ` irep2 ` and its children, saving memory unless either irept is subsequently
42+ modified, at which point a copy is made.
43+
44+ \subsection irept_discussion_section Discussion
45+
46+ Many other compiler infrastructures represent a polymorphic tree using nodes
47+ specialised to a particular expression type: for example, perhaps a binary
48+ addition operator could be represented using a tag plus two pointers to child
49+ expressions, rather than allocating a whole irept (including a variable-length
50+ expression vector and empty maps for storing named subexpressions). This may
51+ save memory, but inhibits ad-hoc annotations such as tagging the addition
52+ "does not overflow" without adding that field to addition operations globally
53+ or maintaining a shadow data structure to track that information. This is easier
54+ with a general irept structure that can store an arbitrary number of
55+ arbitrarily-named child nodes.
56+
57+ Another optimisation commonly seen when storing polymorphic trees is to use a
58+ uniform node data structure (like irept) but to keep the node compact, for
59+ example storing at most four pointers and transparently allocating extension
60+ nodes when necessary for an unusually large expression. This provides the best
61+ of both worlds, obtaining compact storage for common cases such as unannotated
62+ binary expressions while permitting deviation at times. The downside is that
63+ implementing such a system is more complex than straightforwardly using C++
64+ standard data structures as in irept.
65+
2666\subsection irep_idt_section Strings: dstringt, the string_container and the ID_ *
2767
2868Within cbmc, strings are represented using \ref irep_idt, or \ref irep_namet
0 commit comments