Document symbol table and namespace

smowton · smowton · commit 2773e688a32c · 2018-08-21T11:56:22.000+01:00
diff --git a/src/util/README.md b/src/util/README.md
@@ -101,19 +101,90 @@ of a while loop is a statement.
 
 \subsection symbolt_section symbolt, symbol_tablet, and namespacet
 
-To be documented.
-
-\subsubsection symbol_lifetimes_section Symbol lifetimes, symbol modes, name, base-name, pretty-name; semantic of lifetimes for symex?
-
-To be documented.
-
-\subsubsection storing_symbols_section Storing symbols and hiding symbols (namespacet)
-
-To be documented.
-
-\subsubsection ns_follow_section ns.follow
-
-To be documented.
+A symbol table is a mapping from symbol names to \ref symbolt objects, which
+store a symbol's name, attributes, type and perhaps value. They are used to
+describe local and global variables, type definitions and function prototypes
+and definitions.
+
+All symbols store a type (an instance of \typet). For function or method symbols
+these are \ref code_typet instances.
+
+Global variable symbols may have a value (an \ref exprt), in which case it is
+used to initialise the global. Method or function symbols may also have a value,
+in which case it is a \ref codet and gives the function definition. A method or
+function symbol without a value is a prototype (for example, it might be an
+`extern` declaration in C). Local variables' symbol values are always ignored;
+any initialiser must be explicitly assigned after they are instantiated by a
+declaration (\ref code_declt).
+
+Symbol expressions (\ref symbol_exprt) and types (\ref symbol_typet) refer to
+symbols stored in a symbol table. Symbol expressions can be thought of as
+referring to the table for more detail about a symbol (for example, is it a
+local or a global variable, or perhaps a function?), and have a type which must
+match the type given in the symbol table. Symbol types can be thought of as
+shorthands or aliases for a type given in more detail in the symbol table, for
+example permitting a shorthand for a large structure type, as well as permitting
+the construction of expressions referring to that type before its full
+definition is known.
+
+Note the implementation of \ref symbol_tablet is split into a base interface
+(\ref symbol_table_baset) and an implementation (\ref symbol_tablet). There is
+one alternate implementation (\ref journalling_symbol_tablet) which additionally
+maintains a log or journal of symbol creation, modification and deletions.
+
+Namespaces (\ref namespacet) provide a read-only view on one or more symbol
+tables, and provide helper functions that aid accessing the table. A namespace
+may layer exactly two symbol tables, in which case any lookup operation checks
+the 'top' symbol table before looking in the 'bottom' symbol table if nothing
+is found in the top. The namespace wrapper class also provides the
+\ref namespacet::follow operation, which dereferences a `symbol_typet` to
+retrieve the type it refers to, including following a symbol_typet which refers
+to another symbol which eventually refers to a 'real' type.
+
+\subsubsection symbolt_lifetime Symbol lifetimes
+
+Symbols with \ref symbolt::is_static_lifetime set are global variables; they
+are initialised before a program's entry-point is called and live until it ends.
+They may be marked dead using a \ref code_deadt instruction, but this does not
+render the variable inaccessible, it only indicates that the variable's current
+value will not be read without an intervening write.
+
+Symbols with function type (those whose type is a \ref code_typet
+Non-type, non-global symbols (those with \ref symbolt::is_static_lifetime and
+\ref symbolt::is_type not set) are local variables, and their lifespan
+description varies depending on context.
+
+In symbol table function definitions (the values of function-typed symbols),
+local variables are created using a \ref code_declt instruction, and live until
+their enclosing \ref code_blockt ends (similar to the C idiom
+`{ int x; ... } // x lifespan ends`). By contrast in GOTO programs locals are
+declared using a DECL instruction and live until a DEAD instruction referring
+to the same symbol. Explicit DEAD instructions are always used rather than
+killing implicitly by exiting a function.
+
+Multiple instances of the same local may be live at the same time by virtue of
+recursive function calls; a dead instruction or scope end always targets the
+most recently allocated instance.
+
+Scoping rules are particular to source languages and are not enforced by
+CPROVER. For example, in standard C it is not possible to refer to a local
+variable across functions without using a pointer, but in some possible source
+languages this is permitted.
+
+\subsubsection symbolt_details symbol modes, name, base-name, pretty-name
+
+Symbols have:
+* A mode, which indicates the source language frontend responsible for creating
+  them. This is mainly used in pretty-printing the symbol table, to indicate
+  the appropriate language frontend to use rendering the symbol's value and/or
+  type.
+* A base-name and pretty-name, which are a short and user-friendly version of
+  the symbol's definitive name respectively.
+* Several flags (see \ref symbolt for full details), including
+  \ref symbolt::is_static_lifetime (is this a global variable symbol?),
+  \ref symbolt::is_type (is this a type definition),
+  \ref symbolt::is_thread_local (of a variable, are there implicitly instances
+    of this variable per-thread?).
 
 \subsection cmdlinet