Skip to content

Commit

Permalink
Add symtab/Symtab.h
Browse files Browse the repository at this point in the history
  • Loading branch information
hainest committed Apr 3, 2024
1 parent ba36b11 commit f86e777
Show file tree
Hide file tree
Showing 8 changed files with 758 additions and 683 deletions.
1 change: 1 addition & 0 deletions docs/symtabAPI/developer/API.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ SymtabAPI
Statement.h
StringTable.h
Symbol.h
Symtab.h
SymtabReader.h
Type-mem.h
Variable.h
294 changes: 294 additions & 0 deletions docs/symtabAPI/developer/Symtab.h.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
.. _`sec-dev:Symtab.h`:

Symtab.h
########

.. cpp:namespace:: Dyninst::SymtabAPI::dev

.. cpp:class:: Symtab : public LookupInterface, public AnnotatableSparse

.. cpp:member:: private std::string member_name_

This will be either the name from the MappedFile _or_ the name of the ".a" file when
the Symtab is created during static re-writing (see :cpp:func:`Archive::parseMember`).

.. cpp:member:: const std::unique_ptr<symtab_impl> impl

Hide implementation details that are complex or add large dependencies

.. cpp:function:: static Symtab *findOpenSymtab(std::string filename)

Finds a previously-opened symtab with name ``name``.

.. cpp:function:: static bool closeSymtab(Symtab *s)

Destroys ``s`` and removes it from the cache of symbtabs.

.. cpp:function:: bool hasStackwalkDebugInfo()

.. cpp:function:: bool getRegValueAtFrame(Address pc, Dyninst::MachRegister reg, Dyninst::MachRegisterVal &reg_result, \
MemRegReader *reader)

.. cpp:function:: bool addRegion(Offset vaddr, void *data, unsigned int dataSize, std::string name, Region::RegionType rType_, bool loadable = false, unsigned long memAlign = sizeof(unsigned), bool tls = false)

Creates a new region using the specified parameters and adds it to the file.

.. cpp:function:: bool addRegion(Region *newreg)

Adds the provided region to the file.

.. cpp:function:: bool getAllNewRegions(std::vector<Region *>&ret)

This method finds all the new regions added to the object file. Returns
``true`` with ``ret`` containing the regions if there is at least one
new region that is added to the object file or else returns ``false``.

.. cpp:function:: void fixup_code_and_data(Offset newImageOffset, Offset newImageLength, Offset newDataOffset, Offset newDataLength)
.. cpp:function:: bool fixup_RegionAddr(const char* name, Offset memOffset, long memSize)
.. cpp:function:: bool updateRegion(const char* name, void *buffer, unsigned size)
.. cpp:function:: bool updateCode(void *buffer, unsigned size)
.. cpp:function:: bool updateData(void *buffer, unsigned size)
.. cpp:function:: bool updateFuncBindingTable(Offset stub_addr, Offset plt_addr)

.. cpp:function:: bool addSymbol(Symbol *newsym)

This method adds a new symbol ``newsym`` to all of the internal data
structures. The primary name of the ``newsym`` must be a mangled name.
Returns ``true`` on success and ``false`` on failure. A new copy of
``newsym`` is not made. ``newsym`` must not be deallocated after adding
it to symtabAPI. We suggest using ``createFunction`` or
``createVariable`` when possible.

.. cpp:function:: bool addSymbol(Symbol *newSym, Symbol *referringSymbol)

This method adds a new dynamic symbol ``newsym`` which refers to
``referringSymbol`` to all of the internal data structures. ``newsym``
must represent a dynamic symbol. The primary name of the newsym must be
a mangled name. All the required version names are allocated
automatically. Also if the ``referringSymbol`` belongs to a shared
library which is not currently a dependency, the shared library is added
to the list of dependencies implicitly. Returns ``true`` on success and
``false`` on failure. A new copy of ``newsym`` is not made. ``newsym``
must not be deallocated after adding it to symtabAPI.

.. cpp:function:: Function *createFunction(std::string name, Offset offset, size_t size, Module *mod = NULL)

This method creates a ``Function`` and updates all necessary data
structures (including creating Symbols, if necessary). The function has
the provided mangled name, offset, and size, and is added to the Module
``mod``. Symbols representing the function are added to the static and
dynamic symbol tables. Returns the pointer to the new ``Function`` on
success or ``NULL`` on failure.

.. cpp:function:: Variable *createVariable(std::string name, Offset offset, size_t size, Module *mod = NULL)

This method creates a ``Variable`` and updates all necessary data
structures (including creating Symbols, if necessary). The variable has
the provided mangled name, offset, and size, and is added to the Module
``mod``. Symbols representing the variable are added to the static and
dynamic symbol tables. Returns the pointer to the new ``Variable`` on
success or ``NULL`` on failure.

.. cpp:function:: bool deleteFunction(Function *func)

This method deletes the ``Function`` ``func`` from all of symtab’s data
structures. It will not be available for further queries. Return
``true`` on success and ``false`` if ``func`` is not owned by the
``Symtab``.

.. cpp:function:: bool deleteVariable(Variable *var)

This method deletes the variable ``var`` from all of symtab’s data
structures. It will not be available for further queries. Return
``true`` on success and ``false`` if ``var`` is not owned by the
``Symtab``.

.. cpp:function:: void setTruncateLinePaths(bool value)
.. cpp:function:: bool getTruncateLinePaths()
.. cpp:function:: std::string getDefaultNamespacePrefix() const

.. cpp:function:: Module* findModuleByOffset(Offset offset) const

Returns the module at the offset ``offset`` in the debug section (e.g., .debug_info).

.. cpp:function:: Module *getDefaultModule() const


.. cpp:function:: bool addType(Type *typ)

Adds a new type ``type`` to symtabAPI. Return ``true`` on success.

.. cpp:function:: static boost::shared_ptr<builtInTypeCollection>& builtInTypes()
.. cpp:function:: static boost::shared_ptr<typeCollection>& stdTypes()

.. cpp:function:: static void getAllstdTypes(std::vector<boost::shared_ptr<Type>>&)
.. cpp:function:: static std::vector<Type*>* getAllstdTypes()

Returns all the standard types that normally occur in a program.


.. cpp:function:: static void getAllbuiltInTypes(std::vector<boost::shared_ptr<Type>>&)
.. cpp:function:: static std::vector<Type*>* getAllbuiltInTypes()

Returns all the built-in types defined in the binary.

.. cpp:function:: virtual boost::shared_ptr<Type> findType(unsigned type_id, Type::do_share_t)

The same as :cpp:func:`Type* findType(unsigned i)`.

.. cpp:function:: Type* findType(unsigned i)

Returns the type at index ``i``.

Returns ``false`` if no type was found.

.. cpp:function:: bool addLine(string lineSource, unsigned int lineNo, unsigned int lineOffset, Offset lowInclusiveAddr, Offset highExclusiveAddr)

This method adds a new line to the line map. ``lineSource`` represents
the source file name. ``lineNo`` represents the line number. Returns
``true`` on success and ``false`` on error.

.. cpp:function:: bool addAddressRange(Offset lowInclusiveAddr, Offset highExclusiveAddr, string lineSource, unsigned int lineNo, unsigned int lineOffset = 0);

This method adds an address range
``[lowInclusiveAddr, highExclusiveAddr)`` for the line with line number
``lineNo`` in source file ``lineSource`` at offset ``lineOffset``.
Returns ``true`` on success and ``false`` on error.

.. cpp:function:: bool emitSymbols(Object *linkedFile, std::string filename, unsigned flag = 0)

.. cpp:function:: bool emit(std::string filename, unsigned flag = 0)

Creates a new file using the specified name that contains all changes made by the user.

.. cpp:function:: void addDynLibSubstitution(std::string oldName, std::string newName)
.. cpp:function:: std::string getDynLibSubstitution(std::string name)

.. cpp:function:: Offset getFreeOffset(unsigned size)

Find a contiguous region of unused space within the file (which may be
at the end of the file) of the specified size and return an offset to
the start of the region. Useful for allocating new regions.

.. cpp:function:: bool addLibraryPrereq(std::string libname)

Add a library dependence to the file such that when the file is loaded,
the library will be loaded as well. Cannot be used for static binaries.

.. cpp:function:: bool addSysVDynamic(long name, long value)
.. cpp:function:: bool addLinkingResource(Archive *library)
.. cpp:function:: bool getLinkingResources(std::vector<Archive *> &libs)
.. cpp:function:: bool addExternalSymbolReference(Symbol *externalSym, Region *localRegion, relocationEntry localRel)
.. cpp:function:: bool addTrapHeader_win(Address ptr)
.. cpp:function:: bool updateRelocations(Address start, Address end, Symbol *oldsym, Symbol *newsym)
.. cpp:function:: bool removeLibraryDependency(std::string lib)
.. cpp:function:: void rebase(Offset offset)
.. cpp:function:: Object *getObject()
.. cpp:function:: const Object *getObject() const
.. cpp:function:: void dumpModRanges()
.. cpp:function:: void dumpFuncRanges()
.. cpp:function:: Module *getOrCreateModule(const std::string &modName, const Offset modAddr)
.. cpp:function:: Offset getElfDynamicOffset()
.. cpp:function:: bool delSymbol(Symbol *sym)
.. cpp:function:: void getSegmentsSymReader(std::vector<SymSegment> &segs)
.. cpp:function:: bool deleteSymbol(Symbol *sym)

This method deletes the symbol ``sym`` from all of symtab’s data
structures. It will not be available for further queries. Return
``true`` on success and ``false`` if func is not owned by the
``Symtab``.

.. cpp:function:: static boost::shared_ptr<Type>& type_Error()
.. cpp:function:: static boost::shared_ptr<Type>& type_Untyped()
.. cpp:function:: bool getFuncBindingTable(std::vector<relocationEntry> &fbt) const
.. cpp:function:: bool findPltEntryByTarget(Address target_address, relocationEntry &result) const
.. cpp:function:: Offset getTOCoffset(Function *func = NULL) const
.. cpp:function:: Offset getTOCoffset(Offset off) const
.. cpp:function:: Offset fileToDiskOffset(Dyninst::Offset) const
.. cpp:function:: Offset fileToMemOffset(Dyninst::Offset) const
.. cpp:function:: bool canBeShared()



Notes
=====

An Elf Object that can be loaded into memory to form an executable’s
image has one of two types: ET_EXEC and ET_DYN. ET_EXEC type objects are
executables that are loaded at a fixed address determined at link time.
ET_DYN type objects historically were shared libraries that are loaded
at an arbitrary location in memory and are position independent code
(PIC). The ET_DYN object type was reused for position independent
executables (PIE) that allows the executable to be loaded at an
arbitrary location in memory. Although generally not the case an object
can be both a PIE executable and a shared library. Examples of these
include libc.so and the dynamic linker library (ld.so). These objects
are generally used as a shared library so ``isExec()`` will classify
these based on their typical usage. The methods below use heuristics to
classify ET_DYN object types correctly based on the properties of the
Elf Object, and will correctly classify most objects. Due to the
inherent ambiguity of ET_DYN object types, the heuristics may fail to
classify some libraries that are also executables as an executable. This
can happen in object is a shared library and an executable, and its
entry point happens to be at the start of the .text section.

``isExecutable()`` is equivalent to elfutils’ ``elfclassify --program``
test with the refinement of the soname value and entry point tests.
Pseudocode for the algorithm is shown below:

- **if** (**not** loadable()) **return** *false*

- **if** (object type is ET_EXEC) **return** *true*

- **if** (has an interpreter (PT_INTERP segment exists)) **return**
*true*

- **if** (PIE flag is set in FLAGS_1 of the PT_DYNAMIC segment)
**return** *true*

- **if** (DT_DEBUG tag exists in PT_DYNAMIC segment) **return** *true*

- **if** (has a soname and its value is “linux-gate.so.1”) **return**
*false*

- **if** (entry point is in range .text section offset plus 1 to the
end of the .text section) **return** *true*

- **if** (has a soname and its value starts with “ld-linux”) **return**
*true*

- **otherwise return** *false*

``isSharedLibrary()`` is equivalent to elfutils’
``elfclassify --library``. Pseudocode for the algorithm is shown below:

- **if** (**not** loadable()) **return** *false*

- **if** (object type is ET_EXEC) **return** *false*

- **if** (there is no PT_DYNAMIC segment) **return** *false*

- **if** (PIE flag is set in FLAGS_1 of the PT_DYNAMIC segment)
**return** *false*

- **if** (DT_DEBUG tag exists in PT_DYNAMIC segment) **return** *false*

- **otherwise return** *true*

Elf files can also store data that is neither an executable nor a shared
library including object files, core files and debug symbol files. To
distinguish these cases the ``loadable()`` function is defined using the
pseudocode shown below and returns true is the file can loaded into a
process’s address space:

- **if** (object type is neither ET_EXEC nor ET_DYN) **return** *false*

- **if** (there is are no program segments with the PT_LOAD flag set)
**return** *false*

- **if** (contains no sections) **return** *true*

- **if** (contains a section with the SHF_ALLOC flag set and a section
type of neither SHT_NOTE nor SHT_NOBITS) **return** *true*

- **otherwise return** *false*
9 changes: 0 additions & 9 deletions docs/symtabAPI/developer/notes.rst

This file was deleted.

45 changes: 45 additions & 0 deletions docs/symtabAPI/developer/relocationEntry.h.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,48 @@ relocationEntry.h
.. cpp:enumerator:: relative
.. cpp:enumerator:: jump_slot
.. cpp:enumerator:: absolute

relocationEntry
---------------

This class represents object relocation information.

.. code-block:: cpp
Offset target_addr() const
Specifies the offset that will be overwritten when relocations are
processed.

.. code-block:: cpp
Offset rel_addr() const
Specifies the offset of the relocation itself.

.. code-block:: cpp
Offset addend() const
Specifies the value added to the relocation; whether this value is used
or not is specific to the relocation type.

.. code-block:: cpp
const std::string name() const
Specifies the user-readable name of the relocation.

.. code-block:: cpp
Symbol *getDynSym() const
Specifies the symbol whose final address will be used in the relocation
calculation. How this address is used is specific to the relocation
type.

.. code-block:: cpp
unsigned long getRelType() const
Specifies the platform-specific relocation type.
24 changes: 24 additions & 0 deletions docs/symtabAPI/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,30 @@ useful because libraries can load at different addresses in different
processes. Each AddressLookup instance is associated with, and provides
mapping for, one process.


.. _`sec:symtabapi-defensive`:

Defensive binaries
==================

Code reuse attacks are an increasingly popular technique for circumventing tra-
ditional program protection mechanisms such as W ``xor`` X (e.g., Data Execution
Prevention (DEP)), and the security community has proposed a wide range of
approaches to protect against these attacks. However, many of these approaches
provide ad hoc solutions, relying on observed attack characteristics that are not
intrinsic to the class of attacks. In the continuing arms race against code reuse
attacks, we must construct defenses using a more systematic approach: good
engineering practices must combine with the best security techniques.
Any such approach must be engineered to cover the complete spectrum of
attack surfaces. While more general defensive techniques, such as Control Flow
2 Detecting Code Reuse Attacks
Integrity or host-based intrusion detection, provide good technical solutions,
each is lacking in one or more features necessary to provide a comprehensive
and adoptable solution. We must develop defenses that can be effectively
applied to real programs.

See `Jacboson et al. 2014 <https://paradyn.org/papers/Jacobson14ROPStop.pdf>`_ for details.

.. _symtabapi-usage:

Usage
Expand Down

0 comments on commit f86e777

Please sign in to comment.