Skip to content

Commit

Permalink
Add parseAPI/CodeSource.h
Browse files Browse the repository at this point in the history
  • Loading branch information
hainest committed Apr 3, 2024
1 parent 9697c0e commit 326fe48
Show file tree
Hide file tree
Showing 4 changed files with 219 additions and 161 deletions.
1 change: 1 addition & 0 deletions docs/parseAPI/developer/API.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ ParseAPI
CFGFactory.h
CFGModifier.h
CodeObject.h
CodeSource.h
debug_parse.h
dominator.h
IA_aarch64.h
Expand Down
132 changes: 132 additions & 0 deletions docs/parseAPI/developer/CodeSource.h.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
.. _`sec-dev:CodeSource.h`:

CodeSource.h
############

.. cpp:namespace:: Dyninst::ParseAPI::dev

.. cpp:class:: CodeSource

Implementers of CodeSource can fill the following structures with available
information. Some of this information is optional.

.. cpp:member:: protected mutable std::map<Address, std::string> _linkage

Named external linkage table (e.g. PLT on ELF). Optional.

.. cpp:member:: protected Address _table_of_contents

Table of Contents for position independent references. Optional.

.. cpp:member:: protected std::vector<CodeRegion*> _regions

Code regions in the binary. At least one region is required for parsing.

.. cpp:member:: protected Dyninst::IBSTree<CodeRegion> _region_tree

Code region lookup. Must be consistent with the _regions vector. Mandatory.

.. cpp:member:: protected dyn_c_vector<Hint> _hints

Hints for where to begin parsing.

Required for the default parsing mode, but usage of one of the direct parsing
modes (parsing particular locations or using speculative methods) is supported
without hints.

.. cpp:member:: protected static dyn_hash_map<std::string, bool> non_returning_funcs

Lists of known non-returning functions

.. cpp:member:: protected static dyn_hash_map<int, bool> non_returning_syscalls_x86

Lists of known non-returning functions by syscall

.. cpp:member:: protected static dyn_hash_map<int, bool> non_returning_syscalls_x86_64

Lists of known non-returning functions by syscall number on x86_64

.. cpp:function:: dyn_c_vector<Hint> const& hints() const
.. cpp:function:: std::vector<CodeRegion*> const& regions() const
.. cpp:function:: int findRegions(Address addr, std::set<CodeRegion*> & ret) const
.. cpp:function:: bool regionsOverlap() const
.. cpp:function:: Address getTOC() const
.. cpp:function:: virtual Address getTOC(Address) const

If the binary file type supplies per-function TOC's (e.g. ppc64 Linux), override.

.. cpp:function:: virtual void print_stats() const
.. cpp:function:: virtual bool have_stats() const
.. cpp:function:: virtual void incrementCounter(const std::string& name) const
.. cpp:function:: virtual void addCounter(const std::string& name, int num) const
.. cpp:function:: virtual void decrementCounter(const std::string& name) const
.. cpp:function:: virtual void startTimer(const std::string& name) const
.. cpp:function:: virtual void stopTimer(const std::string& name) const
.. cpp:function:: virtual bool findCatchBlockByTryRange(Address address, std::set<Address>&) const
.. cpp:function:: void addRegion(CodeRegion*)
.. cpp:function:: void removeRegion(CodeRegion*)

.. cpp:class:: SymtabCodeRegion : public CodeRegion

.. cpp:function:: SymtabCodeRegion(SymtabAPI::Symtab*, SymtabAPI::Region*)
.. cpp:function:: SymtabCodeRegion(SymtabAPI::Symtab*, SymtabAPI::Region*, std::vector<SymtabAPI::Symbol*> &symbols)
.. cpp:function:: void names(Address, std::vector<std::string>&)
.. cpp:function:: bool findCatchBlock(Address addr, Address& catchStart)
.. cpp:function:: bool isValidAddress(const Address) const
.. cpp:function:: void* getPtrToInstruction(const Address) const
.. cpp:function:: void* getPtrToData(const Address) const
.. cpp:function:: unsigned int getAddressWidth() const
.. cpp:function:: bool isCode(const Address) const
.. cpp:function:: bool isData(const Address) const
.. cpp:function:: bool isReadOnly(const Address) const
.. cpp:function:: Address offset() const
.. cpp:function:: Address length() const
.. cpp:function:: Architecture getArch() const
.. cpp:function:: Address low() const
.. cpp:function:: Address high() const
.. cpp:function:: SymtabAPI::Region* symRegion() const

.. cpp:class:: SymtabCodeSource : public CodeSource, public boost::lockable_adapter<boost::recursive_mutex>

.. cpp:function:: SymtabCodeSource(SymtabAPI::Symtab*, hint_filt, bool allLoadedRegions=false)
.. cpp:function:: SymtabCodeSource(SymtabAPI::Symtab*)
.. cpp:function:: SymtabCodeSource(const char*)
.. cpp:function:: bool nonReturning(Address func_entry)
.. cpp:function:: bool nonReturningSyscall(int num)
.. cpp:function:: bool resizeRegion(SymtabAPI::Region*, Address newDiskSize)
.. cpp:function:: Address baseAddress() const
.. cpp:function:: Address loadAddress() const
.. cpp:function:: Address getTOC(Address addr) const
.. cpp:function:: SymtabAPI::Symtab* getSymtabObject()
.. cpp:function:: bool isValidAddress(const Address) const
.. cpp:function:: void* getPtrToInstruction(const Address) const
.. cpp:function:: void* getPtrToData(const Address) const
.. cpp:function:: unsigned int getAddressWidth() const
.. cpp:function:: bool isCode(const Address) const
.. cpp:function:: bool isData(const Address) const
.. cpp:function:: bool isReadOnly(const Address) const
.. cpp:function:: Address offset() const
.. cpp:function:: Address length() const
.. cpp:function:: Architecture getArch() const
.. cpp:function:: void removeHint(Hint)
.. cpp:function:: static void addNonReturning(std::string func_name)
.. cpp:function:: void print_stats() const
.. cpp:function:: bool have_stats() const
.. cpp:function:: void incrementCounter(const std::string& name) const
.. cpp:function:: void addCounter(const std::string& name, int num) const
.. cpp:function:: void decrementCounter(const std::string& name) const
.. cpp:function:: void startTimer(const std::string& name) const
.. cpp:function:: void stopTimer(const std::string& name) const
.. cpp:function:: bool findCatchBlockByTryRange(Address, std::set<Address>&) const

.. cpp:struct:: SymtabCodeSource::hint_filt

.. cpp:function:: virtual bool operator()(SymtabAPI::Function* f)=0

.. cpp:struct:: SymtabCodeSource::try_block

.. cpp:member:: Address tryStart
.. cpp:member:: Address tryEnd
.. cpp:member:: Address catchStart

.. cpp:function:: try_block(Address ts, Address te, Address c)
177 changes: 85 additions & 92 deletions docs/parseAPI/public/CodeSource.h.rst
Original file line number Diff line number Diff line change
@@ -1,128 +1,121 @@
.. _`sec:CodeSource.h`:

CodeSource.h
============
############

.. cpp:namespace:: Dyninst::ParseAPI

.. cpp:class:: CodeSource : public Dyninst::InstructionSource

** Retrieve binary code from an executable, library, or other binary code object**

It also can provide hints of function entry points (such as those derived from debugging
symbols) to seed the parser.

.. cpp:type:: dyn_c_hash_map<void*, CodeRegion*> RegionMap

.. cpp:function:: virtual bool nonReturning(Address func_entry)

Checks if a function returns by location ``func_entry``.

This information may be statically known for some code sources, and can lead
to better parsing accuracy.

.. cpp:function:: virtual bool nonReturning(std::string func_name)

Checks if a function returns by name ``func_name``.

This information may be statically known for some code sources, and can lead
to better parsing accuracy.

.. cpp:function:: virtual bool nonReturningSyscall(int number)

Checks if a system call returns by system call number ``number``.

This information may be statically known for some code sources, and can lead
to better parsing accuracy.

.. cpp:function:: virtual Address baseAddress()

Returns the base address of the code covered by this source.

.. cpp:namespace:: Dyninst::parseAPI
If the binary file type supplies non-zero base or load addresses (e.g. Windows PE),
implementations should override these functions.

Class CodeSource
----------------
.. cpp:function:: virtual Address loadAddress()

**Defined in:** ``CodeSource.h``
Returns the load address of the code covered by this source.

The CodeSource interface is used by the ParseAPI to retrieve binary code
from an executable, library, or other binary code object; it also can
provide hints of function entry points (such as those derived from
debugging symbols) to seed the parser. The ParseAPI provides a default
implementation based on the SymtabAPI that supports many common binary
formats. For details on implementing a custom CodeSource, see Appendix
`5 <#sec:extend>`__.
If the binary file type supplies non-zero base or load addresses (e.g. Windows PE),
implementations should override these functions.

.. code-block:: cpp
virtual bool nonReturning(Address func_entry) virtual bool
nonReturning(std::string func_name)
.. cpp:function:: std::map<Address, std::string>& linkage()

Looks up whether a function returns (by name or location). This
information may be statically known for some code sources, and can lead
to better parsing accuracy.
Returns the external linkage map.

.. code-block:: cpp
virtual bool nonReturningSyscall(int /*number*/)
This may be empty.

Looks up whether a system call returns (by system call number). This
information may be statically known for some code sources, and can lead
to better parsing accuracy.
.. cpp:struct:: CodeSource::Hint

.. code-block:: cpp
virtual Address baseAddress() virtual Address loadAddress()
**A starting point for parsing**

If the binary file type supplies non-zero base or load addresses (e.g.
Windows PE), implementations should override these functions.
.. note:: This class satisfies the C++ `Compare <https://en.cppreference.com/w/cpp/named_req/Compare>`_ concept.

.. code-block:: cpp
std::map< Address, std::string > & linkage()
.. cpp:member:: Address _addr
.. cpp:member:: int _size
.. cpp:member:: CodeRegion* _reg
.. cpp:member:: std::string _name

Returns a reference to the external linkage map, which may or may not be
filled in for a particular CodeSource implementation.
.. cpp:function:: Hint(Addr, CodeRegion*, std::string)
.. cpp:function:: std::vector< Hint > const& hints()

.. code-block:: cpp
struct Hint Address _addr; CodeRegion *_region; std::string _name;
Hint(Addr, CodeRegion *, std::string); std::vector< Hint > const&
hints()
Returns the currently-defined function entry hints.

Returns a vector of the currently defined function entry hints.
.. cpp:function:: std::vector<CodeRegion *> const& regions()

.. code-block:: cpp
std::vector<CodeRegion *> const& regions()
Returns a read-only vector of code regions within the binary represented
by this code source.

Returns a read-only vector of code regions within the binary represented
by this code source.
.. cpp:function:: int findRegions(Address addr, set<CodeRegion *> & ret)

.. code-block:: cpp
int findRegions(Address addr, set<CodeRegion *> & ret)
Finds all CodeRegion objects that overlap the provided address. Some
code sources (e.g. archive files) may have several regions with
overlapping address ranges; others (e.g. ELF binaries) do not.

Finds all CodeRegion objects that overlap the provided address. Some
code sources (e.g. archive files) may have several regions with
overlapping address ranges; others (e.g. ELF binaries) do not.
.. cpp:function:: bool regionsOverlap()

.. code-block:: cpp
bool regionsOverlap()
Indicates whether the CodeSource contains overlapping regions.

Indicates whether the CodeSource contains overlapping regions.
.. cpp:class:: CodeRegion

Class CodeRegion
----------------
**Divide a CodeSource into distinct regions**

**Defined in:** ``CodeSource.h``
This interface is mostly of interest to CodeSource implementors.

The CodeRegion interface is an accounting structure used to divide
CodeSources into distinct regions. This interface is mostly of interest
to CodeSource implementors.
.. cpp:function:: void names(Address addr, vector<std::string>& names)

.. code-block:: cpp
void names(Address addr, vector<std::string> & names)
Retrieves the names associated with the function address ``addr`` in the
region, e.g. symbol names in an ELF or PE binary.

Fills the provided vector with any names associated with the function at
a given address in the region, e.g. symbol names in an ELF or PE binary.
.. cpp:function:: virtual bool findCatchBlock(Address addr, Address & catchStart)

.. code-block:: cpp
virtual bool findCatchBlock(Address addr, Address & catchStart)
Finds the exception handler associated with the address ``addr``, if one exists.

Finds the exception handler associated with an address, if one exists.
This routine is only implemented for binary code sources that support
structured exception handling, such as the SymtabAPI-based
SymtabCodeSource provided as part of the ParseAPI.
This routine is only implemented for binary code sources that support structured
exception handling.

.. code-block:: cpp
Address low()
.. cpp:function:: Address low()

The lower bound of the interval of address space covered by this region.
Returns the lower bound of the interval of the address space covered by this region.

.. code-block:: cpp
Address high()
.. cpp:function:: Address high()

The upper bound of the interval of address space covered by this region.
Returns the upper bound of the interval of the address space covered by this region.

.. code-block:: cpp
bool contains(Address addr)
.. cpp:function:: bool contains(Address addr)

Returns true if
:math:`\small \texttt{addr} \in [\small \texttt{low()},\small \texttt{high()})`,
false otherwise.
Checks if :cpp:func:`low` :math:`\le` ``addr`` :math:`\lt` :cpp:func:`high`.

.. code-block:: cpp
virtual bool wasUserAdded() const
.. cpp:function:: virtual bool wasUserAdded() const

Return true if this region was added by the user, false otherwise.
Return true if this region was added by the user, false otherwise.

0 comments on commit 326fe48

Please sign in to comment.