Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Writing Dissectors

ashdnazg edited this page Feb 27, 2014 · 2 revisions

Pyreshark's dissectors are written in the form of a .py (e.g. my_protocol.py) file, containing a class named Protocol inheriting ProtocolBase.

In order to be loaded, this file should be placed in <Wireshark-dir>\python\protocols.

When Pyreshark is initialized, it creates an instance of this class and uses said instance to generate the new protocol (and its fields, trees, etc.) and register it in Wireshark.

Let's have a look at the sample protocol (you can find it in \python\protocols\sample_protocol.py):

Sample Protocol

from cal.cal_types import ProtocolBase, FieldItem, PyFunctionItem, Subtree, TextItem
from cal.ws_consts import FT_UINT16, BASE_HEX, FT_UINT8, FT_ETHER, FT_IPv4

ETHERNET = 1
IP = 0x0800

ARPOP_REQUEST = 1
ARPOP_REPLY = 2

HW_TYPE_STRINGS = {ETHERNET : "Ethernet"}
PROTO_TYPE_STRINGS = {IP : "IP"}
OPCODE_STRINGS =   {ARPOP_REQUEST:  "request",
                    ARPOP_REPLY:    "reply"}

class Protocol(ProtocolBase):
    def __init__(self):
        self._name = "Pyreshark Sample Protocol (ARP)"
        self._filter_name = "pysample"
        self._short_name = "PYSAMPLE"
        self._items = [FieldItem("hw.type", FT_UINT16, "Hardware Type", strings = HW_TYPE_STRINGS),
                       FieldItem("proto.type", FT_UINT16, "Protocol Type", display = BASE_HEX, strings = PROTO_TYPE_STRINGS),
                       FieldItem("hw.size", FT_UINT8, "Hardware Size"),
                       FieldItem("proto.size", FT_UINT8, "Protocol Size"),
                       FieldItem("opcode", FT_UINT16, "Opcode", strings = OPCODE_STRINGS),
                       Subtree(TextItem("src", "Sender"), [PyFunctionItem(self.add_addresses, { "mac" : FieldItem("hw_mac", FT_ETHER, "Sender MAC Address"),
                                                                                                "ip" : FieldItem("proto_ipv4", FT_IPv4, "Sender IP Address"),})]),
                       Subtree(TextItem("dst", "Target"), [PyFunctionItem(self.add_addresses, { "mac" : FieldItem("hw_mac", FT_ETHER, "Target MAC Address"),
                                                                                                "ip" : FieldItem("proto_ipv4", FT_IPv4, "Target IP Address"),})]),
                       ]
        #self._register_under = { "ethertype": 0x0806} # UNCOMMENT THIS TO TEST THE PROTOCOL

    def add_addresses(self, packet):
        (hw_type, proto_type, hw_size, proto_size) = packet.unpack(">HHBB", 0)
        if hw_type == ETHERNET:
            packet.read_item("mac")
        else:
            packet.add_text("Unimplemented hardware type")
            packet.offset += hw_size
        
        if proto_type == IP:
            packet.read_item("ip")
        else:
            packet.add_text("Unimplemented protocol type")
            packet.offset += proto_size

This a very thin and incomplete implementation of the ARP protocol. We shall now inspect the different parts of the code

Imports

from cal.cal_types import ProtocolBase, FieldItem, PyFunctionItem, Subtree, TextItem
from cal.ws_consts import FT_UINT16, BASE_HEX, FT_UINT8, FT_ETHER, FT_IPv4

As you can see two modules are being imported, both under the cal package.

cal stands for C Abstraction Layer, it's actually the core of Pyreshark, hiding Wireshark's C API and providing you with its own "pythonic" one.

  • cal_types holds Pyreshark's API, almost everything you'll need is there.
  • ws_consts holds several constant values from Wireshark's C code, that are necessary for protocol writing.

The Protocol Class

class Protocol(ProtocolBase):
    def __init__(self):
        self._name = "Pyreshark Sample Protocol (ARP)"
        self._filter_name = "pysample"
        self._short_name = "PYSAMPLE"
        self._items = [...]
        #self._register_under = { "ethertype": 0x0806}

The first thing you notice about this class is that it inherits ProtocolBase (from cal.cal_types). Removing this may cause various exceptions and errors. Try your best to just leave this line as is.

Now let's have a look on the various members initialized in the constructor:

Variable Description
_name The full name of your protocol
_filter_name The name of your protocol in the filter box
_short_name The name of your protocol in the protocol column
_items This is where you state the structure of your protocol, we'll discuss this properly later on
_register_under Use this if you want your dissector to be called by an existing protocol, you can register it in the latter's table (find all available tables in the menu Internals->Dissector Tables)
_hidden If you don't want the protocol to be automatically added to the tree, set this to True

Note that _register_under is commented out so it doesn't override the original ARP protocol. To test the sample protocol just uncomment this line, start Wireshark and inspect any ARP packet.

There's one more thing that can be set in the constructor that doesn't appear in the sample:

  • set_next_dissector(dissector_name, length = REMAINING_LENGTH) - Used for dictating which protocol will be parsed after yours, and how many bytes it'll receive (omitting the second argument will pass all remaining bytes to the next dissector). For example:
self.set_next_dissector("tcp")

Calling this function in the constructor sets the default value for all dissected packets. Omitting this line will keep "data" as the default next protocol.

Items

That's the fun part where you actually write your protocol's structure (note that we're still in the constructor):

self._items = [FieldItem("hw.type", FT_UINT16, "Hardware Type", strings = HW_TYPE_STRINGS),
               FieldItem("proto.type", FT_UINT16, "Protocol Type", display = BASE_HEX, strings = PROTO_TYPE_STRINGS),
               FieldItem("hw.size", FT_UINT8, "Hardware Size"),
               FieldItem("proto.size", FT_UINT8, "Protocol Size"),
               FieldItem("opcode", FT_UINT16, "Opcode", strings = OPCODE_STRINGS),
               Subtree(TextItem("src", "Sender"), [PyFunctionItem(self.add_addresses, { "mac" : FieldItem("hw_mac", FT_ETHER, "Sender MAC Address"),
                                                                                        "ip" : FieldItem("proto_ipv4", FT_IPv4, "Sender IP Address"),})]),
               Subtree(TextItem("dst", "Target"), [PyFunctionItem(self.add_addresses, { "mac" : FieldItem("hw_mac", FT_ETHER, "Target MAC Address"),
                                                                                        "ip" : FieldItem("proto_ipv4", FT_IPv4, "Target IP Address"),})]),
               ]

As you can see _items is a list of several objects (all of which reside happily in cal.cal_types). During the dissection these items are being processed sequentially, starting in the beginning of the packet and advancing the offset as needed.

Item What happens during Dissection
FieldItem Reads a regular field in the packet's bytes, like an integer or an IP address and adds it to the tree
TextItem Adds a custom textual field to the tree
Subtree Adds a sub-tree to the tree
PyFunctionItem Calls a python function to process the packet

FieldItem

The constructor accepts a myriad parameters, most of which have very convenient defaults.

Parameter Description Default value
name The name of the field. Used for generating the filter name. -
field_type Any of the FT_* from ws_consts.py (also wireshark's ftypes.h). -
full_name The name that'll be shown in the tree. If it is set to None, full_name=name. None
descr A short description of the field. If it is set to None, descr=name. None
encoding Encoding for reading the field. See ws_consts.py. If it is set to None, a default encoding is picked from FIELD_TYPES_DICT in cal_consts.py. None
mask Bit mask. NO_MASK=0
display How the field's value will be displayed in the tree. See ws_consts.py. If it is set to None, a default display is picked from FIELD_TYPES_DICT in cal_consts.py. None
strings A dictionary for translating the field's value into text. For boolean fields use True and False as keys, for integers use either the values directly or tuples of (min, max) - not both at the same dictionary! None
length Length of the field in bytes. If it is set to None, a default length is picked from FIELD_TYPES_DICT in cal_consts.py. None

Note that None means python's None and not the English word "none".

Useful tips:

  • The item's filter name will be generated according to its position. If it's directly under the protocol root, it'll be named (protocol-filter-name).(item's name) (e.g. "pysample.opcode").
  • If it's under a tree, the tree's parent item will join the filter name as well (e.g. "pysample.src.hw_mac").
  • The offset is advanced after this item is dissected (according to the item's length).

TextItem

When you just need another line of text in the tree, this item is for you!

Parameter Description Default value
name The name of the field. Used for generating the filter name. -
text The text that will be added to the tree. -
length Length of the field in bytes. 0

Useful tips:

  • Note that the offset is not advanced.
  • Extremely handy as the parent item of a Subtree.

Subtree

If you want to have sub-tree in your protocol's tree, it's easy and fun!

Parameter Description Default value
parent_item The subtree's parent item. -
item_list The subtree's children - A list of items. -
tree_name Used by Wireshark for remembering which trees are expanded. Put AUTO_TREE for the name of parent_item. AUTO_TREE
  • In 9 times out of 10, you don't want to set tree_name.
  • You'd usually want to use a TextItem as parent_item.

PyFunctionItem

With the three items above we can create wonderful protocols, with a slight limitation: no dissection logic. That's where PyFunctionItem comes to the rescue. When this item is being dissected it'll call a python function of your choice where you can happily program your protocol's logic in Python.

PyFunctionItem(self.add_addresses, { "mac" : FieldItem("hw_mac", FT_ETHER, "Sender MAC Address"),
                                     "ip" : FieldItem("proto_ipv4", FT_IPv4, "Sender IP Address"),}
 .
 .
 .

def add_addresses(self, packet):
    (hw_type, proto_type, hw_size, proto_size) = packet.unpack(">HHBB", 0)
    if hw_type == ETHERNET:
        packet.read_item("mac")
    else:
        packet.add_text("Unimplemented hardware type")
        packet.offset += hw_size
    
    if proto_type == IP:
        packet.read_item("ip")
    else:
        packet.add_text("Unimplemented protocol type")
        packet.offset += proto_size
Parameter Description Default value
dissection_func A python function. It'll be called with a single parameter: a Packet instance. -
items_dict A dictionary of all the items the function might read. The keys can be anything and will be used when the function calls packet.read_item(key). -

The Packet object your function receives contains your API for dissecting the packet.

  • packet.id - The packet's position in the capture.
  • packet.visited - Whether the packet was visited before, or it's our first time seeing it.
  • packet.buffer - The packet's bytes as a string.
  • packet.offset - The current offset in packet.buffer.
  • packet.add_text(text, length, offset) - Used for adding a line of text to the tree.
Parameter Description Default value
text The text to be added. -
length The number of bytes that'll be marked when selecting the item. packet.offset is not advanced. 0
offset The beginning offset for the marked bytes. If set to None, offset=self.offset. None
  • packet.set_column_text(col_id, text) - Used for setting a columns text.
Parameter Description Default value
col_id The column's id (any COL_* from ws_consts.py). -
text The new text of the column. -
  • packet.read_item(item_key) - Used for adding any item from the aforementioned items_dict to the tree.

    • Note that the offset is advanced according to the item read.
Parameter Description Default value
item_key The key of the item in the items_dict. -
  • packet.unpack(format, offset) - Used for reading values from the packet's bytes.
    • Note that the offset is not affected.
Parameter Description Default value
format A format string (see Python's documentation for the module struct). -
offset The offset from which the values will be read. None will set it to the current offset. None

There's another important function that can be called from here. It belongs to ProtocolBase and we have already met it:

  • set_next_dissector(dissector_name, length = REMAINING_LENGTH) - When being called from a function, it'll only change the next dissector for the current packet.

Useful tip:

  • The item after the PyFunctionItem (or the next dissector, if it is the last item) will be read beginning in packet.offset, don't forget to set it to the right position if necessary.

DissectorItem

An item that calls another dissector.

Parameter Description Default value
name A protocol's name -
length The number of bytes to be dissected REMAINING_LENGTH
  • The item has a function set(name, length=REMAINING_BYTES) that lets you change its parameters temporarily for the next time it'll be invoked. Only use it if you know what you're doing.

OffsetItem

An item that advances the offset.

Parameter Description Default value
length Number of bytes by which to advance the offset. -
encoding one of ENC_*, relevant for whether it's big endian or little endian. ENC_BIG_ENDIAN
flags Any of OFFSET_FLAGS_*, Useful for length preceded fields. OFFSET_FLAG_NONE

The three flags available are:

Flag What does it do?
OFFSET_FLAGS_NONE The offset is advanced length bytes.
OFFSET_FLAGS_READ_LENGTH A uint of size length is read, and the offset is advanced by length + the uint's value
OFFSET_FLAGS_READ_LENGTH_INCLUDING A uint of size length is read, and the offset is advanced by the uint's value

ColumnItem

An item that changes a columns text.

Parameter Description Default value
col_id The column's id (any COL_* from ws_consts.py). -
text The new text of the column. -

SubSource

An item that adds a new data source from which its sub-fields will be read. The source is created from a python string returned by a function passed as a parameter.

Parameter Description Default value
source_name The name of the new source. -
create_data_func A python function that returns the new source's bytes as a string. It'll be called with a single parameter: a Packet instance (See PyFunctionItem). -
items_list A list of the items that will be read from the new source. -

General Tips

  • IMPORTANT: Even though you declare the items in Python, all items outside a PyFunctionItem will be dissected by C code! If you're worried about speed, avoid using PyFunctionItem unless necessary. Theoretically, if your protocol has no inner logic and contains no PyFunctionItems, there won't be any Python code running after Wireshark starts. I might write an explanation of how Pyreshark works later, in the meanwhile have a look at the code.
  • You can pass information between different functions by storing it in the Protocol object (accessible through self), just make sure you reset your value when dissecting a new packet, as the same Protocol object is used for dissecting all packets.
  • Don't make recursive dissectors (a dissector that contains a DissectorItem of itself). I'm not responsible for anything that might happen if you do. That probably sums the current possibilities and opportunities Pyreshark has to offer.

Good luck with your dissector(s)!