Skip to content
A list of generic tools for parsing binary data structures, such as file formats, network protocols or bitstreams
Branch: master
Clone or download
mbeckerle and dloss Added Apache Daffodil (incubating) DFDL implementation. (#7)
* Added Apache Daffodil (incubating) DFDL implementation.

* Added Apache Daffodil (incubating) DFDL implementation.
Latest commit 5a5b2ab May 17, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Jan 6, 2017 Added Apache Daffodil (incubating) DFDL implementation. (#7) May 17, 2018

Parsing binary data

A list of generic tools for parsing binary data structures, such as file formats, network protocols or bitstreams.

Parser generators, parsing libraries and frameworks

  • Spicy (DSL, C/C++, Bro): a next-generation parser generator for network protocols and file formats
  • Nom (Rust): Rust parser combinator framework
  • Hammer (C): bit-oriented parsing library
  • Kaitai Struct (DSL): declarative language used for describe various binary data structures, laid out in files or in memory
  • Hachoir (Python): view and edit a binary stream field by field. Long list of parsers for all kinds of formats
  • Construct and Construct 3 (Python): library for parsing and building of data structures (binary or textual). Define your data structures in a declarative manner
  • DataScript Tools (DSL): DataScript is a formal language for modelling binary datatypes, bitstreams or file formats. PDF
  • Parsifal (OCaml): OCaml-based parsing engine. Paper: A pragmatic solution to the binary parsing problem. Olivier Levillain
  • Haka (Lua): open source security oriented language which allows to describe protocols and apply security policies on (live) captured traffic
  • BinData (Ruby): provides a declarative way to read and write structured binary data
  • Binary-parser (Node): binary parser builder library for node, which enables you to write efficient parsers in a simple & declarative way
  • Gloss (Clojure): turn complicated byte formats into Clojure data structures and Clojure data structures into compact byte representations
  • Preon (Java): Bit syntax for Java. A declarative data binding framework for dealing with binary encoded data
  • attoparsec and attoparsec-binary: (Haskell): fast parser combinator library, aimed particularly at dealing efficiently with network protocols and complicated text/binary file formats
  • Marpa (C/C++, Perl, Go): libmarpa (C)
  • Scapy (Python): send, sniff and dissect and forge network packets. Usable interactively or as a library
  • libtins (C++): crafting, sending, sniffing and interpreting raw network packets
  • libcrafter (C++): high level library for C++ designed to create and decode network packets
  • scodec (Scala): Combinator library for working with binary data
  • Daffodil (Scala/Java, XML Schema): an open-source implementation of DFDL (Data Format Description Language) capable of describing many industry and military standards and parsing into a infoset, which is most commonly represented as either XML or JSON, and writing back to native format.
  • binaryparse (Nim, DSL): In-language DSL for reading and writing binary data supporting all sorts of patterns. Generates an efficient stream based reader and writer for the runtime execution.

Stand-alone software

Hex editors with grammars

Wireshark is a network protocol analyzer that includes dissectors for over two thousand protocols.

  • TShark: command line version, can easily be called from shell scripts.
  • Wireshark Generic Dissector: add-on, allows dissection of a protocol based on a text description of the protocol elements
  • Wireshark Lua: dissectors can be written in Lua (Examples)
  • pyreshark: plugin providing a simple interface for writing Wireshark dissectors in Python
  • Sharktools (Python, Matlab): Tools for programmatic parsing of packet captures using Wireshark functionality
Other Stand-alone Software
  • Netzob: open source tool for reverse engineering, traffic generation and fuzzing of communication protocols
  • Cat Karat Packet Builder: packet generation tool that allows to build custom packets for firewall or target testing
  • radare2 (C, with bindings/pipe for almost all languages): Unix-like reverse engineering framework and commandline tools. See Parsing a fileformat with radare2 and Types.

Research papers

  • Nail: A Practical Tool for Parsing and Generating Data Formats. Julian Bangert and Nickolai Zeldovich,
  • GAPA: Generic Application-Level Protocol Analyzer and its Language. Nikita Borisov, David J. Brumley, Helen J. Wang, Chuanxiong Guo
  • PADS/ML: a functional data description language. Y. Mandelbaum, K. Fisher, D. Walker, M. F. Fernandez, and A. Gleyzer.
  • PacketTypes: P. J. McCann and S. Chandra. Packet types: Abstract specification of network protocol messages.
  • Zebu: A Language-Based Approach for Improving the Robustness of Network Application Protocol Implementations. Larent Burgy et. al.
  • Zebra: Improving the Performance of Message Parsers for Embedded Systems. Jigar Solanki et. al.
  • z2z: Automatic Generation of Network Protocol Gateways. Yerom-David Bromberg, Laurent Reveillere, Julia L. Lawall, Gilles Muller
  • Yakker: Semantics and Algorithms for Data-dependent Grammars. Trevor Jim, Yitzhak Mandelbaum, David Walker
  • BinPAC: Superseded by BinPAC++, which is now known as Spicy
  • FlowSifter: High-Speed Application Protocol Parsing and Extraction for Deep Flow Inspection. Alex X. Liu, Chad R. Meiners, Eric Norige, and Eric Torng
  • TSN.1: Transfer Syntax Notation One (TSN.1). A formal notation for describing messages in binary protocols
  • NetPDL: Markup Language that aims to describe Protocols from OSI layer 2 to OSI layer 7
  • Tupni: Automatic Reverse Engineering of Input Formats. Weidong Cui et. al.
  • W. Underwood Grammar-Based Specification and Parsing of Binary File Formats. William Underwood

Lists of interesting binary formats

This is obviously rather subjective and definitely not supposed to be a complete list:

Related topics

You can’t perform that action at this time.