Skip to content

A utility for mapping the file formats embedded within a single file

License

Notifications You must be signed in to change notification settings

chrismattmann/polyfile

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolyFile


Build Status PyPI version Slack Status

A utility to identify and map the semantic structure of files, including polyglots, chimeras, and schizophrenic files. It can be used in conjunction with its sister tool PolyTracker for Automated Lexical Annotation and Navigation of Parsers, a backronym devised solely for the purpose of collectively referring to the tools as The ALAN Parsers Project.

Quickstart

In the same directory as this README, run:

pip3 install -e .

This will automatically install the polyfile executable in your path.

Usage

usage: polyfile [-h] [--filetype FILETYPE] [--list] [--html HTML]
                [--try-all-offsets] [--debug] [--quiet] [--version]
                [-dumpversion]
                [FILE]

A utility to recursively map the structure of a file.

positional arguments:
  FILE                  The file to analyze; pass '-' or omit to read from
                        STDIN

optional arguments:
  -h, --help            show this help message and exit
  --filetype FILETYPE, -f FILETYPE
                        Explicitly match against the given filetype (default
                        is to match against all filetypes)
  --list, -l            list the supported filetypes (for the `--filetype`
                        argument) and exit
  --html HTML, -t HTML  Path to write an interactive HTML file for exploring
                        the PDF
  --try-all-offsets, -a
                        Search for a file match at every possible offset; this
                        can be very slow for larger files
  --debug, -d           Print debug information
  --quiet, -q           Suppress all log output (overrides --debug)
  --version, -v         Print PolyFile's version information to STDERR
  -dumpversion          Print PolyFile's raw version information to STDOUT and
                        exit

To generate a JSON mapping of a file, run:

polyfile INPUT_FILE > output.json

You can optionally have PolyFile output an interactive HTML page containing a labeled, interactive hexdump of the file:

polyfile INPUT_FILE --html output.html > output.json

File Support

PolyFile can identify all 10,000+ file formats in the TrID database. It currently has support for parsing and semantically mapping the following formats:

For an example that exercises all of these file formats, run:

curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - > ESultanikResume.json

Output Format

PolyFile outputs its mapping in an extension of the SBuD JSON format described in the documentation.

Current Status and Known Deficiencies

  • The instrumented Kaitai Struct parser generator implementation has only been tested on the JPEG/JFIF grammar; other KSY definitions may exercise portions of the KSY specification that have not yet been implemented

License and Acknowledgements

This research was developed by Trail of Bits with funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois. It is licensed under the Apache 2.0 lisense. The PDF parser is modified from the parser developed by Didier Stevens and released into the public domain. © 2019, Trail of Bits.

About

A utility for mapping the file formats embedded within a single file

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 86.5%
  • JavaScript 10.2%
  • HTML 1.7%
  • CSS 1.6%