Skip to content

alex-components/polyfile

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolyFile


Slack Status PyPI version

A utility to identify and map the semantic structure of files, including polyglots, chimeras, and schizophrenic files. It can be used in conjunction with its sister tool PolyTracker for Automated Lexical Annotation and Navigation of Parsers, a backronym devised solely for the purpose of collectively referring to the tools as The ALAN Parsers Project.

Quickstart

In the same directory as this README, run:

pip3 install -e .

This will automatically install the polyfile executable in your path.

Usage

$ polyfile --help
usage: polyfile [-h] [--html HTML] [--debug] [--quiet] FILE

A utility to recursively map the structure of a file.

positional arguments:
  FILE                  The file to analyze

optional arguments:
  -h, --help            show this help message and exit
  --html HTML, -t HTML  Path to write an interactive HTML file for exploring
                        the PDF
  --debug, -d           Print debug information
  --quiet, -q           Suppress all log output (overrides --debug)

To generate a JSON mapping of a file, run:

polyfile INPUT_FILE > output.json

You can optionally have PolyFile output an interactive HTML page containing a labeled, interactive hexdump of the file:

polyfile INPUT_FILE --html output.html > output.json

File Support

PolyFile can identify all 10,000+ file formats in the TrID database. It currently has support for parsing and semantically mapping the following formats:

For an example that exercises all of these file formats, run:

curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - > ESultanikResume.json

Current Status and Known Deficiencies

  • The instrumented Kaitai Struct parser generator implementation has only been tested on the JPEG/JFIF grammar; other KSY definitions may exercise portions of the KSY specification that have not yet been implemented
  • The JSON output schema will soon be replaced with the similar SBuD format

License and Acknowledgements

This research was developed by Trail of Bits with funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois. It is licensed under the Apache 2.0 lisense. The PDF parser is modified from the parser developed by Didier Stevens and released into the public domain. © 2019, Trail of Bits.

About

A utility for mapping the file formats embedded within a single file

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 86.2%
  • JavaScript 10.4%
  • HTML 1.8%
  • CSS 1.6%