Skip to content
hkerem edited this page Sep 14, 2010 · 15 revisions

Done

Kernel:

  • Protocol interception
  • Encrypted protocols
  • FSM modules to support protocols
  • ACLs based on IP ranges and matchers
  • Core configuration via .app and .config files
  • Other configuration on Mnesia (such as ACL and matcher configuration)
  • Text extraction from binary documents
  • Support for compressed files
  • Determining file types with python-magic
  • Usage logs and advanced diagnostics with SASL
  • Cached REGEX engine
  • Thrift → Python bridge for extra functionality
  • Thrift → PHP bridge for web UI bindings
  • Turkish NLP module
  • Trainer module
  • MySQL → MNesia compiler for MyDLP ACL configurations.
  • Matcher groups

Supported Protocols:

  • HTTP / HTTPS
  • SMTP
  • ICAP

Supported Document Types:

  • MS Word 97 – 2003 (*.doc)
  • MS Excel 97 – 2003 (*.xls)
  • MS Powerpoint 97 – 2003 (*.ppt)
  • RTF (*.rtf)
  • PDF (*.pdf)
  • PostScript (*.ps)
  • XML (*.xml) (Through these OOXML and ODF is supported)
  • HTML (*.html)
  • Plain texts

Supported Archive Types:

  • Zip (Through these OOXML and ODF is supported)
  • RAR archives

Implemented Matchers:

  • Mime-Type
  • File MD5 Hash
  • Regular Expressions
  • IBAN: Matches if document contains valid IBAN account numbers
  • Credit Card: Matches if document contains valid Credit Card numbers
  • Republic of Turkey ID No: Matches if document contains valid Republic of Turkey ID numbers (TC Kimlik No)
  • Social Security Number
  • Encrypted Archive
  • Encrypted File: Matches if because of any reason get_text could not extract text from file.
  • Sentence Hashes: Matches if determined number or percent of sentences have existed within previously generated database.
  • Bayessian Classification: Using bayeserl with optional Turkish NLP normalizer.
  • Source code

Todo

Protocols:

  • MS Exchange
  • POP / IMAP
  • MSNMS / Jabber

Matchers: