Project Start Date
June 30, 2012
GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007
All software provided as is. All software covered under the GPL license and free for public redistribution. If unintended consequences occur due to utilization of this software, user bears the resultant outcome. The rule of thumb is to test and validate properly all solutions prior to implementation within a production environment. All solutions should be subject to public scrutiny, and peer review.
Project is being spun up utilizing Python programming language. If you can code in Python proficiently, feel free to fork the project off from my GitHub account, and begin contributions. Scope of the project is such that one person will be hard pressed to reach mile stones or complete and maintain on their own. Will require a community effort of committed, dedicated persons interested in furthering Computer Network Defense.
Working on an Open Source Project toward realization of real time OSINT threat stream of Social media and DarkNet media. Idea is to go beyond anything anyone has contributed within the community toward realization of a solid Proactive Real Time Intelligence Threat Stream. The idea behind open sourcing framework is based upon said technology should not be exclusive or hard to obtain so that organizations have a tool set to build off from. Solution is SIEM agnostic, it should be able to fit into whatever SIEM solution a customer utilizes. In this instance we are leverage the ArcSight SIEM.
Leveraging Social media API's and development of Python web crawlers/spiders to gather OSINT within DarkNet space such as TorWeb. Other DarkNet entities will be targeted where access can be gained or subverted.
AI/ML/NLP - Data Mining
Machine Learning - Artificial Intelligence - Natural Language Processing Moving forward in the CND space, systems are increasing in complexity to the extent it is impossible for the traditional Security Operations Center model of Computer Network Defense to provide adequate protection reactively or proactively. As the degree of complexity increases over a long time horizon, the number of humans it will take to mount traditional SOC CND operations will reach a point of unfeasibility. Hence the need for Artificial Intelligence, via Machine Learning, and Natural Language Processing. All of the tool sets and framework are freely available from Python, to Disco, to MemSql, to ZeroMQ. Disco with Mapreduce provides a highly cost effective solution to adding a Real Time Proactive Cyber Analytic capability to a Security Operations Center Computer's Network Defense Game Plan. Leveraging the above described tool sets, an attribute enhanced post analytic data stream can be introduced into a SIEM architecture to perform higher level Decision Trees, Neural Networks post processed data, and Natural Language Processing analytic capabilities. Identifying, uncovering, and discovering strategic data focal elements within the threat space in real time such as hacker/actor sentiment toward initiation of an active operation against an organization, trust relationships between hacker/actor groups, Natural Language Translation of foreign language site to monitor in real time for indicators of interest. The idea is to utilize cutting edge AI/ML/NLP technology to take your Computer Network Defense to the next level of an active Proactive Computer Network Defense. The idea is not to replace the human analytic elements within a SOC CND operation, but instead to increase the strategic real time detection ability of human SOC CND analysts, providing the analysts with a real time proactive OSINT threat stream, that has been reduced and pre analytic analysis has already been conducted by the DIsco Cluster. Presenting as stated, data focal points of strategic value which empower a Proactive CND Strategy within a CND SOC, versus the current failed Reactive CND Strategy. Modernization of SOC CND operations will only be realized in a Proactive CND Strategy, hunting the hunters, versus being hunted by the hunters. Better to be a hunter of hunters than the hunted.
VirtualBox Development VM Provided for Collaborative Development
You will need VirtualBox https://www.virtualbox.org/wiki/Downloads
Development VM Download Location
Development VM Authentication & Identity Information
username: disco password: osint2012 username: root password: osint2012
Development VM Additional Instructions
Additional instructions are at the following $PATH Read the files at this location Includes instructions with commiting to the Fork of the Master Git repository
Development Stack Includes
Fedora 17 Native Python 2.7.3 with simplejson, numpy, disco, pandas modules installed Erlang R15B01 Disco lastest build from Git ZeroMQ MemSql - 10GB developer license MySql client Komodo IDE - community addition
DISCO OSINT Real Time Threat Stream High Level View
DISCO OSINT Real Time Threat Stream Detailed Data Flow Architecture
DISCO OSINT Real Time Threat Stream Detailed Data Flow Architecture Continued
OSINT Real Time Threat Stream Tool Set
Python Programming Language
Python/Erlang/Mapreduce Parallel Processing FrameWork 100% Python on Frontend http://www.discoproject.com/
MemSQL => Worlds Fastest Database
MemSQL is the fastest way to ingest large volumes of data while simultaneously analyzing that data in real time. http://memsql.com/
Alternative Data Tiers to Explore
Faster than TCP The Intelligent Transport Layer http://www.zeromq.org
Architecture Health Monitoring
HTTP => urllib, urllib2, urllib3, requests, httplib, httplib2
JSON => simplejson, json-py, cjson, pandas DataFrame
XML => ElementTree, xml.dom, xml.dom.minidom, xml.sax, lxml, simplexml
RSS/ATOM => FeedParser
Stats/R => pandas
HTML => BeautifulSoup, html5lib
URL => cgi, urllib, urlparse, requests
ML => milk, pybrain, scikit-learn
NLP => NLTK
AI => FANN + Python, scikit-learn, MDP, LibSVM Python Interface