Skip to content
Proactive Computer Network Defense Strategy - OSINT Real Time Threat Stream - Social/DarkNet - Data Mining
Branch: master
Clone or download
Pull request Compare This branch is 7 commits behind proactivecndosint2012:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Diagrams
Hadoop
SaltStack
TESTING-REFACTOR-REWRITE
archive_files
crawlers
darknet
facebook
pastebin
pastebins
product
prune_files
pyinotify
reddit
scrapers
torweb
twitter
var
watchlist
GPL_LICENSE.TXT
OSINTCND.komodoproject
PASTEBIN.komodoproject
README
README.md
TODO
TODO.Cont
TODO.txt
error_handle.py
error_handle.pyc
pastebin_scraper.py

README.md

Project Start Date

June 30, 2012

License

GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007

Disclaimer

All software provided as is. All software covered under the GPL license and free for public redistribution. If unintended consequences occur due to utilization of this software, user bears the resultant outcome. The rule of thumb is to test and validate properly all solutions prior to implementation within a production environment. All solutions should be subject to public scrutiny, and peer review.

Requirements

Project is being spun up utilizing Python programming language. If you can code in Python proficiently, feel free to fork the project off from my GitHub account, and begin contributions. Scope of the project is such that one person will be hard pressed to reach mile stones or complete and maintain on their own. Will require a community effort of committed, dedicated persons interested in furthering Computer Network Defense.

Vision

Working on an Open Source Project toward realization of real time OSINT threat stream of Social media and DarkNet media. Idea is to go beyond anything anyone has contributed within the community toward realization of a solid Proactive Real Time Intelligence Threat Stream. The idea behind open sourcing framework is based upon said technology should not be exclusive or hard to obtain so that organizations have a tool set to build off from. Solution is SIEM agnostic, it should be able to fit into whatever SIEM solution a customer utilizes. In this instance we are leverage the ArcSight SIEM.

Data Collection

Leveraging Social media API's and development of Python web crawlers/spiders to gather OSINT within DarkNet space such as TorWeb. Other DarkNet entities will be targeted where access can be gained or subverted.

AI/ML/NLP - Data Mining

Machine Learning - Artificial Intelligence - Natural Language Processing Moving forward in the CND space, systems are increasing in complexity to the extent it is impossible for the traditional Security Operations Center model of Computer Network Defense to provide adequate protection reactively or proactively. As the degree of complexity increases over a long time horizon, the number of humans it will take to mount traditional SOC CND operations will reach a point of unfeasibility. Hence the need for Artificial Intelligence, via Machine Learning, and Natural Language Processing. All of the tool sets and framework are freely available from Python, to Disco, to MemSql, to ZeroMQ. Disco with Mapreduce provides a highly cost effective solution to adding a Real Time Proactive Cyber Analytic capability to a Security Operations Center Computer's Network Defense Game Plan. Leveraging the above described tool sets, an attribute enhanced post analytic data stream can be introduced into a SIEM architecture to perform higher level Decision Trees, Neural Networks post processed data, and Natural Language Processing analytic capabilities. Identifying, uncovering, and discovering strategic data focal elements within the threat space in real time such as hacker/actor sentiment toward initiation of an active operation against an organization, trust relationships between hacker/actor groups, Natural Language Translation of foreign language site to monitor in real time for indicators of interest. The idea is to utilize cutting edge AI/ML/NLP technology to take your Computer Network Defense to the next level of an active Proactive Computer Network Defense. The idea is not to replace the human analytic elements within a SOC CND operation, but instead to increase the strategic real time detection ability of human SOC CND analysts, providing the analysts with a real time proactive OSINT threat stream, that has been reduced and pre analytic analysis has already been conducted by the DIsco Cluster. Presenting as stated, data focal points of strategic value which empower a Proactive CND Strategy within a CND SOC, versus the current failed Reactive CND Strategy. Modernization of SOC CND operations will only be realized in a Proactive CND Strategy, hunting the hunters, versus being hunted by the hunters. Better to be a hunter of hunters than the hunted.

VirtualBox Development VM Provided for Collaborative Development

You will need VirtualBox https://www.virtualbox.org/wiki/Downloads

Development VM Download Location

https://www.dropbox.com/sh/vnhy5wq0pamk2cm/s5mEys7OBr

Development VM Authentication & Identity Information

username: disco password: osint2012 username: root password: osint2012

Development VM Additional Instructions

Additional instructions are at the following $PATH Read the files at this location Includes instructions with commiting to the Fork of the Master Git repository

/home/disco/Desktop

Development Stack Includes

Fedora 17 Native Python 2.7.3 with simplejson, numpy, disco, pandas modules installed Erlang R15B01 Disco lastest build from Git ZeroMQ MemSql - 10GB developer license MySql client Komodo IDE - community addition

Collaboration Contact

proactivecndosint2012@gmail.com

Diagrams

DISCO OSINT Real Time Threat Stream High Level View

http://goo.gl/jbMGU

DISCO OSINT Real Time Threat Stream Detailed Data Flow Architecture

http://goo.gl/oowkW

DISCO OSINT Real Time Threat Stream Detailed Data Flow Architecture Continued

http://goo.gl/FnCk1

OSINT Real Time Threat Stream Tool Set

Python Programming Language

www.python.org

DISCO

Python/Erlang/Mapreduce Parallel Processing FrameWork 100% Python on Frontend http://www.discoproject.com/

MemSQL => Worlds Fastest Database

MemSQL is the fastest way to ingest large volumes of data while simultaneously analyzing that data in real time. http://memsql.com/

Alternative Data Tiers to Explore

VoltDB

http://voltdb.com/products-services/downloads

JustOneDB

http://www.justonedb.com/

Clustrix

http://www.clustrix.com/thanks/devkit

ZeroMQ

Faster than TCP The Intelligent Transport Layer http://www.zeromq.org

Architecture Health Monitoring

Collectd

http://collectd.org/features.shtml

Python Modules

HTTP => urllib, urllib2, urllib3, requests, httplib, httplib2

JSON => simplejson, json-py, cjson, pandas DataFrame

XML => ElementTree, xml.dom, xml.dom.minidom, xml.sax, lxml, simplexml

RSS/ATOM => FeedParser

Stats/R => pandas

HTML => BeautifulSoup, html5lib

URL => cgi, urllib, urlparse, requests

ML => milk, pybrain, scikit-learn

NLP => NLTK

AI => FANN + Python, scikit-learn, MDP, LibSVM Python Interface

You can’t perform that action at this time.