Skip to content
Jan Wrona edited this page Jul 28, 2016 · 9 revisions

Annotation

SecurityCloud project aims at developing innovative solution for distributed IP flow storage and its query engine. Moreover, the SecurityCloud collector offers high-availability features, that is, there is no single point of failure, and the distributed collector is able to recover in case of failure of a single component. The SecurityCloud collector is scalable and introduces very low overhead in comparison to solutions based on big data platforms. SecurityCloud project is a joint effort of CESNET, Flowmon Networks and Masaryk University.

Architecture overview

SecurityCloud collector is a distributed flow-based processing software. Architecture is based on the master/slave model, although we usually use the proxy/subcollector terminology, where proxy works as a master and subcollector as a slave or a worker. Both proxy and subcollector nodes simultaneously performs flow reception/storage and handling of flow queries.

SecurityCloud collector consists of two core tools - IPFIXcol and fdistdump. IPFIXcol receives, distributes and stores flow data while fdistdump executes ad hoc user queries upon stored data. IPFIXcol must be instantiated as a proxy at the proxy node and as a subcollector at the subcollector node. Fdistdump is a command line tool that utilizes MPI to communicate with subcollectors during query execution and uses libnf library to read flow data.

Please note that a computer may serve as a proxy, as a subcollector or as both at the same time. The last option is not ideal though, because computer serving as both proxy and subcollector may get overloaded easily.

Flow records reception and storage

As you can see on the picture below, primary task of the proxy node is to split incoming flow records incoming from the exporter (IPFIX or NetFlow) and distribute them to the set of subcollector nodes. Each subcollector works as an independent collector: it receives records from the proxy (IPFIX or NetFlow) and stores them on a local storage.

Flow storage

Flow records query

Each flow query consists of four phases:

  1. User creates the query and sends it to the proxy node.
  2. Flow records from required time interval are spread across all the subcollectors (round-robin), therefore proxy node creates sub-query on each one of them (map phase).
  3. Each sub-query is processed independently and all the subcollectors works at the same time. Result is then send back to the proxy node (reduce phase).
  4. Proxy node waits for all the results, performs some postprocessing and presents the final result back to the user.

Although we call it map and reduce phases, it is not the same as a MapReduce programming model as you may know from the Hadoop framework.

Flow query