Skip to content

cloudera/cybersec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cybersec Toolkit

Build and Test

Overview

Enterprises deploy many point solutions to defend their networks. These point solutions provide a wealth of data about the enterprise assets and networks but it is difficult to provide analytics on this data because there is no common repository and the events are in different formats. The Cybersec Toolkit is a pipeline that ingests, correlates and prepares cybersecurity data for analytics. The Cyber Toolkit leverages the Cloudera Data Platform to build a Security Data Lakehouse.

The Cyber Toolkit ingests raw log events from a variety of sources, parses and normalizes the log events using a common schema, enriches the events with reference data, scores the log events, profiles the events, and streams the events to a Kafka and a data lakehouse. Integrate with orchestration or investigation and ticketing platforms using Flink SQL (SQL Stream Builder) on the triaged event topic. Query the data lakehouse using SQL for visualizations and ad hoc queries or Spark for notebooks, investigations and machine learning model training.

The Cyber Toolkit is flexible and configurable so the ingestion can be changed with low or no code.

Ingestion Stages

  1. Parse
  2. Triage
  3. Index
  4. Profile

Tools

  1. Batch Enrichment Load
  2. Upsert Scoring Command
  3. Event Generation

Packaging

The Cybersec Toolkit includes a Cloudera Manager parcel and service for easier installation.

  1. Parcel
  2. Cloudera Service

Building from Source

Clone repo

git clone https://github.com/cloudera/cybersec.git

Build with tests

cd cybersec/flink-cyber
mvn clean install

Build without running tests

cd cybersec/flink-cyber
mvn clean install -DskipTests