Skip to content

brexhq/substation

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
fe60cfb

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
cmd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Substation

substation logo

Substation is a toolkit for creating highly configurable, no maintenance, and cost efficient serverless data pipelines.

What is Substation?

Originally designed to collect, normalize, and enrich security event data, Substation provides methods for achieving high quality data through interconnected, serverless data pipelines.

Substation also provides Go packages for filtering and modifying JSON data.

Features

As an event-driven ingest, transform, and load application, Substation has these features:

  • real-time event filtering and processing
  • cross-dataset event correlation and enrichment
  • concurrent event routing to downstream systems
  • runs on containers, built for extensibility
    • support for new event filters and processors
    • support for new ingest sources and load destinations
    • supports creation of custom applications (e.g., multi-cloud)

As a package, Substation has these features:

Use Cases

Substation was originally designed to support the mission of achieving high quality data for threat hunting, threat detection, and incident response, but it can be used to move data between many distributed systems and services. Here are some example use cases:

  • data availability: sink data to an intermediary streaming service such as AWS Kinesis, then concurrently sink it to a data lake, data warehouse, and SIEM
  • data consistency: normalize data across every dataset using a permissive schema such as the Elastic Common Schema
  • data completeness: enrich data by integrating AWS Lambda functions and building self-populating AWS DynamoDB tables for low latency, real-time event context

Example Data Pipelines

Simple

The simplest data pipeline is one with a single source (ingest), a single transform, and a single sink (load). The diagram below shows pipelines that ingest data from different sources and sink it unmodified to a data warehouse where it can be used for analysis.

graph TD
    sink(Data Warehouse)

    %% pipeline one
    source_a(HTTPS Source)
    processing_a[Transfer]

    %% flow
    subgraph pipeline X
    source_a ---|Push| processing_a
    end

    processing_a ---|Push| sink

    %% pipeline two
    source_b(Data Lake)
    processing_b[Transfer]

    %% flow
    subgraph pipeline Y
    source_b ---|Pull| processing_b
    end

    processing_b ---|Push| sink

Complex

The complexity of a data pipeline, including its features and how it connects with other pipelines, is up to the user. The diagram below shows two complex data pipelines that have these feature:

  • both pipelines write unmodified data to intermediary streaming data storage (e.g., AWS Kinesis) to support concurrent consumers and downstream systems
  • both pipelines transform data by enriching it from their own inter-pipeline metadata lookup (e.g., AWS DynamoDB)
  • pipeline Y additionally transforms data by enriching it from pipeline X's metadata lookup
graph TD

    %% pipeline a
    source_a_http(HTTPS Source)
    sink_a_streaming(Streaming Data Storage)
    sink_a_metadata(Metadata Lookup)
    sink_a_persistent[Data Warehouse]
    processing_a_http[Transfer]
    processing_a_persistent[Transform]
    processing_a_metadata[Transform]

    %% flow
    subgraph pipeline Y
    source_a_http ---|Push| processing_a_http
    processing_a_http ---|Push| sink_a_streaming
    sink_a_streaming ---|Pull| processing_a_persistent
    sink_a_streaming ---|Pull| processing_a_metadata
    processing_a_persistent---|Push| sink_a_persistent
    processing_a_persistent---|Pull| sink_a_metadata
    processing_a_metadata ---|Push| sink_a_metadata
    end

    processing_a_persistent ---|Pull| sink_b_metadata

    %% pipeline b
    source_b_http(HTTPS Source)
    sink_b_streaming(Streaming Data Storage)
    sink_b_metadata(Metadata Lookup)
    sink_b_persistent(Data Warehouse)
    processing_b_http[Transfer]
    processing_b_persistent[Transform]
    processing_b_metadata[Transform]

    %% flow
    subgraph pipeline X
    source_b_http ---|Push| processing_b_http
    processing_b_http ---|Push| sink_b_streaming
    sink_b_streaming ---|Pull| processing_b_persistent
    sink_b_streaming ---|Pull| processing_b_metadata
    processing_b_persistent---|Push| sink_b_persistent
    processing_b_persistent---|Pull| sink_b_metadata
    processing_b_metadata ---|Push| sink_b_metadata
    end

As a toolkit, Substation makes no assumptions about how data pipelines are configured and connected. We encourage experimentation and outside-the-box thinking when it comes to pipeline design!

Quickstart

Users can use the steps below to test Substation's functionality. We recommend doing the steps below in a Docker container (we've included Visual Studio Code configurations for developing and testing Substation in .devcontainer/ and .vscode/ ).

Step 0: Set Environment Variable

export SUBSTATION_ROOT=/path/to/repository

Step 1: Compile the File Binary

Run the commands below to compile the Substation file app.

cd $SUBSTATION_ROOT/cmd/file/substation/ && \
go build . && \
./substation -h

Step 2: Compile the quickstart Configuration File

Run the command below to compile the quickstart Jsonnet configuration files into a Substation JSON config.

cd $SUBSTATION_ROOT && \
sh build/scripts/config/compile.sh

Step 3: Test Substation

Run the command below to test Substation.

After this, we recommend reviewing the config documentation and running more tests with other event processors to learn how the app works.

cd $SUBSTATION_ROOT && \
./cmd/file/substation/substation -input examples/quickstart/data.json -config examples/quickstart/config.json

Users can continue exploring the system by iterating on the quickstart config, building and running custom example applications, and deploying a data pipeline in AWS.

Additional Documentation

More documentation about Substation can be found across the project, including:

Licensing

Substation and its associated code is released under the terms of the MIT License.

About

Substation is a toolkit for creating highly configurable, no maintenance, and cost efficient serverless data pipelines.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •