Skip to content

Component: Topologies

Erik Novak edited this page Nov 11, 2020 · 1 revision

A topology is an organization of nodes into a graph that determines the paths where the messages must travel. These nodes are of two types:

  • Bolts. The nodes in a topology that receives input data from other nodes and emits new data into the topology. Read more at Component: Bolts.

  • Spouts. The node in a topology that reads data from external sources and emits the data into the topology. Read more at Component: Spouts.

This page describes the available topologies that are currently present in the project.

Prerequisites

NOTE: Before running the topologies, please address the environment variables settings that are required for running the chosen topology.

Running a Topology

The framework is designed for easy creation of document and text processing pipelines. The user can modify and/or create their own pipelines by defining the topology file in the /topologies folder. Afterwards, the topology can be started with the following command:

# navigate into the built distribution folder
cd ./dist
# start the pipeline
node ./pipeline -tn {topology-name} -tp {topology-path}

Where the topology-name is user defined (can be anything), while the topology-path is the relative path to the topology file.

This example

node ./pipeline -tn uuid.index-material-text -tp ../topologies/index-material-text

will run the topology which will receive text material metadata from the appropriate Kafka topic and process the materials without translating them for simple indexing in Elasticsearch.

NOTE: To add new spouts and bolts, please look at the list of available components in the qtopology documentation found here.

Available Topologies

Clone this wiki locally