Skip to content

How to Build a Pipeline

Erik Heller edited this page Apr 30, 2019 · 13 revisions

After following the How to Add a Service tutorial and connecting your service to the ROXcomposer you can learn here how to build a pipeline with it.

In the following tutorial you will learn about:

What is a Pipeline

If you want to use your services, you will have to build pipelines that contain them, and then send messages with your data to the pipeline. Each service will process the data and pass it on to the next service in the pipeline.

Use Case

Pipelines are useful if you want to reuse a service for different purposes. You might want to use a service that converts a text to lowercase in a sentiment analysis pipeline, but you can also reuse it for a topic modeling pipeline.

Note however, that you will not have to create a new service if you want to use the Labelizer with a different label file. To learn how to avoid that read about pipeline parameters.

Good to Know

You can build a pipeline using the REST API or the GUI, but for either option you will have to make sure that the services you want in your pipe are running on the ROXcomposer.
This is due to the fact that the ROXcomposer does not have knowledge about a service until it is started.

Building your pipeline anew each time you want to use the ROXcomposer can be a time consuming task, which is why you have the option to save your session and load it afterwards.
A session is simply a json containing information about the current pipelines and services on the composer and their parameters. Read more about this here.

How to Build a Pipeline Using REST API

To build a pipeline with the REST API first start your service in the same way as you learned in the previous step:

import json
import requests

# service parameters
labelizer_json = {
     "path": "/home/janabecker/PycharmProjects/jotb-services/jotb/services/topic_matcher.py",
     "params": {
         "ip": "127.0.0.1",
         "port": 4010,
         "name": "labelizer",
         "filepath": "/path/to/file/labeled_words.json"
     }
}

# send request to start_service endpoint
response = requests.post("http://localhost:7475/start_service", data=json.dumps(labelizer_json),
              headers={"content-type": "application/json"})
print(response.text)

# the endpoint needs info about pipe name and the services that are in the pipeline (order is important)
pipe_data = {"name": "labelizer_pipe", "services": ["labelizer"]}

# send to the endpoint set_pipeline
response = requests.post("http://localhost:7475/set_pipeline", data=json.dumps(pipe_data),
                         headers={"content-type": "application/json"})
print(response.text)

This should yield the following response:

{"message":"service [labelizer] created"}
{"message":"pipeline [labelizer_pipe] created"}

How to Build a Pipeline Using GUI

To build a pipeline in the GUI navigate to the pipelines page and click the 'add pipeline' button.

add new pipe

Change the name of your pipeline (pipelines with the same name are overwritten).

edit name

Now add your running labelizer using the search bar in the lower left corner.

add service

Now you have a rudimentary pipeline containing the labelizer. Save your changes.

save

Nice! You're ready to finally send some data to the labelizer and see what comes out. Follow the How To Send a Message tutorial to learn how to do that.

Pipeline Parameters

An important aspect of pipelines is the fact that you can add pipeline specific parameters for each service. Parameters are a list of strings, you can encode your parameter in any way you like, we will be using a key=value format, which is easy to parse.

Parameters can be useful if you do not want to create a new Labelizer each time you want to change the label file. You can build different pipelines, e.g. one for labeling vegetable-related texts with a label file containing labels for different vegetables and one for fruit-related texts with another labelfile, and simply add a parameter with the path to the different label file for each pipeline.
Of course you will need to slightly change your labelizer code so that the file is read from the parameter and not from the service specific parameters JSON file.

To add custom parameters in the GUI click the small plus button on the labelizer card:

add parameter

And change the content to the path of the file.

edit parameter

Don't forget to save the changes.
Generally, if your service was already running and you have changed it's code you need to restart the service for the changes to reach the composer.

Now you will to change the on_message function of your labelizer to be able to parse the parameter:

def on_message(self, msg, msg_id, parameters=None):
    labelfile = ""
    if parameters:
        for parameter in parameters:
            if "file=" in parameter:
                labelfile = parameter.split("=")[1]  # get the filepath to the label file
    if labelfile:
       label_data = self.get_label_data(labelfile) # get the labels and their associated terms

Sending the Parameters via REST API

Of course you could also send the parameters using the REST API, simply create a json object containing a service key (the name) and a parameters key (list of parameter strings):

# define labelizer with pipeline specific parameters
labelizer = {
    "service": "labelizer",
    "parameters": ["file=path/to/labelfile"]
}

pipe_data = {"name": "labelizer_pipe", "services": [labelizer]}

# create pipeline
response = requests.post("http://localhost:7475/set_pipeline", data=json.dumps(pipe_data),
                         headers={"content-type": "application/json"})

Save and Restore Sessions

You do not have to rebuild your pipeline every time you use the ROXcomposer manually, simply use the restore session functionality.

First, you will need a session object. Download it by clicking 'save session' in the lower right corner:

save session

Now you have a session that contains your pipelines and any parameters that you added to your services.
To restore the session simply click the 'load session' button next to the save session button. Choose the file on your system that you downloaded and your session will be restored.

Note that the services that were used in that session have to be running! and need to have the same name in the current ROXcomposer-GUI instance to be able to restore the session.

It is also highly recommended to rename the downloaded session file to something meaningful (like labeling_session.json), so that you will be able to find the right session later on.


Previous

Next Page