Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An orchestrator for ease of workflow management #47

Closed
s-minoo opened this issue Jun 16, 2022 · 9 comments
Closed

An orchestrator for ease of workflow management #47

s-minoo opened this issue Jun 16, 2022 · 9 comments
Assignees
Labels
challenge technical problem applied to a use case proposal: changes needed 👷

Comments

@s-minoo
Copy link

s-minoo commented Jun 16, 2022

This challenge has been split into 3 separate challenges: #50 #51 #52

Pitch

Undoubtedly, data will flow from pod to pod in the Solid ecosystem. Applications can create ad-hoc solutions to fetch and transfer data from one pod to another, however, interoperable orchestration of those data flows increases scalability of the solution. Think for example of a workflow that extracts Strava data from the Strava API, maps it to RML using the RMLStreamer as an LDES data stream, and then bucketizes that stream to, for example, create aggregated statistics of how many runs you did last week, how much kilometers, etc. etc. Without an orchestration component, this flow will need to be re-implemented for different use cases, again and again. An implementation-independent interoperable solution is needed.

Existing frameworks for workflow management, such as NiFi, Oozie, Airflow, and Dagster restricts the users within the context of the frameworks, be it in terms of programming language, limited API extensibility or fixed orchestration mechanism. On the other hand, DSL based workflow management tools such as Toil and Snakemake are limited in the tasks that they support which includes only BASH scripts.

Nextflow solved the aforementioned problems of the workflow management systems, however, it only supports file-based channels for data transfer. It cannot set up a workflow with processors using arbitrary channels such as Kafka for data transfer.

The aforementioned tools also suffer from the lack of semi-automatic generation of a workflow plan and
require the user to explicitly define the workflow plan.
Therefore, a generic and modular orchestrator to manage not only workflow but also the orchestration of different micro-services/app will be beneficial in the context of Solid, for example, setting up and orchestration of the different components needed for LDES generation. Furthermore, this would enable a strong foundation to a more modular data processing workflow architecture without reliance on existing tech stack on data processing.

Desired solution

  • A model to describe the processors, channels and data types
    • Could be an extension of RML's Logical Source/Target
  • Simple configuration to setup the processors/apps/channels in the workflow
  • Semi-automatic generation of the workflow plan
  • Distribution of the workflow plan with any orchestration frameworks (e.g. Kubernetes)

Acceptance criteria

  • A CLI tool to generate a plan, run the generated workflow plan, and stop the workflow gracefully
  • An actual usage of the orchestrator in the setup of LDES servers
  • A working stream processing workflow (e.g Kafka -> JScript processing -> RMLStreamer -> LDES) started with the orchestrator.

Precondition

  • Configuration files for your processors

    • Start script: how to start the processor
    • Stop script (optional): how to stop the processor
    • Channels supported: list of channels supported by the processor
    • Serialization type: data formats supported by the processor
  • Configuration files for the channels used

    • Start script: how to start the **services **which enables the channels (e.g Kafka brokers for Kafka channels)
    • Stop script: how to stop the services
    • Optional args: configurations of the channel used to connect by the processors

Demonstrator

In the context of workflow setups, developers need to connect different individual components with each other to compose the workflow. For example, in the to generate LDES data from existing heterogeneous data sources, a typical workflow could look something like this:

  1. Data fetching from data source
  2. Mapping fetched data to RDF quads
  3. Feeding RDF quads to a LDES server

The developer runs the orchestrator with the provided config files for processors and channels to generate a workflow plan. The workflow plan could then be executed by the orchestrator, or tuned manually if desired before executing it with the orchestrator.

The orchestrator could also start the necessary services such as Kafka brokers and also gracefully stop the running processors in the workflow.

Pointers

Scenarios

@s-minoo s-minoo added the challenge technical problem applied to a use case label Jun 16, 2022
@pheyvaer
Copy link
Contributor

Hi @s-minoo

Great idee! Because of the different things that are described here I think that this better described as a scenario and that separate/smaller challenges are extracted from this scenario.

@s-minoo
Copy link
Author

s-minoo commented Jun 17, 2022

Should I then split this into 3 separate challenges?

  1. A spec/ontology to describe the workflows used by the orchestrator
  2. A spec for the configuration of the processors using the ontology in step 1
  3. A CLI orchestrator tool that uses the configurations to execute the pipeline

@pheyvaer
Copy link
Contributor

Yes that would indeed be a good start! We can always refine, adjust, add more challenges as work is done.

@RubenVerborgh
Copy link
Contributor

Will need to be applied to a use case, so the task can be finished.

@pheyvaer
Copy link
Contributor

pheyvaer commented Aug 2, 2022

@s-minoo Did you have the chance to look into making the necessary changes?

@s-minoo
Copy link
Author

s-minoo commented Aug 4, 2022

This challenge has been split up into 3 smaller challenges #50 #51 #52.
Is it okay if I just refer to them?

@RubenVerborgh
Copy link
Contributor

That's okay! We can update the description and/or close this one then.

@pheyvaer
Copy link
Contributor

@s-minoo Can you either close this one or update its description? Thanks!

@s-minoo
Copy link
Author

s-minoo commented Sep 14, 2022

Edited and I'll close this too!

@s-minoo s-minoo closed this as completed Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case proposal: changes needed 👷
Projects
None yet
Development

No branches or pull requests

3 participants