-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An orchestrator for ease of workflow management #47
Comments
Hi @s-minoo Great idee! Because of the different things that are described here I think that this better described as a scenario and that separate/smaller challenges are extracted from this scenario. |
Should I then split this into 3 separate challenges?
|
Yes that would indeed be a good start! We can always refine, adjust, add more challenges as work is done. |
Will need to be applied to a use case, so the task can be finished. |
@s-minoo Did you have the chance to look into making the necessary changes? |
That's okay! We can update the description and/or close this one then. |
@s-minoo Can you either close this one or update its description? Thanks! |
Edited and I'll close this too! |
This challenge has been split into 3 separate challenges: #50 #51 #52
Pitch
Undoubtedly, data will flow from pod to pod in the Solid ecosystem. Applications can create ad-hoc solutions to fetch and transfer data from one pod to another, however, interoperable orchestration of those data flows increases scalability of the solution. Think for example of a workflow that extracts Strava data from the Strava API, maps it to RML using the RMLStreamer as an LDES data stream, and then bucketizes that stream to, for example, create aggregated statistics of how many runs you did last week, how much kilometers, etc. etc. Without an orchestration component, this flow will need to be re-implemented for different use cases, again and again. An implementation-independent interoperable solution is needed.
Existing frameworks for workflow management, such as NiFi, Oozie, Airflow, and Dagster restricts the users within the context of the frameworks, be it in terms of programming language, limited API extensibility or fixed orchestration mechanism. On the other hand, DSL based workflow management tools such as Toil and Snakemake are limited in the tasks that they support which includes only BASH scripts.
Nextflow solved the aforementioned problems of the workflow management systems, however, it only supports file-based channels for data transfer. It cannot set up a workflow with processors using arbitrary channels such as Kafka for data transfer.
The aforementioned tools also suffer from the lack of semi-automatic generation of a workflow plan and
require the user to explicitly define the workflow plan.
Therefore, a generic and modular orchestrator to manage not only workflow but also the orchestration of different micro-services/app will be beneficial in the context of Solid, for example, setting up and orchestration of the different components needed for LDES generation. Furthermore, this would enable a strong foundation to a more modular data processing workflow architecture without reliance on existing tech stack on data processing.
Desired solution
Acceptance criteria
Precondition
Configuration files for your processors
Configuration files for the channels used
Demonstrator
In the context of workflow setups, developers need to connect different individual components with each other to compose the workflow. For example, in the to generate LDES data from existing heterogeneous data sources, a typical workflow could look something like this:
The developer runs the orchestrator with the provided config files for processors and channels to generate a workflow plan. The workflow plan could then be executed by the orchestrator, or tuned manually if desired before executing it with the orchestrator.
The orchestrator could also start the necessary services such as Kafka brokers and also gracefully stop the running processors in the workflow.
Pointers
Scenarios
The text was updated successfully, but these errors were encountered: