Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Increasing need for insights from vast data sources has given rise to data-driven business intelligence products which build and execute complex data workflows.
A data workflow is a set of inter-dependent data-driven tasks. Simple solutions use
cron based approach which works well for simple workflows with few or no task dependencies. However,
cron fails if there are complex dependencies between tasks.
At Cognitree, we build and execute complex data workflows for our customers to gather data insights. We built an effective scheduling tool Kronos for our data pipelines which adds features on top of
What is Kronos
Kronos is a Java based replacement for
cron to build, run and monitor complex data pipelines with flexible deployment options including embed mode. It handles dependency resolution, workflow management, failures. Kronos is built on top of Quartz and uses DAG (Directed Acyclic Graph) to manage the tasks within a workflow.
Examples of data pipelines include batch jobs, chaining multiple tasks, machine learning jobs etc.
The architecture is flexible and extensible with each component of the Kronos designed to be pluggable.
- Dependency Management: Define/manage dependency among tasks in a workflow.
- Dynamic: Define/modify workflow at runtime.
- Extensible: Define custom task handlers and persistence store.
- Fault Tolerant: Handle system/process faults.
- Flexible deployment model: Embed as a library or deploy in standalone or distributed mode.
What next? Head on to overview section to understand more about Kronos.