Skip to content
Ankit Nanglia edited this page Sep 25, 2018 · 8 revisions

Increasing need for insights from vast data sources has given rise to data-driven business intelligence products which build and execute complex data workflows.

A data workflow is a set of inter-dependent data-driven tasks. Simple solutions use cron based approach which works well for simple workflows with few or no task dependencies. However, cron fails if there are complex dependencies between tasks.

At Cognitree, we build and execute complex data workflows for our customers to gather data insights. We built an effective scheduling tool Kronos for our data pipelines which adds features on top of cron.

What is Kronos

Kronos is a Java based replacement for cron to build, run and monitor complex data pipelines with flexible deployment options including embed mode. It handles dependency resolution, workflow management, failures. Kronos is built on top of Quartz and uses DAG (Directed Acyclic Graph) to manage the tasks within a workflow.

Examples of data pipelines include batch jobs, chaining multiple tasks, machine learning jobs etc.

Kronos can be compared with Oozie and Azkaban, which are targetted specifically for Hadoop workflows while Kronos is flexible and can run any workflow including big data pipelines.

The architecture is flexible and extensible with each component of the Kronos designed to be pluggable.

Why Kronos

  • Dependency Management: Define/manage dependency among tasks in a workflow.
  • Dynamic: Define/modify workflow at runtime.
  • Extensible: Define custom task handlers and persistence store.
  • Fault Tolerant: Handle system/process faults.
  • Flexible deployment model: Embed as a library or deploy in standalone or distributed mode.

What next? Head on to overview section to understand more about Kronos.