Synthesizer is a lightweight cron-based task scheduler. It is designed to be completely decoupled from the code it is executing.
It aims to have as many of the (useful) features of other task orchestration frameworks (Airflow, DAGster, Prefect) without tying you to a specific language (Python) and encouraging more decoupled execution styles.
As the development of the MVP is still in-progress, the following is a "final vision" of what it is expected to look like. This is a pseudo design document guiding the development of the project
For better separation of concerns, the server
and cli
are developed as discrete crates, but with intertwined dependencies.
Syntherizer has a lightweight server with three distinct logical components:
- A REST API - This is used for registering new pipelines, listing pipelines, etc.
- A Scheduler - This component will continually check the current time against the scheduled next time of each pipeline. When a pipeline is "ready" for execution, it is marked as
ready
in the database. - An Executor - This is what actually allows the server to execute the scheduled tasks. This would be a pool of worker threads (or a single async thread polling remote execution?). It polls the database for tasks that are
ready
and then executes them accordingly.
To facilitate a better user-experience, a lightweight CLI wrapper around the API server will be created. It is expected that the CLI will be talking to a remote instance and therefore is configurable.
There will be a relatively simple HTMX front-end to make it easier to interact with the server. However, this does not mean that the CLI is a second-class citizen and should also have a great user experience for interacting with the server.
- A user would define a
pipeline
in YAML format, providing the required fields. - They would then use the CLI to
register
the pipeline(s) with the server. Something likesyn register
. It is considered best practice to keep all pipelines in a file in version control to make it easier to keep track of what pipelines are registered. - The server does an
upsert
with the registered pipelines - The scheduler does its thing, eventually marking the pipelines as
ready
- The executor picks up the new pipeline.
- The pipeline's tasks are then run either via local worker threads or remotely. Ideally, the remote execution mode would be "infinitely scalable" by leveraging async/await to poll for remote work completion. This would remove the need to have a thread-per-task execution model.
This section is for tips on development workflow
- To add a migration, run
sqlx migrate add -r "<name>"