Skip to content
Caio Cominato edited this page Oct 1, 2024 · 3 revisions

Harpy

Harpy is a parallel execution engine that is designed from the ground up to be simple to use and aims to be "type restrictive" (as much as possible).

Mainly the goal of harpy is to provide a simple engine to run your python functions in parallel and have clear boundaries between the any lambda / udf functions.

When we say "type restrictive" (as much as possible), we mean that we between the boundaries of each function definition, the user must define the input and output types and the engine will enforce that these match between functions that are connected.

What Harpy is not

Harpy is not a Spark or Dask replacement. It is not targeted at analytics or big data processing (although it could be used for this). It is designed to be a simple parallel execution engine for your python functions.

What Harpy is

Harpy is (at its current state) a concept and a WIP project. It is designed to be simple and provide a scalable way to have python enabled scalable processing without requiring a single node doing all the work.

Harpy can also be a choice for a simple ML pipeline where you need to do data processing or model training in paralle.

What is the motivation behind Harpy?

The motivation stems from the idea that a single node can actually do a lot of work. However, when you have a lot of work to do, threadpooling or multiprocessing can be a bit cumbersome to manage across multiple nodes.

Harpy aims to provide a simplistic way to manage this across multiple nodes. It draws inspiration from many processing engines. It also reflects the idea popularized by DuckDB that a single node can do a lot of work.

For this reason Harpy comes pre-packaged with DuckDB and Pandas as the default data processing engine. This provides an out of the box solution if your goal is to process data in a SQL like manner.

As of now, what can Harpy do?

As of now, not much. Harpy is still in its infancy and it is being developed as a side project. The goal is to have a working prototype with a simple UI so that people can play around with it.

Clone this wiki locally