Skip to content

TatraDev/pipertool

Repository files navigation

Piper logo

WebsiteDocsChat (Community & Support)Tutorials

Contributors License Docs Build status Contact Blog

Piper is an open-source platform for data science and machine learning prototyping. Concentrate only on your goals. Key features:

  1. Simple python contexts experience. Helps to create and deploy pipelines. Does not depend on any proprietary online services.
  2. Connect each module into a pipeline. Run it via docker or virtual environment. Then build whole infrastructure by using venv, Docker or Cloud.
  3. Decreases routine and repetitive tasks. Speed up process from idea to production.
  4. Well-tested and reproducible. Easily extendable by your own Executor.

Piper aims to help data-scientists and machine-learning developers to create and build full infrastructure for their projects.

Contents

How Piper works =============

Quick start

Quick start pipertool package compose env ===========

In root directory project run command in terminal

  • sudo -u root /bin/bash
  • create and activate venv
  • pip install -r requirements.txt
  • in configuration.py rename for correctly path for new directory
  • python setup.py install
  • piper --env-type compose start
  • 0.0.0.0:7585 - FastApi
  • 0.0.0.0:9001 - Milvus Console (minioadmin/minioadmin)
  • piper --env-type compose stop
  • pip uninstall piper

Quick start pipertool package compose env ===========

In root directory project run command in terminal

  • sudo -u root /bin/bash
  • create and activate venv
  • pip install -r requirements.txt
  • in configuration.py rename for correctly path for new directory
  • python main.py
  • await click CTRL+C from compose env

Installation

pip (PyPI)

Comparison to related technologies

  1. Jupyter - is the de facto experimental environment for most data scientists. However, it is desirable to write experimental code.
  2. Data Engineering tools such as AirFlow or Luigi - These are very popular ML pipeline build tools. Airflow can be connected to a kubernetes cluster or collect tasks through a simple PythonOperator. The downside is that their functionality is generally limited on this, that is, they do not provide ML modules out of the box. Moreover, all developments will still have to be wrapped in a scheduler and this is not always a trivial task. However, we like them and we use Airflow and Luigi as possible context for executors.
  3. Azure ML / Amazon SageMaker / Google Cloud - Cloud platforms really allow you to assemble an entire system from ready-made modules and put it into operation relatively quickly. Of the minuses: high cost, binding to a specific cloud, as well as small customization for specific business needs. For a large business, this is the most logical option - to build an ML infrastructure in the cloud. We also maintain cloud options as posible ways for the deployment step.
  4. DataRobot/Baseten - They offer an interesting, but small set of ready-made modules. However, in Baseten, all integration is implied in the kubernetes cluster. This is not always convenient and necessary for Proof-of-Concept. Piper also provides an open-source framework in which you can build a truly customized pipeline from many modules. Basically, such companies either do not provide an open-source framework, or provide a very truncated set of modules for experiments, which limits the freedom, functionality, and applicability of these platforms. This is partly similar to the hub of models and datasets in huggingface.
  5. Mlflow / DVC - There are also many excellent projects on the market for tracking experiments, serving and storing machine learning models. But they are increasingly utilitarian and do not directly help in the task of accelerating the construction of a machine learning MVP project. We plan to add integrations to Piper with the most popular frameworks for the needs of DS and ML specialists.

Contributing

Contributions are welcome! Please see our Contributing Guide for more details. Thanks to all our contributors!

Contributors

Mailing List

Copyright

This project is distributed under the Apache license version 2.0 (see the LICENSE file in the project root).

By submitting a pull request to this project, you agree to license your contribution under the Apache license version 2.0 to this project.

About

Platform for data science and machine learning prototyping. Developed by Tatradev.com

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published