Skip to content

Tangle is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.

License

Notifications You must be signed in to change notification settings

TangleML/tangle

Repository files navigation

Tangle

The Tangle system helps users create and run ML experiments and production pipelines. Any batch workflow that has beginning and end can be orchestrated via a pipeline.

Install the app

image

Installation

Try on local machine

  1. Install Docker and uv.
  2. Download the app code (needs to be done once):
git clone https://github.com/TangleML/tangle.git tangle/backend --branch stable
git clone https://github.com/TangleML/tangle-ui.git tangle/frontend_build --branch gh_pages_stable --single-branch --depth 1
  1. Start the app:

Linux and Mac OS:

cd tangle && backend/start_local.sh

Windows:

cd tangle && backend\start_local.cmd
  1. Once the "start_local: Starting the orchestrator" message appears in the terminal, open the http://localhost:8000 URL in a Web browser and start use the app. Click the "New Pipeline" button at the top to start building a new pipeline.

Try in Google Cloud Shell (free)

Google Cloud Shell is free (50 hours per week) and needs a Google Cloud account.

  1. Open Google Cloud Shell in a Web browser
  2. Download the app code (needs to be done once):
git clone https://github.com/TangleML/tangle.git tangle/backend --branch stable
git clone https://github.com/TangleML/tangle-ui.git tangle/frontend_build --branch gh_pages_stable --single-branch --depth 1
  1. Start the app:
cd tangle && backend/start_local.sh
  1. Once the "start_local: Starting the orchestrator", "View app at" messages appears in the terminal, open the https://shell.cloud.google.com/devshell/proxy?port=8000 URL in another browser tab and start using the app.

Concepts

A pipeline system like Tangle orchestrates containerized command-line programs. When pipeline system runs a pipeline, it executes an interconnected graph of containerized programs locally or remotely (e.g. in cloud), and facilitates the transfer of data between them.

A pipeline is a graph of interconnected component tasks.

A component describes a certain command-line program inside a container. Component specification describes its signature (inputs, outputs), metadata (name, description and annotations) and implementation which specifies which container image to use, which program to start and and how to connect the inputs and outputs to the program's command-line arguments. Components can be written in any language. All Cloud Pipelines projects including Tangle supports arbitrary containers and arbitrary programs.

A task describes an instance of a component and specifies the input arguments for the component's inputs. Tasks are connected together into a graph by linking some upstream task outputs to some downstream task inputs.

The resulting graph of interconnected tasks is called a pipeline. A pipeline can be submitted for execution. During the pipeline execution, the pipeline's tasks are executed (in parallel, if possible) and produce output artifacts that are passed to downstream tasks.

Design

This backend consists of the API Server and the Orchestrator.

API Server

The API Server receives API requests and accesses the database to fulfill them. The API documentation can be accessed at http://localhost:8000/docs.

Orchestrator

The Orchestrator works independently from the API Server. It launches container executions and facilitates data passing between executions. The Orchestrator and the API Server communicate via the database. The Orchestrator launches container tasks using a specified Launcher, communicating with it via abstract interface. Such flexibility helps support different container execution systems and cloud providers.

Database

The backend uses SqlAlchemy to abstract the database access, so any database engine supported by SqlAlchemy can be used. We officially support the Sqlite and MySQL databases.

DB diagram

Launchers

Launchers launch container executions on a local or remote computer. Currently the following launchers are supported:

  • Local Docker using local storage
  • Local Kubernetes using local storage via HostPath volumes
  • Google Cloud Kubernetes Engine using Google Cloud Storage

More launchers may be added in the future.

Credits

This Tangle Pipelines backend is created by Alexey Volkov as part of the Cloud Pipelines project. It's derived from the Cloud Pipelines SDK orchestrator and uses parts of it under the hood.

About

Tangle is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages