Flyte

Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale

Home Page · Quick Start · Documentation · Features · Community & Resources · Changelogs · Components

💥 Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses a workflow as a core concept and a task (a single unit of execution) as a top level concept. Multiple tasks arranged in a data producer-consumer order create a workflow.

Workflows and Tasks can be written in any language, with out of the box support for Python, Java and Scala.

⏳ Five Reasons to Use Flyte

Kubernetes-Native Workflow Automation Platform
Ergonomic SDK's in Python, Java & Scala
Versioned & Auditable
Reproducible Pipelines
Strong Data Typing

🚀 Quick Start

With docker installed, run the following command:

  docker run --rm --privileged -p 30081:30081 -p 30084:30084 cr.flyte.org/flyteorg/flyte-sandbox

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here's a quick visual tour of the console.

To dig deeper into Flyte, refer to the Documentation.

⭐️ Current Deployments

🔥 Features

Used at Scale in production by 500+ users at Lyft with more than 1 million executions and 40+ million container executions per month
A data aware platform
Enables collaboration across your organization by:
- Executing distributed data pipelines/workflows
- Reusing tasks across projects, users, and workflows
- Making it easy to stitch together workflows from different teams and domain experts
- Backtracing to a specified workflow
- Comparing results of training workflows over time and across pipelines
- Sharing workflows and tasks across your teams
- Simplifying the complexity of multi-step, multi-owner workflows
Quick registration -- start locally and scale to the cloud instantly
Centralized Inventory constituting Tasks, Workflows and Executions
gRPC / REST interface to define and execute tasks and workflows
Type safe construction of pipelines -- each task has an interface which is characterized by its input and output, so illegal construction of pipelines fails during declaration rather than at runtime
Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
Memoization and Lineage tracking
Provides logging and observability
Workflow features:
- Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
- Parallel step execution
- Extensible backend to add customized plugin experience (with simplified user experience)
- Branching
- Inline subworkflows (a workflow can be embeded within one node of the top level workflow)
- Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
- Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
- Dynamic workflow creation and execution with runtime type safety
- Container side plugins with first class support in Python
- PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)
Guaranteed reproducibility of pipelines via:
- Versioned data, code and models
- Automatically tracked executions
- Declarative pipelines
Multi cloud support (AWS, GCP and others)
Extensible core, modularized, and deep observability
No single point of failure and is resilient by design
Automated notifications to Slack, Email, and Pagerduty
Multi K8s cluster support
Out of the box support to run Spark jobs on K8s, Hive queries, etc.
Snappy Console
Python CLI and Golang CLI (flytectl)
Written in Golang and optimized for large running jobs' performance
Grafana templates (user/system observability)

In Progress

Helm chart for Flyte (coming soon - June)
Flink-K8s (coming soon - June)
One click deploy to AWS
Reactive pipelines & Events

🔌 Available Plugins

Containers
K8s Pods
AWS Batch Arrays
K8s Pod Arrays
K8s Spark (native Pyspark and Java/Scala)
AWS Athena
Qubole Hive
Presto Queries
Distributed Pytorch (K8s Native) -- Pytorch Operator
Sagemaker (builtin algorithms & custom models)
Distributed Tensorflow (K8s Native) -- TFOperator
Papermill notebook execution (Python and Spark)
Type safe and data checking for Pandas dataframe using Pandera
Versioned datastores using DoltHub and Dolt
Use SQLAlchemy to query any relational database
Build your own plugins that use library containers

📦 Component Repos

Repo	Language	Purpose	Status
flyte	Kustomize,RST	deployment, documentation, issues	Production-grade
flyteidl	Protobuf	interface definitions	Production-grade
flytepropeller	Go	execution engine	Production-grade
flyteadmin	Go	control plane	Production-grade
flytekit	Python	python SDK and tools	Production-grade
flyteconsole	Typescript	admin console	Production-grade
datacatalog	Go	manage input & output artifacts	Production-grade
flyteplugins	Go	flyte plugins	Production-grade
flytestdlib	Go	standard library	Production-grade
flytesnacks	Python	examples, tips, and tricks	Incubating
flytekit-java	Java/Scala	Java & scala SDK for authoring Flyte workflows	Incubating
flytectl	Go	A standalone Flyte CLI	Incomplete

🔩 Production K8s Operators

Repo	Language	Purpose
Spark	Go	Apache Spark batch
Flink	Go	Apache Flink streaming

🤝 Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Biweekly Community Sync

📣 Flyte OSS Community Sync happens every other Tuesday, 9am-10am PDT (Checkout the events calendar). Here's the zoom link.
Meeting notes and backlog of topics are captured in doc.
If you'd like to revisit any community sync meeting that has happened, you can access the video recordings.

Conference Talks

Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
OSS + ELC NA 2020 splash
Datacouncil video | splash
FB AI@Scale Making MLOps & DataOps a reality
GAIC 2020

Blog Posts

Flyte blog site

Podcasts

TWIML&AI - Scalable and Maintainable ML Workflows at Lyft - Flyte
Software Engineering Daily - Flyte: Lyft Data Processing Platform
MLOps Coffee session - Flyte: an open-source tool for scalable, extensible, and portable workflows

💖 Top Contributors

A big thank you to the community for making Flyte possible!

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
.github		.github
CHANGELOG		CHANGELOG
boilerplate		boilerplate
deployment		deployment
docker/sandbox		docker/sandbox
eks		eks
end2end		end2end
helm		helm
kustomize		kustomize
opta		opta
rsts		rsts
script		script
stats		stats
.gitattributes		.gitattributes
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
.readthedocs.yml		.readthedocs.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
doc-requirements.in		doc-requirements.in
doc-requirements.txt		doc-requirements.txt
requirements.in		requirements.in
requirements.txt		requirements.txt

License

ajsalow/flyte

Folders and files

Latest commit

History

Repository files navigation