Skip to content

Commit

Permalink
docs: revamp the README (#8052)
Browse files Browse the repository at this point in the history
* docs: revamp the README

* +erin comments for style

Co-authored-by: Erin Cochran <erin.k.cochran@gmail.com>

Co-authored-by: Erin Cochran <erin.k.cochran@gmail.com>
  • Loading branch information
rexledesma and erinkcochran87 committed May 26, 2022
1 parent 0e0bf21 commit e557f0e
Showing 1 changed file with 59 additions and 136 deletions.
195 changes: 59 additions & 136 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<p align="center">
<img src="assets/dagster-logo.png" />
<a href="https://dagster.io/"><img src="assets/dagster-logo.png"></a>
<br /><br />
<a href="https://twitter.com/dagsterio"><img src="https://img.shields.io/twitter/follow/dagsterio"></a>
<a href="https://dagster.io/slack"><img src="https://dagster-slackin.herokuapp.com/badge.svg"></a>
Expand All @@ -8,154 +8,77 @@

# [Dagster](https://dagster.io/) &middot; [![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/dagster-io/dagster/blob/master/LICENSE) [![PyPI Version](https://badge.fury.io/py/dagster.svg)](https://pypi.org/project/dagster/) [![Coveralls coverage](https://coveralls.io/repos/github/dagster-io/dagster/badge.svg?branch=master)](https://coveralls.io/github/dagster-io/dagster?branch=master)

An orchestration platform for the development, production, and observation of data assets.
Dagster is an orchestration platform for the development, production, and observation of data assets.

- **Develop and test locally, then deploy anywhere:** With Dagster, the same computations can run
in-process against your local file system or on a distributed work queue against your production
data lake. Choose to locally develop on your laptop, deploy on-premise, or run in any
cloud.
- **Model the data produced and consumed:** In your orchestration graph, Dagster models data
dependencies and handles how data passes between steps. Gradual typing on inputs and outputs catches
bugs early.
- **Link data to computations:** Dagster’s Asset Catalog tracks the data sets and ML models produced
by your jobs. Understand how they were generated and trace issues when asset declarations
do not match their materializations in storage.
- **Build a self-service data platform:** Dagster helps platform teams build systems for data
practitioners. Jobs are built from shared, reusable, configurable data processing components.
Dagit, Dagster’s web interface, lets anyone inspect these objects and discover how to use them.
- **Declare and isolate dependencies:** Dagster’s server model enables you to isolate codebases. Problems
in one job will not bring down the system or other jobs. Each job can have its own package
dependencies and Python version.
- **Debug jobs from a rich interface**: Dagit includes expansive facilities for understanding
the jobs it orchestrates. When inspecting a run of your job, you can query over logs, discover the
most time-consuming tasks via a Gantt chart, re-execute subsets of steps, and more.

## Installation

Dagster is available on PyPI and officially supports Python 3.6+.

Dagster lets you define jobs in terms of the data flow between reusable, logical components, then test locally and run anywhere. With a unified view of jobs and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke.

Dagster is designed for data platform engineers, data engineers, and full-stack data scientists. Building a data platform with Dagster makes your stakeholders more independent and your systems more robust. Developing data pipelines with Dagster makes testing easier and deploying faster.

### Develop and test locally, then deploy anywhere

With Dagster’s pluggable execution, the same computations can run in-process against your local file system, or on a distributed work queue against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, deploy it on-premise, or in any cloud.

### Model and type the data produced and consumed by each step

Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.

### Link data to computations

Dagster’s Asset Manager tracks the data sets and ML models produced by your jobs, so you can understand how they were generated and trace issues when they don’t look how you expect.

### Build a self-service data platform

Dagster helps platform teams build systems for data practitioners. Jobs are built from shared, reusable, configurable data processing and infrastructure components. Dagit, Dagster’s web interface, lets anyone inspect these objects and discover how to use them.

### Avoid dependency nightmares

Dagster’s repository model lets you isolate codebases so that problems in one job don’t bring down the rest. Each job can have its own package dependencies and Python version. Jobs are run in isolated processes so user code issues can't bring the system down.

### Debug pipelines from a rich UI
```bash
$ pip install dagster dagit
```

Dagit, Dagster’s web interface, includes expansive facilities for understanding the jobs it orchestrates. When inspecting a run of your job, you can query over logs, discover the most time consuming tasks via a Gantt chart, re-execute subsets of steps, and more.
This installs two modules:

## Getting Started
- **Dagster**: The core programming model.
- **Dagit**: The web interface for developing and operating Dagster jobs. It includes a DAG browser,
a type-aware interface to launch runs, a live view for in-progress runs, a catalog to view your data
assets, and more.

### Installation
For a quick overview, check out our [Getting Started](https://docs.dagster.io/getting-started) page.

Dagster is available on PyPI, and officially supports Python 3.6+.
## Documentation

```bash
$ pip install dagster dagit
```
You can find the Dagster documentation [on the website](https://docs.dagster.io).

This installs two modules:
We've divided up the documentation into several sections:

- **Dagster**: the core programming model and abstraction stack; stateless, single-node,
single-process and multi-process execution engines; and a CLI tool for driving those engines.
- **Dagit**: the UI for developing and operating Dagster pipelines, including a DAG browser, a
type-aware config editor, and a live execution interface.
- [🌱 Tutorial](https://docs.dagster.io/tutorial/)
- [💡 Concepts](https://docs.dagster.io/concepts)
- [🚢 Deployment](https://docs.dagster.io/deployment)
- [🤝 Integrations](https://docs.dagster.io/integrations)
- [📖 Guides](https://docs.dagster.io/guides)

### Learn
## Community

Next, jump right into our [tutorial](https://docs.dagster.io/tutorial/), read our [complete documentation](https://docs.dagster.io), or check out our [GitHub Discussions](https://github.com/dagster-io/dagster/discussions). If you're actively using Dagster or have questions on
getting started, we'd love to hear from you:
Connect with thousands of other data practitioners building with Dagster. Share knowledge, get help,
and contribute to the open-source project. To see featured material and upcoming events, check out
our [Dagster Community](https://dagster.io/community) page.

<br />
<p align="center">
<a href="https://dagster.io/slack"><img src="https://user-images.githubusercontent.com/609349/63558739-f60a7e00-c502-11e9-8434-c8a95b03ce62.png" width=160px; /></a>
</p>
Join our community here:

- 🌟 [Star us on Github](https://github.com/dagster-io/dagster)
- 🐦 [Follow us on Twitter](https://twitter.com/dagsterio)
- 📺 [Subscribe to our YouTube Channel](https://www.youtube.com/channel/UCfLnv9X8jyHTe6gJ4hVBo9Q)
- 📚 [Read our Blog Posts](https://dagster.io/blog)
- 👋 [Join us on Slack](https://dagster.io/slack)
- ✏️ [Start a Github Discussion](https://github.com/dagster-io/dagster/discussions)

## Contributing

For details on contributing or running the project for development, check out our [contributing
guide](https://docs.dagster.io/community/contributing/). <br />

## Integrations

Dagster works with the tools and systems that you're already using with your data, including:

<table>
<thead>
<tr style="background-color: #ddd" align="center">
<td colspan=2><b>Integration</b></td>
<td><b>Dagster Library</b></td>
</tr>
</thead>
<tbody>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57987547-a7e36b80-7a37-11e9-95ae-4c4de2618e87.png"></td>
<td style="border-left: 0px"> <b>Apache Airflow</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-airflow" />dagster-airflow</a><br />Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as <a href="https://github.com/apache/airflow">Apache Airflow DAGs</a>.</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57987976-5ccc5700-7a3d-11e9-9fa5-1a51299b1ccb.png"></td>
<td style="border-left: 0px"> <b>Apache Spark</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-spark" />dagster-spark</a> &middot; <a href="https://docs.dagster.io/_apidocs/libraries/dagster-pyspark" />dagster-pyspark</a>
<br />Libraries for interacting with Apache Spark and PySpark.
</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/58348728-48f66b80-7e16-11e9-9e9f-1a0fea9a49b4.png"></td>
<td style="border-left: 0px"> <b>Dask</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-dask" />dagster-dask</a>
<br />Provides a Dagster integration with Dask / Dask.Distributed.
</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/58349731-f36f8e00-7e18-11e9-8a2e-86e086caab66.png"></td>
<td style="border-left: 0px"> <b>Datadog</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-datadog" />dagster-datadog</a>
<br />Provides a Dagster resource for publishing metrics to Datadog.
</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57987809-bf245800-7a3b-11e9-8905-494ed99d0852.png" />
&nbsp;/&nbsp; <img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57987827-fa268b80-7a3b-11e9-8a18-b675d76c19aa.png">
</td>
<td style="border-left: 0px"> <b>Jupyter / Papermill</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagstermill" />dagstermill</a><br />Built on the <a href="https://github.com/nteract/papermill">papermill library</a>, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines.</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57988016-f431aa00-7a3d-11e9-8cb6-1309d4246b27.png"></td>
<td style="border-left: 0px"> <b>PagerDuty</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-pagerduty" />dagster-pagerduty</a>
<br />A library for creating PagerDuty alerts from Dagster workflows.
</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/58349397-fcac2b00-7e17-11e9-900c-9ab8cf7cb64a.png"></td>
<td style="border-left: 0px"> <b>Snowflake</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-snowflake" />dagster-snowflake</a>
<br />A library for interacting with the Snowflake Data Warehouse.
</td>
</tr>
<tr style="background-color: #ddd">
<td colspan=2 align="center"><b>Cloud Providers</b></td>
<td><b></b></td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57987557-c2b5e000-7a37-11e9-9310-c274481a4682.png"> </td>
<td style="border-left: 0px"><b>AWS</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-aws" />dagster-aws</a>
<br />A library for interacting with Amazon Web Services. Provides integrations with Cloudwatch, S3, EMR, and Redshift.
</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/84176312-0bbb4680-aa36-11ea-9580-a70758b12161.png"> </td>
<td style="border-left: 0px"><b>Azure</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-azure" />dagster-azure</a>
<br />A library for interacting with Microsoft Azure.
</td>
</tr>
<tr>
<td align="center" style="border-right: 0px"><img style="vertical-align:middle" src="https://user-images.githubusercontent.com/609349/57987566-f98bf600-7a37-11e9-81fa-b8ca1ea6cc1e.png"> </td>
<td style="border-left: 0px"><b>GCP</b></td>
<td><a href="https://docs.dagster.io/_apidocs/libraries/dagster-gcp" />dagster-gcp</a>
<br />A library for interacting with Google Cloud Platform. Provides integrations with GCS, BigQuery, and Cloud Dataproc.
</td>
</tr>
</tbody>
</table>

This list is growing as we are actively building more integrations, and we welcome contributions!
guide](https://docs.dagster.io/community/contributing/).

## License

Dagster is [Apache 2.0 licensed](https://github.com/dagster-io/dagster/blob/master/LICENSE).

0 comments on commit e557f0e

Please sign in to comment.