Skip to content

Commit

Permalink
Add quickstart example for dbt & Dagster (#21240)
Browse files Browse the repository at this point in the history
## Summary & Motivation

This PR adds a quickstart example for dbt & Dagster. A user that wants
to skip the tutorial can copy and paste the code included in the example
and run `dagster dev`.

This example includes the new experimental class `DbtProject`. Another
PR will add more DbtProject examples in the dbt docs and notes that the
feature is still experimental.

## How I Tested These Changes

BK 
+
local
```
make mdx-full-format
make next-watch-build
```
  • Loading branch information
maximearmstrong committed May 17, 2024
1 parent e221897 commit 0d005f5
Show file tree
Hide file tree
Showing 5 changed files with 358 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -859,6 +859,10 @@
"title": "dbt",
"path": "/integrations/dbt",
"children": [
{
"title": "Quickstart",
"path": "/integrations/dbt/quickstart"
},
{
"title": "dbt & Dagster tutorial",
"path": "/integrations/dbt/using-dbt-with-dagster",
Expand Down
258 changes: 258 additions & 0 deletions docs/content/integrations/dbt/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
---
title: "Quickstart: Run your dbt project with Dagster"
description: Get started swiftly with this simple dbt & Dagster example.
---

# Quickstart: Run your dbt project with Dagster

<Note>
<strong>Note</strong>: This guide uses the <code>DbtProject</code> class,
which is an experimental feature. Visit the{" "}
<a href="/\_apidocs/libraries/dagster-dbt#dagster_dbt.DbtProject">
dagster-dbt library API reference
</a>{" "}
for more info.
</Note>

---

## Setup your environment

<Note>
<strong>Note</strong>: We strongly recommend installing Dagster inside a
Python virtualenv. Visit the{" "}
<a href="/getting-started/install">Dagster installation docs</a> for more
info.
</Note>

Install dbt, Dagster, and the Dagster webserver. Run the following to install everything using pip:

```shell
pip install dagster-dbt dagster-webserver
```

The `dagster-dbt` library installs both `dbt-core` and `dagster` as dependencies. Refer to the [dbt](https://docs.getdbt.com/dbt-cli/install/overview) and [Dagster](/getting-started/install) installation docs for more info.

---

## Load your dbt project in Dagster

<TabGroup>
<TabItem name="Option 1: Create a single Dagster file">

Running your dbt project with Dagster can be easily done after creating a single file. For this example, let's consider a basic use case - say you want to represent your dbt models as Dagster assets and run them daily at midnight.

With your text editor of choice, create a Python file in the same directory as your dbt project directory and add the following code. Note that since this file contains [all Dagster definitions required for your code location](/concepts/code-locations), it is recommended to name it `definitions.py`.

The following code assumes that your Python file and dbt project directory are adjacent in the same directory. If that's not the case, make sure to update the `RELATIVE_PATH_TO_MY_DBT_PROJECT` constant so that it points to your dbt project.

```python file=/integrations/dbt/quickstart/with_single_file.py startafter=start_example endbefore=end_example
from pathlib import Path

from dagster import AssetExecutionContext, Definitions
from dagster_dbt import (
DbtCliResource,
DbtProject,
build_schedule_from_dbt_selection,
dbt_assets,
)

RELATIVE_PATH_TO_MY_DBT_PROJECT = "./my_dbt_project"

my_project = DbtProject(
project_dir=Path(__file__)
.joinpath("..", RELATIVE_PATH_TO_MY_DBT_PROJECT)
.resolve(),
)


@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()


my_schedule = build_schedule_from_dbt_selection(
[my_dbt_assets],
job_name="materialize_dbt_models",
cron_schedule="0 0 * * *",
dbt_select="fqn:*",
)

defs = Definitions(
assets=[my_dbt_assets],
schedules=[my_schedule],
resources={
"dbt": DbtCliResource(project_dir=my_project),
},
)
```

</TabItem>
<TabItem name="Option 2: Create a new Dagster project">

You can create a Dagster project that wraps your dbt project by using the `dagster-dbt` command line interface.

To use the command, you'll need to provide two options:

- `--project-name`, the name of your Dagster project to be created, and
- `--dbt-project-dir`, the path to your dbt project.

In our example, our Dagster project is named `my_dagster_project` and the relative path to our dbt project is `./my_dbt_project`, meaning that we are in the directory where `my_dbt_project` is located.

```shell
dagster-dbt project scaffold --project-name my_dagster_project --dbt-project-dir ./my_dbt_project --use-experimental-dbt-project
```

This command creates a new directory called `my_dagster_project/` inside the current directory. The new `my_dagster_project/` directory will contain a set of files that define a Dagster project to load the dbt project provided in `./my_dbt_project`.

</TabItem>
<TabItem name="Option 3: Use an existing Dagster project">

You can use an existing Dagster project to run your dbt project. To do so, you'll need to add some objects using the [dagster-dbt library](/\_apidocs/libraries/dagster-dbt) and update your `Definitions` object. This example assumes that your existing Dagster project includes both `assets.py` and `definitions.py` files, among other required files.

```shell
my_dagster_project
├── __init__.py
├── assets.py
├── definitions.py
├── pyproject.toml
├── setup.cfg
└── setup.py
```

1. Change directories to the Dagster project directory:

```shell
cd my_dagster_project/
```

2. Create a Python file named `project.py` and add the following code. The DbtProject object is a representation of your dbt project that assist with managing `manifest.json` preparation.

```python file=/integrations/dbt/quickstart/with_project.py startafter=start_dbt_project_example endbefore=end_dbt_project_example
from pathlib import Path

from dagster_dbt import DbtProject

RELATIVE_PATH_TO_MY_DBT_PROJECT = "./my_dbt_project"

my_project = DbtProject(
project_dir=Path(__file__)
.joinpath("..", RELATIVE_PATH_TO_MY_DBT_PROJECT)
.resolve(),
)
```

3. In your `assets.py` file, add the following code. Using the `dbt_assets` decorator allows Dagster to create a definition for how to compute a set of dbt resources, described by a `manifest.json`.

```python file=/integrations/dbt/quickstart/with_project.py startafter=start_dbt_assets_example endbefore=end_dbt_assets_example
from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets

from .project import my_project

@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()
```

4. In your `definitions.py` file, update your Definitions object to include the newly created objects.

```python file=/integrations/dbt/quickstart/with_project.py startafter=start_dbt_definitions_example endbefore=end_dbt_definitions_example
from dagster import Definitions
from dagster_dbt import DbtCliResource

from .assets import my_dbt_assets
from .project import my_project

defs = Definitions(
...,
assets=[
...,
# Add the dbt assets alongside your other asset
my_dbt_assets,
],
resources={
...: ...,
# Add the dbt resource alongside your other resources
"dbt": DbtCliResource(project_dir=my_project),
},
)
```

With these changes, your existing Dagster project is ready to run your dbt project.

</TabItem>
</TabGroup>

---

## Run your dbt project in Dagster's UI

Now that your code is ready, you can run Dagster's UI to take a look at your dbt project.

<TabGroup>
<TabItem name="Option 1: From a Dagster file">

1. Locate the Dagster file containing your definitions. If you followed our example to create a single Dagster file in the [previous section](#load-your-dbt-project-in-dagster), your file should be named `definitions.py`.

2. To start Dagster's UI, run the following:

```shell
dagster dev -f definitions.py
```

Which will result in output similar to:

```shell
Serving dagster-webserver on http://127.0.0.1:3000 in process 70635
```

</TabItem>
<TabItem name="Option 2: From a Dagster project">

1. Change directories to the Dagster project directory:

```shell
cd my_dagster_project/
```

2. To start Dagster's UI, run the following:

```shell
dagster dev
```

Which will result in output similar to:

```shell
Serving dagster-webserver on http://127.0.0.1:3000 in process 70635
```

</TabItem>
</TabGroup>

In your browser, navigate to <http://127.0.0.1:3000>. The page will display the asset graph for the job created by the schedule definition:

<!--
![Asset graph for your job in Dagster's UI, containing dbt models loaded as Dagster assets](/images/integrations/dbt/quickstart/asset_graph.png)
-->

<Image
alt="Asset graph for your job in Dagster's UI, containing dbt models loaded as Dagster assets"
src="/images/integrations/dbt/quickstart/asset_graph.png"
width={2688}
height={1680}
/>

In Dagster, running a dbt model corresponds to _materializing_ an asset. Because a schedule definition has been added to your code location, your assets will be materialized at the next cron tick. The assets can also be materialized manually by click the **Materialize all** button near the top right corner of the page. This will launch a run to materialize the assets.

---

## What's next?

Congratulations on successfully running your dbt project in Dagster!

In this example, we created a single file to handle a simple use case and run your dbt project. To learn more about our dbt integrations and how to handle more complex use cases, consider the following options:

- To find out more about integrating dbt with Dagster, begin the official dbt scaffold tutorial, like the [dbt & Dagster project](/integrations/dbt/using-dbt-with-dagster).
- For an end-to-end example, from the project creation to the deployment to Dagster+, checkout out the Dagster & dbt course in [Dagster University](https://courses.dagster.io).
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# pyright: reportMissingImports=false
# ruff: isort: skip_file

# start_dbt_project_example
from pathlib import Path

from dagster_dbt import DbtProject

RELATIVE_PATH_TO_MY_DBT_PROJECT = "./my_dbt_project"

my_project = DbtProject(
project_dir=Path(__file__)
.joinpath("..", RELATIVE_PATH_TO_MY_DBT_PROJECT)
.resolve(),
)
# end_dbt_project_example


# fmt: off
# start_dbt_assets_example
from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets

from .project import my_project

@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()
# end_dbt_assets_example
# fmt: on


# start_dbt_definitions_example
from dagster import Definitions
from dagster_dbt import DbtCliResource

from .assets import my_dbt_assets
from .project import my_project

defs = Definitions(
...,
assets=[ # type: ignore
...,
# Add the dbt assets alongside your other asset
my_dbt_assets,
],
resources={
...: ...,
# Add the dbt resource alongside your other resources
"dbt": DbtCliResource(project_dir=my_project),
},
)
# end_dbt_definitions_example
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# pyright: reportMissingImports=false
# ruff: isort: skip_file

# start_example
from pathlib import Path

from dagster import AssetExecutionContext, Definitions
from dagster_dbt import (
DbtCliResource,
DbtProject,
build_schedule_from_dbt_selection,
dbt_assets,
)

RELATIVE_PATH_TO_MY_DBT_PROJECT = "./my_dbt_project"

my_project = DbtProject(
project_dir=Path(__file__)
.joinpath("..", RELATIVE_PATH_TO_MY_DBT_PROJECT)
.resolve(),
)


@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()


my_schedule = build_schedule_from_dbt_selection(
[my_dbt_assets],
job_name="materialize_dbt_models",
cron_schedule="0 0 * * *",
dbt_select="fqn:*",
)

defs = Definitions(
assets=[my_dbt_assets],
schedules=[my_schedule],
resources={
"dbt": DbtCliResource(project_dir=my_project),
},
)
# end_example

1 comment on commit 0d005f5

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs ready!

✅ Preview
https://dagster-docs-asowf2qas-elementl.vercel.app
https://master.dagster.dagster-docs.io

Built with commit 0d005f5.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.