Skip to content

Commit

Permalink
[du-dbt] updating-post-2-27-dogfooding (#20116)
Browse files Browse the repository at this point in the history
## Summary & Motivation

Addressing the things that are checked off in the Linear ticket:
https://linear.app/dagster-labs/issue/DEV-147/227-post-dogfooding-action-items

Also snuck in some renames bc I felt Lesson 7 was too directed toward
Dagster Cloud when it can be applied to OSS deployments, too.

## How I Tested These Changes

---------

Co-authored-by: Erin Cochran <erin.k.cochran@gmail.com>
  • Loading branch information
tacastillo and erinkcochran87 committed Feb 28, 2024
1 parent 4333e0d commit 06961cd
Show file tree
Hide file tree
Showing 12 changed files with 110 additions and 80 deletions.
6 changes: 3 additions & 3 deletions docs/dagster-university/pages/dagster-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ title: Dagster + dbt
- [Creating an incremental model](/dagster-dbt/lesson-6/2-creating-a-simple-incremental-model)
- [Creating a partitioned dbt asset](/dagster-dbt/lesson-6/3-creating-a-partitioned-dbt-asset)
- [Lesson recap](/dagster-dbt/lesson-6/4-lesson-recap)
- Lesson 7: Deploying to Dagster Cloud
- Lesson 7: Deploying to Production
- [Overview](/dagster-dbt/lesson-7/1-overview)
- [Pushing the project to GitHub](/dagster-dbt/lesson-7/2-pushing-the-project-to-github)
- [Setting up Dagster Cloud](/dagster-dbt/lesson-7/3-setting-up-dagster-cloud)
- [Creating the manifest with GitHub Actions](/dagster-dbt/lesson-7/4-creating-the-manifest-with-github-actions)
- [Preparing for a sucessful run](/dagster-dbt/lesson-7/5-preparing-for-a-successful-run)
- [Creating the manifest during deployment](/dagster-dbt/lesson-7/4-creating-the-manifest-during-deployment)
- [Preparing for a successful run](/dagster-dbt/lesson-7/5-preparing-for-a-successful-run)
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ Even if you’ve already completed the Dagster Essentials course, you should sti
Run the following to clone the project:

```bash
git clone https://github.com/dagster-io/project-dagster-university -b module/dagster-and-dbt-starter
git clone https://github.com/dagster-io/project-dagster-university -b module/dagster-and-dbt-starter dagster-and-dbt
```
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ setup(
"dbt-duckdb",
"geopandas",
"kaleido",
"pandas",
"pandas[parquet]",
"plotly",
"shapely",
"smart_open[s3]",
Expand All @@ -48,7 +48,7 @@ setup(
Then, run the following in the command line to rename the `.env.example` file and install the dependencies:

```bash
cd project_dagster_university
cd dagster-and-dbt
cp .env.example .env
pip install -e ".[dev]"
```
Expand All @@ -59,7 +59,11 @@ To confirm everything works:

1. Run `dagster dev` from the directory.
2. Navigate to the Dagster UI ([`http://localhost:3000`](http://localhost:3000/)) in your browser.
3. Open the asset graph by clicking **Assets > View global asset lineage**.
3. Click **Materialize all** to materialize all the assets in the project. **For partitioned assets**, you can materialize just the most recent partition:
3. Open the asset graph by clicking **Assets > View global asset lineage** and confirm the asset graph you see matches the graph below.

![The Asset Graph in the Dagster UI](/images/dagster-dbt/lesson-2/asset-graph.png)
![The Asset Graph in the Dagster UI](/images/dagster-dbt/lesson-2/asset-graph.png)

4. Let's confirm that you can materialize these assets by:
1. Navigating to **Overview > Jobs**
2. Clicking on the `trip_update_job` job and then **Materialize all...**.
3. When prompted to select a partition, materialize the most recent one (`2023-03-01`). It will start a run/backfill and your assets should materialize successfully.
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,15 @@ We’ll only create one `@dbt_assets` definition for now, but in a later lesson,
from dagster import AssetExecutionContext
from dagster_dbt import dbt_assets, DbtCliResource

import os

from .constants import DBT_DIRECTORY
```

3. The `@dbt_assets` decorator requires a path to the project’s manifest file, which is within our `DBT_DIRECTORY`. Use that constant to create a path to the `manifest.json` by copying and pasting the code below:

```python
dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json")
dbt_manifest_path = os.path.join(DBT_DIRECTORY, "target", "manifest.json")
```

Similar to how we used `joinpath` earlier to point to the dbt project’s directory, we’re using it once again to reference `target/manifest.json` more precisely.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ lesson: '3'

Once all the work above has been done, you’re ready to see your dbt models represented as assets! Here’s how you can find your models:

1. Run `dagster dev` and navigate to the asset graph.
1. If you haven't yet, run `dagster dev` in your command line, and then navigate to the asset graph in the UI.
2. Expand the `default` group in the asset graph.
3. You should see your two dbt models, `stg_trips` and `stg_zones`, converted as assets within your Dagster project!

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,4 @@ lesson: '3'

4. Navigate to the details page for the run you just started, then look at the logs.

When finished, proceed to the next page.

{% callout %}
> **Important!** Before continuing, change `dbt_analytics` back to use `dbt run`.
{% /callout %}
When finished, proceed to the next page.
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,21 @@ lesson: '4'

By now, you’ve had to run `dbt parse` and reload your code location quite frequently, which doesn’t feel like the cleanest developer experience.

Before we move on, we’ll reduce the number of steps in the feedback loop by automating the `dbt parse` command. We’ll also take advantage of a few other aspects of the `DbtCliResource` that we wrote earlier.
Before we move on, we’ll reduce the number of steps in the feedback loop. We'll automate the `dbt parse` command by taking advantage of the `DbtCliResource` that we wrote earlier.

---

## Automating running dbt parse in development

The first feature is that resources don’t need to be part of an asset to be executed. This means that once a `dbt_resource` is defined, you can use it to execute commands when your code location is being built. Rather than manually running `dbt parse`, let’s use the `dbt_resource` to run the command for us.
The first detail is that resources don’t need to be part of an asset to be executed. This means that once a `dbt_resource` is defined, you can use it to execute commands when your code location is being built. Rather than manually running `dbt parse`, let’s use the `dbt_resource` to run the command for us.

In `dbt.py`, above the `dbt_manifest_path` declaration, add this snippet to run `dbt parse`:
In `dbt.py`, import the `dbt_resource` with:

```python
from ..resources import dbt_resource
```

Afterward, above your `dbt_manifest_path` declaration, add this snippet to run `dbt parse`:

```python
dbt_resource.cli(["--quiet", "parse"]).wait()
Expand Down Expand Up @@ -68,5 +74,5 @@ This is great, however, it might feel a bit greedy and intensive to be constantl
.target_path.joinpath("manifest.json")
)
else:
dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json")
dbt_manifest_path = os.path.join(DBT_DIRECTORY, "target", "manifest.json")
```
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ if os.getenv("DAGSTER_DBT_PARSE_PROJECT_ON_LOAD"):
.target_path.joinpath("manifest.json")
)
else:
dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json")
dbt_manifest_path = os.path.join(DBT_DIRECTORY, "target", "manifest.json")


@dbt_assets(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,18 +75,20 @@ Let's start by adding a new string constant to reference when building the new a
In the `assets/constants.py` file, add the following to the end of the file:

```python
AIRPORT_TRIPS_FILE_PATH = Path(__file__).joinpath("..", "..", "outputs", "airport_trips.png").resolve()
AIRPORT_TRIPS_FILE_PATH = get_path_for_env(os.path.join("data", "outputs", "airport_trips.png"))
```


This creates a path to where we want to save the chart. The `get_path_for_env` utilty function is not specific to Dagster, but rather is a utility function we've defined in this file to help with Lesson 7 (Deploying your Dagster and dbt project).

### Creating the airport_trips asset

Now we’re ready to create the asset!

1. Open the `assets/metrics.py` file.
2. At the end of the file, define a new asset called `airport_trips` with the context argument and the existing `DuckDBResource` named `database`:
2. At the end of the file, define a new asset called `airport_trips` with the the existing `DuckDBResource` named `database` and it will return a `MaterializeResult`, indicating that we'll be returning some metadata:

```python
def airport_trips(context, database: DuckDBResource):
def airport_trips(database: DuckDBResource) -> MaterializeResult:
```

3. Add the asset decorator to the `airport_trips` function and specify the `location_metrics` model as a dependency:
Expand All @@ -95,61 +97,61 @@ Now we’re ready to create the asset!
@asset(
deps=["location_metrics"],
)
def airport_trips(context, database: DuckDBResource):
def airport_trips(database: DuckDBResource) -> MaterializeResult:
```

**Note:** Because Dagster doesn’t discriminate and treats all dbt models as assets, you’ll add this dependency just like you would with any other asset.

4. Fill in the body of the function with the following code to follow a similar pattern to your project’s existing pipelines: query for the data, use a library to generate a chart, save the chart as a file, and embed the chart:

```python
@asset(
deps=["location_metrics"],
)
def airport_trips(context, database: DuckDBResource):
"""
A chart of where trips from the airport go
"""
query = """
select
zone,
destination_borough,
trips
from location_metrics
where from_airport
"""
with database.get_connection() as conn:
airport_trips = conn.execute(query).fetch_df()
fig = px.bar(
airport_trips,
x="zone",
y="trips",
color="destination_borough",
barmode="relative",
labels={
"zone": "Zone",
"trips": "Number of Trips",
"destination_borough": "Destination Borough"
},
)
pio.write_image(fig, constants.AIRPORT_TRIPS_FILE_PATH)
with open(constants.AIRPORT_TRIPS_FILE_PATH, 'rb') as file:
image_data = file.read()
# Convert the image data to base64
base64_data = base64.b64encode(image_data).decode('utf-8')
md_content = f"![Image](data:image/jpeg;base64,{base64_data})"
#TODO: Use `MaterializeResult` instead
context.add_output_metadata({
"preview": MetadataValue.md(md_content),
"data": MetadataValue.json(airport_trips.to_dict(orient="records"))
})
@asset(
deps=["location_metrics"],
)
def airport_trips(database: DuckDBResource) -> MaterializeResult:
"""
A chart of where trips from the airport go
"""

query = """
select
zone,
destination_borough,
trips
from location_metrics
where from_airport
"""

with database.get_connection() as conn:
airport_trips = conn.execute(query).fetch_df()

fig = px.bar(
airport_trips,
x="zone",
y="trips",
color="destination_borough",
barmode="relative",
labels={
"zone": "Zone",
"trips": "Number of Trips",
"destination_borough": "Destination Borough"
},
)

pio.write_image(fig, constants.AIRPORT_TRIPS_FILE_PATH)

with open(constants.AIRPORT_TRIPS_FILE_PATH, 'rb') as file:
image_data = file.read()

# Convert the image data to base64
base64_data = base64.b64encode(image_data).decode('utf-8')
md_content = f"![Image](data:image/jpeg;base64,{base64_data})"

return MaterializeResult(
metadata={
"preview": MetadataValue.md(md_content)
}
)
```

5. Reload your code location to see the new `airport_trips` asset within the `metrics` group. Notice how the asset graph links the dependency between the `location_metrics` dbt asset and the new `airport_trips` chart asset.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,25 @@ lesson: '5'

Override the `get_group_name` method in your `CustomizedDagsterDbtTranslator` to group each dbt model by their layer (`marts` and `staging`).

**Hint:** `dbt_resource_props`
**Hint:** `dbt_resource_props` is Python dictionary with a structure that contains something like this:

```json
{
"database": "data",
"schema": "main",
"name": "stg_trips",
"resource_type": "model",
"package_name": "analytics",
"path": "staging/stg_trips.sql",
"original_file_path": "models/staging/stg_trips.sql",
"unique_id": "model.analytics.stg_trips",
"fqn": ["analytics", "staging", "stg_trips"],
"alias": "stg_trips",
... #other properties
}
```

`get_group_name` expects to return a string to group the dbt models by. What property of `dbt_resource_props` can you access (and maybe even index!) to group the models by layer (ex. `marts` or `staging`)?

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ At this point, you have a fully integrated Dagster and dbt project! You’ve lea

In this lesson, we’ll deploy your Dagster+dbt project to have it running in both local and production environments. We’ll walk through some considerations involved in bundling your dbt project up with Dagster

You’ll learn how to prepare the project for deployment to Dagster Cloud, including pushing the project to GitHub and setting up CI/CD to factor in your dbt project. We’re using Dagster Cloud because it’s a standardized and controlled experience that we can walk you through, but all of the general patterns can be applied to however you deploy Dagster.
You’ll learn how to deploy your unified Dagster and dbt project to production, including pushing the project to GitHub and setting up CI/CD to factor in your dbt project. We’ll using Dagster Cloud because it’s a standardized and controlled experience that we can walk you through, but all of the general patterns can be applied to however you deploy Dagster.
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
---
title: "Lesson 7: Creating the manifest with GitHub Actions"
title: "Lesson 7: Creating the manifest during deployment"
module: 'dbt_dagster'
lesson: '7'
---

# Creating the manifest with GitHub Actions
# Creating the manifest during deployment

To recap, our deployment failed in the last section because Dagster couldn’t find a dbt manifest file, which it needs to turn dbt models into Dagster assets. This is because we built this file by running `dbt parse` during local development. You ran this manually in Lesson 3 and improved the experience in Lesson 4. However, Dagster Cloud’s out-of-the-box `deploy.yml` GitHub Action isn’t aware that you’re also trying to deploy a dbt project with Dagster.
To recap, our deployment failed in the last section because Dagster couldn’t find a dbt manifest file, which it needs to turn dbt models into Dagster assets. This is because we built this file by running `dbt parse` during local development. You ran this manually in Lesson 3 and improved the experience in Lesson 4. However, you'll also need to build your dbt manifest file during deployment. We recommend adopting CI/CD to automate this process.

In this case, Dagster Cloud’s out-of-the-box `deploy.yml` GitHub Action isn’t aware that you’re also trying to deploy a dbt project with Dagster.

To get our deployment working, we need to add a step to our GitHub Actions workflow that runs the dbt commands required to generate the `manifest.json`. Specifically, we need to run `dbt deps` and `dbt parse` in the dbt project, just like you did during local development.

Expand All @@ -15,7 +17,7 @@ To get our deployment working, we need to add a step to our GitHub Actions workf
3. Locate the `Checkout for Python Executable Deploy` step, which should be on or near line 38.
4. After this step, add the following:

```bash
```yaml
- name: Parse dbt project and package with Dagster project
if: steps.prerun.outputs.result == 'pex-deploy'
run: |
Expand Down

1 comment on commit 06961cd

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-qyuaoe2bz-elementl.vercel.app

Built with commit 06961cd.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.