Skip to content

Commit

Permalink
[du] full sending on type annotations (#21726)
Browse files Browse the repository at this point in the history
## Summary & Motivation

## How I Tested These Changes
  • Loading branch information
tacastillo committed May 13, 2024
1 parent 11011ab commit 7351009
Show file tree
Hide file tree
Showing 20 changed files with 51 additions and 35 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ In Lesson 9, you created the `adhoc_request` asset. During materialization, the
end_date: str

@asset
def adhoc_request(config: AdhocRequestConfig, taxi_zones, taxi_trips, database: DuckDBResource):
def adhoc_request(config: AdhocRequestConfig, taxi_zones, taxi_trips, database: DuckDBResource) -> None:
"""
The response to an request made in the `requests` directory.
See `requests/README.md` for more information.
Expand Down Expand Up @@ -148,7 +148,7 @@ class AdhocRequestConfig(Config):
end_date: str

@asset
def adhoc_request(config: AdhocRequestConfig, taxi_zones, taxi_trips, database: DuckDBResource):
def adhoc_request(config: AdhocRequestConfig, taxi_zones, taxi_trips, database: DuckDBResource) -> None:
"""
The response to an request made in the `requests` directory.
See `requests/README.md` for more information.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ from dagster import MaterializeResult
@asset(
group_name="raw_files",
)
def taxi_zones_file():
def taxi_zones_file() -> None:
"""
The raw CSV file for the taxi zones dataset. Sourced from the NYC Open Data portal.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Docstrings are defined by including a string, surrounded by triple quotes (`”
from dagster import asset

@asset
def taxi_zones_file():
def taxi_zones_file() -> None:
"""
The raw CSV file for the taxi zones dataset. Sourced from the NYC Open Data portal.
"""
Expand All @@ -49,7 +49,7 @@ from dagster import asset
@asset(
description="The raw CSV file for the taxi zones dataset. Sourced from the NYC Open Data portal."
)
def taxi_zones_file():
def taxi_zones_file() -> None:
"""
This will not show in the Dagster UI
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ from dagster import asset
@asset(
group_name="raw_files",
)
def taxi_zones_file():
def taxi_zones_file() -> None:
"""
The raw CSV file for the taxi zones dataset. Sourced from the NYC Open Data portal.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Let’s add metadata to the `taxi_trips_file` asset to demonstrate further. This
partitions_def=monthly_partition,
group_name="raw_files",
)
def taxi_trips_file(context):
def taxi_trips_file(context) -> None:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand Down Expand Up @@ -65,8 +65,22 @@ Let’s add metadata to the `taxi_trips_file` asset to demonstrate further. This
)
```

5. Then, since we're now returning something, let's update the return type of the asset to `MaterializeResult`:

```python
from dagster import asset, MaterializeResult

@asset(
partitions_def=monthly_partition,
group_name="raw_files",
)
def taxi_trips_file(context) -> MaterializeResult:
```


Let’s break down what’s happening here:

- Rather than returning nothing, we'll return some information about the materialization that happened with the `MaterializationResult` class.
- The `metadata` parameter accepts a `dict`, where the key is the label or name of the metadata and the value is the data itself. In this case, the key is `Number of records`. The value in this example is everything after `Number of records`.
- Using `MetadataValue.int`, the value of the `num_rows` variable is typed as an integer. This tells Dagster to render the data as an integer.

Expand All @@ -80,7 +94,7 @@ Let’s add metadata to the `taxi_trips_file` asset to demonstrate further. This
partitions_def=monthly_partition,
group_name="raw_files",
)
def taxi_trips_file(context):
def taxi_trips_file(context) -> MaterializeResult:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ To better understand how materialization works, let’s take another look at the

```python file=/dagster-university/lesson_3.py startafter=start_taxi_trips_file_asset endbefore=end_taxi_trips_file_asset
@asset
def taxi_trips_file():
def taxi_trips_file() -> None:
"""The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal."""
month_to_fetch = "2023-03"
raw_trips = requests.get(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The asset you built should look similar to the following code. Click **View answ

```python {% obfuscated="true" %}
@asset
def taxi_zones_file():
def taxi_zones_file() -> None:
"""
The raw CSV file for the taxi zones dataset. Sourced from the NYC Open Data portal.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ Your first asset, which you’ll name `taxi_trips_file`, will retrieve the yello
from . import constants
```

3. Below the imports, add the following code to create a function named `taxi_trips_file`:
3. Below the imports, let's define a function that takes no inputs and returns nothing (type-annoted with `None`). Add the following code to create a function to do this named `taxi_trips_file`:

```python
def taxi_trips_file():
def taxi_trips_file() -> None:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand Down Expand Up @@ -51,7 +51,7 @@ Your first asset, which you’ll name `taxi_trips_file`, will retrieve the yello
from dagster import asset

@asset
def taxi_trips_file():
def taxi_trips_file() -> None:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand All @@ -65,3 +65,5 @@ Your first asset, which you’ll name `taxi_trips_file`, will retrieve the yello
```

That’s it - you’ve created your first Dagster asset! Using the `@asset` decorator, you can easily turn any existing Python function into a Dagster asset.

**Questions about the `-> None` bit?** That's a Python feature called **type annotation**. In this case, it's saying that the function returns nothing. You can learn more about type annotations in the [Python documentation](https://docs.python.org/3/library/typing.html). We highly recommend using type annotations in your code to make it easier to read and understand.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ import requests
from dagster import asset

@asset
def taxi_trips_file():
def taxi_trips_file() -> None:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Having all of your assets in one file becomes difficult to manage. Let’s separ
@asset(
deps=["taxi_trips", "taxi_zones"]
)
def manhattan_stats():
def manhattan_stats() -> None:
```

4. Now, let’s add the logic to calculate `manhattan_stats`. Update the `manhattan_stats` asset definition to reflect the changes below:
Expand All @@ -46,7 +46,7 @@ Having all of your assets in one file becomes difficult to manage. Let’s separ
@asset(
deps=["taxi_trips", "taxi_zones"]
)
def manhattan_stats():
def manhattan_stats() -> None:
query = """
select
zones.zone,
Expand Down Expand Up @@ -96,7 +96,7 @@ In this section, you’ll create an asset that depends on `manhattan_stats`, loa
@asset(
deps=["manhattan_stats"],
)
def manhattan_map():
def manhattan_map() -> None:
trips_by_zone = gpd.read_file(constants.MANHATTAN_STATS_FILE_PATH)

fig = px.choropleth_mapbox(trips_by_zone,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The asset you built should look similar to the following code. Click **View answ
@asset(
deps=["taxi_zones_file"]
)
def taxi_zones():
def taxi_zones() -> None:
sql_query = f"""
create or replace table zones as (
select
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ import pandas as pd
@asset(
deps=["taxi_trips"]
)
def trips_by_week():
def trips_by_week() -> None:
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))

current_date = datetime.strptime("2023-03-01", constants.DATE_FORMAT)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage
@asset(
deps=["taxi_trips_file"]
)
def taxi_trips():
def taxi_trips() -> None:
"""
The raw taxi trips dataset, loaded into a DuckDB database
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ We’ll assume your code looks like the following for the rest of the module. If
@asset(
deps=["taxi_zones_file"],
)
def taxi_zones(database: DuckDBResource):
def taxi_zones(database: DuckDBResource) -> None:
"""
The raw taxi zones dataset, loaded into a DuckDB database.
"""
Expand Down Expand Up @@ -69,7 +69,7 @@ Update the `manhattan_stats` asset:
@asset(
deps=["taxi_trips", "taxi_zones"]
)
def manhattan_stats(database: DuckDBResource):
def manhattan_stats(database: DuckDBResource) -> None:
"""
Metrics on taxi trips in Manhattan
"""
Expand Down Expand Up @@ -104,7 +104,7 @@ Update the `trips_by_week` asset:
@asset(
deps = ["taxi_trips"]
)
def trips_by_week(database: DuckDBResource):
def trips_by_week(database: DuckDBResource) -> None:

current_date = datetime.strptime("2023-01-01", constants.DATE_FORMAT)
end_date = datetime.now()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Throughout this module, you’ve used DuckDB to store and transform your data. E
@asset(
deps=["taxi_trips_file"],
)
def taxi_trips():
def taxi_trips() -> None:
...
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ from dagster import asset
@asset(
deps=["taxi_trips_file"],
)
def taxi_trips():
def taxi_trips() -> None:
sql_query = """
create or replace table taxi_trips as (
select
Expand Down Expand Up @@ -71,7 +71,7 @@ from dagster import asset
@asset(
deps=["taxi_trips_file"],
)
def taxi_trips(database: DuckDBResource):
def taxi_trips(database: DuckDBResource) -> None:
sql_query = """
create or replace table taxi_trips as (
select
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Starting with `taxi_trips_file`, the asset code should currently look like this:

```python
@asset
def taxi_trips_file():
def taxi_trips_file() -> None:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand Down Expand Up @@ -52,7 +52,7 @@ To add the partition to the asset:
@asset(
partitions_def=monthly_partition
)
def taxi_trips_file(context: AssetExecutionContext):
def taxi_trips_file(context: AssetExecutionContext) -> None:
```

**Note**: The `context` argument isn’t specific to partitions. However, this is the first time you've used it in Dagster University. The `context` argument provides information about how Dagster is running and materializing your asset. For example, you can use it to find out which partition Dagster is materializing, which job triggered the materialization, or what metadata was attached to its previous materializations.
Expand All @@ -63,7 +63,7 @@ To add the partition to the asset:
@asset(
partitions_def=monthly_partition
)
def taxi_trips_file(context):
def taxi_trips_file(context) -> None:
partition_date_str = context.partition_key
```

Expand All @@ -73,7 +73,7 @@ To add the partition to the asset:
@asset(
partitions_def=monthly_partition
)
def taxi_trips_file(context):
def taxi_trips_file(context) -> None:
partition_date_str = context.partition_key
month_to_fetch = partition_date_str[:-3]
```
Expand All @@ -86,7 +86,7 @@ from ..partitions import monthly_partition
@asset(
partitions_def=monthly_partition
)
def taxi_trips_file(context):
def taxi_trips_file(context) -> None:
"""
The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ from ..partitions import monthly_partitions
deps=["taxi_trips_file"],
partitions_def=monthly_partition,
)
def taxi_trips(context: AssetExecutionContext, database: DuckDBResource):
def taxi_trips(context: AssetExecutionContext, database: DuckDBResource) -> None:
"""
The raw taxi trips dataset, loaded into a DuckDB database, partitioned by month.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ from ..partitions import weekly_partition
deps=["taxi_trips"],
partitions_def=weekly_partition
)
def trips_by_week(context: AssetExecutionContext, database: DuckDBResource):
def trips_by_week(context: AssetExecutionContext, database: DuckDBResource) -> None:
"""
The number of trips per week, aggregated by week.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Now that you’ve defined how the asset can be materialized, let’s create the
@asset(
deps=["taxi_zones", "taxi_trips"]
)
def adhoc_request(config: AdhocRequestConfig, database: DuckDBResource):
def adhoc_request(config: AdhocRequestConfig, database: DuckDBResource) -> None:
```

3. When the report is written to a file, it should have a similar name to the request. A template has been provided in `assets/constants.py` that contains a template named `REQUEST_DESTINATION_TEMPLATE_FILE_PATH` .
Expand Down Expand Up @@ -130,7 +130,7 @@ class AdhocRequestConfig(Config):
@asset(
deps=["taxi_zones", "taxi_trips"]
)
def adhoc_request(config: AdhocRequestConfig, database: DuckDBResource):
def adhoc_request(config: AdhocRequestConfig, database: DuckDBResource) -> None:
"""
The response to an request made in the `requests` directory.
See `requests/README.md` for more information.
Expand Down

0 comments on commit 7351009

Please sign in to comment.