Skip to content

Commit

Permalink
link to examples in concepts pages (#7753)
Browse files Browse the repository at this point in the history
  • Loading branch information
jamiedemaria committed May 10, 2022
1 parent b0e3ca2 commit 35a198b
Show file tree
Hide file tree
Showing 15 changed files with 209 additions and 21 deletions.
18 changes: 18 additions & 0 deletions docs/content/concepts/assets/software-defined-assets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -412,3 +412,21 @@ def my_asset(context):
## Further Reading

Interested in learning more about software-defined assets and working through a more complex example? Check out our [guide on software-defined assets](/guides/dagster/software-defined-assets) and our [example project](https://github.com/dagster-io/dagster/tree/master/examples/modern_data_stack_assets) that integrates software-defined assets with other Modern Data Stack tools.

## See it in action

For more examples of software-defined assets, check out the following in our [SDA Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news_assets):

- [Defining an asset](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news_assets/hacker_news_assets/activity_analytics/assets/activity_forecast.py)
- [Loading assets from dbt](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news_assets/hacker_news_assets/activity_analytics/\__init\_\_.py)
- [Per-asset IO manager](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news_assets/hacker_news_assets/core/assets/items.py)
- [Partitioned assets](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news_assets/hacker_news_assets/core/assets/items.py)
- [AssetGroups](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news_assets/hacker_news_assets/core/\__init\_\_.py)

Our [Modern Data Stack example](https://github.com/dagster-io/dagster/tree/master/examples/modern_data_stack_assets) also covers:

- [Defining assets](https://github.com/dagster-io/dagster/blob/master/examples/modern_data_stack_assets/modern_data_stack_assets/assets.py)
- [Loading assets from dbt](https://github.com/dagster-io/dagster/blob/master/examples/modern_data_stack_assets/modern_data_stack_assets/assets.py)
- [Loading assets from Airbyte](https://github.com/dagster-io/dagster/blob/master/examples/modern_data_stack_assets/modern_data_stack_assets/assets.py)

Our [Bollinger example](https://github.com/dagster-io/dagster/tree/master/examples/bollinger) also covers software-defined assets
6 changes: 6 additions & 0 deletions docs/content/concepts/configuration/config-schema.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -219,3 +219,9 @@ resources:
my_str: foo
my_int: 1
```

## See it in action

For more examples of jobs, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Config schema on a resource](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/parquet_io_manager.py)
6 changes: 6 additions & 0 deletions docs/content/concepts/configuration/configured.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,9 @@ def datasets():
sample_dataset()
full_dataset()
```

## See it in action

For more examples of jobs, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Using configured() for resources](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/\__init\_\_.py)
10 changes: 10 additions & 0 deletions docs/content/concepts/io-management/io-managers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -319,3 +319,13 @@ class DataframeTableIOManagerWithMetadata(IOManager):
Any entries yielded this way will be attached to the `Handled Output` event for this output.

Additionally, if you have specified that this `handle_output` function will be writing to an asset by defining a `get_output_asset_key` function, these metadata entries will also be attached to the materialization event created for that asset. You can learn more about this functionality in the [Asset Docs](/concepts/assets/asset-materializations).

## See it in action

For more examples of IO Managers, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Snowflake IO Manager](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/snowflake_io_manager.py)
- [Parquet IO Manager](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/parquet_io_manager.py)
- [S3 IO Manager with custom bucket](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/common_bucket_s3\_pickle_io_manager.py)

Our [Bollinger example](https://github.com/dagster-io/dagster/tree/master/examples/bollinger) and [New York Times example](https://github.com/dagster-io/dagster/tree/master/examples/nyt-feed) also cover writing custom IO Managers.
12 changes: 12 additions & 0 deletions docs/content/concepts/ops-jobs-graphs/jobs-graphs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -539,3 +539,15 @@ def define_dep_dsl_graph() -> GraphDefinition:
path = os.path.join(os.path.dirname(__file__), "my_graph.yaml")
return construct_graph_with_yaml(path, [add_one, add_two, subtract])
```

## See it in action

For more examples of jobs, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Creating multiple jobs from a graph](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/jobs/dbt_metrics.py)
- [Specifying config on a job](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/jobs/hacker_news_api_download.py)

Our [New York Times example](https://github.com/dagster-io/dagster/tree/master/examples/nyt-feed) covers:

- [Conditional branching](https://github.com/dagster-io/dagster/blob/master/examples/nyt-feed/nyt_feed/nyt_feed_job.py)
- [Using the same op twice](https://github.com/dagster-io/dagster/blob/master/examples/nyt-feed/nyt_feed/nyt_feed_job.py)
8 changes: 8 additions & 0 deletions docs/content/concepts/ops-jobs-graphs/ops.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -177,3 +177,11 @@ def my_op_factory(

return my_inner_op
```

## See it in action

For more examples of ops, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Specifying Ins and Outs](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/ops/comment_stories.py)
- [Using resources](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/ops/download_items.py)
- [Per-Output IO Manager](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/ops/user_top_recommended_stories.py)
Original file line number Diff line number Diff line change
Expand Up @@ -345,3 +345,10 @@ A few rules govern partition-to-partition dependencies:

- When the upstream asset and downstream asset have the same <PyObject object="PartitionsDefinition" />, each partition in the downstream asset depends on the same partition in the upstream asset.
- When the upstream asset and downstream asset are both time window-partitioned, each partition in the downstream asset depends on all partitions in the upstream asset that intersect its time window. For example, if an asset with a <PyObject object="DailyPartitionsDefinition" /> depends on an asset with an <PyObject object="HourlyPartitionsDefinition" />, then partition `2022-04-12` of the daily asset the would depend on 24 partitions of the hourly asset: `2022-04-12-00:00` through `2022-04-12-23:00`.

## See it in action

For more examples of partitions, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Defining an hourly partitioned schedule](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/schedules/hourly_hn_download_schedule.py)
- [Specifying the partition to a job](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/jobs/hacker_news_api_download.py)
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,13 @@ def test_configurable_job_schedule():

If your `@schedule`-decorated function doesn't have a context parameter, you don't need to provide one when invoking it.

## See it in action

For more examples of schedules, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Defining an hourly partitioned schedule](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/schedules/hourly_hn_download_schedule.py)
- [Specifying the partition to a job](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/jobs/hacker_news_api_download.py)

## Troubleshooting

Try these steps if you're trying to run a schedule and are running into problems.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -612,3 +612,9 @@ def uses_db_connection():
```

If a resource you want to initialize has dependencies on other resources, those can be included in the dictionary passed to <PyObject object="build_resources"/>. For more in-depth usage, check out the [Initializing Resources Outside of Execution](/concepts/resources#initializing-resources-outside-of-execution) section.

## See it in action

For more examples of sensors, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Sensor factory](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/sensors/hn_tables_updated_sensor.py)
8 changes: 8 additions & 0 deletions docs/content/concepts/resources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,11 @@ When constructing a job that includes that op, we provide the resource `client`,
def connect():
get_client()
```

## See it in action

For more examples of resources, check out the following in our [Hacker News example](https://github.com/dagster-io/dagster/tree/master/examples/hacker_news):

- [Hacker News resource](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/hn_resource.py)
- Using resources in ops [1](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/ops/download_items.py) [2](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/ops/dbt.py)
- [Specifying resources on jobs and supplying config](https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/jobs/hacker_news_api_download.py)
4 changes: 4 additions & 0 deletions docs/content/concepts/types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,7 @@ Dagster types peacefully coexist with Python type annotations. In this example,
def double_even_with_annotations(num: int) -> int:
return num
```

## See it in action

For more examples of the dagster type system, check out our [Bollinger example](https://github.com/dagster-io/dagster/tree/master/examples/bollinger)
107 changes: 107 additions & 0 deletions docs/next/__tests__/mdxExternalLinks.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import fs, { link } from "fs";
import path from "path";
import fg from "fast-glob";
import { Node } from "hast";
import visit from "unist-util-visit";
import matter from "gray-matter";

// remark
import mdx from "remark-mdx";
import remark from "remark";

const ROOT_DIR = path.resolve(__dirname, "../../");
const DOCS_DIR = path.resolve(ROOT_DIR, "content");
const DAGSTER_DIR = path.resolve(ROOT_DIR, "..")
interface LinkElement extends Node {
type: "link" | "image";
url: string;
}


test("No dead external MDX links", async () => {
const allMdxFilePaths = await fg(["**/*.mdx"], { cwd: DOCS_DIR });

const astStore: { [filePath: string]: Node } = {};
const allExternalLinksStore: { [filePath: string]: Array<string> } = {};

// Parse mdx files to find all internal links and populate the store
await Promise.all(
allMdxFilePaths.map(async (relativeFilePath) => {
const absolutePath = path.resolve(DOCS_DIR, relativeFilePath);
const fileContent = await fs.promises.readFile(absolutePath, "utf-8");
// separate content and front matter data
const { content, data } = matter(fileContent);
astStore[relativeFilePath] = remark().use(mdx).parse(content);
})
);

for (const filePath in astStore) {
const externalLinks = collectExternalLinks(astStore[filePath]);
allExternalLinksStore[filePath] = externalLinks;
}

const deadLinks: Array<{ sourceFile: string; deadLink: string }> = [];

let linkCount = 0;

for (const source in allExternalLinksStore) {
const linkList = allExternalLinksStore[source];

for (const link of linkList) {
linkCount++;
if (!isLinkLegit(link)) {
deadLinks.push({
sourceFile: path.resolve(DOCS_DIR, source),
deadLink: link,
});
}
}
}

// Sanity check to make sure the parser is working
expect(linkCount).toBeGreaterThan(0);

expect(deadLinks).toEqual([]);
});


function isLinkLegit(
rawTarget: string,
): boolean {
// TODO: Validate links to API Docs

const splitter = new RegExp('\/master\/')

const filePath = rawTarget.split(splitter)[1]

return fileExists(path.resolve(DAGSTER_DIR, filePath))
}

// traverse the mdx ast to find all links to our examples
function collectExternalLinks(
tree: Node,
): Array<string> {
const externalLinkRegex = /^(https?:\/\/github\.com\/dagster\-io\/dagster\/.*\/master)/;
const result: Array<string> = [];

visit(tree, ["link", "image"], (node: LinkElement, index) => {
const { url } = node;
if (url.match(externalLinkRegex)) {
result.push(url);
} else {
return;
}
});

return result;
}

function fileExists(filePath: string): boolean {

try {
fs.statSync(filePath);
return true;
} catch (_) {
return false;
}
}
Empty file.
19 changes: 2 additions & 17 deletions examples/hacker_news/hacker_news/jobs/hacker_news_api_download.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
from datetime import datetime

from hacker_news.ops.download_items import build_comments, build_stories, download_items
from hacker_news.ops.id_range_for_time import id_range_for_time
from hacker_news.resources import RESOURCES_LOCAL, RESOURCES_PROD, RESOURCES_STAGING
from hacker_news.resources.hn_resource import hn_api_subsample_client, hn_snapshot_client
from hacker_news.resources.partition_bounds import partition_bounds
from hacker_news.schedules.hourly_hn_download_schedule import hourly_download_config

from dagster import graph, hourly_partitioned_config, in_process_executor
from dagster import graph, in_process_executor

DOWNLOAD_TAGS = {
"dagster-k8s/config": {
Expand Down Expand Up @@ -35,20 +34,6 @@ def hacker_news_api_download():
build_stories(items)


@hourly_partitioned_config(start_date=datetime(2020, 12, 1))
def hourly_download_config(start: datetime, end: datetime):
return {
"resources": {
"partition_bounds": {
"config": {
"start": start.strftime("%Y-%m-%d %H:%M:%S"),
"end": end.strftime("%Y-%m-%d %H:%M:%S"),
}
},
}
}


download_prod_job = hacker_news_api_download.to_job(
resource_defs={
**{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@
from dagster import hourly_partitioned_config


@hourly_partitioned_config(start_date=datetime(2021, 1, 1))
def hourly_download_schedule_config(start: datetime, end: datetime):
@hourly_partitioned_config(start_date=datetime(2020, 12, 1))
def hourly_download_config(start: datetime, end: datetime):
return {
"resources": {
"partition_start": {"config": start.strftime("%Y-%m-%d %H:%M:%S")},
"partition_end": {"config": end.strftime("%Y-%m-%d %H:%M:%S")},
"partition_bounds": {
"config": {
"start": start.strftime("%Y-%m-%d %H:%M:%S"),
"end": end.strftime("%Y-%m-%d %H:%M:%S"),
}
},
}
}

1 comment on commit 35a198b

@vercel
Copy link

@vercel vercel bot commented on 35a198b May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.