Skip to content

Commit

Permalink
add metadata to jobs (#7849)
Browse files Browse the repository at this point in the history
  • Loading branch information
jamiedemaria committed Jun 1, 2022
1 parent ac59da6 commit cfe0766
Show file tree
Hide file tree
Showing 23 changed files with 678 additions and 11 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,10 @@
{
"title": "Op Retries",
"path": "/concepts/ops-jobs-graphs/op-retries"
},
{
"title": "Metadata & Tags",
"path": "/concepts/ops-jobs-graphs/metadata-tags"
}
]
},
Expand Down
113 changes: 113 additions & 0 deletions docs/content/concepts/ops-jobs-graphs/metadata-tags.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: Metadata & Tags | Dagster
description: Metadata and tags provide two ways to attach information in Dagster
---

# Metadata & Tags

Metadata and tags provide two different ways for you to attach information to your jobs and the runs launched from those jobs. The main difference between metadata and tags is what "object" the information is attached to. Metadata is attached to JobDefinitions (specified using the @job decorator) and tags are attached to runs that are created by executing a job.

## Metadata

Metadata allows you to attach information to a job. This information can be whatever you want, but possible cases include:

- keeping track of the team responsible for maintaining a job
- linking to documentation or other resources
- displaying the git hash corresponding to the current job definition

Note: If you are running Dagster using separate Dagit and user code installations (more info [here](/deployment/overview)), then your Dagit installation must be >=0.14.18 to use metadata on jobs.

### Specifying Metadata

When you attach metadata to a job, you do it as a dictionary of key value pairs. The keys must be a string, but the values can be any one of the <PyObject object="MetadataValue" /> classes we provide. You can also use primitive python types as values, and dagster will convert them to the appropriate <PyObject object="MetadataValue" />.

```python file=/concepts/ops_jobs_graphs/metadata_tags.py startafter=start_metadata_on_job endbefore=end_metadata_on_job
@op
def my_op():
return "Hello World!"


@job(
metadata={
"owner": "data team", # will be converted to MetadataValue.text
"docs": MetadataValue.url("https://docs.dagster.io"),
}
)
def my_job_with_metadata():
my_op()
```

In addition to adding metadata on the `@job` decorator, you can also add metadata using the <PyObject object="GraphDefinition" method="to_job" /> method.

```python file=/concepts/ops_jobs_graphs/metadata_tags.py startafter=start_metadata_on_graph_to_job endbefore=end_metadata_on_graph_to_job
@graph
def my_graph():
my_op()


my_second_job_with_metadata = my_graph.to_job(
metadata={"owner": "api team", "docs": MetadataValue.url("https://docs.dagster.io")}
)
```

### Viewing Metadata

After attaching metadata to a job, you can view it in dagit by navigating to the job overview page. Metadata will be displayed in the right pane.

<img
alt="job-metadata.png"
src="/images/concepts/ops-jobs-graphs/job-metadata.png"
/>

## Tags

Tags allow you to attach information to the run created when you execute a job. Tags can contain any information you want, and dagster will also attach some tags to your runs (we'll cover these later).

### Specifying Tags

You can specify tags you want attached to every run by adding them to a job. Tags are specified as a dictionary of key value pairs where the key must be a string and the value must be a string or json that is serializable to a string.

```python file=/concepts/ops_jobs_graphs/metadata_tags.py startafter=start_tags_on_job endbefore=end_tags_on_job
@job(tags={"foo": "bar"})
def my_job_with_tags():
my_op()
```

In addition to adding tags on the `@job` decorator, you can also add metadata using the <PyObject object="GraphDefinition" method="to_job" /> method.

```python file=/concepts/ops_jobs_graphs/metadata_tags.py startafter=start_tags_on_graph_to_job endbefore=end_tags_on_graph_to_job
my_second_job_with_tags = my_graph.to_job(tags={"foo": "bar"})
```

When executing a job, you can add tags to the run using the Launchpad in Dagit <img
alt="tag-adder.png"
src="/images/concepts/ops-jobs-graphs/tag-adder.png"
/>

### Viewing Tags

You can view the tags that have been attached to runs by going to the Runs page in Dagit <img
alt="tags-viewer.png"
src="/images/concepts/ops-jobs-graphs/tags-viewer.png"
/>

### Dagster provided tags

Dagster will automatically add tags to your runs in some cases including:

- The solid selection for the run, if applicable
- The partition set and partition of the run, if applicable
- The schedule that triggered the run, if applicable
- The backfill ID, if applicable
- The parent run of a re-executed run
- The docker image tag

### Using tags to affect run execution

Some features of dagster are controlled using the tags attached to a run. Some examples include:

- [Customizing kubernetes config](/deployment/guides/kubernetes/customizing-your-deployment)
- [Specifying celery config](/deployment/guides/kubernetes/deploying-with-helm-advanced#configuring-celery-queues)
- [Turning run memoization on or off](/guides/dagster/memoization#disabling-memoization)
- [Setting concurrency limits when using the QueuedRunCoordinator](/deployment/run-coordinator#usage)
- [Setting the priority of different runs](/deployment/run-coordinator#priorities)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/screenshot_capture/screenshots.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,9 @@
- select a partition from the partition selector
vetted: false

- path: concepts/ops-jobs-graphs/job-metadata.png
url: http://127.0.0.1:3000/workspace/toys_repository@dagster_test.graph_job_op_toys.repo/jobs/with_metadata
vetted: false

##################
# Concepts: Dagit
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
from dagster import MetadataValue, graph, job, op

# start_metadata_on_job


@op
def my_op():
return "Hello World!"


@job(
metadata={
"owner": "data team", # will be converted to MetadataValue.text
"docs": MetadataValue.url("https://docs.dagster.io"),
}
)
def my_job_with_metadata():
my_op()


# end_metadata_on_job


# start_metadata_on_graph_to_job


@graph
def my_graph():
my_op()


my_second_job_with_metadata = my_graph.to_job(
metadata={"owner": "api team", "docs": MetadataValue.url("https://docs.dagster.io")}
)

# end_metadata_on_graph_to_job

# start_tags_on_job


@job(tags={"foo": "bar"})
def my_job_with_tags():
my_op()


# end_tags_on_job

# start_tags_on_graph_to_job

my_second_job_with_tags = my_graph.to_job(tags={"foo": "bar"})

# end_tags_on_graph_to_job
76 changes: 72 additions & 4 deletions integration_tests/test_suites/backcompat-test-suite/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,75 @@ This test suite ensures that the branch Dagster code can successfully communicat

In order to run, the `EARLIEST_TESTED_RELEASE` environment variable needs to be set.

- Set `EARLIEST_TESTED_RELEASE` to match the earliest release to test:
```bash
export EARLIEST_TESTED_RELEASE="0.12.8"
```
- Set `EARLIEST_TESTED_RELEASE` to match the earliest release to test:
```bash
export EARLIEST_TESTED_RELEASE="0.12.8"
```


If you are on MacOS, ensure you have docker running

From `integration_tests/test_suites/backcompat-test-suite` run any of the following commands
* `pytest -m dagit-latest-release -xvv -ff tests/test_backcompat.py`
* `pytest -m dagit-earliest-release -xvv -ff tests/test_backcompat.py`
* `pytest -m user-code-latest-release -xvv -ff tests/test_backcompat.py`
* `pytest -m user-code-earliest-release -xvv -ff tests/test_backcompat.py`
* `tox dagit-latest-release`
* `tox dagit-earliest-release`
* `tox user-code-latest-release`
* `tox user-code-earliest-release`


where:
* dagit-latest-release: Dagit on most recent release and user code on current branch
* dagit-earliest-release: Dagit on earliest release to maintain backcompat for, and user code on current branch
* user-code-latest-release: Dagit on current branch and user code on latest minor release
* user-code-earliest-release: Dagit on current branch and user code on earliest release to maintain backcompat for


## Debugging tips

### Option 1:
To view the logs of the docker containers that are spun up during testing, you'll need to comment out a line in the
test suite so that the containers are not removed. In `tests/test_backcompat.py` in `docker_service_up()` the final lines will be
```python
try:
yield
finally:
subprocess.check_output(["docker-compose", "-f", docker_compose_file, "stop"])
subprocess.check_output(["docker-compose", "-f", docker_compose_file, "rm", "-f"])
```
change them to
```python
try:
yield
finally:
subprocess.check_output(["docker-compose", "-f", docker_compose_file, "stop"])
# subprocess.check_output(["docker-compose", "-f", docker_compose_file, "rm", "-f"])
```
When you run the backcompat test, you can view the docker containers using `docker container ls -a` and view the logs for the container in
question using `docker logs <CONTAINER ID>`

### Option 2:
Most of the tests are run in subprocesses and inside docker containers, so if you're having trouble debugging
in this setup, you can emulate what the test is doing using two clones of dagster

1. create a new virtualenv running the same python version you usually use
2. clone dagster into a new folder. we'll call it `dagster_2` here. We'll call your normal clone of dagster that's on your user branch `dagster`
3. activate your new virtual env and cd into `dagster_2`
4. checkout the version of dagster you want to test against (ie. checkout release/0.14.17)
5. `make dev install` in `dagster_2`
6. In `dagster` start up a grpc server pointing at `repo.py` in `dagit_service`: `dagster api grpc --python-file dagit_service/repo.py --host 0.0.0.0 --port 4266`
7. In `dagster_2` update `integration_tests/test_suites/backcompat-test-suite/dagit_service/workspace.yaml` to tell dagit that the grpc service host is localhost and the port is 4266
8. In `dagster_2` run dagit: `dagit -w integration_tests/test_suites/backcompat-test-suite/dagit_service/workspace.yaml`
9. In `dagster` open a python interpreter and run the following
```python
from dagster_graphql import DagsterGraphQLClient

client = DagsterGraphQLClient("localhost", port_number=3000)
client.submit_pipeline_execution(pipeline_name="the_job", mode="default", run_config={})
```

10. You can modify the args to `submit_pipeline_execution` based on the test that you are debugging

This setup should allow you to set breakpoints in `dagster` and `dagster_2`
4 changes: 4 additions & 0 deletions js_modules/dagit/packages/core/src/graphql/schema.graphql

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import {useHistory, useParams} from 'react-router-dom';
import {PYTHON_ERROR_FRAGMENT} from '../app/PythonErrorInfo';
import {AssetGraphExplorer} from '../asset-graph/AssetGraphExplorer';
import {useDocumentTitle} from '../hooks/useDocumentTitle';
import {METADATA_ENTRY_FRAGMENT} from '../metadata/MetadataEntry';
import {Loading} from '../ui/Loading';
import {buildPipelineSelector} from '../workspace/WorkspaceContext';
import {RepoAddress} from '../workspace/types';
Expand Down Expand Up @@ -132,6 +133,9 @@ export const PIPELINE_EXPLORER_ROOT_QUERY = gql`
... on PipelineSnapshot {
id
name
metadataEntries {
...MetadataEntryFragment
}
...GraphExplorerFragment
solidHandle(handleID: $rootHandleID) {
Expand Down Expand Up @@ -160,6 +164,7 @@ export const PIPELINE_EXPLORER_ROOT_QUERY = gql`
...PythonErrorFragment
}
}
${METADATA_ENTRY_FRAGMENT}
${GRAPH_EXPLORER_FRAGMENT}
${GRAPH_EXPLORER_SOLID_HANDLE_FRAGMENT}
${GRAPH_EXPLORER_ASSET_NODE_FRAGMENT}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import {gql, useQuery} from '@apollo/client';
import {Box} from '@dagster-io/ui';
import {Box, MetadataTable} from '@dagster-io/ui';
import * as React from 'react';

import {PYTHON_ERROR_FRAGMENT} from '../app/PythonErrorInfo';
import {METADATA_ENTRY_FRAGMENT, MetadataEntry} from '../metadata/MetadataEntry';
import {PipelineSelector} from '../types/globalTypes';
import {Loading} from '../ui/Loading';
import {isThisThingAJob, useRepository} from '../workspace/WorkspaceContext';
Expand Down Expand Up @@ -42,6 +43,13 @@ export const SidebarPipelineOrJobOverview: React.FC<{

const modes = pipelineSnapshotOrError.modes;

const metadataRows = pipelineSnapshotOrError.metadataEntries.map((entry) => {
return {
key: entry.label,
value: <MetadataEntry entry={entry} />,
};
});

return (
<>
<SidebarSection title="Description">
Expand All @@ -58,6 +66,11 @@ export const SidebarPipelineOrJobOverview: React.FC<{
))}
</Box>
</SidebarSection>
<SidebarSection title="Metadata">
<Box padding={{vertical: 16, horizontal: 24}}>
<MetadataTable rows={metadataRows} />
</Box>
</SidebarSection>
</>
);
}}
Expand All @@ -76,6 +89,9 @@ const JOB_OVERVIEW_SIDEBAR_QUERY = gql`
id
...SidebarModeInfoFragment
}
metadataEntries {
...MetadataEntryFragment
}
}
... on PipelineNotFoundError {
message
Expand All @@ -86,6 +102,7 @@ const JOB_OVERVIEW_SIDEBAR_QUERY = gql`
...PythonErrorFragment
}
}
${METADATA_ENTRY_FRAGMENT}
${SIDEBAR_MODE_INFO_FRAGMENT}
${PYTHON_ERROR_FRAGMENT}
`;

1 comment on commit cfe0766

@vercel
Copy link

@vercel vercel bot commented on cfe0766 Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.