remove old asset lineage from docs (#8382)

dagster-io · Jun 14, 2022 · db08e07 · db08e07
1 parent a60e889
commit db08e07
Show file tree

Hide file tree

Showing 9 changed files with 0 additions and 584 deletions.
diff --git a/docs/content/concepts/assets/asset-materializations.mdx b/docs/content/concepts/assets/asset-materializations.mdx
@@ -187,152 +187,6 @@ def my_partitioned_asset_op(context):
     return remote_storage_path
 ```
 
-## Linking Op Outputs to Assets <Experimental />
-
-It is fairly common for an asset to correspond to an op output. In the following simplified example, our op produces a dataframe, persists it to storage, and then passes the dataframe along as an output:
-
-```python file=concepts/assets/materialization_ops.py startafter=start_simple_asset_op endbefore=end_simple_asset_op
-from dagster import op, Output, AssetMaterialization
-
-
-@op
-def my_asset_op(context):
-    df = read_df()
-    persist_to_storage(df)
-    context.log_event(AssetMaterialization(asset_key="my_dataset"))
-    return df
-```
-
-In this case, the <PyObject object="AssetMaterialization" /> and the <PyObject object="Output" /> events both correspond to the same data, the dataframe that we have created. With this in mind, we can simplify the above code, and provide useful information to the Dagster framework, by making this link between the `my_dataset` asset and the output of this op explicit.
-
-Just as there are two places in which you can log runtime <PyObject object="AssetMaterialization" /> events (within an op body and within an IO manager), we provide two different interfaces for linking an op output to to an asset. Regardless of which you choose, every time the op runs and logs that output, an <PyObject object="AssetMaterialization" /> event will automatically be created to record this information.
-
-If you use an <PyObject object="Output" /> event to yield your output, and specified any metadata entries on it, (see: [Op Event Docs](/concepts/ops-jobs-graphs/op-events#attaching-metadata-to-outputs)), these entries will automatically be attached to the materialization event for this asset.
-
-### Linking assets to an Output Definition <Experimental />
-
-For cases where you are storing your asset within the body of an op, the easiest way of linking an asset to an op output is with the `asset_key` parameter on the relevant <PyObject object="OutputDefinition"/> in your op. To do this, you may define a constant <PyObject object="AssetKey"/> that identifies the asset you are linking.
-
-```python file=/concepts/assets/materialization_ops.py startafter=start_output_def_mat_op_0 endbefore=end_output_def_mat_op_0
-from dagster import op, Output, Out, AssetKey
-
-
-@op(out=Out(asset_key=AssetKey("my_dataset")))
-def my_constant_asset_op(context):
-    df = read_df()
-    persist_to_storage(df)
-    return df
-```
-
-### Linking assets to outputs with an IOManager <Experimental />
-
-If you've defined a custom <PyObject object="IOManager"/> to handle storing your op's outputs, the <PyObject object="IOManager"/> will likely be the most natural place to define which asset a particular output will be written to. To do this, you can implement the `get_output_asset_key` function on your <PyObject object="IOManager"/>.
-
-Similar to the above interface, this function takes an <PyObject object="OutputContext"/> and returns an <PyObject object="AssetKey"/>. The following example functions nearly identically to `PandasCsvIOManagerWithMetadata` from the [runtime example](/concepts/assets/asset-materializations#example-io-manager) above.
-
-```python file=/concepts/assets/materialization_io_managers.py startafter=start_asset_def endbefore=end_asset_def
-from dagster import AssetKey, IOManager, MetadataEntry
-
-
-class PandasCsvIOManagerWithOutputAsset(IOManager):
-    def load_input(self, context):
-        file_path = os.path.join("my_base_dir", context.step_key, context.name)
-        return read_csv(file_path)
-
-    def handle_output(self, context, obj):
-        file_path = os.path.join("my_base_dir", context.step_key, context.name)
-
-        obj.to_csv(file_path)
-
-        yield MetadataEntry.int(obj.shape[0], label="number of rows")
-        yield MetadataEntry.float(obj["some_column"].mean(), "some_column mean")
-
-    def get_output_asset_key(self, context):
-        file_path = os.path.join("my_base_dir", context.step_key, context.name)
-        return AssetKey(file_path)
-```
-
-When an output is linked to an asset in this way, the generated <PyObject object="AssetMaterialization" /> event will contain any <PyObject object="MetadataEntry" /> information yielded from the `handle_output` function (in addiition to all of the `metadata` specified on the corresponding <PyObject object="Output" /> event).
-
-See the [IO manager docs](/concepts/io-management/io-managers#yielding-metadata-from-an-io-manager) for more information on yielding these entries from an IO manager.
-
-#### Specifying partitions for an output-linked asset
-
-If you are already specifying a `get_output_asset_key` function on your <PyObject object="IOManager" />, you can optionally specify a set of partitions that this manager will be updating or creating by also defining a `get_output_asset_partitions` function. If you do this, an <PyObject object="AssetMaterialization" /> will be created for each of the specified partitions. One useful pattern to pass this partition information (which will likely vary each run) to the manager, is to specify the set of partitions on the configuration of the output. You can do this by providing [per-output configuration](/concepts/io-management/io-managers#providing-per-output-config-to-an-io-manager) on the IO manager.
-
-Then, you can calculate the asset partitions that a particular output will correspond to by reading this output configuration in `get_output_asset_partitions`:
-
-```python file=/concepts/assets/materialization_io_managers.py startafter=start_partitioned_asset_def endbefore=end_partitioned_asset_def
-from dagster import AssetKey, IOManager, MetadataEntry
-
-
-class PandasCsvIOManagerWithOutputAssetPartitions(IOManager):
-    def load_input(self, context):
-        file_path = os.path.join("my_base_dir", context.step_key, context.name)
-        return read_csv(file_path)
-
-    def handle_output(self, context, obj):
-        file_path = os.path.join("my_base_dir", context.step_key, context.name)
-
-        obj.to_csv(file_path)
-
-        yield MetadataEntry.int(obj.shape[0], label="number of rows")
-        yield MetadataEntry.float(obj["some_column"].mean(), "some_column mean")
-
-    def get_output_asset_key(self, context):
-        file_path = os.path.join("my_base_dir", context.step_key, context.name)
-        return AssetKey(file_path)
-
-    def get_output_asset_partitions(self, context):
-        return set(context.config["partitions"])
-```
-
-## Asset Lineage <Experimental />
-
-When an op output is linked to an <PyObject object="AssetKey"/>, Dagster can automatically generate lineage information that describes how this asset relates to other output-linked assets.
-
-As a simplified example, imagine a two-op job that first scrapes some user data from an API, storing it to a table, then trains an ML model on that data, persisting it to a model store:
-
-```python file=/concepts/assets/materialization_jobs.py startafter=start_job_0 endbefore=end_job_0
-from dagster import op, job, AssetKey, Out
-
-
-@op(out=Out(asset_key=AssetKey("my_db.users")))
-def scrape_users():
-    users_df = some_api_call()
-    persist_to_db(users_df)
-    return users_df
-
-
-@op(out=Out(asset_key=AssetKey("ml_models.user_prediction")))
-def get_prediction_model(users_df):
-    my_ml_model = train_prediction_model(users_df)
-    persist_to_model_store(my_ml_model)
-    return my_ml_model
-
-
-@job
-def my_user_model_job():
-    get_prediction_model(scrape_users())
-```
-
-In this case, it's certainly fair to say that this ML model, which we have assigned the key `ml_models.user_prediction`, _depends on_ the table that we created, `my_db.users` (it uses the data in the table to train the model).
-
-Why is that? By specifying the structure of your job, you have already defined data depedencies between these ops. By linking the output of `scrape_users` to the input of `get_prediction_model`, we can now infer that whatever this second op outputs will be some function of its input. Furthermore, since we have linked each of these outputs to external assets, we can use this knowledge to say that the asset associated with the output of `get_prediction_model` depends on the asset associated with the output of `scrape_users`.
-
-This feature is still in its early stages, but for now, this lineage information is surfaced in the [Asset Catalog](/concepts/dagit/dagit#assets) page for each asset ("Parent assets"):
-
-<!-- This was generated with:
-    * `dagit -f materialization_jobs.py -p 3333` inside dagster/examples/docs_snippets/docs_snippets/concepts/assets directory
--->
-
-<Image
-alt="asset-lineage"
-src="/images/concepts/assets/asset-lineage.png"
-width={1607}
-height={1031}
-/>
-
 ## Software-defined assets vs. Asset Materializations
 
 When working with software-defined assets, the assets and their dependencies must be known at definition time. When you look at software-defined assets in Dagit, you can see exactly what assets are going to be materialized before any code runs.

diff --git a/docs/next/public/images/concepts/assets/asset-lineage.png b/docs/next/public/images/concepts/assets/asset-lineage.png
diff --git a/docs/screenshot_capture/screenshots.yaml b/docs/screenshot_capture/screenshots.yaml
@@ -97,10 +97,6 @@
     - View the asset details page for "my_dataset"
   vetted: false
 
-- path: concepts/assets/asset-lineage.png
-  url: https://demo.elementl.show/instance/assets/snowflake/hackernews/comments
-  vetted: false
-
 - path: concepts/logging/job-log-dagit.png
   defs_file: examples/docs_snippets/docs_snippets/concepts/logging/builtin_logger.py
   steps:

diff --git a/examples/docs_snippets/docs_snippets/concepts/assets/materialization_io_managers.py b/examples/docs_snippets/docs_snippets/concepts/assets/materialization_io_managers.py
diff --git a/examples/docs_snippets/docs_snippets/concepts/assets/materialization_jobs.py b/examples/docs_snippets/docs_snippets/concepts/assets/materialization_jobs.py