docs(sda): fix typos and edit wording (#7136)

dagster-io · Mar 21, 2022 · 98e99c8 · 98e99c8 · vercel · Mar 21, 2022
1 parent 3509c98
commit 98e99c8
Show file tree

Hide file tree

Showing 7 changed files with 29 additions and 35 deletions.
diff --git a/docs/content/api/modules.json b/docs/content/api/modules.json
diff --git a/docs/content/api/searchindex.json b/docs/content/api/searchindex.json
diff --git a/docs/content/api/sections.json b/docs/content/api/sections.json
diff --git a/docs/content/guides/dagster/software-defined-assets.mdx b/docs/content/guides/dagster/software-defined-assets.mdx
@@ -7,14 +7,16 @@ description: The "software-defined asset" APIs sit atop of the graph/job/op APIs
 
 <CodeReferenceLink filePath="examples/software_defined_assets" />
 
-The "Software-defined asset" APIs sit atop of the graph/job/op APIs and enable a novel novel approach to orchestration that puts assets at the forefront. As a reminder, to Dagster, an "asset" is a data product: an object produced by a data pipeline, e.g. a table, ML model, or report.
+The software-defined asset APIs sit atop of the graph/job/op APIs and enable a novel approach to orchestration that puts assets at the forefront.
 
-Conceptually, software-defined assets invert the typical relationship between assets and computation. Instead of defining a graph of ops and recording which assets those ops end up materializing, you define a set of assets, each of which knows how to compute its contents from upstream assets.
+In Dagster, an "asset" is a data product, an object produced by a data pipeline. Some examples are tables, machine learning models, or reports.
+
+Conceptually, software-defined assets invert the typical relationship between assets and computation. Instead of defining a graph of ops and recording which assets those ops end up materializing, you define a set of assets. Each asset knows how to compute its contents from upstream assets.
 
 Taking a software-defined asset approach has a few main benefits:
 
-- **Write less code** - because each asset knows about the assets it depends on, you don't need to use `@graph` / `@job` to wire up dependencies between your ops.
-- **Track cross-job dependencies via asset lineage** - Dagit allows you to find the parents and children of any asset, even if they live in different jobs. This is useful for finding the sources of problems and for understanding the consequences of changing or removing an asset.
+- **Write less code** - Each asset knows about the assets it depends on; you don't need to use `@graph` / `@job` to wire up dependencies.
+- **Track cross-job dependencies via asset lineage** - Dagster allows you to find the parents and children of any asset, even if they live in different jobs. This is useful for finding the sources of problems and for understanding the consequences of changing or removing an asset.
 - **Know when you need to take action on an asset** - In a unified view, Dagster compares the assets you've defined in code to the assets you've materialized in storage. You can catch that you've deployed code for generating a new table, but that you haven't yet materialized it. Or that you've deployed code that adds a column to a table, but that your stored table is still missing that column. Or that you've removed an asset definition, but the table still exists in storage.
 
 In this example, we'll define some tables with dependencies on each other. We have a table of temperature samples collected in five-minute increments, and we want to compute a table of the highest temperatures for each day.
@@ -23,7 +25,7 @@ In this example, we'll define some tables with dependencies on each other. We ha
 
 ### Defining the assets
 
-Here are our asset (aka table) definitions.
+Here are our asset definitions that define tables we want to materialize.
 
 ```python file=../../software_defined_assets/software_defined_assets/assets.py startafter=start_marker endbefore=end_marker
 import pandas as pd
@@ -53,17 +55,17 @@ def hottest_dates(daily_temperature_highs: DataFrame) -> DataFrame:
 
 `sfo_q2_weather_sample` represents our base temperature table. It's a <PyObject module="dagster" object="SourceAsset" />, meaning that we rely on it, but don't generate it.
 
-`daily_temperature_highs` represents a computed asset. It's derived by taking the `sfo_q2_weather_sample` table and applying the decorated function to it. Notice that it's defined using a pure function - i.e. a function with no side effects, just logical data transformation. The code for storing and retrieving the data in persistent storage will be supplied later on in an <PyObject object="IOManager" /> - that allows swapping in different implementations in different environments. E.g. we might want to store data in a local CSV file for easy testing, but store data a data warehouse in production.
+`daily_temperature_highs` represents a computed asset. It's derived by taking the `sfo_q2_weather_sample` table and applying the decorated function to it. Notice that it's defined using a pure function, a function with no side effects, just logical data transformation. The code for storing and retrieving the data in persistent storage will be supplied later on in an <PyObject object="IOManager" />. This allows us to swap in different implementations in different environments. For example, in local development, we might want to store data in a local CSV file for easy testing. However in production, we would want to store data in a data warehouse.
 
-`hottest_dates` is a computed asset that depends on another computed asset - the `daily_temperture_highs` asset.
+`hottest_dates` is a computed asset that depends on another computed asset, `daily_temperature_highs`.
 
-The framework infers asset dependencies by looking at the names of the arguments to the decorated functions. E.g. the function that defines the `daily_temperature_highs` asset has an argument named `sfo_q2_weather_sample` - corresponding to the asset of the same name.
+The framework infers asset dependencies by looking at the names of the arguments to the decorated functions. The function that defines the `daily_temperature_highs` asset has an argument named `sfo_q2_weather_sample`, which corresponds to the asset definition of the same name.
 
 ### Combining the assets into a group
 
 Having defined some assets, we can combine them into an <PyObject object="AssetGroup" />, which allows working with them in Dagit. It also allows combining them with resources and IO managers that determine how they're stored and connect them to external services.
 
-It's common to use a utility like <PyObject object="AssetGroup" method="from_module" /> or \<PyObject object="AssetGroup" method='from_package_name" /> to pick up all the assets within a module or package, so you don't need to list them individually.
+It's common to use a utility like <PyObject object="AssetGroup" method="from_module" /> or <PyObject object="AssetGroup" method="from_package_name" /> to pick up all the assets within a module or package, so you don't need to list them individually.
 
 ```python file=../../software_defined_assets/software_defined_assets/weather_assets_group.py startafter=asset_group_start endbefore=asset_group_end
 # imports the module called "assets" from the package containing the current module
@@ -78,7 +80,7 @@ weather_assets = AssetGroup.from_modules(
 )
 ```
 
-The order that we supply the assets when constructing an <PyObject object="AssetGroup" /> doesn't matter - the dependencies are determined by what's declared inside each asset.
+The order that we supply the assets when constructing an <PyObject object="AssetGroup" /> doesn't matter, since the dependencies are determined by each asset definition.
 
 The functions we used to define our assets describe how to compute their contents, but not how to read and write them to persistent storage. For reading and writing, we define an <PyObject object="IOManager" />. In this case, our `LocalFileSystemIOManager` stores DataFrames as CSVs on the local filesystem:
 

diff --git a/docs/next/public/objects.inv b/docs/next/public/objects.inv
diff --git a/docs/sphinx/sections/api/apidocs/assets.rst b/docs/sphinx/sections/api/apidocs/assets.rst
@@ -15,6 +15,7 @@ A software-defined asset combines:
 .. autodecorator:: asset
 
 .. autoclass:: AssetGroup
+   :members:
 
 .. autodecorator:: multi_asset
 

diff --git a/python_modules/dagster/dagster/core/asset_defs/asset_group.py b/python_modules/dagster/dagster/core/asset_defs/asset_group.py
@@ -164,31 +164,22 @@ def build_job(
 
         Args:
             name (str): The name to give the job.
-            selection (Union[str, List[str]]): A single selection query or list of selection queries to execute. For example:
-                * ``['some_asset_key']``: selects ``some_asset_key`` itself.
-                * ``['*some_asset_key']``: select ``some_asset_key`` and all
-                    its ancestors (upstream dependencies).
-                * ``['*some_asset_key+++']``: select ``some_asset_key``, all
-                    its ancestors, and its descendants
-                    (downstream dependencies) within 3 levels down.
-                * ``['*some_asset_key', 'other_asset_key_a', 'other_asset_key_b
-                    +']``: select ``some_asset_key`` and all its
-                ancestors, ``other_asset_key_a`` itself, and
-                    ``other_asset_key_b`` and its direct child asset keys. When
-                    subselecting into a multi-asset, all of the asset keys in
-                    that multi-asset must be selected.
+            selection (Union[str, List[str]]): A single selection query or list of selection queries
+                to execute. For example:
+
+                    - ``['some_asset_key']`` select ``some_asset_key`` itself.
+                    - ``['*some_asset_key']`` select ``some_asset_key`` and all its ancestors (upstream dependencies).
+                    - ``['*some_asset_key+++']`` select ``some_asset_key``, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.
+                    - ``['*some_asset_key', 'other_asset_key_a', 'other_asset_key_b+']`` select ``some_asset_key`` and all its ancestors, ``other_asset_key_a`` itself, and ``other_asset_key_b`` and its direct child asset keys. When subselecting into a multi-asset, all of the asset keys in that multi-asset must be selected.
+
             executor_def (Optional[ExecutorDefinition]): The executor
                 definition to use when executing the job. Defaults to the
                 executor on the AssetGroup. If no executor was provided on the
-                AssetGroup, then it defaults to
-                :py:class:`multi_or_in_process_executor`.
-            tags (Optional[Dict[str, Any]]): Arbitrary metadata for any
-                execution of the Job.
-                Values that are not strings will be json encoded and must meet t
-                he criteria that
-                `json.loads(json.dumps(value)) == value`.  These tag values may
-                be overwritten by tag
-                values provided at invocation time.
+                AssetGroup, then it defaults to :py:class:`multi_or_in_process_executor`.
+            tags (Optional[Dict[str, Any]]): Arbitrary metadata for any execution of the job.
+                Values that are not strings will be json encoded and must meet the criteria that
+                `json.loads(json.dumps(value)) == value`.  These tag values may be overwritten
+                tag values provided at invocation time.
             description (Optional[str]): A description of the job.
 
         Examples: