Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4/n] add python api for replacing local file references with source control links #21675

Merged
merged 7 commits into from
May 15, 2024

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented May 6, 2024

Summary

Adds the link_to_source_control utility fn which converts all local source code reference metadata in the passed assets to references to source control. These local paths are mapped to the path in source control using a user-passed local git root. The path from the git root to the file locally is used as the filepath within source control.

For example:

@asset 
def my_asset() -> pd.DataFrame:
  ...

@asset 
def my_other_asset() -> pd.DataFrame:
  ...

defs = Definitions(
  assets=link_to_source_control(
    with_source_code_references([my_asset, my_other_asset]),
    source_control_url="https://github.com/dagster-io/dagster",
    source_control_branch="master",
    repository_root_absolute_path=file_relative_path(__file__, "../../"),
  )
)

A stacked PR will introduce a utility method that will wrap both link_to_source_control and with_source_code_references and will decide whether to link to source control based on whether the definitions are being loaded in a cloud context.

Test Plan

Unit tests.

@benpankow benpankow force-pushed the benpankow/add-code-source-link-frontend branch from e362410 to 978e409 Compare May 7, 2024 17:15
@benpankow benpankow force-pushed the benpankow/first-pass-source-control-link branch from 8497f0b to 5e81cbf Compare May 7, 2024 17:15
@benpankow benpankow force-pushed the benpankow/add-code-source-link-frontend branch from 978e409 to 7b3ff2a Compare May 7, 2024 17:48
@benpankow benpankow force-pushed the benpankow/first-pass-source-control-link branch 2 times, most recently from 3e2b861 to 4e1ad1b Compare May 7, 2024 18:04
Copy link

github-actions bot commented May 7, 2024

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-km41nn0l3-elementl.vercel.app
https://benpankow-first-pass-source-control-link.core-storybook.dagster-docs.io

Built with commit 52ed3c8.
This pull request is being automatically deployed with vercel-action

@benpankow benpankow marked this pull request as ready for review May 7, 2024 18:27
@benpankow benpankow requested review from sryza and rexledesma May 7, 2024 18:29
source_control_branch: str,
repository_root_absolute_path: Union[Path, str],
) -> Sequence[Union["AssetsDefinition", "SourceAsset", "CacheableAssetsDefinition"]]:
if "gitlab.com" in source_control_url:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure market share of these 2 is large enough for the fallback case not to matter too much, but worth flagging that we only explicitly can build code links for these right now

@benpankow benpankow requested a review from sryza May 7, 2024 23:23
@benpankow benpankow force-pushed the benpankow/add-code-source-link-frontend branch from 243a068 to bee4264 Compare May 7, 2024 23:23
@benpankow benpankow force-pushed the benpankow/first-pass-source-control-link branch from b5206d2 to 366b4d4 Compare May 7, 2024 23:23
@benpankow benpankow force-pushed the benpankow/add-code-source-link-frontend branch from bee4264 to 85750bb Compare May 8, 2024 15:43
Base automatically changed from benpankow/add-code-source-link-frontend to master May 8, 2024 15:45
Copy link
Member

@rexledesma rexledesma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving ts files to another PR would be nice for easier review.

Main question is whether we can infer some of these parameters on behalf of the user.


@experimental
@whitelist_for_serdes
class CodeReferencesMetadataValue(DagsterModel, MetadataValue["CodeReferencesMetadataValue"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: I think we can drop the s and make CodeReferences into just CodeReference?

"Reference" is one of those nice irregular nouns that can describe a plurality in its singular form (e.g. a "reference" can hold many "references")

Suggested change
class CodeReferencesMetadataValue(DagsterModel, MetadataValue["CodeReferencesMetadataValue"]):
class CodeReferenceMetadataValue(DagsterModel, MetadataValue["CodeReferencesMetadataValue"]):

Also need to update the docstring.

Copy link
Member Author

@benpankow benpankow May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is showing as new because the stacked PRs were rebased - it's not new in this one - same w/ the .ts files unfortunately. Should be fixed now at least.

See this PR for brief naming discussion - personally I bias towards References because it makes clear that the value can hold more than one reference. When I see reference in the wild I typically interpret it as singular unless it's used as an adjective (e.g. reference material).

return self


def local_source_path_from_fn(fn: Callable[..., Any]) -> Optional[LocalFileCodeReference]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use pathlib :)



@experimental
def link_to_source_control(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially infer this information on behalf of the user by running a git command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or just walking the folder structure (since in some cases the user may not install git in the target image, in the cloud context at least). Stacked PR will have a layered utility fn that will have default behavior.

@benpankow benpankow force-pushed the benpankow/first-pass-source-control-link branch from 395cffb to df47725 Compare May 9, 2024 00:06
@benpankow benpankow requested a review from rexledesma May 9, 2024 00:12
Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proliferation of these public helpers makes me a little nervous, especially because there's functionality that's specific to individual technologies. E.g. should this be part of dagster-github instead? I don't have a grand unifying philosophy of how they should be named which also makes me a little nervous.

So I think worth getting @schrockn 's thoughts too if he has bandwidth.

@benpankow thanks for bearing with the agonizing naming etc. process on these public APIs.

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sryza thanks for flagging me on this one. I too share the concern about exposing all of these top-level exports. I didn't realize we were exposing them at the top-level. I'd rather not do that. I was assuming we were building these to get a heartbeat up and running for our own usage, rather than having them be publicly exposed APIs.

python_modules/dagster/dagster/__init__.py Outdated Show resolved Hide resolved
UrlMetadataValue as UrlMetadataValue,
link_to_source_control as link_to_source_control,
with_source_code_references as with_source_code_references,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize with_source_code_references was a top-level export. Has this shipped already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it'll go out with next week's release, so we can definitely roll this back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will drop the top-level export for now.

Comment on lines +159 to +171
def _convert_local_path_to_source_control_path_single_definition(
base_source_control_url: str,
repository_root_absolute_path: str,
assets_def: Union["AssetsDefinition", "SourceAsset", "CacheableAssetsDefinition"],
) -> Union["AssetsDefinition", "SourceAsset", "CacheableAssetsDefinition"]:
from dagster._core.definitions.assets import AssetsDefinition

# SourceAsset doesn't have an op definition to point to - cacheable assets
# will be supported eventually but are a bit trickier
if not isinstance(assets_def, AssetsDefinition):
return assets_def

metadata_by_key = dict(assets_def.metadata_by_key) or {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I already tagged @smackesey in at some point where I saw similar code but it would be far preferable in my view to try to structure this to work on a canonicalized AssetsDefinition

@benpankow benpankow force-pushed the benpankow/first-pass-source-control-link branch from df47725 to 52ed3c8 Compare May 9, 2024 16:10
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool makes sense to me. Thanks for working through this!

Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as well!

@benpankow
Copy link
Member Author

benpankow commented May 15, 2024

Merge activity

  • May 15, 2:09 PM PDT: @benpankow started a stack merge that includes this pull request via Graphite.
  • May 15, 2:11 PM PDT: Graphite rebased this pull request as part of a merge.
  • May 15, 2:14 PM PDT: @benpankow merged this pull request with Graphite.

@benpankow benpankow force-pushed the benpankow/first-pass-source-control-link branch from 70dd254 to 395c0c6 Compare May 15, 2024 21:10
@benpankow benpankow merged commit 52cc1b8 into master May 15, 2024
1 check was pending
@benpankow benpankow deleted the benpankow/first-pass-source-control-link branch May 15, 2024 21:14
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
…control links (dagster-io#21675)

## Summary

Adds the `link_to_source_control` utility fn which converts all local source code reference metadata in the passed assets to references to source control. These local paths are mapped to the path in source control using a user-passed local git root. The path from the git root to the file locally is used as the filepath within source control.

For example:

```python
@asset 
def my_asset() -> pd.DataFrame:
  ...

@asset 
def my_other_asset() -> pd.DataFrame:
  ...

defs = Definitions(
  assets=link_to_source_control(
    with_source_code_references([my_asset, my_other_asset]),
    source_control_url="https://github.com/dagster-io/dagster",
    source_control_branch="master",
    repository_root_absolute_path=file_relative_path(__file__, "../../"),
  )
)

```

A stacked PR will introduce a utility method that will wrap both `link_to_source_control` and `with_source_code_references` and will decide whether to link to source control based on whether the definitions are being loaded in a cloud context.

## Test Plan

Unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants