Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/SageMaker): Remove deprecated apis and add stateful ingestion capability #10573

Merged
merged 2 commits into from
May 28, 2024

Conversation

TonyOuyangGit
Copy link
Contributor

This PR has the following improvements:

  1. remove deprecated APIs: Amazon Sagemaker Edge retired on April 26th, 2024, so the calls to any edge packaging-related jobs will fail with errors: https://docs.aws.amazon.com/sagemaker/latest/dg/edge-eol.html, we should clean up all edge packaging related code
  2. according to https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAutoMLJobV2.html, DescribeAutoMLJobV2 are the new versions of CreateAutoMLJob and DescribeAutoMLJob which offer backward compatibility, the PR updates this API call to use this newer API
  3. adds stateful ingestion capability to SageMaker ingestion

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels May 23, 2024
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small thing, otherwise looks good

class SagemakerSourceConfig(AwsSourceConfig):
class SagemakerSourceConfig(
AwsSourceConfig,
PlatformInstanceConfigMixin,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like this source supports platform instances, so having it in the config might be misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It's interesting that if I give platform_instance in the ingestion recipe, then I can see the platform instance available when browsing ML models, it's in the response of graphQL query getBrowseResultsV2 even though each ML models entity does not have platform_instance.

Do you think I can make some more improvements to support platform instances? Most of the make_ml_model..*_urn code is in mce_builder.py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah adding support for it isn't too bad. it involves

  1. adding platform instance to the config (you already did this)
  2. adding the platform instance as a prefix to all urns. modifying the helpers in mce_builder.py to take an optional platform_instance is the right approach there
  3. emit dataPlatformInstance aspects for every entity

the browse path v2 aspects will get auto-generated based on everything else

@treff7es treff7es merged commit a5515c5 into datahub-project:master May 28, 2024
58 checks passed
hsheth2 added a commit that referenced this pull request May 28, 2024
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants