Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add AwsS3EntityProvider as replacement for AwsS3DiscoveryProcessor #10480

Merged

Conversation

pjungermann
Copy link
Contributor

@pjungermann pjungermann commented Mar 28, 2022

Add a new provider AwsS3EntityProvider as a replacement for the now deprecated
AwsS3DiscoveryProcessor.

The new provider will scan configured S3 buckets (with optional) prefix and
add Location entities for all discovered catalog files.

These Location entities will then be processed as usual.

At each execution, the provider will apply a full mutation, replacing all previous
entities with the new entities/state.

Relates-to: #10183
Signed-off-by: Patrick Jungermann Patrick.Jungermann@gmail.com

Hey, I just made a Pull Request!

✔️ Checklist

  • A changeset describing the change and affected packages. (more info)
  • Added or updated documentation
  • Tests for new functionality and regression tests for bug fixes
  • Screenshots attached (for UI changes)
  • All your commits have a Signed-off-by line in the message. (more info)

@pjungermann pjungermann requested a review from a team as a code owner March 28, 2022 13:33
@github-actions github-actions bot added awaiting-review area:catalog Related to the Catalog Project Area documentation Improvements or additions to documentation labels Mar 28, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Mar 28, 2022

Changed Packages

Package Name Package Path Changeset Bump Current Version
@backstage/plugin-catalog-backend-module-aws plugins/catalog-backend-module-aws patch v0.1.4-next.1

@pjungermann pjungermann force-pushed the PJ_awsS3-provider branch 2 times, most recently from 09399fc to b05d156 Compare March 28, 2022 14:23
Copy link
Member

@jhaals jhaals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Patrick! 🙌 I've added some suggestions :)

.changeset/friendly-hairs-happen.md Show resolved Hide resolved
.changeset/friendly-hairs-happen.md Show resolved Hide resolved
plugins/catalog-backend-module-aws/api-report.md Outdated Show resolved Hide resolved
plugins/catalog-backend-module-aws/api-report.md Outdated Show resolved Hide resolved
@pjungermann
Copy link
Contributor Author

pjungermann commented Mar 29, 2022

I rebased on master since PR #10454 was merged to make it a bit more clean here

@benjdlambert
Copy link
Member

Hmm just wondering if we should skip the deprecation of the older stuff for now, just until we can confirm that the provider API's that we've settled on are right and do everything that we want before losing support for things that might have been supported in the Processor approach? Does that make sense for now at least?

@pjungermann
Copy link
Contributor Author

Hmm just wondering if we should skip the deprecation of the older stuff for now, just until we can confirm that the provider API's that we've settled on are right and do everything that we want before losing support for things that might have been supported in the Processor approach? Does that make sense for now at least?

@benjdlambert the provider does the same thing as the processor -- just that changes will not cause orphans. Also, deprecating it will not remove it yet. This breaking change could come at a later step.

But of course, deprecating it suggests users to migrate.

At the moment, I don't see a value of the processor anymore though. But maybe I miss something.

If you prefer to remove the deprecation it can be done, of course. Not sure about the docs then though. I would find it a bit confusing to have both there (as user).

@benjdlambert
Copy link
Member

Yeah sorry, I didn't mean in your implementation, I just meant from our side let's not get people to migrate until we're settled on Providers for the catalog. Maybe you could break the documentation up into Recommended and Other? @freben thoughts here on what we should be encouraging?

@pjungermann
Copy link
Contributor Author

@benjdlambert as a user, I would prefer providers due to the orphans created when using processors.

Also a reason why I contributed this one and plan to contribute a few more.

As next level, I would see event-based updates vs those (full) syncs. A combination like initial full sync + event-based updates + full syncs once in a while as auto-healing would work fine.

For event-based updates, webhook-based change events as from SCM systems or other evens like S3 events can be interesting, maybe buffered in a queue; so general JMS support might make sense. Inside backstage, this could be solved with adaptors for the different providers converting events into a unified event schema or triggering APIs directly.
I know there are a couple of thoughts floating around for this and all these need to go through an RFC first.
Just for context.

@pjungermann
Copy link
Contributor Author

processors are still great to enhance/modify entities. Ingestion I would leave to providers.

Copy link
Member

@freben freben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, it's getting way too late, some early comments before reading through fully

.changeset/friendly-hairs-happen.md Outdated Show resolved Hide resolved
.changeset/friendly-hairs-happen.md Show resolved Hide resolved
docs/integrations/aws-s3/discovery.md Show resolved Hide resolved
plugins/catalog-backend-module-aws/config.d.ts Outdated Show resolved Hide resolved
@pjungermann
Copy link
Contributor Author

just added the option to omit the provider ID in the config for the single config use case using "default" as ID.

# app-config.yaml

catalog:
  providers:
    awsS3:
      # uses "default" as provider ID
      bucketName: sample-bucket
      prefix: prefix/ # optional
      region: us-east-2 # optional, uses the default region otherwise

…cessor`

Add a new provider `AwsS3EntityProvider` as a replacement for the now deprecated
`AwsS3DiscoveryProcessor`.

The new provider will scan configured S3 buckets (with optional) prefix and
add `Location` entities for all discovered catalog files.

These `Location` entities will then be processed as usual.

At each execution, the provider will apply a full mutation, replacing all previous
entities with the new entities/state.

Relates-to: backstage#10183
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
@pjungermann
Copy link
Contributor Author

@Rugvip @freben I believe it is ready to be merged :-)

Copy link
Member

@freben freben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge and tweak!

// and the default integration will be added as second integration.
// In this case, we still want the first one though, but have no means to select it
// just from the bucket name (and region).
const integration = ScmIntegrations.fromConfig(configRoot).awsS3.list()[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought for the future: this actually can never be undefined at the current point in time, but if we ever changed the behavior to not add a default entry silently, it might have been wise to be defensive here and throw an error if integration turned out to be undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I can create a PR to add this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up PR #10697

return `${endpoint}/${bucketName}/${key}`;
}

return `https://${bucketName}.s3.${this.config.region}.amazonaws.com/${key}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would have to check if the target was really an amazon one right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already check these cases in the lines before. If we don't point to the real AWS S3, we need to provide the endpoint config property (e.g., when LocalStack is used, etc.). So, I'd say it is already covered.

@freben freben merged commit 9f78f91 into backstage:master Apr 7, 2022
@pjungermann pjungermann deleted the PJ_awsS3-provider branch April 7, 2022 12:29
pjungermann added a commit to Bonial-International-GmbH/backstage that referenced this pull request Apr 7, 2022
Relates-to: PR backstage#10480
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
pjungermann added a commit to Bonial-International-GmbH/backstage that referenced this pull request Apr 7, 2022
Relates-to: PR backstage#10480
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
pjungermann added a commit to Bonial-International-GmbH/backstage that referenced this pull request Apr 7, 2022
Relates-to: PR backstage#10480
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
pjungermann added a commit to Bonial-International-GmbH/backstage that referenced this pull request Apr 7, 2022
Relates-to: PR backstage#10480
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
pjungermann added a commit to Bonial-International-GmbH/backstage that referenced this pull request Apr 8, 2022
Relates-to: PR backstage#10480
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
pjungermann added a commit to Bonial-International-GmbH/backstage that referenced this pull request Apr 8, 2022
Relates-to: PR backstage#10480
Signed-off-by: Patrick Jungermann <Patrick.Jungermann@gmail.com>
@pjungermann
Copy link
Contributor Author

I just realized that applying the "allow" catalog rule to locations emitted by an entity provider only supports the global config.

For the awsS3 processor, this was not really the case as the processor's Location can have a the Location-specific "allow" rule and itself only emits the entities contained within the S3 objects/catalog files. Hence, the rule was applied successfully.

For other processors which emit Locations as well like the SCM-related processors for github, gitlab, bitbucket, etc. should be impacted by this limitation as well.

In order to mitigate this, you need to ensure that to be added entities are allowed by the global rule.

Not yet sure what would be the right strategy to have a custom configuration for the provider (instance) and be more restrictive at the global rule. 🤔

@pjungermann
Copy link
Contributor Author

issue #12880 as follow up related to the "allow" rule

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:catalog Related to the Catalog Project Area documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants