Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report: AWS catalog module package missing entity processors export #24711

Open
2 tasks done
niallthomson opened this issue May 9, 2024 · 8 comments
Open
2 tasks done
Labels
area:catalog Related to the Catalog Project Area bug Something isn't working help wanted Help/Contributions wanted from community members

Comments

@niallthomson
Copy link
Contributor

📜 Description

When the catalog-backend-module-aws package was updated to support the new backend import mechanism only the S3 entity provider was integrated. Processors like AwsOrganizationCloudAccountProcessor and AwsEKSClusterProcessor are not included.

👍 Expected behavior

If I enable the catalog-backend-module-aws package in the new backend index.ts like so:

backend.add(import('@backstage/plugin-catalog-backend-module-aws'));

I expect to be able to use the entity processors in that package.

👎 Actual Behavior with Screenshots

I can only use the S3 entity provider without additional code.

👟 Reproduction steps

  1. Create a Backstage app
  2. Import AWS catalog backend module yarn workspace backend add @backstage/plugin-catalog-backend-module-aws
  3. Edit packages/backend/src/index.ts with backend.add(import('@backstage/plugin-catalog-backend-module-aws'));
  4. Run application yarn dev

📃 Provide the context for the Bug.

No response

🖥️ Your Environment

No response

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

@niallthomson niallthomson added the bug Something isn't working label May 9, 2024
@github-actions github-actions bot added the area:catalog Related to the Catalog Project Area label May 9, 2024
@niallthomson
Copy link
Contributor Author

I noticed that the GitHub and GitLab catalog modules have broken out their "organization" providers in to a separate modules/packages that basically just declares the backend module for those providers but the implementation remain in the main GitHub/GitLab packages.

What I'm unclear about here is whether we should:

  1. Break these processors out to separate packages similar to above (seems like a lot of work?)
  2. Rebrand the existing catalogModuleAwsS3EntityProvider as something like catalogModuleAws and register all of the S3 entity provider plus the processors in the same module.
  3. Something else?

@vinzscam vinzscam added the help wanted Help/Contributions wanted from community members label May 10, 2024
@vinzscam
Copy link
Member

yes, the way to go is to create a new module for each of the processors.
It seems more work from a plugin's author perspective but it will much easier for adopters to install the modules using the new backend system.

@niallthomson
Copy link
Contributor Author

niallthomson commented May 11, 2024

To play devils advocate a bit: I'm not sure thats better for anyone? For the maintainers its more overhead and for the adopters its more steps to import multiple packages and likely more documentation to look through. And I'm not 100% clear on the benefits of splitting them. I understand a boundary has to be drawn somewhere and if this is the general guidance then I get where its coming from but looking specifically at a case like this I'm not sure I understand what anyone gains from the separate packages.

But if thats the recommended direction I can start to figure out what this will look like.

@freben
Copy link
Member

freben commented May 11, 2024

The way the backend system is made, it wants to consume the default export of packages. And there can only be one default export of course.

I wonder if this is the point where we could start talking about subpath exports. Ping @Rugvip

@niallthomson
Copy link
Contributor Author

niallthomson commented May 17, 2024

Perhaps related to this I was browsing #10183 and wondered whether these processors would also benefit from being converted to providers.

For example with many AWS customers I expect that their accounts change infrequently enough that being able to control the interval for the AWS Organizations processor would reduce the amount of API calls.

I also see that ingesting GKE clusters as Resource kinds is implemented as a provider instead of a processor in GkeEntityProvider. The EKS version is a processor.

It could be that the path here is to port these to providers and export those rather than exporting the existing processors. These processors are largely undocumented anyway. However I'm not sure what the downsides of switching to providers would be.

@Rugvip
Copy link
Member

Rugvip commented May 20, 2024

Yep, entity providers is the preferred pattern for ingesting additional entities. Definitely makes sense to migrate any such processors that don't currently have an equivalent provider.

In many cases we also keep supporting providers for backwards compatibility, and it may be that we want to start deprecation some of them.

Regarding the export structure in the new backend system, I think we can keep any processor modules on separate named exports, and stick to exporting the provider modules on the default exports. This effectively means that you need a bit of extra code to wire up a processor discovery module in the new system, but that's fine because it isn't the recommended approach.

@niallthomson
Copy link
Contributor Author

For my own education: are there negative consequences to just adding these processors in the existing module alongside the S3 provider? Not introducing more modules.

@Rugvip
Copy link
Member

Rugvip commented May 23, 2024

@niallthomson Yep, right now you can register any type of location through a Location entity. That means that anyone with the ability to add a Location entity to the catalog can also set up any form of ingestion with the available processors in the catalog. Worst case that can lead to information leakage as the catalog might be authorized to read content that the user is not.

The typical setup of custom processors is through catalog.locations config though, and I think it would be very reasonable to lock down what location types you're able to define in a Location entity. If we did that I don't see any reason we couldn't register all processors by default. Then again, discovery processors aren't really a thing we want to be encouraging in the first place, entity providers do the same thing but better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:catalog Related to the Catalog Project Area bug Something isn't working help wanted Help/Contributions wanted from community members
Projects
None yet
Development

No branches or pull requests

4 participants