Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report: [Catalog][Github Plugin] Etag not respected #14614

Closed
2 tasks done
angeliski opened this issue Nov 14, 2022 · 3 comments
Closed
2 tasks done

🐛 Bug Report: [Catalog][Github Plugin] Etag not respected #14614

angeliski opened this issue Nov 14, 2022 · 3 comments
Labels
area:catalog Related to the Catalog Project Area bug Something isn't working help wanted Help/Contributions wanted from community members

Comments

@angeliski
Copy link
Contributor

📜 Description

I am getting a weird behavior on update the Catalog using the Gihub provider.

First I have 402 locations, and I expect when the catalog processor run, only the changed ETag consumes rate limit.
image

This Image is before the processors run.
image
This is after:
image

Result: 4320-4722=402

So I believe the Etag is not respected.
ps: I can understand in some cases my entities are really changing, but I have one I freeze just to run this test, so the number in the worst case should be 401 not 402

👍 Expected behavior

I am expecting the catalog don't consumes rate limit for not changed entities. In the best case, my rate limit is the same number after run the processors.

👎 Actual Behavior with Screenshots

First I have 402 locations, and I expect when the catalog processor run, only the changed ETag consumes rate limit.
image

This Image is before the processors run.
image
This is after:
image

Result: 4320-4722=402

So I believe the Etag is not respected.
ps: I can understand in some cases my entities are really changing, but I have one I freeze just to run this test, so the number in the worst case should be 401 not 402

👟 Reproduction steps

I created a basic repository with the changes I used:
https://github.com/angeliski/backstage-test/compare/rate-limit-catalog-issue?expand=1

Important to notice, my organization is 'resultadosDigitais' (@ResultadosDigitais), so my test is based in this org.
I am running this example with Github token just for simplicity, I am having this same issue using Github apps (just the rate limit is different)

📃 Provide the context for the Bug.

I am having problems with Gihub App rate limit, so I am trying to find why this is happening.

We have more or less 400 repositories, but the update frequency for they are low (excluding some big repositories), so the Etag behavior should give me the necessary to avoid this, but isn't working how I expect.

I can't find where the Etag is stored, because the Component,Location and the real Etag are all different, so I am a little confuse where this information is stored.

image

🖥️ Your Environment

yarn run v1.22.18
$ /home/ROGERIO.ANGELISKI/workspace/backstage-test/node_modules/.bin/backstage-cli info
OS: Linux 5.14.0-1054-oem - linux/x64
node: v16.14.0
yarn: 1.22.18
cli: 0.20.0 (installed)
backstage: 1.7.1

Dependencies:
@backstage/app-defaults 1.0.7
@backstage/backend-common 0.15.2
@backstage/backend-plugin-api 0.1.3
@backstage/backend-tasks 0.3.6
@backstage/catalog-client 1.1.1
@backstage/catalog-model 1.1.2
@backstage/cli-common 0.1.10
@backstage/cli 0.20.0
@backstage/config-loader 1.1.5
@backstage/config 1.0.3
@backstage/core-app-api 1.1.1
@backstage/core-components 0.11.2
@backstage/core-plugin-api 1.0.7
@backstage/errors 1.1.2
@backstage/integration-react 1.1.5
@backstage/integration 1.3.2
@backstage/plugin-api-docs 0.8.10
@backstage/plugin-app-backend 0.3.37
@backstage/plugin-auth-backend 0.17.0
@backstage/plugin-auth-node 0.2.6
@backstage/plugin-catalog-backend-module-github 0.1.8
@backstage/plugin-catalog-backend 1.5.0
@backstage/plugin-catalog-common 1.0.7
@backstage/plugin-catalog-graph 0.2.22
@backstage/plugin-catalog-import 0.9.0
@backstage/plugin-catalog-node 1.2.0
@backstage/plugin-catalog-react 1.2.0
@backstage/plugin-catalog 1.6.0
@backstage/plugin-github-actions 0.5.10
@backstage/plugin-org 0.5.10
@backstage/plugin-permission-common 0.7.0
@backstage/plugin-permission-node 0.7.0
@backstage/plugin-permission-react 0.4.6
@backstage/plugin-proxy-backend 0.2.31
@backstage/plugin-scaffolder-backend 1.7.0
@backstage/plugin-scaffolder-common 1.2.1
@backstage/plugin-scaffolder 1.7.0
@backstage/plugin-search-backend-module-pg 0.4.1
@backstage/plugin-search-backend-node 1.0.3
@backstage/plugin-search-backend 1.1.0
@backstage/plugin-search-common 1.1.0
@backstage/plugin-search-react 1.2.0
@backstage/plugin-search 1.0.3
@backstage/plugin-tech-radar 0.5.17
@backstage/plugin-techdocs-backend 1.4.0
@backstage/plugin-techdocs-module-addons-contrib 1.0.5
@backstage/plugin-techdocs-node 1.4.1
@backstage/plugin-techdocs-react 1.0.5
@backstage/plugin-techdocs 1.3.3
@backstage/plugin-user-settings 0.5.0
@backstage/release-manifests 0.0.6
@backstage/test-utils 1.2.1
@backstage/theme 0.2.16
@backstage/types 1.0.0
@backstage/version-bridge 1.0.1
Done in 0.51s.

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

@angeliski angeliski added the bug Something isn't working label Nov 14, 2022
@Rugvip Rugvip added area:catalog Related to the Catalog Project Area help wanted Help/Contributions wanted from community members labels Nov 17, 2022
@Rugvip
Copy link
Member

Rugvip commented Nov 17, 2022

Unable to reproduce this with this minimal setup in the example app in this repo:

 import { EntityProvider } from '@backstage/plugin-catalog-node';
 import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend';
 import { Router } from 'express';
+import { GithubEntityProvider } from '@backstage/plugin-catalog-backend-module-github';
 import { PluginEnvironment } from '../types';
 
 export default async function createPlugin(
@@ -25,6 +26,18 @@ export default async function createPlugin(
   providers?: Array<EntityProvider>,
 ): Promise<Router> {
   const builder = await CatalogBuilder.create(env);
+
+  builder.addEntityProvider(
+    GithubEntityProvider.fromConfig(env.config, {
+      logger: env.logger,
+      schedule: env.scheduler.createScheduledTaskRunner({
+        frequency: { minutes: 2 },
+        timeout: { minutes: 3 },
+      }),
+    }),
+  );
+  builder.setProcessingIntervalSeconds(20);
+
   builder.addProcessor(new ScaffolderEntitiesProcessor());
   builder.addEntityProvider(providers ?? []);
   const { processingEngine, router } = await builder.build();

Along with this in config, and a plain access token:

catalog:
  providers:
    github:
      default:
        organization: '<test-org>' # <- small org with 4 repos and 1 entity each
        catalogPath: '/catalog-info.yaml'
        validateLocationsExist: true
        filters:
          branch: 'master'
          repository: '.*'

Along with the following separate curl to keep track of the rate limits:

curl -H 'Authorization: Bearer <token>' 'https://api.github.com/rate_limit' | jq .resources.core .resources.graphql

I'm getting cache hits with the ETag, and added a bit of debug logging to verify that too. The core resource rate limit stays at 4, while the GraphQL one slowly increases as the provider checks the repos in the org.

Anything obvious difference that you can spot? Does this setup work if you try it here in the main repo, or are you seeing the issue here too?

@angeliski
Copy link
Contributor Author

Hey @Rugvip your example helped me to find the issue. The Etag is respected, my "problem" is about the unecessary locations generated which return 404 (and spend rate limit of course).

I found old locations (from GithubOrgReaderProcessor) and repositories that don't match my providers config (our version isn't with the feature validateLocationsExist enabled yet, we need to upgrade to 1.8),

I will close this issue (isn't really a bug), but could you help me to understand how we can get a more clean catalog?

And another question, Is there a way to use different github apps from Catalog and Scaffolder? (to avoid impact in creation process when the catalog is rate limited)

@Rugvip
Copy link
Member

Rugvip commented Nov 17, 2022

I will close this issue (isn't really a bug), but could you help me to understand how we can get a more clean catalog?

See #14574 as a general entry point to that issue 😅. We've got some ideas for solutions but nothing being worked on just yet.

And another question, Is there a way to use different github apps from Catalog and Scaffolder? (to avoid impact in creation process when the catalog is rate limited)

Not different apps afaik, only separate auth by for example using user tokens for the scaffolding. It can be implemented but it requires you to write a custom GithubCredentialsProvider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:catalog Related to the Catalog Project Area bug Something isn't working help wanted Help/Contributions wanted from community members
Projects
None yet
Development

No branches or pull requests

2 participants