New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes Plugin 500 errors #9942
Comments
|
@mclarke47 As mentioned, I have raised this as a github issue. Thank you |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
@asaini11 I am going through old issues and just wanted to check in, what's your status for this one? |
|
Hi @freben this is still an issue but we are on Backstage version 0.4.14. I am going to try and update to the latest and see if this still persists. Thank you. |
|
I was getting this same error until I adjusted the ClusterRole attached to the ServiceAccount used for Backstage. It's not exactly clear what the required permissions are... |
|
@freben we have updated our backstage version to 1.1.0 and we are still getting the same issue so would like to keep this open till we have a fix. |
|
After updating to the latest version it was failing because |
|
Thank you @chriscarpenter12 I'll give this a go as we're currently using a GCP IAM policy which has the permissions role defined there so will try this way and let you know how I get on. |
|
Have tagged @mclarke47 on discord to see if we can get a permanent fix on this/further input. |
|
I wonder does using the configuration option |
|
K8s version is: |
|
I think this is a bug, we see this sometimes as well and we're on Azure, not Google. As we have 6 clusters, sometimes we get throttled by Azure and one or more requests may fail, which causes this banner to show and never disappear. We're doing a couple of things to mitigate these errors:
|
|
Hi @goenning that's good to know that others are having the same issue using different config! I think I will try these things to see if it reduces it. However, it sounds like you have tried a few things but it's still not fixed. I'm thinking, as we do get a 200 after the 500 error it would be good if the message in the UI disappears so hopefully the users don't see it as a workaround for a smoother developer experience. @mclarke47 would also be good to get your thoughts on this. |
|
Hello @chriscarpenter12 I tried the K8 service account but this didn't work for us as we have 7 clusters so the Google Service Account works better for us (despite the error). |
|
Update: To limit the refresh rate temporarily I have removed 4/7 cluster configuration in the app-config file to only capture the important clusters. We are still seeing the issue but it doesn't occur as quickly/often. |
Expected Behavior
K8 plugin shouldn’t show 500 errors in logs and UI as our config is correct. Or the error message should go away in the UI once we receive a 200 or not show in the UI at all. We know our config is correct as we can see the pod information in the UI.
Actual Behavior
In the UI we can see the K8 plugin load successfully. After approx 3 mins we start seeing 500 timeout errors. These are being error handled using code & code
We would like to understand why this error handling was put in place and why the 500 errors are returned.
When we use developer tools in the browser (F12) we can see 10 second refresh requests as defined by code. As we can see 200 status codes we know our config is correct and we can successfully see pod data in the UI. However, after 3 mins we get a 500 error in the logs and the UI shows the error. Usually the k8 plugin refresh that is done after 10 seconds is a 200 http code but the error remains in the UI (only the cluster IP updates following the next 500 error). The 500 error happens approx every minute but in the UI the error doesn’t go away until the page is refreshed. Despite the error being there in the UI and the fact we do get 200 status codes too the pod information is still able to update. However, this does not allow a good user experience for our devs. For example, in the screenshots attached you can see we get a 500 and then a 200 status code. We get a few 200 codes until we hit a 500 again. Ideally the UI should remove the error message when the 200 http code has been hit. As the error remains in the UI, we have to refresh the UI for the errors to go away every 3 mins.
Errors in logs:
2022-03-02T09:47:25.428Z kubernetes error action=retrieveObjectsByServiceId service=core, error=Error: connect ETIMEDOUT ourClusterIP:443 type=pluginwhere core is one of our services.Following this error we see this in the UI:
There was a problem retrieving some Kubernetes resources for the entity: core. This could mean that the Error Reporting card is not completely accurate.Note. after some time we see the following in the logs
2022-03-02T09:48:00.730Z kubernetes error action=retrieveObjectsByServiceId service=core, error=FetchError: request to https://www.googleapis.com/oauth2/v4/token failed, reason: socket hang up type=pluginalthough this shouldn't happen as we are using a Google service account as per google’s docsFrom this error in the logs, we get the following message in the UI:
Errors: Request failed with 503 , upstream connect error or disconnect/reset before headers. reset reason: connection terminationAgain a refresh in the browser fixes this temporarily.
We have also seen the following error in the logs:
kubernetes error action=retrieveObjectsByServiceId service=core, error=HttpError: HTTP request failed type=pluginSteps to Reproduce
We have configured the K8 plugin using docs where we have the below config:
where we have mounted DEV_K8S_CONFIG_CA_DATA as a K8 secret and have also defined and mounted GOOGLE_APPLICATION_CREDENTIALS which is our gcp's service account json file. We have 7 clusters (one of which is dev). The other clusters follow a similar config.
Context
We would like a smooth developer user experience i.e the devs shouldn’t need to keep refreshing the UI to remove the error message.
Your Environment
We are hosting backstage in Kubernetes.
Backstage version 0.4.14
Related bug on github
Chrome browser
yarn backstage-cli info:Run command locally.
The text was updated successfully, but these errors were encountered: