Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backups failing with Failed to stream backup to remote kopia repository on Kopia API #258

Closed
allenporter opened this issue Jul 22, 2021 · 4 comments

Comments

@allenporter
Copy link
Owner

status:
  state: Failed
  startTime: 2021-07-22T01:57:20Z
  endTime: 2021-07-22T02:26:05Z
  restorePoint:
    name: ""
  error:
    cause: '{"fields":[{"name":"message","value":"{\"message\":\"Failed to move data
      from
      source\",\"function\":\"kasten.io/k10/kio/kanister/function.(*moverBackupToServerFunc).Exec\",\"linenumber\":134,\"fields\":[{\"name\":\"dataSource\",\"value\":\"http://10.106.88.156:8000/v0/backup\"}],\"cause\":{\"message\":\"Failed
      to stream backup to remote kopia repository on Kopia API
@allenporter
Copy link
Owner Author

From executor, its failing to talk to the data-mover job which is port 51515.

Errors:[]*models.Error{(*models.Error){Cause:map[string]interface{}{"cause":map[string]interface{}{"message":"unable to get repository parameters: error running http request: Get \"https://10.101.221.60:51515/api/v1/repo/parameters\": round-trip error: can't find certificate matching SHA256 fingerprint \"XXXX\" (server had [YYYY])"}, "function":"kasten.io/k10/kio/kopiaclient.OpenRepository", "linenumber":json.Number("108"), "message":"Failed to open Kopia repository"}, Fields:[]*models.Field{}, Message:"Job failed to be executed", Retriable:false}}, GroupIndex:1, ID:strfmt.UUID("ac1d19d1-ea98-11eb-a46a-b69c932c624b"), Manifest:models.ItemID("9a982304-ea98-11eb-b5b8-3eaa9d356af5"), 

@allenporter
Copy link
Owner Author

Last run succeeded. This is fairly flaky -- plus the rules are noisy.

@allenporter
Copy link
Owner Author

Rules at https://docs.kasten.io/latest/operating/monitoring.html#generating-alerts -- problem is that it alerts on every backup failure, rather than if the jobs are within policy or not.

The metric dashboardbff_compliance_count seems like it may be more useful since it tracks the # of jobs out of compliance.

allenporter added a commit that referenced this issue Jul 22, 2021
Track jobs that are out of compliance for more than a few hours

Issue #258
@allenporter
Copy link
Owner Author

allenporter commented Aug 25, 2021

Going with benji instead of k10. #274

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant