Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): Provide more details when scan job fails #328

Merged
merged 2 commits into from
Jan 13, 2021

Conversation

danielpacak
Copy link
Contributor

@danielpacak danielpacak commented Jan 11, 2021

It might happen that a scan job cannot be created. In such cases Starboard CLI hangs forever because it watches the job's status to terminate (JobComplete or JobFailed). However, some failures are reported as Kubernetes events associated with the scan job.

This change adds the events informer to watch events associated with a scan job, and terminate the program when the event of type EventTypeWarning (= error for Kubernetes) is detected.

To reproduce you can delete the starboard service account in the starborad namespace:

kubectl delete sa -n starboard starboard

Before

$ kubectl starboard generate vulnerabilityreports deploy/nginx -v 3
I0111 14:33:00.321886   35566 cert_rotation.go:137] Starting client certificate rotation controller
I0111 14:33:00.351359   35566 scanner.go:66] Getting Pod template for workload: {Deployment nginx default}
I0111 14:33:00.357111   35566 scanner.go:72] Scanning with options: {ScanJobTimeout:0s DeleteScanJob:true}
I0111 14:33:00.358993   35566 runner.go:79] Running task and waiting forever
I0111 14:33:00.359382   35566 runnable_job.go:72] Creating job "starboard/92506d31-46ed-4095-a1b1-822e5f076a42"
I0111 14:33:00.372751   35566 reflector.go:175] Starting reflector *v1.Job (30m0s) from pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125
I0111 14:33:00.372772   35566 reflector.go:211] Listing and watching *v1.Job from pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125
HANGS FOREVER !!!

After

$ ./bin/starboard generate vulnerabilityreports deploy/nginx -v 3
I0111 14:31:48.312623   35506 cert_rotation.go:137] Starting client certificate rotation controller
I0111 14:31:48.327276   35506 scanner.go:66] Getting Pod template for workload: {Deployment nginx default}
I0111 14:31:48.331739   35506 scanner.go:72] Scanning with options: {ScanJobTimeout:0s DeleteScanJob:true}
I0111 14:31:48.333309   35506 runner.go:79] Running task and waiting forever
I0111 14:31:48.333339   35506 runnable_job.go:73] Creating job "starboard/37dc6d47-e5bf-46c9-9888-400105710fc6"
I0111 14:31:48.339370   35506 reflector.go:175] Starting reflector *v1.Event (30m0s) from pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125
I0111 14:31:48.339388   35506 reflector.go:211] Listing and watching *v1.Event from pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125
I0111 14:31:48.339391   35506 reflector.go:175] Starting reflector *v1.Job (30m0s) from pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125
I0111 14:31:48.339409   35506 reflector.go:211] Listing and watching *v1.Job from pkg/mod/k8s.io/client-go@v0.18.6/tools/cache/reflector.go:125
I0111 14:31:48.439720   35506 runner.go:83] Stopping runner on task completion with error: warning event received: Error creating: pods "37dc6d47-e5bf-46c9-9888-400105710fc6-" is forbidden: error looking up service account starboard/starboard: serviceaccount "starboard" not found (FailedCreate)
error: running scan job: warning event received: Error creating: pods "37dc6d47-e5bf-46c9-9888-400105710fc6-" is forbidden: error looking up service account starboard/starboard: serviceaccount "starboard" not found (FailedCreate)

Resolves: #70

Signed-off-by: Daniel Pacak pacak.daniel@gmail.com

@codecov
Copy link

codecov bot commented Jan 11, 2021

Codecov Report

Merging #328 (1d3d89f) into main (5af029b) will increase coverage by 0.32%.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
+ Coverage   65.70%   66.02%   +0.32%     
==========================================
  Files          57       57              
  Lines        2799     2820      +21     
==========================================
+ Hits         1839     1862      +23     
+ Misses        720      718       -2     
  Partials      240      240              
Impacted Files Coverage Δ
pkg/vulnerabilityreport/scanner.go 74.73% <ø> (+1.26%) ⬆️
pkg/kube/pod/manager.go 35.71% <40.00%> (+0.57%) ⬆️
pkg/kube/runnable_job.go 63.93% <78.57%> (+1.93%) ⬆️
pkg/configauditreport/writer.go 86.66% <0.00%> (ø)
pkg/polaris/scanner.go
pkg/polaris/converter.go
pkg/polaris/plugin.go 89.91% <0.00%> (ø)
pkg/configauditreport/scanner.go 80.55% <0.00%> (ø)
pkg/cmd/scan_configaudit.go 53.84% <0.00%> (+3.84%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5af029b...1d3d89f. Read the comment docs.

Resolves: #70

Signed-off-by: Daniel Pacak <pacak.daniel@gmail.com>
Copy link
Contributor

@lizrice lizrice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny question but otherwise LGTM

@@ -159,7 +160,10 @@ func (pw *Manager) GetPodByJob(ctx context.Context, job *batch.Job) (*core.Pod,
if err != nil {
return nil, err
}
return &podList.Items[0], nil
if len(podList.Items) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance podList can be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory if there's no error it won't be nil. However, we're practitioners so I added nil check to the podList 🤣

Signed-off-by: Daniel Pacak <pacak.daniel@gmail.com>
@danielpacak danielpacak merged commit 2a490f9 into main Jan 13, 2021
@danielpacak danielpacak deleted the cli_watch_events branch January 13, 2021 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Starboard could provide better error reporting when jobs fail
2 participants