Skip to content

Commit

Permalink
feat: vulnerability rescan of K8s workloads based on report TTL (#879)
Browse files Browse the repository at this point in the history
This patch adds the controller to manage TTL of vulnerability reports.
Based on the TTL annotations the controller deletes obsolete vulnerability reports,
which triggers rescan.

You must set the OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL
environment variable to enable this feature.

Resolves: #537 

Signed-off-by: Edvin Norling <edvin.norling@xenit.se>
Co-authored-by: Daniel Pacak <pacak.daniel@gmail.com>
  • Loading branch information
Edvin N and danielpacak committed Jan 12, 2022
1 parent 922ec04 commit ab3974f
Show file tree
Hide file tree
Showing 13 changed files with 194 additions and 22 deletions.
1 change: 1 addition & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,7 @@ basic development workflow. For other install modes see [Operator Multitenancy w
OPERATOR_VULNERABILITY_SCANNER_ENABLED=true \
OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS=false \
OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED=true \
OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL="" \
OPERATOR_BATCH_DELETE_LIMIT=3 \
OPERATOR_BATCH_DELETE_DELAY="30s" \
go run cmd/starboard-operator/main.go
Expand Down
2 changes: 2 additions & 0 deletions deploy/helm/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ spec:
value: {{ .Values.operator.kubernetesBenchmarkEnabled | quote }}
- name: OPERATOR_VULNERABILITY_SCANNER_ENABLED
value: {{ .Values.operator.vulnerabilityScannerEnabled | quote }}
- name: OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL
value: {{ .Values.operator.vulnerabilityScannerReportTTL | quote }}
- name: OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED
value: {{ .Values.operator.configAuditScannerEnabled | quote }}
- name: OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS
Expand Down
2 changes: 2 additions & 0 deletions deploy/helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ operator:

# vulnerabilityScannerEnabled the flag to enable vulnerability scanner
vulnerabilityScannerEnabled: true
# vulnerabilityScannerReportTTL the flag to set how long a vulnerability report should exist. "" means that the vulnerabilityScannerReportTTL feature is disabled
vulnerabilityScannerReportTTL: ""
# configAuditScannerEnabled the flag to enable configuration audit scanner
configAuditScannerEnabled: true
# kubernetesBenchmarkEnabled the flag to enable CIS Kubernetes Benchmark scanner
Expand Down
2 changes: 2 additions & 0 deletions deploy/static/04-starboard-operator.deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ spec:
value: "true"
- name: OPERATOR_VULNERABILITY_SCANNER_ENABLED
value: "true"
- name: OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL
value: ""
- name: OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED
value: "true"
- name: OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS
Expand Down
39 changes: 20 additions & 19 deletions docs/operator/configuration.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
Configuration of the operator's Pod is done via environment variables at startup.

| NAME | DEFAULT | DESCRIPTION |
| ------------------------------------------------------------ | -------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `OPERATOR_NAMESPACE` | N/A | See [Install modes](#install-modes) |
| `OPERATOR_TARGET_NAMESPACES` | N/A | See [Install modes](#install-modes) |
| `OPERATOR_SERVICE_ACCOUNT` | `starboard-operator` | The name of the service account assigned to the operator's pod |
| `OPERATOR_LOG_DEV_MODE` | `false` | The flag to use (or not use) development mode (more human-readable output, extra stack traces and logging information, etc). |
| `OPERATOR_SCAN_JOB_TIMEOUT` | `5m` | The length of time to wait before giving up on a scan job |
| `OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT` | `10` | The maximum number of scan jobs create by the operator |
| `OPERATOR_SCAN_JOB_RETRY_AFTER` | `30s` | The duration to wait before retrying a failed scan job |
| `OPERATOR_BATCH_DELETE_LIMIT` | `10` | The maximum number of config audit reports deleted by the operator when the plugin's config has changed. |
| `OPERATOR_BATCH_DELETE_DELAY` | `10s` | The duration to wait before deleting another batch of config audit reports. |
| `OPERATOR_METRICS_BIND_ADDRESS` | `:8080` | The TCP address to bind to for serving [Prometheus][prometheus] metrics. It can be set to `0` to disable the metrics serving. |
| `OPERATOR_HEALTH_PROBE_BIND_ADDRESS` | `:9090` | The TCP address to bind to for serving health probes, i.e. `/healthz/` and `/readyz/` endpoints. |
| `OPERATOR_CIS_KUBERNETES_BENCHMARK_ENABLED` | `true` | The flag to enable CIS Kubernetes Benchmark scanner |
| `OPERATOR_VULNERABILITY_SCANNER_ENABLED` | `true` | The flag to enable vulnerability scanner |
| `OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED` | `true` | The flag to enable configuration audit scanner |
| `OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS` | `false` | The flag to enable vulnerability scanner to only scan the current revision of a deployment |
| `OPERATOR_LEADER_ELECTION_ENABLED` | `false` | The flag to enable operator replica leader election |
| `OPERATOR_LEADER_ELECTION_ID` | `starboard-lock` | The name of the resource lock for leader election |
| NAME | DEFAULT | DESCRIPTION |
| ------------------------------------------------------------ | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `OPERATOR_NAMESPACE` | N/A | See [Install modes](#install-modes) |
| `OPERATOR_TARGET_NAMESPACES` | N/A | See [Install modes](#install-modes) |
| `OPERATOR_SERVICE_ACCOUNT` | `starboard-operator` | The name of the service account assigned to the operator's pod |
| `OPERATOR_LOG_DEV_MODE` | `false` | The flag to use (or not use) development mode (more human-readable output, extra stack traces and logging information, etc). |
| `OPERATOR_SCAN_JOB_TIMEOUT` | `5m` | The length of time to wait before giving up on a scan job |
| `OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT` | `10` | The maximum number of scan jobs create by the operator |
| `OPERATOR_SCAN_JOB_RETRY_AFTER` | `30s` | The duration to wait before retrying a failed scan job |
| `OPERATOR_BATCH_DELETE_LIMIT` | `10` | The maximum number of config audit reports deleted by the operator when the plugin's config has changed. |
| `OPERATOR_BATCH_DELETE_DELAY` | `10s` | The duration to wait before deleting another batch of config audit reports. |
| `OPERATOR_METRICS_BIND_ADDRESS` | `:8080` | The TCP address to bind to for serving [Prometheus][prometheus] metrics. It can be set to `0` to disable the metrics serving. |
| `OPERATOR_HEALTH_PROBE_BIND_ADDRESS` | `:9090` | The TCP address to bind to for serving health probes, i.e. `/healthz/` and `/readyz/` endpoints. |
| `OPERATOR_CIS_KUBERNETES_BENCHMARK_ENABLED` | `true` | The flag to enable CIS Kubernetes Benchmark scanner |
| `OPERATOR_VULNERABILITY_SCANNER_ENABLED` | `true` | The flag to enable vulnerability scanner |
| `OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED` | `true` | The flag to enable configuration audit scanner |
| `OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS` | `false` | The flag to enable vulnerability scanner to only scan the current revision of a deployment |
| `OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL` | `""` | The flag to set how long a vulnerability report should exist. When a old report is deleted a new one will be created by the controller. It can be set to `""` to disabled the TTL for vulnerability scanner. |
| `OPERATOR_LEADER_ELECTION_ENABLED` | `false` | The flag to enable operator replica leader election |
| `OPERATOR_LEADER_ELECTION_ID` | `starboard-lock` | The name of the resource lock for leader election |

## Install Modes

Expand Down
7 changes: 7 additions & 0 deletions docs/operator/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,13 @@ No resources found in default namespace.
!!! Tip
Use `vuln` and `configaudit` as short names for `vulnerabilityreports` and `configauditreports` resources.

To be sure that your vulnerabilityreports is is up to date with the latest CVE:s you can define
how long your vulnerabilityreports should be in the cluster before automatically getting deleted.
For example setting `OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL=24h` would delete the report after 24 hours.
When the vulnerabilityreports gets deleted starboard will automatically create a new job and scan the images again.
Assuming that your image scan solution have updated it's DB the new vulnerabilityreports that gets created will contain the latest CVE:s.
This feature is disabled by default.

## Infrastructure Scanning

The operator discovers also Kubernetes nodes and runs CIS Kubernetes Benchmark checks on each of them. The results are
Expand Down
4 changes: 4 additions & 0 deletions pkg/apis/aquasecurity/v1alpha1/common_types.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
package v1alpha1

const (
TTLReportAnnotation = "starboard.aquasecurity.github.io/report-ttl"
)

// Scanner is the spec for a scanner generating a security assessment report.
type Scanner struct {
// Name the name of the scanner.
Expand Down
96 changes: 96 additions & 0 deletions pkg/operator/controller/ttl_report.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
package controller

import (
"context"
"fmt"
"time"

"github.com/aquasecurity/starboard/pkg/apis/aquasecurity/v1alpha1"
"github.com/aquasecurity/starboard/pkg/operator/etc"
"github.com/aquasecurity/starboard/pkg/operator/predicate"
"github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/api/errors"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
)

type TTLReportReconciler struct {
logr.Logger
etc.Config
client.Client
}

func (r *TTLReportReconciler) SetupWithManager(mgr ctrl.Manager) error {
installModePredicate, err := predicate.InstallModePredicate(r.Config)
if err != nil {
return err
}

err = ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.VulnerabilityReport{}, builder.WithPredicates(
predicate.Not(predicate.IsBeingTerminated),
installModePredicate)).
Complete(r.reconcileReport())
if err != nil {
return err
}
return nil
}

func (r *TTLReportReconciler) reconcileReport() reconcile.Func {
return func(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Logger.WithValues("report", req.NamespacedName)

report := &v1alpha1.VulnerabilityReport{}
err := r.Client.Get(ctx, req.NamespacedName, report)
if err != nil {
if errors.IsNotFound(err) {
log.V(1).Info("Ignoring cached report that must have been deleted")
return ctrl.Result{}, nil
}
return ctrl.Result{}, fmt.Errorf("getting report from cache: %w", err)
}

ttlReportAnnotationStr, ok := report.Annotations[v1alpha1.TTLReportAnnotation]
if !ok {
log.V(1).Info("Ignoring report without TTL set")
return ctrl.Result{}, nil
}

reportTTLTime, err := time.ParseDuration(ttlReportAnnotationStr)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed parsing %v with value %v %w", v1alpha1.TTLReportAnnotation, ttlReportAnnotationStr, err)
}
creationTime := report.Report.UpdateTimestamp
ttlExpired, durationToTTLExpiration, err := ttlIsExpired(reportTTLTime, creationTime.Time)
if err != nil {
return ctrl.Result{}, err
}
if ttlExpired {
log.V(1).Info("Removing vulnerabilityReport with expired TTL")
err := r.Client.Delete(ctx, report, &client.DeleteOptions{})
if err != nil && !errors.IsNotFound(err) {
return ctrl.Result{}, err
}
// Since the report is deleted there is no reason to requeue
return ctrl.Result{}, nil
}
log.V(1).Info("RequeueAfter", "durationToTTLExpiration", durationToTTLExpiration)
return ctrl.Result{RequeueAfter: durationToTTLExpiration}, nil
}
}

func ttlIsExpired(reportTTL time.Duration, creationTime time.Time) (bool, time.Duration, error) {
expiresAt := creationTime.Add(reportTTL)
currentTime := time.Now()
isExpired := currentTime.After(expiresAt)

if isExpired {
return true, time.Duration(0), nil
}

expiresIn := expiresAt.Sub(currentTime)
return false, expiresIn, nil
}
28 changes: 28 additions & 0 deletions pkg/operator/controller/ttl_report_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package controller

import (
"testing"
"time"

"github.com/stretchr/testify/assert"
)

func TestTTLIsExpired(t *testing.T) {
ttlReportAnnotationStr := "10h"
ttlReportTime, _ := time.ParseDuration(ttlReportAnnotationStr)
creationTime := time.Now()
ttlExpired, _, err := ttlIsExpired(ttlReportTime, creationTime)
assert.NoError(t, err)
assert.False(t, ttlExpired)
}

func TestTTLIsNotExpired(t *testing.T) {
ttlReportAnnotationStr := "10s"
ttlReportTime, _ := time.ParseDuration(ttlReportAnnotationStr)
creationTime := time.Now()
then := creationTime.Add(time.Duration(-10) * time.Minute)
ttlExpired, durationToTTLExp, err := ttlIsExpired(ttlReportTime, then)
t.Logf("Duration to ttl expiration %s, we should rescheduel check", durationToTTLExp)
assert.NoError(t, err)
assert.True(t, ttlExpired)
}
12 changes: 9 additions & 3 deletions pkg/operator/controller/vulnerabilityreport.go
Original file line number Diff line number Diff line change
Expand Up @@ -360,12 +360,18 @@ func (r *VulnerabilityReportReconciler) processCompleteScanJob(ctx context.Conte
}
_ = logsStream.Close()

report, err := vulnerabilityreport.NewReportBuilder(r.Client.Scheme()).
reportBuilder := vulnerabilityreport.NewReportBuilder(r.Client.Scheme()).
Controller(owner).
Container(containerName).
Data(reportData).
PodSpecHash(podSpecHash).
Get()
PodSpecHash(podSpecHash)

if r.Config.VulnerabilityScannerReportTTL != nil {
reportBuilder.ReportTTL(r.Config.VulnerabilityScannerReportTTL)
}

report, err := reportBuilder.Get()

if err != nil {
return err
}
Expand Down
1 change: 1 addition & 0 deletions pkg/operator/etc/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ type Config struct {
LeaderElectionEnabled bool `env:"OPERATOR_LEADER_ELECTION_ENABLED" envDefault:"false"`
LeaderElectionID string `env:"OPERATOR_LEADER_ELECTION_ID" envDefault:"starboard-lock"`
VulnerabilityScannerScanOnlyCurrentRevisions bool `env:"OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS" envDefault:"false"`
VulnerabilityScannerReportTTL *time.Duration `env:"OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL"`
}

// GetOperatorConfig loads Config from environment variables.
Expand Down
10 changes: 10 additions & 0 deletions pkg/operator/operator.go
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,16 @@ func Start(ctx context.Context, buildInfo starboard.BuildInfo, operatorConfig et
}).SetupWithManager(mgr); err != nil {
return fmt.Errorf("unable to setup vulnerabilityreport reconciler: %w", err)
}

if operatorConfig.VulnerabilityScannerReportTTL != nil {
if err = (&controller.TTLReportReconciler{
Logger: ctrl.Log.WithName("reconciler").WithName("ttlreport"),
Config: operatorConfig,
Client: mgr.GetClient(),
}).SetupWithManager(mgr); err != nil {
return fmt.Errorf("unable to setup TTLreport reconciler: %w", err)
}
}
}

if operatorConfig.ConfigAuditScannerEnabled {
Expand Down
12 changes: 12 additions & 0 deletions pkg/vulnerabilityreport/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ type ReportBuilder struct {
container string
hash string
data v1alpha1.VulnerabilityReportData
reportTTL *time.Duration
}

func NewReportBuilder(scheme *runtime.Scheme) *ReportBuilder {
Expand Down Expand Up @@ -171,6 +172,11 @@ func (b *ReportBuilder) Data(data v1alpha1.VulnerabilityReportData) *ReportBuild
return b
}

func (b *ReportBuilder) ReportTTL(ttl *time.Duration) *ReportBuilder {
b.reportTTL = ttl
return b
}

func (b *ReportBuilder) reportName() string {
kind := b.controller.GetObjectKind().GroupVersionKind().Kind
name := b.controller.GetName()
Expand Down Expand Up @@ -199,6 +205,12 @@ func (b *ReportBuilder) Get() (v1alpha1.VulnerabilityReport, error) {
},
Report: b.data,
}

if b.reportTTL != nil {
report.Annotations = map[string]string{
v1alpha1.TTLReportAnnotation: b.reportTTL.String(),
}
}
err := kube.ObjectToObjectMetadata(b.controller, &report.ObjectMeta)
if err != nil {
return v1alpha1.VulnerabilityReport{}, err
Expand Down

0 comments on commit ab3974f

Please sign in to comment.