docs(design): Caching Scan Results by Image Reference #740

danielpacak · 2021-10-08T17:31:09Z

Signed-off-by: Daniel Pacak pacak.daniel@gmail.com

itaysk

if the operator has permissions to the cluster, and the user has access to the operator (since it's namespaced), then effectively a user could get access to the cluster reports, which might contain other namespaces (tenants) reports?

danielpacak · 2021-10-11T14:56:57Z

if the operator has permissions to the cluster, and the user has access to the operator (since it's namespaced), then effectively a user could get access to the cluster reports, which might contain other namespaces (tenants) reports?

Not sure which user we're talking about here. In the proposed solution the Starboard operator pod is run with the service account (by default starboard-operator), which would require extended permissions to manage ClusterVulnerabilityReport on top of existing access to VulnerabilityReport kinds.

A user who's creating a workload is a different K8s RBAC subject. Hence, explicit role binding has to be created to grant him view access to namespaced or cluster-scoped vulnerability reports.

simar7 · 2021-10-29T20:19:38Z

docs/design/caching_scan_results_by_repo_digest.md

+As you can see, Starboard eventually created two VulnerabilityReports by spinning
+up only one K8s Job.


Which problem are we trying to solve: 1. Spinning less K8s jobs or 2. Creating less VulnerabilityReports?

I ask this because https://github.com/aquasecurity/starboard/pull/740/files#diff-fa88466a8291355bf7dddafd314bcf4277f669d956110f8fdfd319dd48f8147eR32-R35 mentions 2 jobs and 2 VulnerabilityReports. From this solution all we've done is 1 job and 2 VulnerabilityReports.

If long running jobs are the problem we're trying to solve, wouldn't using Trivy Client/Server model be a better idea? (Reasons: Less code, less bugs, single cache, and ultimately faster finishing jobs).

In this proposal we are just trying to create less K8s jobs.

For VulnerabilityReports we'll have a separate design if needed. Changing they way we create VulnerabilityReports won't be backward compatible and we have to plan for that carefully.

Starboard is already integrated with Trivy in Standalone and ClientServer mode.

By default we use Trivy in Standalone mode because it does not require deploying Trivy server. It's simpler, good for demos and trying things out, but is also least efficient.

Even if you switch from Standalone to ClientServer mode Starboard would still create K8s jobs. The only difference is that jobs will complete faster. Therefore, leveraging cache allows us to skip K8s job creation if an image with the same digest has already been scanned.

NissesSenap · 2022-01-04T18:25:20Z

I like the idea, the issue I see is that today can a developer that only have namespace access delete a vulnerabilityreport and starboard would retrigger a new job. Assuming that starboard is using a external trivy server trivy could have gotten a new CVE update and thanks to that it would be interesting to scan the new job.

This could probably be solved with a config value, If we add a TTL of the ClusterVulnerabilityReport that is for example 24 hours by default I think we could skip the majority of jobs especially when creating new RS in a deployment per day which I think would be really useful.

One thing that we would have to keep in mind is what should happen if we start the operator or we have two RS that is created really close. Should we have some logic in the vulnerabilityreport controller that makes sure that only one of the image scan job gets started?

NissesSenap

Gave this some more thought and read through the document again and added a small comment that hopefully will help out with the TODO mentioned.

In the design document I think it would be good to add a ClusterVulnerabilityReport example.
I assume it will be very similar to how a VulnerabilityReport look like.

I would suggest also creating a new annotation for VulnerabilityReports that gets generated from a ClusterVulnerabilityReport without running a separate job.

For example: starboard.aquasecurity.github.io/ClusterVulnerabilityReportName: 84bcb5cd46 this way it would be very clear if the VulnerabilityReport was created from a ClusterVulnerabilityReport and I think this could help out with debugging.

@danielpacak do you already have some code ready for this?

NissesSenap · 2022-01-07T17:12:35Z

docs/design/caching_scan_results_by_repo_digest.md

+
+## TODO
+
+* Specify the life-cycle of ClusterVulnerabilityReport objects, e.g. how they are invalidated.


I assume the main goal is to delete the ClusterVulnerabilityReport after a specific time.

One way could be to follow the steps of: #879 which is to have a seperate controller sitting and listening for ClusterVulnerabilityReport. But in this case we woulden't do any reque instead we would just delete the existing ClusterVulnerabilityReport after a specific amount of time.

The good thing with this solution is that it would be very similar to what we do in #863.

The second way could be when checking if a job should be started or not vs copying an existing ClusterVulnerabilityReport into a VulnerabilityReports we could just check and see the age of the ClusterVulnerabilityReport. If it's above x time we could just delete the existing ClusterVulnerabilityReport and run the job as we normally would, thus generating a new VulnerabilityReports and ClusterVulnerabilityReport.

The good thing here is that we don't rely on another controller.

danielpacak · 2022-02-01T15:22:37Z

Gave this some more thought and read through the document again and added a small comment that hopefully will help out with the TODO mentioned.

In the design document I think it would be good to add a ClusterVulnerabilityReport example. I assume it will be very similar to how a VulnerabilityReport look like.

I would suggest also creating a new annotation for VulnerabilityReports that gets generated from a ClusterVulnerabilityReport without running a separate job.

For example: starboard.aquasecurity.github.io/ClusterVulnerabilityReportName: 84bcb5cd46 this way it would be very clear if the VulnerabilityReport was created from a ClusterVulnerabilityReport and I think this could help out with debugging.

@danielpacak do you already have some code ready for this?

@NissesSenap I wrote some code initially to POC this idea, but then I dropped it and started with this design doc.

NissesSenap · 2022-03-03T12:00:19Z

@danielpacak how do you want to proceed on this design document?
It would be really nice to get this feature, especially in environments where you are running sidecars like linkerd.

If you want I can start working on the design document.
I could write a PR to your fork and you can approve but it will be hard for people to give feedback on.
Or should I create a new PR?

Some other solution?

danielpacak · 2022-03-04T09:27:13Z

@danielpacak how do you want to proceed on this design document? It would be really nice to get this feature, especially in environments where you are running sidecars like linkerd.

If you want I can start working on the design document. I could write a PR to your fork and you can approve but it will be hard for people to give feedback on. Or should I create a new PR?

Some other solution?

What if we merge this PR and then you can refine it? This way we'll have your PR to collect feedback and make progress. WDYT?

NissesSenap · 2022-03-05T07:53:33Z

I think that is a great idea.

Signed-off-by: Daniel Pacak <pacak.daniel@gmail.com>

danielpacak · 2022-03-06T10:37:22Z

I think that is a great idea.

I marked it as draft and merged. We can refine it by opening follow up PR(s).

Signed-off-by: Daniel Pacak <pacak.daniel@gmail.com>

danielpacak added the 🎨 design More about design and architecture than writing Go code label Oct 8, 2021

danielpacak changed the title ~~docs(design): Caching Scan Results by Repository Digest~~ docs(design): Caching Scan Results by Image Reference Oct 8, 2021

danielpacak requested review from deven0t, itaysk, simar7 and yoavrotems October 8, 2021 17:49

danielpacak added the 🙏 help wanted Extra attention is needed label Oct 8, 2021

itaysk reviewed Oct 10, 2021

View reviewed changes

danielpacak marked this pull request as ready for review October 14, 2021 12:22

simar7 reviewed Oct 29, 2021

View reviewed changes

danielpacak requested review from chen-keinan and removed request for yoavrotems January 5, 2022 18:39

NissesSenap reviewed Jan 7, 2022

View reviewed changes

docs(design): Caching Scan Results by Image Reference

c821672

Signed-off-by: Daniel Pacak <pacak.daniel@gmail.com>

danielpacak merged commit afe2521 into aquasecurity:main Mar 6, 2022

danielpacak deleted the docs_design_cache_scan_results_by_repo_digest branch March 6, 2022 10:37

dirien pushed a commit to dirien/starboard that referenced this pull request Mar 8, 2022

docs(design): Caching Scan Results by Image Reference (aquasecurity#740)

eea166c

Signed-off-by: Daniel Pacak <pacak.daniel@gmail.com>

NissesSenap mentioned this pull request Nov 9, 2022

Implement "Caching Scan Results by Image Reference" aquasecurity/trivy-operator#668

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(design): Caching Scan Results by Image Reference #740

docs(design): Caching Scan Results by Image Reference #740

danielpacak commented Oct 8, 2021

itaysk left a comment

danielpacak commented Oct 11, 2021 •

edited

simar7 Oct 29, 2021

danielpacak Oct 29, 2021 •

edited

NissesSenap commented Jan 4, 2022

NissesSenap left a comment

NissesSenap Jan 7, 2022

danielpacak commented Feb 1, 2022 •

edited

NissesSenap commented Mar 3, 2022

danielpacak commented Mar 4, 2022

NissesSenap commented Mar 5, 2022

danielpacak commented Mar 6, 2022

		As you can see, Starboard eventually created two VulnerabilityReports by spinning
		up only one K8s Job.


		## TODO

		* Specify the life-cycle of ClusterVulnerabilityReport objects, e.g. how they are invalidated.

docs(design): Caching Scan Results by Image Reference #740

docs(design): Caching Scan Results by Image Reference #740

Conversation

danielpacak commented Oct 8, 2021

itaysk left a comment

Choose a reason for hiding this comment

danielpacak commented Oct 11, 2021 • edited

simar7 Oct 29, 2021

Choose a reason for hiding this comment

danielpacak Oct 29, 2021 • edited

Choose a reason for hiding this comment

NissesSenap commented Jan 4, 2022

NissesSenap left a comment

Choose a reason for hiding this comment

NissesSenap Jan 7, 2022

Choose a reason for hiding this comment

danielpacak commented Feb 1, 2022 • edited

NissesSenap commented Mar 3, 2022

danielpacak commented Mar 4, 2022

NissesSenap commented Mar 5, 2022

danielpacak commented Mar 6, 2022

danielpacak commented Oct 11, 2021 •

edited

danielpacak Oct 29, 2021 •

edited

danielpacak commented Feb 1, 2022 •

edited