Deduplication does not work for Whitesource Scan #1654

bilalk88 · 2019-11-15T07:50:53Z

Bug description
Deduplication does not work for Whitesource Scan

Steps to reproduce
Steps to reproduce the behavior:

Run DefectDojo with Docker Compose
Switch on deduplication:
Go to System Settings
Switch on Deduplicate findings
Click Submit
Add a product:
Go to Add Product
Fill the required fields.
Click Submit
Add a new engagement:
Import Scan Results for Whitesource tool multiple times

Actual behavior
All findings have status just "Active".

Expected behavior
All findings have status "Duplicate".

Deployment method (select with an X)

Kubernetes
Docker
setup.bash / legacy-setup.bash

Environment information

Operating System: CentOS7
DefectDojo Commit Message: 42f1d4e: Update issue templates [2019-10-02 16:01:25 -0500]

Sample scan files (optional)

production-vulnerability-report.json.txt

The text was updated successfully, but these errors were encountered:

ptrovatelli · 2019-11-15T23:52:35Z

Hello @bilalk88 ,
I'm glad to see that people take interest in the deduplication system :)
I've been working on a system for make it configurable for each parser because the current system that tries to work for all the parsers has flaws.
Your problem is similar to #1640
I have just pushed the working configuration for whitesource in #1540 : deduplicate on title + severity + description
severity is just added for safety, in case we have two same titles with different severerities although it really doesn't seem probable
we do need the description, because without it there is sometimes the same title for different libraries
example in your file:

CVE-2017-0249 | Edgeimport-Insights
on both Microsoft.CodeDom.Providers.DotNetCompilerPlatform and Microsoft.Net.Compilers

Again, we could do a little better if the library name was put into "file_path" by the parser: we could deduplicate on cve + file_path + severity which would allow cross-parsers deduplication and would be more resilient to title and description change due to whitesource evolutions.
Still it'll work in simple cases when uploading reports generated by the same version of whitesource

…ow the default one) Tune the configuration to allow cross-parser dedupe sonar vs checkmarx Configure dedupe for: dependency check, npm audit scan (DefectDojo#1640), whitesource scan(DefectDojo#1654) fix full database hash_code recompute and deduplication

ptrovatelli · 2019-12-14T15:10:55Z

Hello @bilalk88 ,
Can you checkout the latest dev and see it that fixes your issue? #1665 was merged.

Thanks!

…ow the default one) Tune the configuration to allow cross-parser dedupe sonar vs checkmarx Configure dedupe for: dependency check, npm audit scan (DefectDojo#1640), whitesource scan(DefectDojo#1654) fix full database hash_code recompute and deduplication

dmeeetreee · 2020-03-04T04:16:21Z

@ptrovatelli just adding in here, as we are in process of uploading multiple scan results (fortify, burp, sonar, .. ) into Dojo from which we'll be integrating with JIRA as part of full incident management pipeline. Deduplication of issues with multiple tools is a bit of a challenge.

Question I have regarding deduplication: with the new mechanism that allows to specify specific fields used in the hash calculation, why do parsers (i.e. burp) still run some deduplication ( dupe_key = str(item.url) + item.severity + item.title) inside their logic, which will conflict when specifying different fields in HASHCODE_FIELDS_PER_SCANNER?

Or do I have that wrong?

madchap · 2020-03-04T07:16:53Z

Again, we could do a little better if the library name was put into "file_path" by the parser: we could deduplicate on cve + file_path + severity which would allow cross-parsers deduplication and would be more resilient to title and description change due to whitesource evolutions.

#1854 could now help. The whitesource parser just needs to take advantage of that change.

Fyi @alles-klar

ptrovatelli · 2020-03-04T09:31:37Z

@dmeeetreee that "dupe_keys" thing in parsers should be understood as an aggregation of results. it is done by the parser before the deduplication is called. the aggregation is not configurable.

the deduplication configuration should be done having in mind what the aggregation is, indeed: if you aggregate on url+severy+title, you cannot deduplicate on other fields than those because all the other fields will be either not present, or not correct( some parsers just pick one random value amongst the aggregated records).

I've fixed this behavior in checkmarx parser by building a concatanation of all the found values rather than picking a random one (it was the last value amongst the aggregated records if I recall which was quite misleading. that same logic is probably still present is some parsers)

I think we need to add this in the doc as it's really not obvious.

ptrovatelli · 2020-03-04T09:34:48Z

although, in some case we could still use other fields in dedupe if they are present in one of the aggregate keys.
for example if title is built based on cve + other data, it's valid to dedup on cve.

still it would seem safer to add cve in the aggregate keys to make sure we don't loose data based on incorrect assumptions

dmeeetreee · 2020-03-05T00:50:00Z

Thanks @ptrovatelli for explaining. As I understand, 2-stage process with aggregation in parser followed by de-duplication.

Will have to talk to our business about integrating with JIRA as part of full incident management pipeline: the idea was to have one JIRA ticket / Dojo Finding for each issue/location/line, so if a file had 2 occurrences of same issue (i.e. CWE-798) we'd have separate incidents. (The Finding object/class inside Dojo model has a line number field so this could be supported in theory.)

In practice that would require to make line number part of the deduplication key, and outside of aggregator. (Flawed in itself, as active development on a file would then see the line number of incident change place and we'd create duplicates ourselves.)

Are there any best practices that can be suggested regarding aggregation/deduplication for our requirement, that is, we are not only interested in reporting the fact a specific security risk is found, but we also want to create a developer workflow/process around it?

stale · 2020-06-03T01:17:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bilalk88 added the bug label Nov 15, 2019

ptrovatelli mentioned this issue Nov 17, 2019

WIP Deduplication configuration per parser #1540

Closed

ptrovatelli added the Waiting on response label Jan 6, 2020

stale bot added the stale label Jun 3, 2020

stale bot closed this as completed Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplication does not work for Whitesource Scan #1654

Deduplication does not work for Whitesource Scan #1654

bilalk88 commented Nov 15, 2019 •

edited

ptrovatelli commented Nov 15, 2019

ptrovatelli commented Dec 14, 2019

dmeeetreee commented Mar 4, 2020 •

edited

madchap commented Mar 4, 2020

ptrovatelli commented Mar 4, 2020

ptrovatelli commented Mar 4, 2020 •

edited

dmeeetreee commented Mar 5, 2020 •

edited

stale bot commented Jun 3, 2020

Deduplication does not work for Whitesource Scan #1654

Deduplication does not work for Whitesource Scan #1654

Comments

bilalk88 commented Nov 15, 2019 • edited

ptrovatelli commented Nov 15, 2019

ptrovatelli commented Dec 14, 2019

dmeeetreee commented Mar 4, 2020 • edited

madchap commented Mar 4, 2020

ptrovatelli commented Mar 4, 2020

ptrovatelli commented Mar 4, 2020 • edited

dmeeetreee commented Mar 5, 2020 • edited

stale bot commented Jun 3, 2020

bilalk88 commented Nov 15, 2019 •

edited

dmeeetreee commented Mar 4, 2020 •

edited

ptrovatelli commented Mar 4, 2020 •

edited

dmeeetreee commented Mar 5, 2020 •

edited